facial feature movement instead. It is most likely
the subjects being studied will move at some stage
in the applied museum environment.
2. Having more than one face on screen can also cause
a problem. As the dynamic systems are designed
towards single motion vectors, multiple vectors on
screen at once could cause an erroneous result, al-
though it would be possible to make some changes
to the programming code to alleviate these poten-
tial errors. In the museum environment, it would be
most likely that people gather in crowds to see the
robot in operation.
3. In a real world environment, you can not guarantee
a specific constant frame rate. The result depends
on the sensitivity of the algorithm in question, but
are most likely to be affected in some way. Also,
noise from the image may be interpreted as move-
ment in a dark location.
4. Another problem with dynamic systems is the use
of relative positions. Normally, if the subject has
a neutral facial expression, and then moves onto
another emotion using the sequence, the database
of movements can detect these changes and deter-
mine the new emotion correctly. If the sequence of
images starts mid-emotion however, the database
would most likely fail to find the appropriate emo-
tional result.
5. Although the motion movement algorithm is fairly
simple and quick to run, complex post-processing
algorithms are needed to match those up with facial
expressions stored in a database.
Due to the aforementioned issues, our research
focuses on the static approach to emotion recogni-
tion, taking and analysing one image at a time. The
problem of pattern matching can be solved with pre-
existing techniques such as neural networks, tem-
plate matching, statistical methods and also sequen-
tial processing. Each of these methods require a
small amount of time to process a single iteration of
the algorithm used, but given the amount of data ap-
plicable to image processing tasks, runs into seconds.
Neural networks and template matching require rela-
tively small amounts of code to execute a single itera-
tion, which makes their use in embedded systems de-
sirable. For static based systems, which are of similar
area to the face finding application, neural networks
make up the majority of the architecture used (Sung
and Poggio, 1998; Rowley et al., 1998; Schneider-
man, 2000).
1.1 Summary
Static facial emotion detection is a related problem
of face finding in an image, which neural networks
are generally applied to. Most methods rely on find-
ing the whole face, while others are part based, try-
ing to locate individual elements. For both practical
and academic purposes, our application of the FPGA
neural network focuses on using a single-image parts
based system, which has seen relatively less attention.
Several new techniques for reducing the search area
will be introduced. The Facial Action Coding Sys-
tem (FACS) database (Ekman, 1978) will be used as a
base reference, and also for determining which Action
Units (AUs) of facial muscles make up an emotion.
2 METHODOLOGY
Our studies into FACS reveal there are three main ar-
eas of the face that are responsible for determination
of emotion, which are the eye and eyebrow, nose and
expression line, and the mouth, each of which slightly
overlaps the other. Three neural networks are em-
ployed for recognition of each area, which in turn out-
put several AUs classifiers. As each neuron can result
in a combination of several different AUs, a fourth
network is used for overall emotion recognition, us-
ing the outputs of each previous neuron as input.
While pattern recognition methods such as neural
networks and template matching can be trained to
recognise parts of the face using pixel data, it is of-
ten advantageous to use some sort of a feature extrac-
tion method. Not only can feature extraction give in-
formation which cannot be learnt with normal pixel
data, but non-pixel data also helps to solve the prob-
lem of illumination in different photo images (Viola
and Jones, 2002). However, both template matching
and neural networks methods are fairly slow, espe-
cially when executed several thousand times for each
pixel on the image.
In order to accelerate recognition speeds, a series
of existing algorithms are used in conjunction with
modified and new ideas, to be utilised in addition with
our neural network FPGA routines. Figure 1 shows
the architecture of the system, which features simple
processing (with the most data) at the start, with more
complex processing at each incremental state, con-
sisting of less data to finish. The algorithm includes
aspects of Hue Saturation Luminance (HSL) colour
conversion, skin detection, neural networks, and the
theory of linear perspective.
In addition to these methods, it is also possible to
take the theory of proxemics into account. Proxemics
is a psychological study into human interaction and
association with space. It has been shown in numer-
ous experiments that people, depending on the context
of their conversation, stand a certain distance apart
from each other whilst talking. The distance involved
depends on the type of conversation, how well the
STATIC FACE DETECTION AND EMOTION RECOGNITION WITH FPGA SUPPORT
391