tected without taking into account the gesture, with no
limitation in the number of gestures being detected, as
long as wrists were not concealed. Additionally, there
was no need for an initialization step. Figure 1 shows
some detection examples.
Figure 1: Five positive results showing both wrists detec-
tions (dark rectangle) and complete hands (white rectangle).
2 SAMPLE SET CONSTRUCTION
Cascade classifiers need both positive and negative
samples for their training. Negative samples should
be as numerous and dissimilar as possible, and they
should not contain the target object, i.e hands. Pos-
itive samples should show as many different target
object views as possible, in different conditions. In
relation to frontal faces, for example, it is advisable
to maximize three variables: subjects, facial gestures
and light source directions. There are many databases
where these circumstances are met (orl, ) (Georghi-
ades et al., 2001). Hands, however, add a fourth vari-
able: background. While in a face sample there is no
need to show nothing else than a face, it is not possible
to show a hand without showing pieces of background
between unbended fingers. Any gesture apart from a
fist will show what lies behind the hand, and thus it
becomes part of the positive sample.
A set of positive hand samples created for a cascade
classifier should also add so many different back-
grounds as possible, allowing the classifier to infer
what is the real target object. Although there are some
hand databases available, it is difficult to find the four
requisites together. In (Triesch, 2000), for example,
there are around 15 different backgrounds and 25 sub-
jects, but only 9 gestures. Thus, this set is suitable for
the training of a single classifier for each gesture, as in
(K
¨
olsch and Turk, 2004b), but not for a more general
one. Being conscious of the high difficulty in meeting
the constrains that a hand dataset imposes during its
creation, we propose a method which tries to reduce
them to only two requirements: different light sources
and different gestures. Thus, hand gestures are per-
formed by a single person under different light con-
ditions, filmed against a background having a single
color or a relatively narrow range of colors (chroma
key or color keying). Then, each sample may suffer
a slight geometrical transformation (stretching and/or
rotation) and finally a random image, chosen from a
high amount of images not showing hands, substitutes
the chroma signal. Using the chroma key technique,
it is possible to create a set of positive hand sam-
ples large enough to train and test a cascade classi-
fier, avoiding the troubles of gathering many different
people and backgrounds, thus saving time.
For testing purposes, we assembled a recording set
with a green chroma screen, six spot lights and a
single ambient light with fixed intensity and color.
Lights where placed in front of the actor, on his left,
right and in front of him, three at the same hand
height, and three around two meters above the hand
height, as seen in Figure 2. This way, it was possi-
ble to record six sequences with different light condi-
tions. A total amount of 288 images where extracted
from the recorded sequences, showing more than 20
gestures under each light setup.
Figure 2: Side and front view of the recording set. White
circles represent light sources.
Then, the chroma key was substituted four times in
each image with a random background image, taken
from a group of more than 5000, generating four new
samples, each of them also mirroed. No geometric
transformation was applied to the original image. Fig-
ure 3 shows the main process steps for a given ges-
ture. The final dataset consist in a total amount of
2304 20x20 grey level images. Figure 4 shows sam-
ples from our previous dataset and samples from the
new one.
VISAPP 2006 - IMAGE UNDERSTANDING
198