COMPARING FACES: A COMPUTATIONAL AND PERCEPTUAL

STUDY

L. Brodo

, M. Bicego

, G. Brelstaff

, A. Lagorio

, M. Tistarelli

, E. Grosso

DSL - University of Sassari - via Tempio, 9 - 07100 Sassari - Italy

DEIR - University of Sassari - via Torre Tonda, 34 - 07100 Sassari - Italy

CRS4, Parco Scientiﬁco e Tecnologico POLARIS - 09100 Pula (CA) - Italy

DAP - University of Sassari - piazza Duomo, 6 - 07041 Alghero (SS) - Italy

Keywords:

Face differences, log polar mapping, psychophysical evaluation, difference maps.

Abstract:

The problem of extracting distinctive parts from a face is addressed. Rather than examining a priori speciﬁed

features such as nose, eyes, mouth or others, the aim here is to extract from a face the most distinguishing or

dissimilar parts with respect to another given face, i.e. ﬁnding differences between faces. A computational

approach, based on log polar patch sampling and evaluation, has been compared with results obtained from a

newly designed perceptual test involving 45 people. The results of the comparison conﬁrm the potential of the

proposed computational method.

1 INTRODUCTION

Automatic face analysis is an active research area in

which interest has grown over recent years.One of the

most challenging and interesting issues in the auto-

mated analysis of images of faces is the detection of

“facial features”, intended as characteristic parts of

the face. Many approaches have been proposed for

the extraction of such facial features (see (Campadelli

and Lanzarotti, 2004) and references therein). Most

of these are devoted to the detection of a priori speci-

ﬁed features, such as the nose, eyes, mouth, eyebrows

or other, non anatomically referenced, ﬁducial points.

In practice, however, for face recognition and authen-

tication, it is necessary to consider additional features,

in particular those that precisely characterize a given

face. Rather than simply extracting standard patterns

to distinguish the face of subject “A” from that of sub-

ject “B”, it is important to extract from the face-image

of subject “A” as many as possible of the features that

differ signiﬁcantly from, or are not even present in,

face “B”.

Recently, an area-based approach aimed at “ﬁnding

differences” between faces was proposed (a prelimi-

nary version appeared in (Bicego et al., 2005)). It ex-

tracts from one face-image the most distinguishing, or

dissimilar, areas with respect to another face-image,

or to a population of faces. In particular, the pro-

posed algorithm extracts, from two face-images, a set

of sub-images centered at different locations within

each image. This process samples most of the face,

in a way similar to that adopted in patch-based im-

age classiﬁcation (Dorko and Schmid, 2003)) and im-

age characterization (Jojic et al., 2003). At each lo-

cation, data are sampled according to a “multi-scale”

regime in which image patches encode grey-scale pat-

tern at different spatial resolutions. A log polar map-

ping (Grosso and Tistarelli, 2000) has been adopted

for this purpose. The image patches thus extracted

constitute two data-set features, each characterizing a

single face. Next a classiﬁer is trained so as to best

distinguish between the two face-classes purely on

the basis of the grey-levels values of the pixels within

each patch. By identifying the loci of the patches in

the resultant classiﬁcation space the degree of “dis-

tinctiveness” can be assessed as the distance from the

trained hyperplane. Since the classiﬁer is trained to

separate patches of the ﬁrst face from patches of the

second, we hypothesize that the most important dif-

ferences between the two faces will be encoded by the

patches furthest from the separating hyperplane (i.e.

those that the classiﬁer weights highest). In (Bicego

et al., 2005) examples of the most important patches

were extracted and shown for several different im-

ages.

In this paper the computational method has been

enhanced and improved, particularly when computing

the difference between patches and when visualizing

the results. However, the main aim here is to inves-

tigate the question: are the differences extracted and

assigned importance by our algorithm also judged im-

portant by human observers? In section 3 we present

an initial perceptual study that provides some prelim-

inary evidence that human observers may indeed con-

188

Brodo L., Bicego M., Brelstaff G., Lagorio A., Tistarelli M. and Grosso E. (2006).

COMPARING FACES: A COMPUTATIONAL AND PERCEPTUAL STUDY.

In Proceedings of the First International Conference on Computer Vision Theory and Applications, pages 188-191

DOI: 10.5220/0001367701880191

 SciTePress

sider important the patch locations identiﬁed by our

algorithm.

2 COMPUTATIONAL

EVALUATION

The idea here is to determine those areas of a given

face-image that differ most from any other face-

image. In brief, we achieve this by projecting into a

feature space two sets of image patches, sampled from

two face-images, and scoring the patches by their mu-

tual distances. The most distant features found in the

feature space are likely to be the more distinctive face

areas for the speciﬁc faces.

In detail, our algorithm extracts, from the two

face-images, a set of patches centered upon speciﬁc

points—where these points are uniformly distributed

across the face-image such that most, or all, of the

face area is covered by the sampling process. Each

patch maps on to a coordinate in a multi-dimensional

feature space by virtue of its sample grey-levels. We

adopt simple feature formulation approach by con-

sidering the sample grey-levels in each patch as or-

dered coordinate values as resulting from the log polar

sampling—in practice deﬁning a 400-D space. The

patches from one face-image will tend to form their

own cluster in this space: the other face-image ought

to form a different cluster. Our extracted patches thus

constitute two data-clusters of location-independent

features, each of which characterize one of the two

faces. Based on the distribution of those patches

within feature space, degrees of distinctiveness of

each face patch can be formulated according to its

distance from the projection of the other data-cluster.

Patches with the highest weights are then interpreted

as encoding the most important differences between

the two face-images.

Since face recognition involves information appar-

ent at a various spatial resolutions a multi-scale analy-

sis should provide an advantage over any single scale

analysis A multi-scale analysis could repeat the clas-

siﬁcation procedure with patches of various sizes, and

then judiciously combine the results to obtain the im-

portant differences. We adopt a variant multi-scale

approach designed to avoid two notable pitfalls: (a)

blind analysis - whereby information revealed at one

scale is not usefully available at other scales, and (b)

repeated image processing - which adds to the overall

computational expense.

Our solution is to sample the face-image using

patches derived from a log-polar mapping (Grosso

and Tistarelli, 2000). This mapping can also be moti-

vated by its resemblance to the distribution of the re-

ceptive ﬁelds in the human retina, where the sampling

resolution is higher at the central fovea and decreases

toward the periphery. The resultant sampling process

ensures that each patch contains both low scale (ﬁne

resolution) and contextual (low resolution) informa-

tion.

Facial features are then selected in two steps:

1. two distinct sets of patches are extracted from the

two face-images at speciﬁc image locations;

2. for each of the two faces, the patches are ranked

according to their distances from the other cluster

in feature space.

2.1 Multi-scale Face Sampling

The patches sampled from the original face-image are

centered at a pre-speciﬁed set of points. To ensure

translation-independence the locations of these points

ideally ought to be selected randomly (Bicego et al.,

2005). Yet since that would require very many sam-

pling points, in order to completely cover the two

faces we adopt here a regular sampling regime for

which the faces have been manually registered before-

hand.

The face-image is re-sampled at each point follow-

ing a log-polar scheme (Grosso and Tistarelli, 2000)

so that the resulting set of patches represents a local

space-variant remapping of the original image, cen-

tered at that point. Analytically, the log-polar scheme

describes the mapping postulated to occur between

the retina (retinal plane (r, q)) and the visual cor-

tex log-polar or cortical plane (x, h)). The size of

the “receptive ﬁelds” follows a linear increment mov-

ing from the central region (fovea) outwards into the

periphery. Due to lack of space, full details of the

log-polar transformation are not given here, interested

reader are referred to (Grosso and Tistarelli, 2000).

The set of log-polar image patches, sampled from

each face-image, are vectorized, and represent the

face in feature space.

2.2 Determining Face Differences

As stated early, the “distinctiveness” of each patch is

related to its locus in feature space with respect to the

other face. In particular, those patches of the ﬁrst face,

found near loci of the second face in feature space

are less distinctive since they may easily be confused

with the patches of that second face. On the other

hand, patches located near the ﬁrst face-set should be

usefully representative.

More formally, let S

the set of patches of face

A and B, respectively. The weight of distinctiveness

ω of a patch p

(x, y), centered at the position (x, y)

in the face A is computed as:

ω(p

(x, y)) = d(p

(x, y),S

) (1)

COMPARING FACES: A COMPUTATIONAL AND PERCEPTUAL STUDY

189

where

d(p

(x, y),S

) = min



)

(x, y),p



))

(2)

where d

is some distance metric between feature

vectors. Here, for simplicity, we adopt a Euclidean

metric. It might be worthwhile investigating other

metrics, such as those due to transforming feature

space via say a Principal Component Analysis or Lin-

ear Discriminant Analysis.

We measure both the difference of face A to face

B and vice versa since the two distinctiveness results

can and do vary. In each case, the results are projected

back on to the spatial image of the face using a paral-

lel ﬂood-ﬁlling technique. This renders a “difference”

map in which the grey level of each pixel indicates the

level of distinctiveness.

3 PERCEPTUAL STUDY

We describe here an informal study of how human

observers report seeing difference between faces with

the aim of comparing the result obtained with that of

our algorithm. This is in anticipation of an objective

psychophysical investigation that we intend to present

in the future.

A perceptual experiment was implemented in Mat-

lab on a laptop PC. Human subjects, with normal,

or corrected vision, were selected for a set of tri-

als. In all 45 university students (7 male, 38 female)

were tested. Each trial began (after 2 seconds of

mid-gray screen) by presenting a stimulus consisting

of two monochromatic face-images side-by-side on a

mid-gray background. After a ﬁxed time-interval the

stimulus was replaced by a single cartoon-image (of

roughly the same size) of a “general face” or mock-up

upon which the subject was then asked to navigate and

click using the PC’s mouse. The task, explained be-

forehand via a training example, was to indicate any

part of the face where they had seen an important dif-

ference during the stimulus presentation. After 5 sec-

onds the mock-up was replaced by the mid-gray back-

ground ready for the next trial to be initiated. A set of

trials consisted of repeating this procedure until each

of six chosen face-pairs had been presented to the sub-

ject. The results were later reviewed by overlaying

the clicked points on the mock-up and displaying it

on screen or paper—e.g see Fig. 2.

Viewing parameters were ﬁxed as follows: view-

ing distance: 50 cm; image height: 9 cm (10 deg,

310 pixels); image width: 6 cm (7 deg, 200 pixels);

image-pair separation: 4 cm (5 deg); stimulus width:

14 cm (19 deg); full contrast screen setting under in-

door ambient illumination.

In the training example—Fig. 2(Exp. 1)—the two

Figure 1: Top line: stimulus faces. Bottom line: results of

perceptual study increasing the time interval: 0.5, 1, 2, 4 s.

The displayed maps are accumulative, i.e. each is the sum

of all the previous map plus the current one.

images were identical except for the artiﬁcial super-

position of a easily seen dark spot on one cheek. In

the trials image pairs presented two different persons,

except in one case where the same person with, and

without, facial make-up and earrings was employed.

Stimulus presentation time ought to allow the ob-

server to have time at least to scan both faces. Since,

it was initially unclear what interval might sufﬁce we

repeated each set of trials four times on each occasion

doubling presentation time—i.e. 0.5, 1, 2 and 4 s.

Note, learning effect might thus contaminate the re-

sults of the later set of trials, and so they are avoided in

the next section. For short intervals one might expect

a concentration on speciﬁc location (featural process),

while for long intervals the tendency may be to con-

vey the attention to the overall face (see (Collishaw

and Hole, 2000)). This is indicated the mock-ups in

the lower parts of Fig. 1, which incrementally shows

the face areas clicked upon during the four trial sets

(the i-th image accumulates the results from the ﬁrst

i trials). It seems that as the interval increases, ob-

servers focus their attention upon conﬁgurational as-

pects of the face such as cheeks, the upper lip zone

and between the eyes.

4 EXPERIMENTAL

COMPARISON

Here we graphically compare difference maps pro-

duced by our algorithm with the mock-up results from

our perceptual experiment. To this end the algorithm

was run on the same face-image pairs presented in the

experiment—as follows. Each log-polar patch had a

resolution of 23 eccentricity steps and 35 receptive

ﬁelds at each, with a 10% overlap along the two direc-

tions. The images were cropped in order to eliminate

the inﬂuence of the background, often omitting the

ears. Here we employ only the mock-ups that com-

bine the 0.5 and 1 s time intervals in order to reduce

any learning effect contamination. Fig. 2 compares

three results. In general, these graphical comparisons

are indicative of the high, if not perfect, degree of

agreement found between the algorithmic-produced

VISAPP 2006 - IMAGE UNDERSTANDING

190

Experiment 1

Experiment 2

Experiment 3

Figure 2: Comparison of computational and perceptual re-

sults: For each experiment, the ﬁrst row contains the origi-

nal images, the second the results of computational and per-

ceptual experiments.

difference maps and the human-generated mock-ups.

Below these comparisons are discussed in turn.

• Comparison 1. This is the training example in-

tended to test the system in artiﬁcially controlled

conditions. The two images are identical, ex-

cept for the black dot attached to the cheek. The

perceptual mock-up result indicates most dots in

the correct zone, with a small spatially random

component. The algorithm also maps the correct

zone as the most important difference—via a light

area. Both difference maps are shown: (a) that of

the difference between the face-with-dot from the

face-without-dot, and (b) vice versa. In the lat-

ter case, the maximum difference appears darker—

presumably because that spot zone is actually more

similar to other parts of the face-with-dot image.

Otherwise the two maps have similar structure, as

might be expected. In this case, the algorithmic and

mock-up results are in good overall agreement.

• Comparison 2. This is a more realistic example

involving two different faces. The perceptual re-

sult, the mock-up, indicates the majority of dots lo-

cated on the mouth, the eyes and the nose, while a

few points are found around the face contour. The

algorithm is in agreement, especially highlighting

the eye zones where the glasses appear to be fun-

damental in discriminating between the two faces.

Neither the forehead nor the cheeks appear to be

important. The two difference maps are structurally

similar, except at the upper part of the right eye of

the second face. This part, greatly highlighted by

the algorithm indicates the right eyebrow, which

appears very different from the one on the left (sim-

ilar to those of the ﬁrst face). Thus the algorithm is

revealing a high level of details here.

• Comparison 3. This realistic example, compares

a male face to a female face. The mock-up re-

sulting from the perceptual trials distinguishes the

eyes, the eyebrows, the mouth, the nose and the

hair junction. The eyes and eyebrows are clearly

identiﬁed by the algorithm, whereas less emphasis

has been given to the mouth and to the nose. The

hair junction has been detected only in one face,

conﬁrming that it is worth while to compute both

difference maps. The large erroneous difference in

the bottom left corner of the ﬁrst face, is probably

due to the neck that is present in the face-image. It

is interesting to note that the algorithm is able to

discover the inclination of eye-line of the ﬁrst face

and represents it in the difference map.

5 CONCLUSIONS

Here we addressed the problem of ﬁnding differences

between faces from two complementary angles: al-

gorithmic analysis and perceptual testing. In several

experiments the difference maps computed showed a

high degree of similarity to those made apparent by

the perceptual testing.

REFERENCES

Bicego, M., Grosso, E., and Tistarelli, M. (2005). On ﬁnd-

ing differences between faces. In Kanade, T., Jain, A.,

and Ratha, N., editors, Audio- and Video-based Bio-

metric Person Authentication, volume LNCS 3546,

pages 329–338. Springer.

Campadelli, P. and Lanzarotti, R. (2004). Fiducial point lo-

calization in color images of face foregrounds. Image

and Vision Computing, 22:863–872.

Collishaw, S. M. and Hole, G. J. (2000). Featural and con-

ﬁgurational processes in the recognition of faces of

different familiarity. Perception, 29:893–909.

Dorko, G. and Schmid, C. (2003). Selection of scale-

invariant parts for object class recognition. In Proc.

Int. Conf. on Computer Vision, volume 1, pages 634–

640.

Grosso, E. and Tistarelli, M. (2000). Log-polar stereo for

anthropomorphic robots. In Proc. European Confer-

ence on Computer Vision, volume 1, pages 299–313.

Springer-Verlag.

Jojic, N., Frey, B., and A.Kannan (2003). Epitomic analy-

sis of appearance and shape. In Proc. Int. Conf. on

Computer Vision, volume 1, pages 34–41.

COMPARING FACES: A COMPUTATIONAL AND PERCEPTUAL STUDY

191