tion application. The system runs at approximately
14 frames/second on a regular P4 1.2GHz com-
puter equipped with a web cam with a resolution of
320x240.
The first image of Figure 7 corresponds to the situ-
ation where a successful match has been obtained; in
this case, the camera frame has been matched to the
third reference image (shown in Figure 6). All the fea-
ture points that have been used to assess the validity
of this match are shown in light gray (the short lines
associated with each point correspond to the displace-
ment between the two matched images). These points
are also used to compute the tensor relation between
that current view and two reference images (here we
used reference images 1 and 2). Labels can then be
displayed and are pointing to the correct location by
virtue of the tensor transfer operation. The camera
is moved and new images are captured, the matched
point are tracked, the tensor is updated and the labels
are again transferred. Figure 7 shows other images
of the sequence in which the labels are indeed always
pointing at the right location.
In normal operation just the labels are shown and
not the feature points used for matching. This is
shown in Figure 4 and 5 where, this time, the 6 refer-
ence images of Figure 5 are used to annotate the video
sequence of Figure 4.
8 CONCLUSION
An augmented reality system has been presented
where a video sequence can be augmented with tex-
tual annotations. The augmentation is accomplished
by following a 2-step process. First, each incom-
ing frame of the captured video is matched with a
set of reference images. Currently a simple, but effi-
cient, matching scheme is used where points are cor-
related by comparing their respective neighborhood.
The fact that each putative match is validated geomet-
rically eliminates most false matches. However, we
are currently investigating other matching strategies
that would make the matching process more robust to
perspective variation and changes in illumination.
The second step requires continuous estimation of
the trifocal tensor relation. The fact that we have in
hands a reliable set of matches associated to the refer-
ence images is key in this operation. Good tensor esti-
mates can therefore be quickly obtained using which
label transfer (from the reference views where they
have been inserted to the current view) becomes pos-
sible. By tracking the points over time, the tensor esti-
mate can be updated resulting in stable label insertion.
The main advantage associated with the use of pro-
jective entities (such as the tensor and the fundamen-
tal matrix) resides in the fact that no calibration in-
formation is required. The system can then easily
accommodate the use of different camera, as well
as zoom changes occurring during the augmentation
process. Neither 3D pose information nor metric in-
formation about the scene are required.
REFERENCES
Bell, B., Hollerer, T., and Feiner, S. (2002). An anno-
tated situation-awareness aid for augmented reality. In
Proc:UIST ACM Symp. on user interface sofware and
technology, pages 213–216.
Boufama, B. and Habed, A. (2005). Registration and track-
ing in the context of ar. ICGST Int. Journal on Graph-
ics Vision and Image Processing, V3.
Chia, K., Cheok, A., and Prince, S. (2002). Online 6 dof
augmented reality registration from natural features.
In Proc. International Symposium on Mixed and Aug-
mented Reality(ISMAR), pages 223–230.
Fusiello, A., Trucco, E., Tommasini, T., and Roberto, V.
(1999). Improving feature tracking with robust statis-
tics. Pattern Analysis and Applications, 2:312–320.
Hartley, R. and Zisserman, A. (2000). Multiple View Geom-
etry in Computer Vision. Cambridge University Press.
Kutulakos, K. and Vallino, J. (1998). Calibration-free aug-
mented reality. IEEE trans. on Visualization and Com-
puter Graphics, 4:1–20.
Li, J., Lagani
`
ere, R., and Roth, G. (2004). Online estima-
tion of trifocal tensors for augmenting live video. In
IEEE/ACM Symp. on Mixed and Augmented Reality,
pages 182–190.
Lourakis, M. and Argyos, A. (2004). Vision-based camera
motion recovery for augmented reality. In Computer
Graphics Int. Conference, pages 569–576.
Newman, J., Ingram, D., and Hopper, A. (2001). Aug-
mented reality in a wide area sentient environmemt.
In Int. Symp. on Augmented Reality, pages 77–86.
Roth, G. and Whitehead, A. (2000). Using projective vi-
sion to find camera positions in an image sequence. In
Proc. of Vision Interface, pages 225–232.
Vacchetti, L., Lepetit, V., and Fua, P. (2004). Stable real-
time 3d tracking using online and offline information.
IEEE trans. on Pattern Analysis and Machine Intelli-
gence, 26:1385–1391.
Vincent, E. and Lagani
`
ere, R. (2001). Matching feature
points in stereo pairs: A comparative study of some
matching strategies. Machine Graphics and Vision,
10:237–259.
Vincent, E. and Lagani
`
ere, R. (2002). Matching feature
points for telerobotics. In IEEE Int. Workshop on Hap-
tic Virtual Env. and Applications, pages 13–18.
VISAPP 2006 - MOTION, TRACKING AND STEREO VISION
440