One key assumption of the Costeira–Kanade
method is not often satisfied in practice in traffic se-
quences: that objects move independently. The mo-
tion matrix M
k
for each object k is of the form:
M
k
=
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
i
t
1,k
t
x1
.
.
.
.
.
.
i
t
n,k
t
xn
j
t
1,k
t
y1
.
.
.
.
.
.
j
t
n,k
t
yn
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
(12)
Our road sequences are around just one or two sec-
onds. During these short time intervals, vehicles do
not rotate by themselves but for lane changes and road
curves. However, it can hardly be appreciated due to
the usually large distance to the camera. Nevertheless,
there is always a continuous and oscillating relative
rotation with respect the camera, caused by the pitch
motion of the car to which it is attached. Since this ro-
tation is the same for all objects, I
k
=[i
1,k
... i
n,k
] and
J
k
=[j
1,k
... j
n,k
] are equal for each k = 1...p. Thus,
the rank of M is reduced to 3 plus the number of lin-
early independent translation vectors t
k
. At this point,
we introduce the following simplifying assumptions:
every vehicle appears, at least, as two blobs (fea-
tures), and in any sequence at least m = 4 features
are tracked. Consequently, the maximum number of
objects is p =
m
2
and, in theory, the rank of W lies
within the range
4 ≤ r ≤ min(m,3+
m
2
) (13)
Translation vectors t
k
contribute too to the motion
degeneracy. Trajectories along such short intervals
are almost always straight lines and relative vehicle
velocities mostly constant. Therefore translation vec-
tors tend also to be linearly dependent, being related
by a constant equal to the ratio of their speeds.
In sum, our trajectories are motion degenerate and
the rank of W is, in practice, almost always 4, some-
times a little bit greater. Hence, only values r = 4, 5,6
are usually worth to try.
Now, we turn to the problem of assessing each
possible value of r according to the blockiness of its
sorted interaction matrix Q
∗
. The bad news is that
even the noiseless interaction matrix Q
∗
is no more
block–diagonal. However, for the correct rank of W
and even in presence of noise, the interaction matrix
in our sequences is still quite block–diagonal and the
energy function of equation (5) can be again used
to find out possible block limits. Figure 1 shows an
example of rank r determination for the sequence 3.
Note that for r = 5 the block–diagonal aspect of the
computed Q
∗
is maximum. In fact, r = 5 provides
the right motion segmentation: the second 3×3 block
corresponds to three lamp posts, and each of the other
blocks is made of a couple of features which belong
to the same vehicle.
Let be l
1
,l
2
...l
r
the list of computed possible block
limits taken as those columns of Q
∗
r
= V
∗
r
V
∗t
r
where
ε
shows a sharp increase. According to equation (5), a
normalized blockiness measure is defined as:
b(r)=
1
r
r
∑
k=2
sign(l
k
− l
k−1
− 1)(
ε
(l
k
) −
ε
(l
k−1
))
(14)
We select the right r as the one for which the
normalized energy within all the possible computed
blocks is maximum. However, blocks of size 1 fea-
ture are not taken into account, because for r = m, Q
∗
m
is perfectly diagonal with 1×1 blocks and we do not
want to interpret this situation as a case of maximum
blockiness. Figure 1 (top) shows an example of the
blockiness measure of Q
∗
r
for the possible values of r
according to equation (13).
Regarding strategy 1, the blocks for the best r yield
the motion segmentation. As for strategy 2, we use
the value of the blockiness measure to make a deci-
sion concerning the existence of more than one block.
A simple threshold (set at 0.7) can differentiate the
two cases in our experiments. In figure 2 the obtained
motion segmentation is shown.
4.2 3D Velocity Computation
The Han-Kanade (Han and Kanade, 2004) algorithm
allows to recover the velocity ratios of the features,
i.e. v
k
/v
l
for each k,l = 1 ... m. The veloc-
ity values are useful not only to group the features
in objects but to distinguish between the objects that
are approaching and the ones that are moving farther
away from the camera, since their velocities have op-
posite sign.
The problem is that we would need to know which
of the points are static in order to obtain the real ve-
locity ratios, as it can be seen in section 3. But that
is, precisely, our final goal: to find out which of the
points correspond to vehicles and which to static fea-
tures (such as lamp posts or traffic signals).
The method described in section 3 solves the case
when the scene is 3D and the velocities of moving ob-
jects span a 3D space (rank(
ˆ
W)=6). Unfortunately,
degenerate cases can arise due to degenerate shape
and/or motion. Specifically, in the traffic sequences
we work with, the most common situation is the ex-
istence of one or multiple moving objects in the same
direction or perhaps opposite sense. As before, the
static structure of the objects is 3D, but 3D velocities
v
k
span only a one dimensional space, since they dif-
fer in module or sign but not in direction.
Therefore, we have to deal with a degenerate case
MOTION SEGMENTATION THROUGH FACTORIZATION - Application to Night Driving Assistance
273