was a correlation between the detectors false posi-
tives/negatives or the signals did not share much mu-
tual information, in which case the peaks were caused
by noise.
3.2.3 Efficiency Analysis
In order to gauge how efficient the gradient ascent
approach is compared to the full-search, we calcu-
lated the number of iterations required to converge to
the correct foreground-detection thresholds for each
of 200 frames in a multimodal (thermal infrared and
visible spectrum) video sequence. We used a median
background image for both the visible and infrared
sequences. We initialised our simplex at 10 different
scales. In only two tests (out of 2000) did it converge
to a sub-optimum solution. This occurred at the two
smallest scales. We found that larger scales, in gen-
eral, required more iterations to converge, but were
more likely to converge to a more precise solution.
The average number of iterations to convergence was
26.72. When compared to a full-search, using 256
thresholds for each signal, the Simplex method is over
2400 times faster.
4 EXPERIMENTAL RESULTS
4.1 Foreground Detection
To test our algorithm, we used it to choose thresh-
olds for foreground detection for multi-modal (ther-
mal infrared and visible spectrum) video data. The
surveillance-type video was captured using the a joint
IR-Visible camera rig (
´
O Conaire et al., 2005). We
used the non-parametric background model described
in (Elgammal et al., 2000) to separately model the
colour and thermal background of the scene. For each
pixel, the models each return the probability that the
pixel belongs to the background. Since we used a lin-
ear quantisation of the threshold space, we got bet-
ter resolution by using the negative logarithm of the
probability. Specifically, we used min(−log(p), 255)
in the foreground detection map for each pixel, where
p is the background probability. This spread out the
detection values (similar to histogram equalisation),
so that they were not all clumped into one bin.
Our tests were run on three multi-modal sequences
of approximately 850 frames each. Two were daytime
scenes and one was captured at night. In order to eval-
uate our approach to thresholding, we compare the
thresholds produced by our method to those produced
by Kapur thresholding (Kapur et al., 1985). Kapur
et al. also used an information theoretic approach to
thresholding. Using the signal’s histogram, their ap-
proach was to explain positive and negative detections
as two different signals and choose the threshold that
would maximise the sum of the two-class entropies.
In a comparison of thresholding methods (Rosin and
Ioannidis, 2003), Kapur thresholding was determined
to have the best all-round performance. The results of
our experiments are shown in figure 2.
In the daytime scenes, there is strong mutual infor-
mation and the results are good. The Kapur thresholds
behave in exactly the opposite way to our approach.
While the Kapur threshold is very stable in the visible
spectrum, the MI threshold varies significantly. On
the other hand, the Kapur threshold is very unstable
in the infrared spectrum, the MI threshold is very sta-
ble. Our method seems to perform counter intuitively,
since the thermal infrared images are far noisier than
the visible spectrum. However, if one imagines two
well separated distributions, as is the case when there
is a high signal-to-noise ratio, then there is a wide
range of thresholds that would give very good perfor-
mance. In a noisy signal, the noise and signal are not
as well separated, so there is only a very narrow band
of thresholds that give the correct separation. This is
why our method has a very stable threshold for the
infrared images, as there is only a very narrow range
of values where the infrared agrees with the visible
spectum. The visible spectrum threshold, on the other
hand, can vary a lot without causing any performance
degradation, since the noise is so low.
In the night time scene, there is very little mu-
tual information between the visible and infrared fore-
ground maps. Pedestrians are practically undetectable
in visible spectrum images. This leads to a low value
at the MI surface peak and poor thresholds for both
modalities. The MI value itself can be used as a qual-
ity measure to determine the reliability of the thresh-
olds returned. However, the mutual information is de-
pendant on how much foreground is present, so we
therefore considered a more robust quality measure
that takes the foreground size into account. If we
compute f , defined as the fraction of all pixels that
both maps agree is foreground, then the highest possi-
ble MI value is M
max
= −f log(f )−(1−f ) log(1−
f). By dividing the obtained MI score by M
max
,
we obtain a quality (or reliability) measure of the re-
turned thresholds. This quality score was computed
for all sequences and is shown in figure 2(d).
Future work will involve determining how to cater
for scenarios where the threshold quality score is low.
This scenario could mean that one or both signals are
performing very poorly (such as the visible spectrum
in nighttime scenes), or that there is no mutual infor-
mation to utilise (such as when there are no objects or
people in the scene). One approach could be to revert
to using a single-band thresholding method for each
signal (such as Kapur). Another approach might be to
use the motion information in each of the modalities.
DETECTION THRESHOLDING USING MUTUAL INFORMATION
411