1999). Further performance measures can be added
as required. Results are being presented primarily in
a tabular form, with other representations, such as
box plots (Oja, 1999), optionally available. As with
the other stages, further methods of presenting the
data can be integrated if needed. All test results can
automatically be written to a SQL database during
the test. This allows for data analysis with powerful
external tools.
Test Engine
The test engine controls all settings and parameters
passed to the modules during the test cycle. This helps
to find optimal settings and parameters in a systematic
fashion. Several steps in determining the algorithms’
best performance are automated. These include pro-
gression of noise intensity and sample reduction.
Noise progression is done by successively increasing
the signal-to-noise ratio (SNR) and per cent amount
of noise of a mixture up to a maximum. Sample re-
duction is done by gradually decreasing the size of the
window used by the algorithms. This shows how the
algorithms’ performance is affected by random noise
or the amount of data available, respectively. Noise
progression and sample reduction are mutually exclu-
sive. Automation of parameter values are also inte-
grated into the test engine.
Results can be stored in a SQL database, as mentioned
before, as well as in .mat files for further performance
analysis during the test-cycle and between the steps.
This allows for the reduction of signal generation and
modification to a single time and reusing the gener-
ated and preparated data with different algorithms.
This feature has been implemented using the MAT-
LAB database toolbox.
Finally, the test engine will generate a result report
showing the pertinent data. This includes options
to visualize the results, such as data sample plots of
source, mixed and unmixed data. Additionally, box-
plots of performance measures for the different algo-
rithms are supplied. This is done in order to improve
general comparability of different algorithms.
3 EVALUATION RESULTS
Using the developed testsuite, the following seven
algorithms have been tested: (I.) The FlexICA
algorithm, developed by S. Choi, A. Cichocki and S.
Amari (Cichocki, 2002), (II.) The EVD algorithm,
developed by P. Georgiev and A. Cichocki (Oja,
2001), (III.) The EVD24 algorithm, developed by
P. Georgiev and A. Cichocki (Oja, 2001), (IV.) The
FastICA algorithm, developed by J. Hurri, H. G
¨
avert,
J. S
¨
arel
¨
a, and A. Hyv
¨
arinen (Oja, 2001), (V.) JADE
algorithm, see J.-F. Cardoso (Cardoso, 1999), (VI.)
CubICA algorithm, see T. Blaschke and L. Wiskott
(Blaschke, 2003), (VII.) EFICA algorithm, see Z.
Koldovsky and P. Tichavsky (Koldovsky, 2005).
Using the testsuite to systematically try different
settings and parameter for the algorithms, a general
observation was, that high pass filtering during the
pre-processing stage yielded better results than unfil-
tered or low pass filtered data. Therefore all datasets
have been high pass filtered before applying the re-
spective algorithms. All tests were performed on two
types of datasets: real world data and synthetically
created random non-negative source signals. We used
T=5000 samples. The synthetic datasets have been
created using the testsuite’s integrated data generation
module. All experiments were conducted in a Monte
Carlo fashion with at least 100 independent runs
using randomly generated mixing matrices. All of
these have been applied to the four sub-problems.
The results the algorithms yielded will be presented
in the following sections. All synthetic datasets have
been created using the testsuite’s integrated data
generation module. The figures given in the next
sections represent the mean values of the SIR over
at least 100 runs using randomly generated mixing
matrices. This was done to guarantee stable results in
a Monte Carlo fashion.
Large scale problem
The first sub-problem deals with the algorithms’ per-
formance given an increasing dimension of the data
set. The data sets used for testing contain an equal
number of sub- and super-Gaussian sources, e.g. for
dimension 6 the data set would contain 3 sources of
each type. With increasing dimension the SIR drops,
as illustrated in the upper part of figure 2 for synthetic
data and in the lower part of figure 2 for real world
speech data. It can be noticed that the two EVD vari-
ants fare poorly compared to FastICA and FlexICA.
For synthetic data they are basically unable to sepa-
rate the mixtures. FastICA and FlexICA on the other
hand are performing better. Up to 20 sources can be
separated without the SIR falling below 15dB.
For real world data the algorithms perform better than
for the synthetic sources. This can probably be at-
tributed to the super-Gaussian nature of the signals.
For the comparison for the second algorithm group a
randomly generated proportion of sub-Gaussian and
super-Gaussian sources combined in one data set was
used, e.g. for dimension 4 the dataset would con-
tain either 2 sources of each type or 1 source of the
first and 3 sources of the other type. As expected,
the SIR drops with increasing dimension (see upper
part of figure 3). It clearly appeared that the FlexICA
performs poorly compared to the others. The best of
these four algorithms, the EfICA algorithm, has an
decreasing SIR reaching the 15dB mark for more than
64 dimensions.
ICINCO 2006 - SIGNAL PROCESSING, SYSTEMS MODELING AND CONTROL
188