are applied as long as the resulting subsets yield bet-
ter values of the criterion function than the previously
evaluated ones of the same dimension. In the top-
down counterpart, SBFS, an exclusion of a feature
is followed by a series of successive conditional in-
clusions if an improvement to the previous sets can
be made. The feature to be included into the current
feature set or excluded from it is always the one that
improves the set most or degrades the value of the cri-
terion function least (Pudil et al., 1994).
The n Best Features Selection method simply
means selection of the n individually best features in
the sense of maximizing the criterion function. It is
the simplest alternative for feature subset selection,
but also the most unreliable since the features may
correlate with each other. Therefore, it was only used
for comparison in this study.
The best possible way to design the process identi-
fication system would have been to select the feature
set and the classification method used simultaneously.
However, because the kNN classifier had previously
been found suitable for the process identification task
(Haapalainen et al., 2005), the effectiveness of the
different feature subsets produced by the feature se-
lection methods were evaluated using the classifica-
tion accuracy of the 3N N classifier as the criterion
function. In addition, the k-nearest neighbour method
has been used to measure the goodness of a feature
set also in the studies (Jain and Zongker, 1997) and
(Kudo and Sklansky, 2000).
The SFS, SBS and n Best Features methods were
selected to be used in this study because of their
easy application and relatively short calculation time.
Compared to the basic sequential feature selection
methods, the main advantage of the floating meth-
ods is that the resulting feature sets of different di-
mensions are not necessarily nested, as in the case
of the SFS and SBS methods. This is because the
floating methods are able to correct the erroneous de-
cisions made at the previous steps of the algorithm.
Therefore, these methods provide a close to optimal
solution to the problem of feature subset selection
(Pudil et al., 1994). Because of this characteristic,
they are also highly applicable to problems involving
nonmonotonic feature selection criterion functions,
which was the case in this study. In addition, even
though the floating feature selection methods are only
nearly optimal, they are much faster than the optimal
but computationally prohibitive Branch and Bound
algorithm (Narendra and Fukunaga, 1977).
In order to evaluate classification accuracy when
using different feature sets, the data were divided into
training and test data sets, which consisted of 2/3 and
1/3 of the data, respectively. The training data set was
used to train the 3N N classifier, and the test data set
was used to evaluate the classification accuracy.
5 RESULTS
The best possible feature subsets for maximizing
the 3N N classifiers classification accuracy were
searched for. The feature selection methods were ap-
plied to both the original and a normalized feature set.
The latter was formed by normalizing the feature val-
ues of the original feature set to have an average of
zero and a standard deviation of one. The results of
the classification using feature subsets constructed by
the various feature selection methods are presented in
Tables 1 a) and b).
The tables show the best classification accuracy ob-
tained using feature sets formed by each of the feature
selection methods. The feature subsets of all dimen-
sions between 1 and 54 (dimension of the original fea-
ture set) were formed with each of the feature selec-
tion methods. Classification using each of these sets
was performed, and the best classification accuracy
obtained was recorded in the tables. The percentages
in the middle row indicate the ratios of correctly clas-
sified processes and the numbers in the bottom row
stand for the dimension of the feature set used in clas-
sification. It should be noted that the best feature sub-
sets produced by the different feature selection meth-
ods are composed of unequal numbers of features.
The classification results of feature subsets formed
from the unnormalized features are presented in Ta-
ble 1 a), and the results of the subsets constructed
from the normalized features are shown in Table 1
b). It can be seen that the subsets of the set of nor-
malized features are notably larger than the subsets of
the set of original features. However, the dimensions
and classification accuracies of the different feature
subsets are difficult to compare since only the sub-
sets yielding the best classification results were con-
sidered at this point. Only the backward methods,
SBS and SBFS, seem to yield better feature subsets
when applied to normalized data. Nevertheless, the
dimensions of these sets, 17 and 29, are much larger
than those of the subsets formed from the unnormal-
ized feature data, which are both of dimension 7. For
comparison, it can be studied what the classification
accuracies would be for smaller subsets of the set of
normalized features formed with the backward meth-
ods. These results are presented in Table 2. It can
be seen that quite good classification results are also
obtained by using the smaller feature sets. However,
these results do not compare with the classification re-
sults of the subsets produced from the unnormalized
feature set by the forward methods, SFS and SFFS.
From the point of view of this study, it was consid-
ered more important to find a moderately small fea-
ture set that yields excellent classification results than
to reduce the dimension of the feature set used to the
absolute minimum. It can be stated, however, that the
best classification results are obtained with small fea-
FEATURE SELECTION FOR IDENTIFICATION OF SPOT WELDING PROCESSES
43