spectrogram to acquire a matrix of (32 × 12) and
saving this matrix for further use.
The memory space needed for this algorithm is
32×12 integer numbers for each sample. Considering
the range of these integers [0, 255], only 1 byte is
needed for each number. Therefore each sample
takes 384 bytes and the growth of space is 384×n
hence Ω(n).
Time complexity for this algorithm can be
calculated by considering the comparisons for each
sample. The number of layers is always constant (6
layers in this case: 2
0
to 2
5
). In presented problem
there are 12 frequency bands which are also
constant. Therefore number of comparisons for each
sample is constant and same for all samples. This
shows that the complexity is again Ω(n). However,
the sorting step takes about O(n Log n). This causes
the time complexity to be O(n Log n) in overall
which is still acceptable as a fast algorithm.
The small size of memory needed for each
sample and the speed of recognition of a word make
this algorithm very suitable for voice commands in
mobile devices such as cell phones, PDAs and etc.
Since there are only one or two words for each
command and considering the results in next section,
this simple algorithm seems very efficient and useful
for the purpose of voice command.
5 RESULTS
The presented algorithm has been implemented and
tested for a single speaker with 100 words. The
results have been compared to that of a widely used
method in speech recognition, HMM. In order to
measure flexibility of this algorithm to noise,
different kinds of noises are applied to test data.
Table 1 shows these results.
For this purpose, a database of samples is
generated which contains about 8 different
pronunciations of a same word, for 100 words which
add up to 800 samples. All samples were introduced
to system except one for each word. Then these
unused samples were tested by system and asked for
recognition. The entry called "Clean" in table 1
refers to these results.
Afterwards, different amounts of two kinds of
noises, White Noise and Babble Noise, are added to
test data and asked again to be recognized. Other
entries of table 1 show these results. Also, "First
Answer" means first recognized answer is the
correct answer and "Third Answer" means one of the
first three answers is the correct answer. The same
data has been tested with HMM approach and its
results are also included for comparison.
Table 1: Experimental results.
HMM
First
Answer
Third
Answer
Clean 100 % 98 % 99 %
20 db 99 % 91 % 96 %
10 db 74 % 90 % 96 %
White
Noise
0 db 4 % 84 % 91 %
20 db 98 % 98 % 99 %
10 db 92 % 92 % 95 %
Babble
Noise
0 db 39 % 44 % 72 %
Table 1 shows that while the efficiency of HMM
algorithm drops down sharply with noisy data, the
presented algorithm keeps its efficiency even with
intensive noise. Also, it can be noted that because of
the smoothing property of averaging, this algorithm
has a good resistance to white noise and this can be
concluded from above results. However, because the
babble noise destroys the information of lower
frequencies, it can affect the efficiency of this
algorithm. Therefore, the first 3 lower frequency
bands of spectrograms have been ignored to achieve
better results in table 1.
REFERENCES
Zimmermann, H.J., 1996. Fuzzy set theory and its
applications, Kluwer Academic Publishers.
Boston/Dordrecht/London, 3
rd
edition.
Gonzalez, R., Woods, R., 2001. Digital image Processing,
Prentice Hall. New Jersey, 2
nd
edition.
Halavati, R., Bagheri, S., Sameti, H., Babaali, B., 2005. A
novel noise immune fuzzy approach to speech
recognition. In International Fuzzy Systems
Association 11th World Congress. Beijing, China.
Babaali, B., Sameti, H., 2004. The sharif speaker-
independent large vocabulary speech recognition
system. In The 2nd Workshop on Information
Technology & Its Disciplines (WITID 2004). Kish
Island, Iran.
Duchateau, J., Demuynck, K., Compernolle, D.V., 1998.
Fast and Accurate Acoustic Modelling with Semi-
Continuous HMMs. In Speech Communication,
volume 24, No. 1, pages 5--17.
Ohkawa, Y., Yoshida, A., Suzuki, M., Ito, A., Makino, S.,
2003. An optimized multi-duration HMM for
spontaneous speech recognition. In EUROSPEECH-
2003. 485-488.
ICINCO 2006 - SIGNAL PROCESSING, SYSTEMS MODELING AND CONTROL
126