cases of tests with MS or FS, this proving a high
generalization capacity of the combined system. In
the case of LPC and MFC coefficients, the
combined-trained database is not so efficient, the
best results being obtained if the tests are made on
the same type of database used in the training
processes.
The PRR variation follows the WRR variation,
which was expected, because nothing was especially
done to enhance PRR.
It is also obvious the improvement of the results
when using HMM modeling triphones compared to
the case of HMM modeling monophones.
REFERENCES
Dumitru, C.O., Gavat, I., 2005. Features Extraction,
Modeling and Training Strategies in Continuous
Speech Recognition for Romanian Language, Proc.
EUROCON, Belgrade, Serbia & Montenegro, pp.
1425-1428.
Dumitru, C.O., Gavat, I., 2005. A Comparative Study of
Features for Continuous Speech Recognition by
Statistical Modeling with Monophones and Triphones,
Proc. SPED, Cluj-Napoca, Romania, pp.73-78.
Furui, S., 2000. Digital Speech Processing, Synthesis and
Recognition, 2-end, rev and expanded Marcel Dekker,
N.Y.
Gold, B., Morgan, N., 2002. Speech and audio signal
processing, John Wiley and Sons, N.Y.
Goronzy, S., 2002. Robust Adaptation to Non-Native
Accents in Automatic Speech Recognition, Springer –
Verlag Berlin Heidelberg, Germany.
Hanson, B.A., Applebaum, T.H., 1990. Robust Speaker-
Independent Word Features Using Static, Dynamic
And Acceleration Features, Proc. ICASSP, pp. 857-
860.
Hermansky, H., 1990. Perceptual Linear Predictive
Analysis of Speech, J. Acoust. Soc. America, Vol.87,
No.4, pp. 1738-1752.
Huang, X., Acero, A., Hon, H.W., 2001. Spoken Language
Processing – A Guide to Theory, Algorithm, and
System Development, Prentice Hall.
Huang, C., Chen, T., Chang, E., 2002. Speaker Selection
Training For Large Vocabulary Continuous Speech
Recognition, Proc. ICLSP Vol. 1, pp. 609-612.
Milner, B.A., 2002. Comparison of Front-End
Configurations for Robust Speech Recognition, ICLSP
2002 Proceedings, Vol. 1, pp. 797-800.
Oancea, E., Gavat, I., Dumitru, C.O., Munteanu, D., 2004.
Continuous speech recognition for Romanian language
based on context-dependent modeling, Proc.
COMMUNICATION 2004, Bucharest, Romania, pp.
221-224.
Odell, J.J., 1992. The Use of Decision Trees with Context
Sensitive Phoneme Modeling, MPhil Thesis,
Cambridge University Engineering Department
SAMPA - Speech Assessment Methods Phonetic
Alphabet,
http://www.phon.ucl.ac.uk/home/sampa/home.htm
Vergin, R D., O’Shaughnessy, Farhat, A., 1999.
Generalized Mel-Frequency Cepstral Coefficients for
Large Vocabulary Speaker Independent Continuous
Speech Recognition, IEEE Trans. Speech Audio
Processing, Vol. 7, No.5, pp. 525-532.
Woodland, P.C., Odell, J.J., Valtchev, V., Young, S.J.,
1994. Large Vocabulary Continuous Speech
Recognition Using HTK, Proc. ICASSP 1994,
Adelaide.
Young, S.J., 1992. The General Use of Tying in Phoneme-
Based HMM Speech Recognizers, Proc. ICASSP’92,
Vol. 1, pp. 569-572, San Francisco.
Young, S.J., Odell, J.J., Woodland, P.C., 1994. Tree Based
State Tying for High Accuracy Modeling, ARPA
Workshop on Human Language Technology,
Princeton.
FEATURES EXTRACTION AND TRAINING STRATEGIES IN CONTINUOUS SPEECH RECOGNITION FOR
ROMANIAN LANGUAGE
121