It is difficult to ensure that the position, orienta-
tion and distance of the speaker’s face are constant
from the video camera for every sample taken. Thus,
descriptors that are invariant to translation, rotation
and scale have to be used to represent the MHI for
accurate recognition of the consonants. The features
used to describe the MHI should also be insensitive
to small variation of mouth movement between dif-
ferent samples of the same consonants. This paper
adopts image moments as region-based features to
represent the approximation of the MHI. Image mo-
ments are chosen because they can be normalized to
achieve scale, translation and rotation invariance. Be-
fore extracting the moment-based features, SWT is
applied to MHI to obtain a transform representation
of the MHI that is insensitive to small variations of
the mouth and lip movement.
2.2 Stationary Wavelet Transform
(SWT)
2-D SWT is used for denoising and to minimize the
variations between the different MHI of the same con-
sonant. While the classical discrete wavelet transform
(DWT) is suitable for this, DWT results in transla-
tion variance (Mallat, 1998) where a small shift of the
image in the space domain will yield very different
wavelet coefficients. SWT restores the translation in-
variance of the signal by omitting the downsampling
process of DWT, and results in redundancies.
2-D SWT at level 1 is applied on the MHI to pro-
duce a spatial-frequency representation of the MHI.
SWT decomposition of the MHI generates four im-
ages, namely approximation (LL), horizontal detail
coefficients (LH), vertical detail coefficients (HL) and
diagonal detail coefficients (HH) through iterative fil-
tering using low pass filters H and high pass filters
G. The approximate image is the smoothed version
of the MHI and carries the highest amount of infor-
mation content among the four images. LH, HL and
HH sub images show the fluctuations of the pixel in-
tensity values in the horizontal, vertical and diagonal
directions respectively. The image moments features
are computed from the approximate sub image.
2.3 Moment-based Features
Image moments are low dimensional descriptors of
image properties. Image moments features can be
normalized to achieve translation, rotation and scale
invariance(Mukundan and Ramakrishnan, 1998) thus
are suitable to be used as features to represent the
approximation of MHI.
Geometric moments
Geometric moments are the projection of the image
function f(x, y) onto a set monomial function.The reg-
ular geometric moments are not invariant to rotation,
translation and scaling.
Translation invariance of the features can be
achieved by placing the centroid of the image at the
origin of the coordinate system (x, y), this results in
the central moments. The central moments can be fur-
ther normalized to achieve scale invariant. The nor-
malized central moments are invariant to changes in
position and scale of the mouth within the MHI.
The normalized central moments can be derived up
to any order. In this paper, the 49 normalized geomet-
ric moments up through 9th order are computed from
the MHI as one of the feature descriptors to represent
the different consonants. For the purpose of compar-
ison of the different techniques, the total number of
moments has been kept the same. Zernike moments
require 49 moments, and thus this number has been
kept for the geometric moments as well.
Hu moments
Hu (Hu, 1962) introduced seven nonlinear combina-
tions of normalized central moments that are invariant
to translational, scale and rotational differences of the
input patterns known as absolute moments invariants.
The first six absolute moment invariants are used in
this approach as features to represent the approximate
image of the MHI for each consonant. The seventh
moment invariant is skew invariant defined to differ-
entiate mirror images and is not used because it is not
required in this application.
Zernike moments
Zernike moments are computed by projecting
the image function f(x, y) onto the orthogonal
Zernike polynomial. The main advantage of Zernike
moments is the simple rotational property of the
features. Zernike moments are also independent
features due to the orthogonality of the Zernike
polynomial(Teague, 1980). This paper uses the
absolute value of the Zernike moments as the rotation
invariant features(Khontazad and Hong, 1990) of the
SWT of MHI. 49 Zernike moments that comprise of
0th order moments up to 12th order moments have
been used as features to represent the approximate
image of the MHI for each consonant.
2.4 Classification using Artificial
Neural Network
Classification involves assigning of new inputs to
one of a number of predefined discrete classes.
ICINCO 2006 - ROBOTICS AND AUTOMATION
342