single hand gesture recognition based on dwt and dct feature

Transcription

single hand gesture recognition based on dwt and dct feature
SINGLE HAND GESTURE RECOGNITION BASED ON DWT AND DCT FEATURE
EXTRACTION AND NEURO-FUZZY CLASSIFIER
Kavitha Jaganathan
Faculty of Creative Industries
UTAR University
[email protected]
Dr. Lili Nur Liyana, Dr. Razali Yakob
School of Computer Science
University Putra Malaysia
[email protected]
Dr. M. Jaganathan
Faculty of Communication
Taylor’s University
[email protected]
ABSTRACT
Hand gestures in Bharatanatyam dance carry valuable information. Learning the meaning of hand
gesture, mimic and practice them with the best way and high matching for the people who want to
be expert in this field is necessary. In this paper, a combined feature extraction method based on the
Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) is proposed. DWT with two
level decompositions is applied to the image by size of 128 × 128. Two-dimensional DCT is then
applied to chosen part and convert the coefficients of DCT to vector. Finally, neuro fuzzy classifier is
used to classify the given images in some given classes. A suitable number of images with good
illumination for different applications have been created. Many types of image processing techniques
like rotation, scaling, and translation can also apply to the original database and make ready more
options for any study. The experimental results show that the proposed method has good
performance in most of single hand gestures. The dataset of single hand gesture in Bharatanatyam
dance has been successfully created and it could serve as a benchmark dataset as well. Our
proposed system is able to recognize single hand gesture with the accuracy of 93%. 56 out of 60
images of single hand gestures are correctly classified by the proposed system. This is because the
parameters identified were the right signal, which gave the best 70 features to be classified and
recognized.
Keywords: Discrete Wavelet Transform, Discrete Cosine Transform, neuro fuzzy classifier, scaling
coefficient, joint spatial, Adaptive neuro fuzzy inference system (ANFIS).
1. Introduction
The culture of India is rich and has a lot of diversities. This paper intends to implement a fusion of
image processing techniques with the aim of making the computer to approve the hand gesture for
the accuracy of hand movement in Bharatanatyam dance. It can be classified into two categories: i)
Asamyukta Hasta (single hand gestures); and ii) Samyukta Hasta (double hand gestures). There are
28 Asamyukta Hasta and 24 Samyukta Hasta. These single hand gestures are relatively unchanged
over the years (Verma, 2009). Most of the previous dance gesture studies are focusing on the other
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
92
body parts motion or skeleton structure for dance gesture recognition (Dong et al., 2006; Heryadi et
al., 2012; Saha et al., 2013(a). Earlier works attempt to recognize gesture with classifier that directly
operates over single low-level feature such as color and texture. Therefore, we attempt to propose a
combined feature extraction method in order to fill the research gap. Objective is to form based on
contemporary feature extraction and pattern classification technique. The sub-objectives are as
follows:
a. To propose the combination of discrete wavelet transform and discrete cosine transform in
feature extraction phase.
b. To compare the feature space description and identify which is suitable for single hand gestures
recognition in Bharanatyam dance.
c. To increase the performance of the single hand gesture recognition in Bharanatyam dance.
In addition, it is also important to highlight the use of the combination of discrete wavelet transform
and discrete cosine transform in feature extraction phase, which increases the performance of the
overall recognition system. Section 2 explains the various work done so far in this area. Section 3
describes the proposed system with its architecture. Section 4 comprises the experimental results
and discussion. Section 5 concludes with future work.
2. Related Work
Computer vision provides innovative solution to many computer-aided digital image-processing
applications. One of the significant research areas includes human gesture recognition. The
application of human gesture recognition provides advantages to many institutions or individual that
employs this application. In addition, related works that deal with the image processing techniques
and classification models are presented.
Table 2.1. Analysis on surveys and reviews of hand gesture recognition
Author(s)
Year
Description
Rautaray and Agrawal
2012
An analysis on hand gesture recognition which focuses
on its main phases, framework and software platform
Shangeetha .R.K, Valliammai
. V, Padmavathi . S
2012
An implementation of distance transform for both the
hands is in progress and its robustness varies when
there is an overlap of hands
Corera and Krishnarajah
2011
An article on challenges in hand gesture recognition
and its related application
Wachs et al.
2011
Discussion on soft computing based methods for hand
gesture recognition
Chaudhary et al.
2011
A review on facial movement and hand gesture
recognition
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
93
Table 2.2. Summary of static single hand gesture recognition approaches
Authors
Year
Feature Extraction
Technique
Application
Recognition Rate %
Saha et al.
2013
Hand gesture recognition
for Bharatanatyam dance
85.1%
Mozarkar &
Warnekar
Feng and
Yuan
2013
Boundary
extraction using
Sobel
Hybrid saliency
technique
HoG features
extraction
algorithm
Hand gesture recognition
for Bharatanatyam dance
Random hand gesture
recognition
85.29%.
Vieriu et al.
2013
Contour extraction
Shangeetha
et al.
2012
Distance transform
Random hand gestures
recognition
Indian Sign Language
recognition
Yun et al.
2012
Hariharan et
al.
2011
Multi-feature
fusion
Orientation filter
Random hand gestures
recognition
Hand gesture recognition
for Bharatanatyam dance
Localized contour
sequence
Human-computer
interaction application
Feature point
extraction method
Indian Sign Language
recognition
2013
Ghosh and Ari 2011
Rajam and
Balakrishnan
2010
High recognition
rate in both bright
and dark
environments
93.3%.
Recognition
accuracy reduce
when fingers are
bent
91%
The algorithm is
stable and has
small error in
experimental
99.6 %
98.125%
3. Methodology
3.1 Proposed Framework
Figure 3.1. Framework of the proposed method
Figure 3.2 Thisula collection of dataset
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
94
Figure 3.1 shows a proposed framework. As can be seen, it is divided into three major phases: preprocessing, feature extraction and classification.
3.2 Dataset Collection
Figure 3.2 are captured from ten different performers aged between 20 and 33 years. They are from
Temple of Fine Arts Academy and UTAR. It consists of 28 classes and there are 20 single hand
gestures for each class. The background of the images is standardized to white concrete wall and all
the images are taken using IPhone 4 with 5 megapixel iSight camera.
3.3 Skin Color Detection
The step of the skin color detection in this paper is as shown in Figure 3.3.
Figure 3.3 Skin Color Detection
3.4 Feature Extraction
The input data is transformed into a reduced representation set of features or feature vectors
(Russel, 2013) when the input data to an algorithm is too large to be processed and is suspected to
be notoriously redundant (much data, but not much information). Discrete Wavelet Transform
(DWT) and Discrete Cosine Transform (DCT) have been applied for feature extraction process. The
feature extracted is the binary object features. In order to extract the feature, a separate binary
image is created. The pixel in the region of interest is assign as 1 and everything else is 0. The
projection can be computed by summing all the pixels along the rows and columns of image. The
horizontal projection, hi(r) and vertical projection, vi(r) can be defined as in Equation 3.1 and 3.2
respectively.
(3.1)
(3.2)
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
95
The 2 Dimensional Discrete Wavelet Transform (2D-DWT) produces coefficient values with the same
as the original image. The implementation of 2 Dimensional Discrete Cosine Transform (2D-DCT) is
suggested for the next step. The extracted coefficients are used to represent the image for
classification.
3.4.1 Discrete Wavelet Transform (DWT)
Discrete Wavelet Transform (DWT) is a high-level feature extraction technique. The basic idea about
DWT is to provide the time-frequency representation and use to transform one function
representation into another. It performs simultaneous representation of an image in different
resolution levels, which is also known as multi-resolution analysis. The 2D-DWT represents an image
in term of a set of shifted and dilated wavelet functions, and scaling functions that form an
orthonormal basis for L2 (R2). Given a J-scale DWT, an image x(s,t) of NxN decomposed as in Equation
3.3. with Equation 3.4.
𝑁 βˆ’1
𝑁 βˆ’1
𝑗
𝑗
π‘₯(𝑠, 𝑑) = βˆ‘π‘˜,𝑖=0 𝑒𝐽,π‘˜,𝑖 Φ𝐿𝐿𝐽,π‘˜,𝑖 (𝑠, 𝑑) + βˆ‘π΅βˆˆπ΅ βˆ‘π‘—=1 βˆ‘π‘˜,𝑖=0 𝑀 𝐡𝐽,π‘˜,𝑖 Ξ¨ 𝐡𝑗,π‘˜,𝑖 (𝑠, 𝑑) (3.3)
𝑗
𝛷 𝐿𝐿𝐽,π‘˜,𝑖 (𝑠, 𝑑) ≑ 2βˆ’2 𝛷(2βˆ’π‘— 𝑠 βˆ’ π‘˜, 2βˆ’π‘— 𝑑 βˆ’ 𝑖), 𝛹 𝐡𝑗,π‘˜,𝑖 (𝑠, 𝑑), 𝛹 𝐡𝑗,π‘˜,𝑖 (𝑠, 𝑑) ≑ 2 βˆ’ 𝑗2𝛹 𝐡𝑗,π‘˜,𝑖 (2βˆ’π‘— 𝑠 βˆ’ π‘˜, 2βˆ’π‘— 𝑑 βˆ’ 𝑖, 𝐡 ∈
𝐡, 𝐡(3.4)
LL3
HL3
LH3
HH3
HL2
HL1
LH2
HH2
LH1
HH1
Figure 3.4. Joint spatial and frequency
representation of three levels of 2D DWT
Figure 3.5 Output of DWT - two decompositions
L and H represent Low and High frequency bands respectively and label 1, 2 and 3 represents the
decomposition level. LL is the upper left quadrant consists of all coefficients. HL and LH is the lower
left and upper right bands respectively. The rows and columns are filtered accordingly. HH is the
lower right quadrant, which is derived analogously to the upper left quadrant but with the use of the
analysis high pass filter, which belongs to the given wavelet. The images are transformed into their
respective coefficients that separate the vertical, horizontal and diagonal sub-bands.
The original image is first filtered using high pass filter (HPF) and low pass filter (LPF) on each row.
The image resulting from HPF and LPF is considered as L1 and H1 respectively. Next, they are
combined into A1, where A1 = [L1, H1]. Then, A1 is down sampled by 2 and passed through HPF and
LPF on each column. The output is L2 and H2 and its combination in A2 = [L2, H2]. 2 to get
compressed image down sample A2. This compressed image is obtained using one level of
decomposition. In order to get more compression ratio, the steps above should be repeated
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
96
depending on the number of decomposition level required. Figure 3.5 shows the result of DWT with
two decompositions.
3.4.2 Discrete Cosine Transform (DCT)
In this paper, 2 Dimensional DCT (2D-DCT) is applied in order to separate the image into parts or subbands of differing importance. It is utilized for data reduction or compression. Data compression is
used to reduce the amount of data that needed to be stored before it will be sent to classifier. DCT
has the ability to pack the most information in fewest coefficients. The general equation for a 2DDCT is defined as in Equation 3.5.
1
2 2
𝑁
1
2 2
𝑀
𝛱.𝑒
𝐹(𝑒, 𝑣) = ( ) ( ) 𝑖 = 0𝑁 βˆ’ 1𝑗 = 0𝑀 βˆ’ 1Ξ›(𝑖). Ξ›(𝑗). cos [
2.𝑁
(2𝑖 + 1)] π‘π‘œπ‘  [
𝛱.𝑣
2.𝑀
(2𝑗 + 1)] . 𝑓(𝑖, 𝑗)(3.5)
Where, N by M is the input image, is the intensity of the pixel in row i and j and is the DCT coefficient
in row and column of the DCT matrix. The left top corner of the DCT appears the most images, much
of the signal energy lies at low frequencies. The larger the number of coefficient gets wiped out as
the middle and high frequencies will be ignored because the value is often small enough. Hence, the
compression is achieved and the low frequency DCT coefficients are then selected as features.
Figure 3.6 Example of wavelet coefficients representation.
Figure 3.6(b) shows that the compression is achieved after DCT is applied. The middle and high
frequencies is ignored, resulting in the elimination of some wavelet coefficients in Figure 3.6(a).
3.4.3 Combination of DWT and DCT Features
35 features have been chosen from the final level DWT approximation coefficients of the input
image. The first 35 highest energy, of DCT coefficients from each axis acceleration data, are
extracted and selected as activity features. For each image, the selected features of DWT and DCT
are combined. Finally, 70 features from each image are selected for training in the next phase. The
combination of DWT and DCT is generally illustrated in Figure 3.7.
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
97
Figure 3.7. Combination of DWT and DCT coefficients
3.5 Neuro Fuzzy Classifier
The final step of our proposed methodology is classification. (Ying, 2013) is individually observed and
analyzed into a set of quantifiable properties. Neuro fuzzy classifier refers to combination of artificial
neural network and fuzzy logic system. An illustration of the neuro fuzzy classifier is as shown in
Figure 3.8. by Sun and Jang(1993).
Figure 3.8. An Illustration of Neuro fuzzy classifier
Figure 3.8 demonstrates the neuro fuzzy classifier framework with two input variable, x1 and x2. The
training data is categorized into three classes, C1, C2 and C3. In this paper, an alternative adaptive
neuro fuzzy classifier is proposed. The rule weights and parameter optimization is manipulated in
our proposed neuro-fuzzy based classifier. To initialize the fuzzy rules k-means algorithm is used.
Gaussian membership function is used for fuzzy set descriptions only because of its simple derivative
expressions. The number of rule samples adapts the rule weights. The SCG is used because it is faster
than the steepest descent and some second order derivative methods and suitable for large scale
problem (Cetişli & Barkana, 2010).After the DWT and DCT are applied, the coefficient matrix is
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
98
converted to vector using zigzag order. The number of the first coefficient in the vector is selected.
The vector belongs to each image and it is divided into training and testing set in neuro fuzzy
classifier.
4. Results and Discussion
4.1 Evaluation of Proposed Single Hand Gesture Recognition System
In order to validate the proposed method, a set of images, which consist of 6 classes. It includes
hand gesture of Bramharam, Chaturam, Hamsapakshakam, Kapitham, Trupathakam and Trisolam.
Table 4.0 shows the description of hand gesture classes that have been chosen for the evaluation.
4.2 Experiment Results
The output of the proposed approach is the image of predicted hand gesture, predicted class and
the meaning of the predicted hand gesture. Figure 4.1 output shows that both image 1 and image 21
is correctly classified and Figure 4.2(a) shows that image 2 that belongs to Class 1 is incorrectly
classified into Class 2. Figure 4.2(b) misclassified image 19 into Class 5. In addition, the overall result
of the classification is plotted in a graph as shown in Figure 4.3
Table 4.0 Description of hand gesture classes for system evaluation
Class
Name
Image
Meaning
1
Bramharam
An auspicious occasion or festival
2
Chaturam
Breaking into pieces
3
Hamsapakshakam
Breaking into pieces
4
Kapitham
Dispersing water of the river
5
Trupathakam
Trident / Knot
6
Trisolam
Milking Cows / Grasping the end of the robes
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
99
Figure 4.1. Example of correctly classified image: a) image 1 in Class 1; b) image 21 in Class 3.
Figure 4.2. Example of misclassified image: a) image 2 in Class 2; b) image 19 in Class 5.
Figure 4.3. Overall result of classification task
Figure 4.3 shows the overall result of classification task done by the system. Image 2 which should be
classified in Class 1 is misclassified into Class 2. Similarly, image 19 and 20 in Class 2 are classified
into Class 5 and image 39 of Class 4 is wrongly classified into Class 2. Meanwhile, all images in Class
3, 5 and 6 are correctly classified into their own class. The proposed approach accuracy is calculated
using the precision formula. Table 4.1 shows the result analysis in tabular form consisting of number
of sample images in testing set, number of correctly classified and misclassified images.
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
100
Table 4.1 Tabular form of result analysis of the proposed system
No. of sample
No. of correctly
classified image (tp)
No. of misclassified
image (fp)
60
56
4
From the analysis above, it is concluded that our proposed system is able to recognize single hand
gesture in Bharatanatyam dance with accuracy of 93%. 56 out of 60 samples tested are classified
correctly by the system. The misclassified image of hand gesture might be due to several reasons
such as ill background and foreground separation and lighting condition.
Table 4.2. Comparison applied in proposed system
Figure 4.4 Comparison of performance
From Table 4.2 and Figure 4.4, it is shown that our proposed method which applied hand gesture
alone, DWT, DCT and 70 features gives the highest accuracy for the recognition with the percentage
of 93%. However, when the binary image is used, the accuracy drops to 75%. The second highest
accuracy goes to the combination of DWT, DCT and 70 features with 88%. The manipulation of
number of features to 80 and 400 decrease the system accuracy to 83% and 73% respectively.
Meanwhile, the accuracy of the system is only 55% when direct PCA is utilized. Combination of DWT,
DCT, 70 features and PCA gives the least accurate result, which is 45%. The comparison proves that
the best number of features used is 70 since it gives the best result. High number of features reduces
the accuracy of the recognition system and utilizing PCA does not provide any significant result to
the system. High number of features causes redundancy and gives negative effect on the learning
models. It significantly decreases the performance of the system.
5. Conclusion
As a conclusion, hand gesture recognition has been improved over the past few years. This is greatly
driven by the fact that hand gesture recognition can serve in some applications, which apply
computer vision technology. This study proposed image-processing algorithms with the aim to
recognize and classify the single hand gesture in Bharanatyam dance. The developed system is able
to fill the research gap and the performance of the proposed method is encouraging.
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
101
5.1 Limitation
The limitation of this research is related to the robustness of the proposed system. In this study, the
environment of the images in the sample set is controlled and the background color is standardized
to white. It is concerned that the accuracy of the system might change if the image in the sample set
exhibits other variations such as color noise, uneven illumination, corrupted by shadows and
occlusion.
5.2 Future Works
For future works, there are some recommendations can be taken into consideration such as:
a. It is recommended to carry out an experiment to test the robustness of the system where
images with complex background and other variations are provided in the sample set.
b. In this study, we only train a relatively small set of training images, which is only 60 images.
It is suggested to apply the method on larger training dataset for future works.
c. Implementation of other well-known classifiers such as ANN, KNN and SVM is highly
suggested for future works as they might offer higher rate of recognition and more robust.
Acknowledgement
This paper is under scholarship of the University Tunku Abdul Rahman University.
References
Chaudhary A, Raheja JL, Das K, Raheja S (2011) Intelligent approaches to interact with machines
using hand gesture recognition in natural way: a survey. Int J Comput Sci Eng Survey (IJCSES)
2(1):122–133
Corera, S., & Krishnarajah, N. (2011). Capturing hand gesture movement: a survey on tools
techniques and logical considerations. Proceedings of chi sparks.
Feng, K.-p., & Yuan, F. (2013). Static hand gesture recognition based on HOG characters and support
vector machines. Instrumentation and Measurement, Sensor Network and Automation (IMSNA),
2013 2nd International Symposium on.
Ghosh, D. K., & Ari, S. (2011). A static hand gesture recognition algorithm using k-mean based radial
basis function neural network. Information, Communications and Signal Processing (ICICS) 2011 8th
International Conference on.
Hariharan, D., Acharya, T., & Mitra, S. (2011). Recognizing hand gestures of a dancer. Pattern
recognition and machine intelligence (pp. 186-192): Springer.
Mozarkar, S., & Warnekar, C. (2013). Recognizing Bharatnatyam Mud Recognizing Bharatnatyam
Mudra Using Principles of Gesture Recognition. International Journal of Computer Science and
Network 2(4), 7.
Priyal, S. P., & Bora, P. K. (2010). A study on static hand gesture recognition using moments. Signal
Processing and Communications (SPCOM), 2010 International Conference on.
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
102
Rajam, P. S., & Balakrishnan, G. (2010). Indian sign language recognition system to aid deaf-dumb
people. Computing Communication and Networking Technologies (ICCCNT), 2010 International
Conference on.
Rautaray, S. S., & Agrawal, A. (2012). Vision based hand gesture recognition for human computer
interaction: a survey. Artificial Intelligence Review, 1-54.
Saha, S., Ghosh, L., Konar, A., & Janarthanan, R. (2013(b)). Fuzzy L Membership Function Based Hand
Gesture Recognition for Bharatanatyam Dance. Computational Intelligence and Communication
Networks (CICN), 2013 5th International Conference on.
Saha, S., Ghosh, S., Konar, A., & Nagar, A. K. (2013(a)). Gesture Recognition from Indian Classical
Dance Using Kinect Sensor. Computational Intelligence, Communication Systems and Networks
(CICSyN), 2013 Fifth International Conference on.
Shangeetha, R. K., Valliammai, V., & Padmavathi, S. (2012, 14-15 Dec. 2012). Computer vision based
approach for Indian Sign Language character recognition. Machine Vision and Image Processing
(MVIP), 2012 International Conference on.
Vieriu, R.-L., Mironica, I., & Goras, B.-T. (2013). Background invariant static hand gesture recognition
based on Hidden Markov Models. Signals, Circuits and Systems (ISSCS), 2013 International
Symposium on.
Wachs, J. P., Kölsch, M., Stern, H., & Edan, Y. (2011). Vision-based hand-gesture
applications. Communications of the ACM, 54(2), 60-71.
Yun, L., Lifeng, Z., & Shujun, Z. (2012). A Hand Gesture Recognition Method Based on Multi-Feature
Fusion and Template Matching. Procedia Engineering, 29, 1678-1684.
E-Proceeding of the International Conference on Social Science Research, ICSSR 2015
(e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia.
Organized by http://WorldConferences.net
103