Full Issue in PDF
Transcription
Full Issue in PDF
Journal of Multimedia ISSN 1796-2048 Volume 9, Number 2, February 2014 Contents REGULAR PAPERS Beef Marbling Image Segmentation Based on Homomorphic Filtering Bin Pang, Xiao Sun, Deying Liu, and Kunjie Chen 189 Semantic Ontology Method of Learning Resource based on the Approximate Subgraph Isomorphism Zhang Lili and Jinghua Ding 196 Trains Trouble Shooting Based on Wavelet Analysis and Joint Selection Feature Classifier Yu Bo, Jia Limin, Ji Changxu, Lin Shuai, and Yun Lifen 207 Massive Medical Images Retrieval System Based on Hadoop YAO Qing-An, ZHENG Hong, XU Zhong-Yu, WU Qiong, LI Zi-Wei, and Yun Lifen 216 Kinetic Model for a Spherical Rolling Robot with Soft Shell in a Beeline Motion Zhang Sheng, Fang Xiang, Zhou Shouqiang, and Du Kai 223 Coherence Research of Audio-Visual Cross-Modal Based on HHT Xiaojun Zhu, Jingxian Hu, and Xiao Ma 230 Object Recognition Algorithm Utilizing Graph Cuts Based Image Segmentation Zhaofeng Li and Xiaoyan Feng 238 Semi-Supervised Learning Based Social Image Semantic Mining Algorithm AO Guangwu and SHEN Minggang 245 Research on License Plate Recognition Algorithm based on Support Vector Machine Dong ZhengHao and FengXin 253 Adaptive Super-Resolution Image Reconstruction Algorithm of Neighborhood Embedding Based on Nonlocal Similarity Junfang Tang and Xiandan Xu 261 An Image Classification Algorithm Based on Bag of Visual Words and Multi-kernel Learning LOU Xiong-wei, HUANG De-cai, FAN Lu-ming, and XU Ai-jun 269 Clustering Files with Extended File Attributes in Metadata Lin Han, Hao Huang, Changsheng Xie, and Wei Wang 278 Method of Batik Simulation Based on Interpolation Subdivisions Jian Lv, Weijie Pan, and Zhenghong Liu 286 Research on Saliency Prior Based Image Processing Algorithm Yin Zhouping and Zhang Hongmei 294 A Novel Target-Objected Visual Saliency Detection Model in Optical Satellite Images Xiaoguang Cui, Yanqing Wang, and Yuan Tian 302 A Unified and Flexible Framework of Imperfect Debugging Dependent SRGMs with Testing-Effort Ce Zhang, Gang Cui, Hongwei Liu, Fanchao Meng, and Shixiong Wu 310 A Web-based Virtual Reality Simulation of Mounting Machine Lan Li 318 Improved Extraction Algorithm of Outside Dividing Lines in Watershed Segmentation Based on PSO Algorithm for Froth Image of Coal Flotation Mu-ling TIAN and Jie-ming Yang 325 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 189 Beef Marbling Image Segmentation Based on Homomorphic Filtering Bin Pang, Xiao Sun, Deying Liu, and Kunjie Chen* College of Engineering, Nanjing Agricultural University, Nanjing 210031, China *Corresponding author, Email: [email protected] Abstract—In order to reduce the influence of uneven illumination and reflect light for beef accurate segmentation, a beef marbling segmentation method based on homomorphic filtering was introduced. Aiming at the beef rib-eye region images in the frequency domain, homomorphic filter was used for enhancing gray, R, G and B 4 chroma images. Then the impact of high frequency /low frequency gain factors on the accuracy of beef marbling segmentation was investigated. Appropriate values of gain factors were determined by the error rate of beef marbling segmentation, and the results of error rate were analyzed comparing to the results without homomorphic filtering. The experimental results show that the error rates of beef marbling segmentation was remarkably reduced with low frequency gain factor of 0.6 and high frequency gain factor of 1.425; Compared with other chroma images, the average error rate (5.38%) of marbling segmentation in G chroma image was lowest; Compared to the result without homomorphic filtering, the average error rate in G chroma image has decreased by 3.73%. Index Terms—Beef; Marbling; Homomorphic Filter; Image Segmentation I. INTRODUCTION Beef color, marbling and surface texture are key factors used by trained expert graders to classify beef quality [1]. Of all factors, the beef marbling score is regarded as the most important indicator [2]. The Ministry of Agriculture of the People's Republic of China has defined four grades of beef marbling and correspondingly published standard marbling score photographs. Referring to the standard photographs, graders determine the abundance of intramuscular fat in rib-eye muscle and then label the marbling score [3]. Since the classification of beef marbling score largely depends on the subjective visual sensory of graders, the estimation on the same beef region may differ. Therefore, developing an objective system of beef marbling grading independent on subjective estimation is imperative in beef industry. Beef marbling, which is an important evaluation indicator in the existing beef quality classification criteria, is usually determined by the abundance of intramuscular fat in beef rib-eye region. Machine vision and image processing technology are considered as the most effective methods in automatic identification of beef marbling grades [4]. In automatic identification, the first © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.189-195 thing is to precisely segment beef marbling. Numerous methods for beef marbling image segmentation have been reported in the past 20 years. For the first time, Ref. [5] segments the image of beef rib-eye section into fat and muscle areas by image processing, and then calculates the total area of fat, and obtains the relationship between fat area and the sensory evaluation results of beef quality. Ref. [3] proposes a beef marbling image segmentation method based on grader's vision thresholds and automatic thresholding to correctly separate the fat flecks from the muscle in the rib-eye region and then compares the proposed segmentation method to prior algorithms. Ref. [6] proposes an algorithm for automatic beef marbling segmentation according to the marbling features and color characteristics, which uses simple thresholding to remove background and then uses clustering and thresholding with contrast enhancement via a customized grayscale to remove marbling. And the algorithm is adapted to different environments of image acquisition. Due to complex and changeable beef marbling, no clear boundary can be discerned between muscle and fat areas. Therefore, marbling can hardly be precisely segmented. The results of Ref. [7] show that fuzzy c-mean (FCM) algorithm functioned well in the segmentation of beef marbling image with high robustness. On this basis, Ref. [8] uses a sequence of image processing algorithm to estimate the content of intramuscular fat in beef longissimus dorsi and then uses a kernel fuzzy c-means clustering (KFCM) method to segment the beef image into lean, fat, and background. Ref. [9] presents a fast modified FCM algorithm for beef marbling segmentation, suggesting that FCM is highly effective. Ref. [10, 11] introduces a kind of method to segment the area of longissimus dorsi and marbling from rib-eye image by using morphology filter, dilation, erosion and logical operation. Ref. [12] uses computer image processing technologies to segment the lean tissue region from beef rib-eye cross-section image and to extract color features of each image, and then uses BP neural network to predict the color grade of beef lean tissue. Ref. [13, 16] establish a kind of predicting models for beef marbling grading, indicating that beef marbling grades could be determined by using fractal dimension and image processing method. Ref. [14] developed a beef image online acquisition system according to the requirements of beef automatic grading industry. In order to reduce the calculating time of the system, only Cr chroma image are 190 considered to extract the effective rib-eye region by using image processing methods. Ref. [15] uses machine vision and support vector machine (SVM) to determine color scores of beef fat. And the fat is separated from the ribeye by using a sequence of image processing algorithms, boundary tracking, thresholding and morphological operation, etc. Then twelve features of fat color are used as inputs to train SVM classifiers. As machine vision technology aims to objectively assess marbling grades, a machine vision system will first collect the entire rib-eye muscle image of a beef sample. Then the sample image can be segmented into exclusively marbling region and rib-eye region images with the image processing algorithm. As a result, marbling features can be computed according to the processed images, which are more prone to objectively and consistently determine beef marbling grading compared with visual sensory. However, in collection of beef rib-eye images, the unfavorable light and acquisition conditions will unavoidably cause problems, such as overall darkness, local shadow, and local reflection, which increase the difficulty in subsequent marbling segmentation and reduce the segmentation precision. Homomorphic filtering is a special method that is often used to remove multiplicative noise. Illumination and reflectance are not separable, but their approximate locations in the frequency domain may be located. Since illumination and reflectance combine multiplicatively, the components are made additive by taking the logarithm of the image intensity, so that these multiplicative components of the image can be separated linearly in the frequency domain. Illumination variations can be thought of as a multiplicative noise, and can be reduced by filtering in the log domain. To make the illumination of an image more even, the high-frequency components are increased and low-frequency components are decreased, because the high-frequency components are assumed to represent mostly the reflectance in the scene (the amount of light reflected off the object in the scene), whereas the low-frequency components are assumed to represent mostly the illumination in the scene. That is, high-pass filtering is used to suppress low frequencies and amplify high frequencies, in the log-intensity domain. As a result, the uneven illumination of color images can be effectively corrected [17-25]. In this paper, homomorphic filtering is used to correct the non-uniform illumination in the beef rib-eye region, and thereby the effects of filtering gain factors and 4 chroma images on marbling segmentation precision are analyzed. Based on this, a beef marbling segmentation method based on homomorphic filtering with G chroma image was introduced. This paper proposes an accurate beef marbling segmentation method based on homomorphic filtering theory, and the specific work is as follows: (a) Homomorphic filtering is a generalized technique for signal and image processing, involving a nonlinear mapping to a different domain in which linear filter techniques are applied, followed by mapping back to the original domain. Homomorphic filter is sometimes used © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 for image enhancement. It simultaneously normalizes the brightness across an image and increases contrast. In order to find out the optimal chroma image to extract beef marbling area accurately, homomorphic filtering in this paper is used respectively to enhance gray, R, G and B 4 chroma images in beef rib-eye region in the frequency domain and then the beef marbling areas are extracted by Otsu method. (b) Homomorphic filtering is used to correct the illumination and reflection variations of beef rib-eye images, which will affect the beef marbling extraction to some extent. In order to select appropriate high/low gain factor values of homomorphic filter to enhance the contrast ratio in the beef rib-eye region, the impact of high /low frequency gain factors on the accuracy of beef marbling segmentation is investigated. Corresponding to different high/low frequency gain factor values of homomorphic filter, the error rate curves of marbling segmentation in gray, R, G and B chroma images are plotted. Then the minimum error rate curves of the 4 chroma images are plotted and the trends of the minimum error rates corresponding to high/low frequency gain factors are discussed. (c) In order to achieve the optimal beef marbling segmentation effect, the segmentation error rates with different chroma images are analyzed and compared. The average values of high/low frequency gain factors are selected to segment marbling. Then the error rate results with homomorphic filtering are compared to those without homomorphic filtering. The rest of paper is organized as follows. The materials and proposed methods are presented in Section 2. Then the impact of homomorphic filter gain factors and different chroma images on the accuracy of beef marbling segmentation is discussed in Section 3. Finally, the conclusions are given in Section 4. II. PROPOSED METHOD Under natural illumination, 10 beef rib-eye images (640×480 pixels) were collected by using a Minolta Z1 digital camera and stored as JPG format in PC. The PC has a Pentium(R) Daul-Core CPU (basic frequency 2.6 GHz), a memory of 2.0 GB, and an operating system of Windows XP. Image processing and data analysis are performed on Matlab software. Before segmentation, preprocessing is needed to separate the rib-eye region for subsequent marbling segmentation. The separation includes threshold setting, regional growth, and morphological processing (details in Ref. [11]). Homomorphic filtering is used to correct the uneven illumination in beef images and thus reduce the effects of darkness and reflection on subsequent image processing. This provides a favorable foundation for accurate segmentation of beef marbling. The principle is as follows. In the illumination-reflection model, image f ( x, y) can be expressed as the product of the illumination component i( x, y) and the reflection component r ( x, y) : JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 f ( x, y) i( x, y) r ( x, y) 191 (1) where 0 i( x, y) , and 0 r ( x, y) 1 . First, the logarithm of f ( x, y) is obtained: z ( x, y) ln f ( x, y) (2) ln i( x, y) ln r ( x, y) By using Fourier transform, then F[ z( x, y)] F[ln i( x, y)] F[ln r ( x, y)] (3) Z (u, v) I (u, v) R(u, v) (4) or 0 ( g ( x, y) T ) g ( x, y) 255 ( g ( x, y) T ) The filter's transfer function H (u, v) is designed as: S (u, v) H (u, v)Z (u, v) (5) H (u, v) I (u, v) H (u, v) R(u, v) By using inverse Fourier transform on S (u, v) , then: s( x, y) F 1[ S (u, v)] F 1[ H (u, v) I (u, v)] F 1[ H (u, v) R(u, v)] high frequency gain factor; rL (0,1] is low frequency gain factor. Appropriate values of high/low gain factors should selected, so as to enhance the contrast ratio of the image in the beef rib-eye region, sharpen the image edges and details, and make marbling segmentation more effectively. The processed beef rib-eye images undergo gray-scale transformation; then the gray and R, G and B 4 chroma images undergo the above homomorphic filtering. Otsu automatic threshold method is used for dividing the ribeye region into the target (muscle) and the background (fat). With the optimal threshold, image g ( x, y) is binaryzed: (6) Let i( x, y) F 1[ H (u, v) I (u, v)] (7) r ( x, y) F 1[ H (u, v) R(u, v)] (8) and In order to evaluate the effects of beef marbling segmentation, the precision of segmentation should be analyzed. Marbling segmentation error rate Q is defined as the error ratio of pixel counts between the extracted marbling region after processing and the marbling region manually segmented from the original image [14]. The pixel count in the manually segmented marbling region is expressed as q( x, y) ; the pixel count in the extracted marbling region after processing is expressed as q( x, y) ; then the beef marbling extraction error rate is calculated as: Q Then equation (6) can be expressed as: s( x, y) i( x, y) r ( x, y) (9) Finally, because z ( x, y) is the logarithm of the original image f ( x, y) , the inverse operation (exponential) can be used to generate a satisfactory enhanced image, which can be expressed by g ( x, y) as: g ( x, y ) e s ( x , y ) e i( x , y ) e r ( x , y ) (10) i0 ( x, y )r0 ( x, y ) where i0 ( x, y) ei( x, y ) (11) r0 ( x, y) er ( x, y ) (12) | q( x, y) q( x, y ) | q( x, y) 100% 2 ( u , v ))/ D02 ] rL III. RESULTS AND DISCUSSION A. Beef Marbling Extraction Based on Homomorphic filtering One image (Fig. 1) is randomly selected from the collected beef images. After preprocessing as described in Section 2, the rib-eye image is obtained (Fig. 2). Then the rib-eye image undergoes gray-scale transformation (Fig. 3) for homomorphic filtering with different frequency gain factors, and the rib-eye image is showed in Fig. 4. (13) where D0 (u, v) is cut-off frequency, D(u, v) is the frequency at point (u, v) ; c is a constant; rH (0, ) is © 2014 ACADEMY PUBLISHER (15) Manual segmentation is performed on Photoshop. The pixel count in the marbling region is summarized. In order to reduce manual extraction error, each image is repeated 3 times to obtain the average value, which is used as the marbling pixel count by deleting decimal part. are the illumination component and reflection component of the output image respectively. A Gaussian high-pass filter H (u, v) is selected as the homomorphic filter's function: H (u, v) (rH rL )[1 ec ( D (14) Figure 1. Original beef sample image 192 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Figure 2. Beef rib-eye image (a) R chroma image (b) G chroma image (c) B chroma image Figure 3. Rib-eye gray image As showed in Fig. 2 and Fig. 3, because of light insufficiency, the rib-eye image lacks brightness, so the contrast between marbling and muscle is small and some tiny marbling is unclear. After homomorphic filtering, the brightness is improved (Fig. 4a), especially the edges are sharpened, so the tiny marbling fragments are enhanced. However, when different values of rL and rH are used, the filtering effects are different. When a small gain factor is used, the image brightness is too large, while the contrast between marbling and muscle is significantly reduced (Fig. 4b), which is unfavorable for subsequent segmentation. When a large gain factor is used, the high frequency part will be excessively enhanced, so the brightness decreases (Fig. 4c), which is also unfavorable for subsequent segmentation. Therefore, appropriate values of gain factors should be selected to improve beef marbling segmentation precision. (a) rL=0.8, rH=1.2 (b) rL=0.2, rH=0.2 (c) rL=0.9, rH=1.8 (d) Figure 4. Rib-eye gray image © 2014 ACADEMY PUBLISHER Gray chroma image Figure 5. Rib-eye gray image JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 B. Selection of Homomorphic Filtering Gain Factors and Their Effects on Beef Marbling Segmentation Precision Homomorphic filtering is used to correct the illumination and reflection components of rib-eye images, which will affect the beef marbling segmentation to some extent. Appropriate values of homomorphic filtering gain factors rL and rH are selected, so as to enhance the contrast ratio in the beef rib-eye region. One image is selected from the 10 images, then the ribeye region is segmented as described in Section 2; different values of rL and rH are selected to construct different filters. Then the gray, R, G and B 4 chroma images undergo homomorphic filtering separately. Finally, the marbling is extracted as described in Section 2 and the error rates are calculated as described in Section 2. The results are listed in Fig. 5. Fig. 5 shows that when rL is constant, the beef marbling extraction error rates in the 4 chroma images all slowly decrease firstly and then sharply increase with the increasing rH . Each beef marbling segmentation error rate curve corresponding to each value of rL shows a minimum error rate. For instance, in gray chroma image, when rL =0.4 and rH =0.8, the beef marbling error rate is minimized to 0.08%. Then the minimum error rates of the 4 chroma images under both rL and rH are used for obtaining the changing curves (Fig. 6 and Fig. 7). 193 Fig. 6 shows that with the increase of rL , the minimum error rate firstly decreases and then increases, and concentrates within rL =0.4-0.8. Fig. 7 shows that with the increase of rH , the minimum error rate also firstly decreases and then increases, and concentrates within rH =0.8-1.8. Specifically, for gray chroma image, the minimum error rate is 0.08% when rL =0.4 and rH =0.8; for R chroma image, the minimum error rate is 0.05% when rL =0.6 and rH =1.7; for G throma image, the minimum error rate is 0.27% when rL =0.7 and rH =1.4; for B chroma image, the minimum error rate is 0.64% when rL =0.7 and rH =1.8. C. Analysis and Comparison of Marbling Segmentation Error Rates Based on Homomorphic Filtering The above analysis shows that within rL =0.4-0.8, and rH =0.8-1.8, the gray, R, G and B chroma images after homomorphic filtering show the minimum error rates, and therefore, rL and rH are arithmetically averaged to rL =0.6 and rH=1.425. The 10 images are preprocessed as described in Section 2 to segment the beef rib-eye regions; then a homomorphic filter with rL of 0.6 and rH of 1.425 is used for filtering the gray, R, G and B 4 chroma image and thereby for segmenting marbling area. Finally equation (15) is used to calculate the error rates of the 4 chroma images for each beef image, and the results are listed in Table 1. TABLE I. ERROR RATE IN BEEF MARBLING SEGMENTATION WITH HOMOMORPHIC FILTERING Figure 6. Effects of low frequency gain factor on minimum error rate in beef marbling segmentation Image No. 1 2 3 4 5 6 7 8 9 10 Mean TABLE II. Image No. 1 2 3 4 5 6 7 8 9 10 Mean Figure 7. Effects of high frequency gain factor on minimum error rate in beef marbling segmentation © 2014 ACADEMY PUBLISHER Chroma Image Gray R 10.97 16.62 10.24 21.05 3.38 13.86 6.56 17.82 1.41 13.29 15.02 25.26 6.85 22.38 12.77 17.42 4.45 16.12 8.48 18.69 8.01 18.25 G 6.91 0.40 7.44 4.46 10.27 4.95 5.82 9.25 1.71 2.56 5.38 B 15.59 14.41 2.97 10.91 5.99 18.97 12.57 9.86 10.36 20.03 12.17 ERROR RATE IN BEEF MARBLING SEGMENTATION WITHOUT HOMOMORPHIC FILTERING Chroma Image Gray R 11.71 19.87 20.53 23.29 13.30 10.83 22.47 27.82 12.39 16.48 14.11 16.37 17.41 22.56 12.99 14.78 9.67 15.96 14.82 20.16 14.94 18.81 G 8.82 7.12 5.65 14.46 12.72 9.53 6.98 14.32 6.61 4.84 9.11 B 15.29 15.13 14.27 20.91 10.63 22.12 16.94 13.57 18.59 23.79 17.12 Table 1 shows that after homomorphic filtering, the error rates of all the 4 chroma images are different. The 194 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 minimum error rate is from G chroma images, which is 5.38%, significantly lower than gray chroma image (8.01%), R chroma image (18.25%) and B chroma image (12.17%), indicating that G image can be used to obtain the optimal segmentation effect. Table 2 shows the error rates of beef marbling extraction without homomorphic filtering (only with Otsu method). Table 2 shows that without homomorphic filtering, the minimum error rate is also from G chroma image (9.11%), significantly lower than the average error rate of gray, R, or B images. However, the error rates without homomorphic filtering are all higher than those with homomorphic filtering. The average error rate in G chroma images is 3.73% higher than that after filtering, indicating that the beef marbling error rate decreases significantly after homomorphic filtering. IV. [5] [6] [7] [8] [9] CONCLUSIONS (1) After homomorphic filtering, beef rib-eye images are improved and much tiny marbling is enhanced. Appropriate values of frequency gain factors should be selected, which is favorable for precise segmentation of beef marbling. (2) High/low frequency gain factors both significantly affect the error rate of beef marbling segmentation. With the increase of either factor, the minimum error rate firstly decreases and then increases. When high frequency gain factor rH is within 0.8-1.8, and when low frequency gain factor rL is within 0.4-0.8, the beef marbling error rate could get the minimum value. (3) rL =0.6 and rH =1.425 are selected to build a homomorphic filter to process the beef rib-eye images; the minimum error rate is from G chroma images, which is 5.38%, about 3.73% lower than that without homomorphic filtering. This indicates that with this gain factor, G images after homomorphic filtering can achieve the optimal beef marbling segmentation effect. [10] [11] [12] [13] [14] [15] ACKNOWLEDGMENT This work was supported by the National Science Foundation of China under Grant No.31071565 and the Funding of the Research Program of China Public Industry under Grant No.201303083. [16] [17] REFERENCES [1] P. Jackman, D. W. Sun, et al, “Prediction of beef eating quality from colour, marbling and wavelet texture features,” Meat Science, vol. 80, no. 4, pp. 1273-1281, 2008. [2] Y. N. Shen, S. H. Kim, et al, “Proteome analysis of bovine longissimus dorsi muscle associated with the marbling score,” Asian-Australasian Journal of Animal Sciences, vol. 25, no. 8, pp. 1083-1088, 2012. [3] K. Chen, C. Qin, “Segmentation of beef marbling based on vision threshold,” Computers and Electronics in Agriculture, vol. 62, no. 2, pp. 223-230, 2008. [4] K. Chen, C. Ji, “Research on Techniques for Automated Beef Steak Grading,” Transactions of the Chinese Society © 2014 ACADEMY PUBLISHER [18] [19] [20] [21] of Agricultural Machinery, vol. 37, no. 3, pp. 153-156, 159, 2006. T. P. Mcdonald, Y. R. Chen, “Separating connected muscle tissues in images of beef carcass ribeyes,” Transactions of the Asae, vol. 33, no. 6, pp. 2059-2065, 1990. P. Jackma, D. W. Sun, P. Allen, “Automatic segmentation of beef longissimus dorsi muscle and marbling by an adaptable algorithm,” Meat Science, vol. 83, no. 2, pp. 187-194, 2009. J. Subbiah, N. Ray, G. A. Kranzler, S. T. Acton, “Computer vision segmentation of the longissimus dorsi for beef quality grading,” Transactions of the ASAE, vol. 47, no. 4, pp. 1261-1268, 2004. C. J. Du, D. W. Sun, et al, “Development of a hybrid image processing algorithm for automatic evaluation of intramuscular fat content in beef M-longissimus dorsi,” Meat Science, vol. 80, no. 4, pp. 1231-1237, 2004. J. Qiu, M. Shen, et al. “Beef marbling extraction based on modified fuzzy C-means clustering algorithm,” Transactions of the Chinese Society of Agricultural Machinery, vol. 41, no. 8, pp. 184-188, 2010. J. Zhao, M. Liu and H. Zhang, “Segmentation of longissimus dorsi and marbling in ribeye imaging based on mathematical morphology,” Transactions of the Chinese Society of Agricultural Engineering, vol. 20, no. 1, pp. 143-146, 2004. K. Chen, C. Qin and C. Ji, “Segmentation Methods Used in Rib-eye Image of Beef Carcass,” Transactions of the Chinese Society of Agricultural Machinery, vol. 37, no. 6, pp. 155-158, 2006. K. Chen, X. Sun and Q. Lu, “Automatic color grading of beef lean tissue based on BP neural network and computer vision,” Transactions of the Chinese Society for Agricultural Machinery, vol. 40, no. 4, pp. 173-178, 2009. K. Chen, G. Wu, M. Yu and D. Liu, “Prediction model of beef marbling grades based on fractal dimension and image features,” Transactions of the Chinese Society for Agricultural Machinery, vol. 43, no. 5, pp. 147-151, 2012. B. Pang, X. Sun and D. Liu, “On-line Acquisition and Real-time Segmentation System of Beef Rib-eye Image,” Transactions of the Chinese Society of Agricultural Machinery, vol. 44, no. 6, pp. 190-193, 2013. K. Chen, X. Sun, C. Qin, X. Ting, “Color grading of beef fat by using computer vision and support vector machine,” Computers and Electronics in Agriculture, vol. 70, no. 1, pp. 27-32, 2010. K. Chen, “Determination of the box-counting fractal dimension and information fractal dimension of beef marbling, ” Transactions of the Chinese Society of Agricultural Engineering, vol. 23, no. 7, pp. 145-149, 2007. X. Zhang, S. Hu, “Video segmentation algorithm based on homomorphic filtering inhibiting illumination changes,” Pattern Recognition and Artificial Intelligence, vol. 26, no. 1, pp. 99-105, 2013. Z. Jiao, B. Xu, “Color image illumination compensation based on homomorphic filtering,” Journal of Optoelectronics Laser, vol. 21, no. 4, pp. 602-605, 2010. X. Wang, F. Hu and Y. Zhao, “Corner extraction based on homomorphic filter,” Computer Engineering, vol. 32, no. 11, pp. 211-212, 264, 2006. J. Xiao, S. Song, and L. Ding, “Research on the fast algorithm of spatial homomorphic filtering, ” Journal of Image and Graphics, vol. 13, no. 12, pp. 2302-2306, 2008. Z. Jiao, B. Xu, “Color image illumination compensation based on HSV transform and homomorphic filtering,” Computer Engineering and Applications, vol. 46, no. 30, pp. 142-144, 2010. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 [22] J. Xiong , X. Zou, H. Wang, H. Peng, M. Zhu and G. Lin, “Recognition of ripe litchi in different illumination conditions based on Retinex image enhancement,” Transactions of the Chinese Society of Agricultural Engineering, vol. 29, no. 12, pp. 170-178, 2013. [23] J. Li, X. Rao and Y. Ying, “Detection of navel surface defects based on illumination—reflectance model,” Transactions of the Chinese Society of Agricultural Engineering, vol. 27, no. 7, pp. 338-342, 2011. © 2014 ACADEMY PUBLISHER 195 [24] J. Qian, X. Yang, X. Wu, Chen Meixiang and Wu Baoguo, “Mature apple recognition based on hybird color space in natural scene,” Transactions of the Chinese Society of Agricultural Engineering, vol. 28, no. 17, pp. 137-142, 2012. [25] J. Tu, C. Liu, Y. Li, J. Zhou and J. Yuan, “Apple recognition method based on illumination invariant graph,” Transactions of the Chinese Society of Agricultural Engineering, vol. 26, no. 2, pp. 26-31, 2010. 196 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Semantic Ontology Method of Learning Resource based on the Approximate Subgraph Isomorphism Zhang Lili College English Teaching & Researching Department, Qiqihar University Qiqihar Heilongjiang 161006 China Jinghua Ding College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea Email: [email protected] Abstract—Digital learning resource ontology is often based on different specification building. It is hard to find resources by linguistic ontology matching method. The existing structural matching method fails to solve the problem of calculation of structural similarity well. For the heterogeneity problem among learning resource ontology, an algorithm is presented based on subgraph approximate isomorphism. First of all, we can preprocess the resource of clustering algorithm through the semantic analysis, then describe the ontology by the directed graph and calculate the similarity, and finally judge the semantic relations through calculating and analyzing different resource between the ontology of different learning resource to achieve semantic compatibility or mapping of ontology. This method is an extension of existing methods in ontology matching. Under the comprehensive application of features such as edit distance and hierarchical relations, the similarity of graph structures between two ontologies is calculated. And, the ontology matching is determined on the condition of subgraph approximate isomorphism based on the alternately mapping of nodes and arcs in the describing graphs of ontologies. An example is used to demonstrate this ontology matching process and the time complexity is analyzed to explain its effectiveness. Index Terms—Digital Learning; Ontology Matching; Digital Resource Ontology; Graph Similarity I. INTRODUCTION In the 1990s, the development of computer network and multimedia technology provides the education development with new energy. education mode, method, scope experience astonishing change and global excellent education resources sharing and communication is realized. The mode of education supported by the computer network technology is often referred to as digital learning [1]. However, because the Internet is a highly open, heterogeneous and distributed information space, and the real meaning is hard to be understood when we use URL technology to search the learning resources, target learning resources are often submerged in a large number of useless redundancy information, so the digital learning resources cannot be found efficiently. © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.196-206 To strengthen information semantic characteristics, the inventor of the URL technology Tim Berners-lee proposed to represent mutual recognized commonly and shared knowledge through Ontology and give strict definition of the concept and the relations between concepts to determine the meaning of the concept [2].The digital learning supported by ontology technique describes the learning resources according to the learning resource metadata standards, establishing a learning resource ontology, apply similarity calculation and matching of ontology to support digital learning resource discovery, which can prevent the learners from losing direction in network learning environment and improve the learning efficiency and accuracy. Similarity is two the basic condition of digital learning resources ontology matching, however, in the present digital learning environment, learning resource ontology often consists of different creators who apply different data specification, modeling method and the technology to create, learning resource ontology of the same topic in a field of often differ greatly, which has a direct impact on the efficiency of digital learning resources discovery. how to effectively solve the matching problem of heterogeneous ontology learning resources, or the ontology matching in the semantic Web, is a challenge the digital learning is facing. At present, the domestic and foreign scholars have proposed many ontology matching methods, mainly based on linguistics, the structure, the instance, and so on, and developed all kinds of ontology matching tools, such as the ONION created by American Stanford university, GLUE [4] created by American Washington University, FAOM created by German Karlsruhe University and so on. Among them, the PROMPT is based on the linguistics, GLUE and QOM is based on machine learning methods. However, when we apply the existing ontology matching methods to learning resource ontology matching, it still have problems as follows: (1) It is difficult for the method based on linguistic to solve the problem of learning resource ontology matching. The reason: the current learning resource ontology metadata standards and specifications JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 are different, such as LOM proposed by the Learning Technology Standards Committee (LTSC) subordinate to IEEE, Dublin core metadata set (DCMS) proposed by Online Computer Library Center (OCLC), LRM released by IMS Global Learning Consortium, etc. Different metadata specification determines different learning resource ontology description language. It is hard to define scientific semantic distance and can't solve the problem of learning resource ontology matching. (2) The existing structural matching method can't meet the demand of learning resource ontology matching. The existing ontology matching methods based on the structure mostly focus only on the hierarchical structure of the ontology itself, and pay little attention to other relations’ influence on ontology matching. Digital learning resources ontology matching should consider the similarity of the overall structure made up of all kinds of relationships, so we cannot use the tree structure similarity matching methods. (3) The matching methods based on the example are limited by the complexity, computing performance, correctness, optimization problems of the machine learning technology and the effectiveness in the practical ontology matching application need to be tested, so it is not a learning resource ontology matching scheme that can be used as the optimization technology. (4) The extracted sentences of multi-document summarization usually come from different documents. It is necessary to sort the extracted sentences to improve the readability of the summarization. The available ways to sort the extracted sentences are most methods [2] [3], [12] time sorting method [2] [3], probability sorting method [4], machine learning method [6] [7] [9], and their improved algorithm, etc. Most sorting method gets the order of the theme according to the successive relationship, so it is easy to interrupt sentence topic; The time information extracted by using the time sorting method is not necessarily accurate; Probability sorting method is likely to lead to imbalance of the subject; Machine learning method is comparatively complex to realize in the process of sorting and rely heavily on training corpus; Subsequent improved algorithm makes some difference in improving the abstract readability. In this type of ontology matching technology, the extract of structure similar characteristic set extraction and similarity calculation is one of the key elements. To extract the information of different structure feature, we use different similarity measure and calculation methods. For example, SF (Similarity Flooding) [8] structure matching method does not consider the pattern information and judge the ontology matching based on the transitivity of graph nodes’ similarity, namely: if two elements’ adjacency nodes in different mode are similar, so does the two similar elements’ node. The structure of the matching period of Cupid [9], the leaf nodes similarities depend on similarity of linguistics, data types and neighboring nodes, the non-leaf nodes similarities are got by calculating similarity with the similarity of the subtree of its root. In the Anchor - PROMPT [10], the ontology is seen as a directed labeled graph, with fixed © 2014 ACADEMY PUBLISHER 197 length of anchors path as structural characteristics of extraction and subgraph path limited by anchors through traverse. Semantic similarity was represented by tagging node similar values in the same location. In the ASCO, nodes’ adjacent relations and concept hierarchy paths are extracted as ontology structural characteristic. Structural similarity is similar proportion in adjacent structure and path in measurement and calculation and get the weighted sum then. In the above ontology matching methods based on structure, the similarity propagation of ontology structural characteristic is an important factor to judge matching, but the present methods rely on the similarity of adjacent nodes too much in the calculation of structural similarity. Similarity propagation usually requires traversing the total graph, with large amounts of calculation and blindness. It needs further study in depth. Research on Ontology Matching: now, many universities and research institutions at home and abroad have studied in this area and invented a lot of tools. The ontology mapping based on semantic Web is the key technology of ontology study. It is the basis of completing ontology finding, aligning, learning and capturing. Ontology mapping and merging tools has been developed abroad, such as PROMPT, Cupid, Similarity Flooding, GLUE, etc.. They measure the similarity of terminology of concepts from different angles. There are element level, structural level and instance level, etc., but the following problems are still existed: (1) versatility is not high: These tools are mostly of the more obvious effects for ontology of specific area or different versions, and if it replaced with ontology of other areas, the effect is not very obvious; (2) It is difficult to ensure the effectiveness and efficiency of mapping: In order to obtain a more accurate similarity, calculation method will be more. In this way, the efficiency is bound to be affected, so a balance point between the effectiveness and efficiency in mapping need to be found; (3) calculation method is not comprehensive enough : While the existing calculation methods can reflect the similarity of physical layer, semantic network layer, description logical layer and etc., there are no similarity calculation standards for the presentation layer and the rule layer at present because that the restriction and rule of ontology still don't have mature theory; (4) automatic level is not high : now, most methods still in semi-automatic mode. After the mapping is calculated, the same ontology may be involved in a number of physical mapping. Due to the deficiencies of the existing calculation method, the mapping with the highest similarity is not necessarily accurate, which requires users to manually select the choice and decide the result. The innovation points of this paper: (1) Digital learning resource ontology is often based on different specification building. It is hard to find resources by linguistic ontology matching method. The existing structural matching method fails to solve the problem of calculation of structural similarity well. After studying and analyzing of existing ontology matching methods, this paper puts forward a method of digital learning resources ontology matching. The method in 198 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Ontology matching methods Structure level Element level Particle size Sring-based name similarity Description Similarity Comment Synonyms Languagebased Tokenization Lemmatisation Morphology elimination Linguistic Resources Lexicons thesaurus Constrain t -based Type Similarity Key properties Alignment Reuse Entire Schema or Ontology fragment Upper level Domain Specific Ontologies SUMO DOLCE UMLS FMA Data analysis and statistics Frequency distribution Griph-based Graph Homomorphi Sm Path Children leaves Taxonomy -based Taxonomy structure Modelbased SAT Solvers DL reasoners Repository of structures Structure metadata The basic technology The term structure The semantic epitaxial Input type Figure 1. Method for classification of ontology matching comprehensive concept of edit distance and similarity based hierarchical architecture and other relations, alternates point of ontology of directed graph, edge matching, thus to determine approximate subgraph isomorphism ontology matching. As the judgment standard, the method to structure the overall similarity helps strengthen the efficiency of digital learning resources ontology matching, improve the ability of resource discovery, found efficient similarity subgraph, improve the precision and efficiency of ontology matching. (2) In view of the two difficulties that the subject is interrupted and the extracted sentences are incoherent, this paper analyses the application of clustering algorithm of the latent semantic analysis in sentence sorting in order to improve the quality of the generation of summarizations. We use the clustering algorithm of latent semantic analysis to cluster the extracted sentence to a topic set, achieving the goal of solving the topic interrupt. Through calculating the ability of exhibition of the document, we will pick out the best document as a template, and then make a twice sorting of the extracted sentence according to the template. II. ONTOLOGY MATCHING METHOD AND FRAME Digital learning resources ontology matching is the key technology to find the mapping relationship between different learning resources, which plays an important supportive role in the retrieval, integration and reuse and so on of digital learning resource ontology. Foreign scholars began to research on ontology matching since the 90 s and have formed many famous ontology matching systems. On ontology matching method, document [6] summarizes the classification map of ontology matching methods as shown in the figure 1 according to the information granularity and type of input at matching. Among them, element level refers to the information of the single entity based on the ontology © 2014 ACADEMY PUBLISHER without considering the correlation between entities while structure level refers to take the information of each entity of the ontology as a whole structure. On the matching technology, there are: (1) Based on the matching technology of character string, the writing style of the ontology is handled as the character string. We use the string matching method to calculate the similarity between ontology texts, use edit distance to measure similarity between strings S1 and S2.The formula is: SimEdit ( S1 , S2 ) Among them, S1 Max(| S1 , S2 ) operi i Max(| S1 , S2 ) and S 2 (1) are separately is the length of the character string S1 and S2, iperi means insert, delete, replace, and character exchange operation, etc. (2) The matching technology based on the upper ontology or field ontology. The upper ontology has nothing to do with field. It can be used as the external knowledge with a common recognition and to discover the semantic relations among await matching ontology. There are common upper ontology such as C yc ontology, SUMO, DOLCE, etc. Field ontology includes common background knowledge. It can be used to eliminate the phenomenon of polysemy, such as FMA, UMLS, OBO, in the field of biomedical, etc. (3) The matching technology based on the structure. Usually, the ontology is represented as a tree hierarchy structure or directed labeled graph structure. Similarity measure is calculated with the help of Tversky model or the structural relations of objects. In general, ontology matching system architecture based on the similarity can be summarized as the following figure 2. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 199 Interactive interface Match the controller Similarity calculation method 1 Pretreatment of ontology and parsing Ontology A Ontology B Similar seats combination Matches the stored Matching results Similarity calculation method 2 Matching extraction Match the tuning Similarity calculation method n Figure 2. Similarity based on ontology matching system architecture diagram III. matrix of the extracted sentence: A ' aij' PROPOSED SCHEME A. The Semantic Analysis of Clustering Algorithm According to the size of the corpus, the vector of document clustering is often high-dimensional. It is a sparse matrix and just estimate the frequency of words. Sometimes, it can't depict the semantic association between the words. The synonyms are liable to reduce the clustering accuracy. YuHui [13] put forward a kind of document clustering algorithm based on improving latent semantic analysis. This paper use the document clustering as a source of reference, try to reduce the size of the clustering granularity, regarding the extracted sentences as miniature document and using clustering algorithm of latent semantic analysis to make a topic cluster of selected extracted sentence collection. This article make a word segmentation processing firstly to remove related stop words, trying to reduce the space dimension and reduce the complexity of calculation. Multiplying the contribution factor of word distribution when characteristics are extracted in order to describe words characteristics better. If P is the probability distribution of the extracted sentences including characteristic words in each document collection, then the entropy I ( pi ) of the words’ distribution can be calculated by the following formula: k I ( X ) P( xi ) *log P( xi ) i 1 The weight of characteristic words can be calculated according to the following formula: weight (i, j ) (1 log(tfi , j )) *log( N / dfi 5) *log( 1 5) I ( pi ) 0.8 In this paper, constructed word – matrix of the extracted sentence: A = aij mn , aij means the ith word’s appearance weight in the jth document. The word corresponds to the matrix while the extracted sentence corresponds to the matrix column. Turning aij into log( aij +1), then divided by its entropy, so we can take consideration to context, getting a new word – m n log(aij 1) formula, aij' aij aij log a l j aij ij l j l j Making the latent semantic analysis to the new word matrix of the extracted sentence A' , this paper uses the singular value decomposition algorithm for dimension reduction and exchange the characteristic space transformation, so that we can get k rank approximate matrix Ak .Specific means is: as for the equivalent formula A'n*m U n*n * Dn*m *Vm' *m , we’d get k rank at beginning after making a descending sort k rank to singular value, replacing A' with Ak approximately and converting the characteristic space to strengthen the semantic relations between word and the extracted sentence. For the set of the extracted sentence D {d1 , d2 , dn } , set of word W {w1 , w2 , wm } and the k rank approximation matrix after singular value decomposition, aij represent the weight value of different words in the extracted sentence d i ; Behind the probability p(di , wj ) p(di )* p(wj | di ) lies the latent semantic space Z {z1 , z2 , zk } . Assuming that the word – the extracted sentence is of conditional independence and the distribution of the latent semantic on extracted sentence or words is of conditional independence, then conditional probability formula of the word – the extracted sentence is as follows: k p( w j | di ) p( w j | zk ) p( zk | d i ) k 1 Than p(di , wj ) p(di )* p(wj | zk )* p( zk | di ) In the formula, p w j | zk is the distribution probability of latent semantic on word, the latent semantic can get visual representation through sorting p w j | zk . p zk | di is the distribution probability of the latent semantic in the extracted sentence. © 2014 ACADEMY PUBLISHER , among the 200 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Then the maximum expected EM algorithm is adopted to make latent semantic model fitting, implementing step E and step M alternately to make iterative calculation. Calculate the conditional probability in the step E: P ( w j | zk ) P ( zk | d i ) P( zk | d i , w j ) k P(w j | zl ) P( zl | di ) dotted line arc, the following figure 3 can be regarded as a description of the ontology. l 1 In step M, the calculation formula is as follows: n P ( w j | zk ) a(d , w ) P( z i i 1 m n j 1 i 1 j k | di , w j ) a(d , w ) P( z i j k | di , w j ) Figure 3. The directed graph representation of ontology m P ( zk | d i ) a(d , w ) P( z j 1 i j k | di , w j ) a(di ) Making iterative calculation of step E and step M, and stop it until raising range of expectation of likelihood function L is less than the threshold, so we can get a optimal solution as follows: n m k E ( L) a(di , w j ) P( zl | di , w j ) log[ P(w j | zk ) P( zk | di )] i 1 j 1 2) Similarity Ontology semantic similarity is an important index of similarity, such as the edit distance of concept, the distance of node base, the similarity of probability and structure of examples. Field scholars have proposed many semantic similarity calculation method, such as edit distance calculation as it shows in the above formula (1), so no more explanation. The calculation formula base distance among nodes is as follows: l 1 After clustering the extracted sentence, we will get the topic collection. In each topic, there are all closely connected extracted sentence in semantic. B. Graph Representation and Similarity of the Ontology 1) The Representation of the Directed Graph of Ontology There are a lot of formalized definition of ontology. We’d like to adopt the definition of document [12] in this paper. Definition 1: Ontology can be defined as five-element group, among them, C is the concept set, I is the instance set, P is the concept set of attribute, H c is the set of hierarchical relationships among concepts, R is the set of the other relations among concepts, A0 is ontology axiom set. For r R , the domain of definition and range are separately recorded as r.dom , r.ran : r r.dom {ci | ci C ci } r , r.dom {c j | c j C c c j } . Definition 2: If the directed labeled graph of ontology O (C, I , P, Hc, R, A0 ) is represented as G(O) (V , E, LV , LE , , ) , among them: 1) The node set V C , the edge set E V V ; 2) : V LV is the mapping function from node set to node tag set; 3) : E LE is the mapping from edge set to edge marking set. For example, when : V LV is assigned to the concept of ontology for the node, : E LE is assigned to the hierarchical relationships among concepts for solid arc, and the R relations among concepts for the © 2014 ACADEMY PUBLISHER Dist ( A, B) 1 2m n1 n2 (2) Among them, n1 , n2 are separately the number of node A in ontology O1 , and node B in ontology O2 , m is the number of overlapping word. Probability similarity of instance can be represented as: Sim( A, B) P( A B) P( A, B) P( A B) P( A, B) P( A, B) P( A, B) (3) Among them, P( A, B) is the probability of the instance that belongs to concept A and B at the same time, P( A, B) is the probability of instance that belongs to concept B but not concept A, and P( A, B) is the probability of the instance of concept A but not concept B. Based on the structure of ontology matching, map matching is a NP complete problem It is difficult to directly use the application of graph structure matching to solve ontology matching, so this kind of method is often achieved through calculate and match the similarity of ontology structure. The general guiding ideology is: to speculate the elements’ similarity through the similarity of the adjacent elements in the graph. In the other word, if the adjacent nodes of a node are similar, then the nodes are similar. The core is the similarity spread. The two most typical ontology matching algorithm based on structure, SF and GMO, its core idea is: the concepts with similar concept of parent/child may be similar and concepts with similar attribute. Among them, the similarity propagation of the Similarity of Flooding algorithm Similarity just considers the spread to adjacent nodes of matched concept while GMO is the similarity spread to overall situation. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 201 Based on the ontology matching of approximate subgraph isomorphism Step 1 Anchor point selection and graph extraction Step 2 Similarity computation and communication Step 3 Approximate subgraph isomorphism Step 4 Based on the approximate isomorphism subgraph ontology matching Ontology A 1.1 the candidate anchors Ontology PDF diagram Ontology B 2.1 structure similarity propagation graph 1.2 the anchor filtering 2.2 structural similarity calculation 1.3 based on the anchor subgraph extraction 2.3 structure similarity 2.4 to extract candidate approximate isomorphism subgraph 3.1 subgraph isomorphism approximation is calculated 4.1 based on the approximate isomorphism subgraph integrated ontology similarity calculation 3.2 approximate subgraph isomorphism 4.2 based on the approximate isomorphism subgraph of ontology matching Ontology A Ontology B Figure 4. Ontology matching based on the approximate subgraph isomorphism B. Learning Resource Ontology Matching Problem Ontology matching is an effective way to solve ontology heterogeneity of digital learning resources. It judges the semantic relations through calculating and analyzing the similarity among different learning resource ontology to achieve semantic compatibility or mapping of ontology. In matching granularity there are matching of concept - concept, attribute- attribute, concept-attribute, and so on. To two ontology A and B, as for the concept in A, we can find a corresponding concept that share the same or similar semantic in B. As for the concept in B, we can do it, too. So A and B is the concept-concept matching. In this paper, the matching of digital learning resource ontology refers to the process of discovering of the whole semantic corresponding among different entities (concept, attribute, relation and so on). Making a description as: Definition 3: The Ontology Matching of digital learning resource is a semantic correspondence, represented as four-element groups: Among them, e1 , e2 are separately entities(concept, attribute, instance, axiom and so on) of ontology A and B, rel {, , , } is the collection of semantic relations among entities, ( , , , respectively refers to inclusion, non-inclusion, independence and equivalent of semantic, sim [0,1] is a semantic equivalent degree measurement in the entities. 1) Ontology Matching Method based on Approximate Subgraph Isomorphism The overall framework of e - Learning resource Ontology Matching method (SIOM) based on approximate Subgraph Isomorphism is shown in figure 4. The figure shows that SIOM is a sequential adapter, mainly including four steps: anchor selection and graph extraction, similarity calculation of graph structure, judgment of approximate subgraph isomorphism and ontology matching based on similar isomorphism subgraph. 2) Anchor Selection and Graph Extraction The anchor, in this article, refers to match the first pair of similar concepts that can be sure between candidate ontology A, B, presenting in the directed labeled graph of © 2014 ACADEMY PUBLISHER ontology as the first pair of determined matching node. The definition is as follows: Defining 4: (Anchor) provides two candidate matching ontology A and B, and the corresponding graph structure are respectively is G( A) , G( B) , If there is a node y CB for the node x CA in G( A) , then: OM ( x, y) , namely: concept x can match concept y (1) I ( x) I A , P( x) PA , Hc( x) HcA , R( x) RA , AA0 ( x) AA0 , (2) I ( y) I B , P( y) PB , Hc( y) HcB , R( y) RB , AB0 ( y) AB0 , there is OM ( I ( x), I ( y)) OM ( P( x), P( y)) OM ( R( x), R( y)) OM ( Hc( x), Hc( y)) OM ( AA0 ( x), AB0 ( y)) (4) So we call x, y is a pair of anchor of A,B, while x and y is the anchor concept. According to the different location of anchors in hierarchical structure of ontology, there are 9 situations as follows: x and y were the root node G( A) , G( B) ; x as the intermediate node G( A) , y as the root node in the G( B) ; x is the root node in the G( A) , y as the intermediate node G( B) ; x as the intermediate node G( A) , y as the root node in the G( B) ; x as the intermediate node G( A) , y is the root node in the G( B) ; x as the intermediate node G( A) , y as the leaf nodes of G( B) ; x as the leaf nodes of G( A) , y as the intermediate node G( B) ; x as the leaf nodes of G( A) , y as the root node in the G( B) ; x as the leaf nodes of G( A) , y as the leaf nodes of G ( B) . 202 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Defining 5: Provide an ontology O and x is the anchor concept of O, then the ontology derive from anchor can be represented as five-element group O x (C x , I X , P x , Hc x , R x , Ax0 ) , in which: (1) C x {c C | (c Hc x) ( x Hc c) (c R x) ( x R c)} is concept set. (2) P x {P {C x }} , I x {I {C x }} is Attribute set and instance set. (3) Hc x {Hc {C x }} is the set of hierarchy relationship between Concept. (4) R x {R {C x }} is the set of other relationship between concept. Reference 1: provide the ontology O and ontology O x derived from its anchor concept x . If the directed graph G(O) , G(O x ) is represented respectively as their corresponding graph structure representation, so there is: G(O x ) G(O) (5) Proof: We can learn that the inference 1 is right from definition1, 2, 7. Reference 2: To the ontology O and ontology O x derived from its anchor concept x , as for its directed graph representation G(O) , G(O x ) , there is: (1) If x is the root node of G( A) , then G(O x ) G(O) (2) X is not the root node of Se ( x, y) Sec ( x, y) Sep ( x, y) , is the weight adjustment coefficient, and 0 , 1 1 (2) The similarity of hierarchy relationship between nodes: provided that the in-degree set of hierarchy relationship of x in G( A) , the out-degree set of hierarchy relationship is xout {x j V ( A) | x Hc x j } , and the in-degree set and the out-degree set of hierarchy relationship in G( B) of the similar y separately is yin , yout , then the calculation formula of the similarity of hierarchy relationship is as follows: S Hc ( x, y ) xin yin xout yout xin xout yin yout xin xout yin (7) yout xin yin {x | x xin , y yin : Se ( x, y) OM ( x, y)} is the node set that can be matched in the father node which has hierarchy relationship with x, y. xout yout {x | x xout , y yout : Se ( x, y) OM ( x, y)} is the node set that can be matched in the son node which has hierarchy relationship with x, y. (3) The similarity of the other relations between nodes: we record the node sets that have relations with x, y as respectively : x R {x' V ( A) | r R A : ( x' r x) ( x r x' )} y R { y ' V ( B) | r R B : ( y ' r y) ( y r y ' )} G( A) , then G(O ) G(O) x If r1 R A , r2 R B , then (( x' r1 x) ( y ' r2 y) OM ( x ' , y ' )) In particular, when x is the leaf node of G( A) , G(O x ) degenerates to be a node in G(O) . Proof: According to the analysis of anchor concept’s location in hierarchical structure of ontology and reference 1,reference 2 is right. 3) The Calculation of Structural Similarity of the Directed Graph of Ontology For candidate ontology matching A,B and their directed graph representation G( A) , G( B) , the similarity calculation of G( A) and G( B) consist of four parts: (1) the similarity of node edit distance; (2) similarity of hierarchical relationships between nodes; (3) the similarity of other relationships between nodes; (4) the similarity of graph structure. Details are as follows: (1) The similarity calculation of edit distance: it is get through comprehensive calculation of concept similarity and attribute similarity represented by node. The specific method is as follows: provided that x and y respectively is the node in G( A) , G( B) , Sec ( x, y) is the edit distance of concept of x and y, and 2 | p | A B pP P Sep ( x, y) A S ( p( x), p( x)) is edit distance | P | | PB | p of the common attribute of x and y. We use the formula (1) to calculate, so the formula of similarity calculation between node x and y is as follows: © 2014 ACADEMY PUBLISHER (6) (( x r1 x ' ) ( y r2 y ' ) OM ( x ' , y ' )) (8) We record the node set satisfying the formula 10 as x y R , with the help of weight adjustment coefficient, then the formula of the similarity of other relations between nodes can be shown as: The weight adjustment parameter i , i satisfy R 0 i , i 1 ri 1 i 1 i i S R ( x, y ) xr y r i rR A rR A R B xr y r ( i i ) i rR B xr y r xr y r xr y r (9) xr y r The similarity of graph structure: the candidate ontology A, B and its directed graph is a pair of anchor of A, B, then the formula of similarity between the directed graph G( x) , G( y ) of ontology derived from x and y can be shown as : S (G( x), G( y)) Se ( x, y) SHc ( x, y) SR ( x, y) (10) 1 is weight adjustment coefficient JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Ontology matching algorithm based on approximate subgraph isomorphism Definition 6: if there is one-to-one correspondence between the point and point, the edge and the edge of the directed graph G and G ' , and the correspondent point and the correspondent edge keep the same relation, then we call G and G ' is isomorphism, recorded as G G' . Because it’s difficult to achieve strict one-to-one correspondence in the ontology matching in general, we can judge the match as long as the similarity in ontology satisfy the threshold. It’s why the paper propose the concept of approximate isomorphism of the ontology graph structure Definition 7: we provide the tag ontology A and candidate matching ontology B. Its directed graph representation is G( A) , G( B) . If (1) For the root node a of G( A) , there is a node b in and a, b is a pair of anchor of A,B (2) For G( A) and the directed graph G( Bb ) derived from ontology b , there is V ( A) V ( Bb ), E ( A) E( Bb ) ; x V ( A) : y V ( Bb ) : OM ( x, y) ; e E( A) : e' E( Bb ) : OM (e, e' ) To the setting matching threshold , there is S (G( A), G( Bb )) Then we call A and B is approximate graph isomorphism, recorded as G( A) G( B) TABLE I. SHOWS THE PSEUDO-CODE DESCRIPTION OF MAIN OPERATION OF ALGORITHM Algorithm OM ( A, B) Input: A, B, G( B) , G( A) , a, Output: Y or N for each node n anchor (a, b) generate B b ; get G( Bb ) from G( B) b b node-add ( N a , N B ); arc-add ( E a , E B ) while N a do x N a : select y N B s. t. Se ( x, y) e b For each arc e E a related to node x For arc e E b related to node y in E ( Bb ) map( x y) ; Calculate SHc ( x, y) , SR ( x, y) Calculate S (G( x), G( y)) Generate subgraph G x ( A) , G y ( Bb ) Test= DAI (G x ( A), G y ( Bb ) If N a then OM ( A, B) T else OM ( A, B) F end Based on the approximate subgraph isomorphism, the main idea of SIOM algorithm is: according to the breadth of the graph, we traverse the sequence at first. After deciding anchor node of matching, we can achieve alternate matching between graph nodes based on the © 2014 ACADEMY PUBLISHER 203 in-degree and out-degree of node and a subgraph that has the approximate isomorphism subgraph with G( A) in candidate matching ontology graph G( B) . The key step are mainly: at first, making sure anchor node b corresponding to the root node a of G( A) in G( B) ; then, anchor that generates B derives ontology B b and the directed graph representation G( Bb ) ; next, making the judgment of approximate isomorphism of graph between G( A) and G( Bb ) . If both satisfy the approximate isomorphism relations, then A and B are match. Otherwise, iterate the above process to meet requirements of convergence. IV. THE REPRESENTATION AND ANALYSIS OF LEARNING RESOURCE ONTOLOGY A. The Ontology of Digital Learning Resource Take the course ontology construction for example, we’d like to illustrate the constituent elements of digital learning resources ontology. Usually, a lesson contains many elements such as knowledge point, exercises, cases, answering questions,. Among them, knowledge refers to basic unit that decomposes the course and constitutes a logical independence of learning resource according to the course syllabus. According to the practical teaching experience and the learning rule, the relations between knowledge point are mainly: Pre/suc relationship: if we must learn knowledge point B before learning knowledge point A, then A is the precursor of B and B is the successor of A . Include-of relation : if the knowledge point A is constituted by the knowledge point of smaller size of particle A1 , A2 ,..., then A1 , A2 ... themselves are Logical unit that can also be used independently and there is include-of relations between A and A1 , A2 ... As for knowledge point A, if A contains other knowledge points, then we call A as the compound knowledge point; If A does not contain knowledge point of smaller granularity, then we call A as meta-knowledge point. Particularly, if A and B have exactly the same precursor/subsequent knowledge points and their content are completely consistent, we think that A and B are equivalent. Related to relation: if the knowledge points A and B contains the knowledge point C at the same time, then A and B have related- to relations. Quoted -of relations: the contents of the knowledge A involves the contents of knowledge point B, but A and B do not belong to the same field, then there are Quoted -of relationship between A and B. In the above relations, pre/sucsequent relationship, include-of relationship and quoted -of relationship have transitivity; Related- to relationship have symmetry and reflexivity. In addition, the traditional Instance-of relationship and Attribution-of relationship are also adopted to ontology of knowledge points. According to the above analysis, we give a definition of ontology of knowledge points as follows: 204 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 The DNS configuration example Contains the relationship Precursor relationship The subsequent relationship define function Correlation between the Reference relationship DNS assigned experimental The property relationship DNS is applied to network management experiment Instance relationship The DNS server configuration DNS common terms Knowledge point ID Algorithm number The algorithm name The theme Property list The content description Subordinate to the subject The difficulty coefficient The importance of The instance Precursor knowledge The subsequent knowledge Contains the knowledge The relevant knowledge Reference points Distributed DNS Number string String String Txt Txt String Float Float Object Ontology Ontology Ontology Ontology ontology Private DNS configuration experiment DNS domain name resolution DNS backup and restore test DNS lookups configuration Centralized DNS DNS resource allocation A forward lookup zone configuration A reverse lookup zone configuration DNS resource records format Figure 5. The ontology of knowledge point of DNS sever configuration Chapter five commonly used configuration server Contains the relationship Correlation between the The DNS server configuration The DHCP server configuration WWW server configuration E - mail server configuration The FTP server configuration knowledge Video server configuration The certificate server configuration Figure 6. The configuration of the ontology model of learning resource of common server Definition 8: a knowledge ontology (KO) can be represented as a 7-element group: figure 6 shows the framework of correspondent learning resource ontology. KO(name) id , name, define, function, content, includedKO, RKO (11) B. The Representation of Ontology Matching Process of Knowledge Point: To represent conveniently, we simplify the ontology of knowledge point as it shows in figure 5, and then abstract to be the directed graph as it shows in 7, recording as tag ontology Q . The mark number in node represent attribute. The mark number on the directed are represent the requirement of other relations in node. We give a candidate ontology Q ' as it shows in figure 7. The first step of algorithm: selecting and matching a pair of node of anchor concept c, B , as it shows in following figure 8. And id , name , define , function , content , includedKO , RKO are respectively the number, name, definition, function, content description, relation set of KO of knowledge point. According to definition 8, we take the lesson “network management” for an example, we build correspondent ontology of knowledge point as it shows in figure 5. It is the knowledge point “DNS server configuration” in chapter5 from “network management from entry to master” written by Cui Beiliang and so on. Provided that the ontology of the knowledge point contained in the chapter is build on sound base, then © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 3 0 6 a 4 b 15 13 1 c 2 22 18 9 d 17 16 f 20 e 9 2 0 40 30 B 5 0 5 5 4 0 50 60 40 50 F graph representation of ontology Q ' , so we think that E 40 7 60 50 50 D 2 5 C 60 60 process until overall graph matching or no matching, the algorithm stops, as it shows in figure 8 In the above process, the graph representation of ontology Q achieved approximate isomorphism in the A 30 H J 40 2 5 6 a b 4 15 13 1 c 2 22 18 9 d 17 16 20 f e 9 2 0 5 0 7 12 2 0 c 22 d 18 60 9 20 e 60 H 5 0 D 13 4 0 7 9 40 2 5 3 0 40 5 5 50 2 5 60 50 J 2 5 40 H 3 0 (ⅱ) A 6 a b 15 12 22 18 60 9 20 40 C 30 13 B 50 22 50 55 D V. 60 E 50 40 60 f 7 40 F 16 40 17 50 20 J 25 40 H 30 (ⅲ) Figure 8. (i) The matching of the first pair of anchor concept (ii) Based on the c, B anchor Q and Q ' spanning graph (iii) The complete matching of Q and Q ' To begin with anchor concept node, we can generate the first subgraph of Q and Q ' in order, showing in figure. We can calculate and judge the matching of node, edge and structure of subgraph and achieve the first matching of Q and Q ' . In matched ontology subgraph, we’d like to match a pair of node of anchor concept. Repeating the above © 2014 ACADEMY PUBLISHER When the number of node and edge is respectively n , N , m , M , the scale of the amount of time of the main operation is n(n 1) N ( N 1) m M 2 2 As a result, the complexity of the algorithm time is O(n6 ) level. It is an effective algorithm. 25 50 17 16 9 20 60 d e 4 13 c 18 30 15 30 ontology Q ' and Q , the unit time needed for the main calculation is respectively: (1) The amount of time needed for the matching of the first pair of anchor node is n N (2) For the node x, y , the amount of time needed for the matching of the edge is E ( x) E ( y) (3) For the isomorphism judgment of subgraph G( x) and G( y ) and enjoys, the amount of time needed is CV2 (G ( x )) CV2 (G (Y )) E ( x) E ( y) E 50 40 50 F J E (Q' ) M , so in order to finish the matching of 3 0 40 60 f representation G(Q' ) of ontology Q ' , V (Q' ) N , 60 50 C 17 16 50 40 B 60 22 2 5 50 E A 13 18 5 5 15 30 b 15 C 40 50 4 a D ontology Q , V (Q) n , E (Q) m and in the graph 40 60 60 4 F 0 (ⅰ) 6 A 30 B ontology Q ' can match ontology Q . C. The Analysis of Algorithm Time Complexity In the description of the pseudo code given in the table 1, the scale of operation amount of approximate subgraph isomorphism of 3 layers of nested loop decides the time complexity of the algorithm. Provided that in the graph representation G(Q) of 3 0 Figure 7. 1) the representation of tag ontology Q 2) the correspondent representation of tag ontology B 3 0 30 205 CONCLUSION Digital learning resource ontology is often based on different specification building. It is hard to find resources by linguistic ontology matching method. The existing structural matching method fails to solve the problem of calculation of structural similarity well So the paper propose a kind of ontology matching method based on the subgraph isomorphism. It makes a alternate matching of the point and the edge in the directed graph of ontology representation based on calculating the overall similarity of graph structure to achieve ontology matching by the judgment of subgraph isomorphism. The method aims to find efficient approximate subgraph, improving the accuracy and efficiency of the ontology matching. ACKNOWLEDGEMENTS Thanks for the support of fund, which is the Study and Practice of the Targeted Public English Educational Pattern under the Concept of Outstanding Talents Education (G2012010681). 206 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 REFERENCES [1] Rifaieh R, Benharkat N A. Query-based data warehousing tool, 2002. Proc of the 5th ACM International Workshop on Data Warehousing and OLAP. New York: ACM, 2002 pp. 35-42. [2] S. ALIREZA HASHEMI GOLPAYEGANI, BAHARM EMAMIZADEH. "Designing work breakdown structures using modular neural networks". Decision Support Systems. 2007, 44(11) pp. 202-222. [3] A. K. M ZAHIDUL QUAIUM AHMED LATIFSHAHRIAR, M. ROKONUZZAMAN, "Process Centric Work Breakdown Structure of Software Coding for Improving Accuracy of Estimation, Resource Loading and Progress Monitoring of Code Development", Proceedings of 2009 12th International Conference on Computer and Information Technology (ICCIT 2009)21-23 December, Dhaka, Bangladesh, 2009 [4] NÚÑEZ S M, DE ANDRÉS SUÁREZ J, GAYO J E L, et al. A semantic based collaborative system for the interoperability of XBRL accounting information// Emerging technologies and information systems for the knowledge society. Springer Berlin Heidelberg, 2008 pp. 593-599. [5] GARCÍA R, GIL R. Publishing xbrl as linked open data// CEUR Workshop Proceedings. 2009, 538. [6] SPIES M. An ontology modelling perspective on business reporting. Information Systems, 2010, 35(4) pp. 404-416. [7] O'RIAIN S, CURRY E, HARTH A. XBRL and open data for global financial ecosystems: A linked data approach. International Journal of Accounting Information Systems, 2012, 13(2) pp. 141-162. [8] HODGE F D, KENNEDY J J, MAINES L A. Does search-facilitating technology improve the transparency of financial reporting. The Accounting Review, 2004, 79(3) pp. 687-703. [9] BARTLEY J, CHEN A Y S, TAYLOR E Z. A comparison of XBRL filings to corporate 10-Ks-Evidence from the voluntary filing program. Accounting Horizons, 2011, 25(2) pp. 227-245. [10] Barzily, R N. Elhadad, K. McKeown. Sentence Ordering in Multidocument Summarization. Human Language Technology Conference, Proceedings of the first international conference on Human language technology research. San Diego. 2001, pp. 79-82. [11] Barzily, R N. Elhadad, K. McKeown. Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research, 2002, 17(2) pp. 35-55. [12] Lapata, M. Probabilistic text structuring: experiments with sentence ordering. In proceedings of the annual meeting of ACL2003, 2003, pp. 545-552. [13] Naoaki Okazaki, Yutaka Matsuo, Mitsuru Ishizuka. Improving Chronological Sentence Ordering by © 2014 ACADEMY PUBLISHER [14] [15] [16] [17] [18] [19] [20] [21] [22] Precedence Relation. In Proc. 20th Internaional Conference on Computational Linguistics (COLING 04), Geneva, Swiss, August 2004, pp. 750-756. Danushka Bollegala, Naoaki Okazaki, Mitsuru Ishizuka. A bottom-up approach to sentence ordering for multi-document summarization. In Proceedings of ACL-COLING 2006. 2006, pp. 134-137. Danushka Bollegala, Naoaki Okazaki, Mitsuru Ishizuka. A Machine Learning Approach to Sentence Ordering for Multi Document. Proceedings of the Annual Meeting of the Association for Natural Language Processing. 2005, pp. 1381-1384. Zhuli Xie, Xin Li, Barbara Di Eugenio, Weimin Xiao, Thomas M. Tirpak and Peter C. Nelson Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization. In 20th International Conference on Computational Linguistics. 2004, pp. 1381-1384. ZHANG J, ACKERMAN M S, ADAMIC L. Expertise networks in online communities: Structure and algorithms// Proceedings of the 1 6th International World-wide Web Conference. 2007. ABDUL-RAHMAN A, HAILES S. Supporting trust in virtual communities// Proceedings of the Hawai’i International Conference on System Sciences. 2000. Bongwon Suh, Peter L. Pirolli, Finding Credible Information Sources in Social Networks Based on Content and Social Structure// Kevin R. Canini, 2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing, 2011 pp. 978-985. A. Ritter, C. Cherry, and B. Dolan. Unsupervised Modeling of Twitter Conversations. // NAACL, 2010. Alonso, Omar, Carson, Chad, Gerster, David, Ji, Xiang, and Nabar, Shubha. Detecting Uninteresting Content in Text Streams// SIGIR Crowdsourcing for Search Evaluation Workshop, 2010 GIRVAN M, NEWMAN M. Community structure in social and biological networks// National Academic Science. Vol. 99. 2002 pp. 7821-7826. Zhang Lili was born in Sichuan province of China at 2th May, 1976. He received his bachelor degree from Southwest Petroleum University, China in 2000, received his master degree in University of Electronic Science and Technology of China in 2008. Jinghua Ding is a graduating doctoral students in Sungkyunkwan University. He regularly reviews papers for some well-known journals and conferences. His research interests are in M2M communications, cloud computing, machine learning and wireless networks. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 207 Trains Trouble Shooting Based on Wavelet Analysis and Joint Selection Feature Classifier Yu Bo Beijing Jiaotong University, School of Traffic and Transportation, Beijing China Email: [email protected] Jia Limin*, Ji Changxu, and Lin Shuai Beijing Jiaotong University, State Key Laboratory of Rail Traffic Control and Safety Beijing, China *Corresponding author, Email: [email protected], [email protected], [email protected] Yun Lifen Mississippi State University, Civil and Environmental Engineering, Mississippi State, USA Email: [email protected] Abstract—According to urban train running status, this paper adjusts constraints, air spring and lateral damper components running status and vibration signals of vertical acceleration of the vehicle body, combined with characteristics of urban train operation, we build an optimized train operation adjustment model and put forward corresponding estimation method-- wavelet packet energy moment, for the train state. First, we analyze characteristics of the body vertical vibration, conduct wavelet packet decomposition of signals according to different conditions and different speeds, and reconstruct the band signal which with larger energy; we introduce the hybrid ideas of particle swarm algorithm, establish fault diagnosis model and use improved particle swarm algorithm to solve this model; the algorithm also gives specific steps for solution; then calculate features of each band wavelet packet energy moment. Changes of wavelet packet energy moment with different frequency bands reflect changes of the train operation state; finally, wavelet packet energy moments with different frequency band are composed as feature vector to support vector machines for fault identification. Index Terms—Wavelet Packet Energy Moments; Supporting Vector Machine; Train Operation Adjustment; Monitoring Data; Urban Trains I. INTRODUCTION With increased speed, the train running stability and comfort needs to be improved. When trains are running at high speed, the impact of track irregularities input will make train body produces sliding, rolling and shaking his head and will laterally accelerate through the body synthesis, affecting the lateral stability and reducing the comfort of the train. Lateral active and semi-active suspension transverse are often used to reduce lateral vibration. Therefore, the study of the relation between the train track irregularity and lateral vibration has important theoretical and practical value for improving lateral stability train and suspension damping effect and © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.207-215 estimating transform law of lateral vibration [1]. Train Operation Adjustment issues are non-linear multi-objective combinatorial optimization problem, which has been known as NP-hard [2] (Nondeterministic Polynomial Problem). As the urban rail transit train speeds increase, driving density increases, the train operation adjustment is also more complicated. Therefore, the study of adjustment method which fits the characteristics of urban rail transit train operation has significance for optimal operation in order to improve the quality of train operator. Domestic and foreign experts and scholars had done a lot of research on train operation adjustment, simulation, operations research, fuzzy decision making, expert systems and other methods have been applied in the solution process [3], and achieved certain results. Urban rail transit train operation adjustment as the core of the work vehicle dispatching, determines the merits of the train running order [4]. For vehicle acceleration track irregularity relationship between inputs, many researchers have been studied and achieved certain results. For example, the literature [5] studies of vehicle-orbit coupling system random vibration characteristics of the train based on the establishment of the vehicle-track vertical cross-coupling model, proving that lateral vibration signal energy is concentrated in 1 ~ 2Hz. Literature [6] uses the power spectral density to study the effects of the track level and direction of the irregularity to the vehicle random vibration, the results show that the train is mainly influenced by rail vehicle direction and the level of irregularity and the performance is low-frequency vibration. In order to extract the low-frequency signal track irregularity, the literature [7-9] use wavelet transform to analyze the track irregularity signal. Literature [10] uses wavelet transform method to process the signal of track irregularity and vertical acceleration collected by the comprehensive test car, and analyze a certain band to determine relationship between the track irregularity and the vertical acceleration. But there are 208 sliding lateral vibration, shaking head and rolling, while relative track irregularities inputs include level and direction, so we need to further explore relationship between the vibration components and input irregularity [11]. Meanwhile, the cross-correlation function reflects the relationship between signals, so we can combine the wavelet transformation and the cross-correlation function method. Thus, firstly use Simulink software to build 17 degrees of freedom transverse suspension model to produce vibration signals of sliding, rolling and shaking head, and then use wavelet transformation and cross-correlation function to analyze the relationship between this three kinds of vibration components and orbital level, direction and unsmooth inputs[12]. Sensors can monitor a large number of vibration data when the high-speed train is running, different train running status will show different characteristics of the data, the way based on characteristics of the monitoring data has important significance to characterize high-speed train security state and state estimation [13]. In recent years, many scholars have proposed some optimization algorithm to solve with train operation adjustment problems. Mainly using genetic optimization algorithm and particle swarm algorithm (PSO) [14]. Although the applicability of genetic algorithms is great, shortcomings exist in solving the optimal solution, such as complex coding process ,time-consuming, slow convergence and poor local search capability; because constraints of train running are many, the search space is great, standard particle groups convergence algorithm are susceptible to premature, it is difficult to obtain the optimal solution [15]. Based on the PSO algorithm, Angeline proposed hybrid particle swarm algorithm, an improved algorithm, which introduce the idea of hybrid of genetic algorithm to PSO algorithm, making the searching capacity of algorithm enhance and it is not easy to fall into local optimum. Therefore, it is urgent to propose a fast optimization method based on a hybrid particle swarm algorithm to solve the problem of urban rail transit train operation adjustment. Therefore, train fault diagnosis simulation, includes two key elements-feature selection and classifier design. Besides useful features, there are redundant features and useless features during extraction of train status feature set, which increase the learning time of classifiers and adversely affect the diagnostic results [16]. To this end, a number of trains troubleshooting feature selection algorithms are put forwarded, such as association rule selection algorithm, genetic algorithm, simulated annealing algorithm, particle swarm optimization and rough sets algorithm [17]. In addition to feature selection, the simulation results of train fault diagnosis are also associated with fault classifier. The current analog fault diagnosis model trains are mainly Bayesian network, K-nearest neighbor method, neural networks and support vector machines [18]. The neural network nonlinear approximation ability is superior, but the complexity of network structure is great, so it has defects such as it is easy to fall into local minimum value [19]. Least squares support vector machine classifier, LSSVM, better overcomes defects © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 such as the over-fitting of neural network, the slow standard SVM training and it is used broadly in simulated fault diagnosis. So we need to choose LSSVM as the classifier of train fault diagnosis, however the classification performance of LSSVM is closely related with the parameters, there are mainly genetic algorithms, simulated annealing algorithm, particle swarm optimization algorithm to select LSSVM parameters [20]. When the train is running, its key components [21] may be faulty, the vibration signal [22] of the sensor monitors as an information factor directly whether the operating state is normal or not. While the vibration signal is majorly nonlinear and non-stationary signals, and wavelet analysis has strong local analysis capabilities, and it have more significant advantages [23] compared to short time Fourier analysis and Fourier transform. Through expansion and translation of the wavelet function, time-frequency window can be adjusted according to the signal frequency, and continuation on wavelet packet decomposition has no further decomposition of the high-frequency band, improving frequency resolution [24]. The main innovations are the following: (a) Compared to the normal state, when critical components of train fails , the main frequency is changed, and the performance is that some bands energy increases, while some bands’ energy decreases, mapping relationship exists between the energy band and fault condition. Based on the monitoring data, evaluate the running status of high-speed train’s air springs and shock absorbers and other key components, aim at the body vertical acceleration vibration signal, this paper propose wavelet packet energy moments method which estimates the train state. Use the feature of the wavelet packet energy moments to extract the method, and use support vector machines for state estimation. Experimental results show that this method can extract the train key components of the initial fault characteristics and the fault recognition rate is high. (b) First, analyzes characteristics of the body vertical vibration, conduct wavelet packet decomposition for signals under the different conditions and different speeds and reconstruct the band signal with larger energy, and then calculate each band’s features of wavelet packet energy moment. Changes of wavelet packet energy moment of different frequency bands reflect changes of the train running. Compose wavelet packet energy moments of different band as feature vectors, and simulation analysis of experimental data shows that the loss of gas train air springs and fault identification lateral damper failure’s recognition rate is high ,which shows that this method can well estimate the fault condition of high-speed train . (c) Considered that train operation adjustment constraints are a lot and difficulty of solving problems is great and other such problems, this paper combine with the characteristics of urban rail transit train operation to establish of an optimized train operation adjustment model. In order to improve the accuracy of fault diagnosis, this paper takes feature selection and the JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 intrinsic link method of LSSVM parameters into consideration and proposes a model with joint selection feature and LSSVM parameters fault diagnosis. The simulation results show that the proposed model improves the accuracy of fault diagnosis and efficiency, It can meet the requirements of simulative train fault diagnosis. II. PROPOSE METHOD A. Adjustment Model for Train Running Status 1) Wavelet Packet Energy Moment The actual operation of the vehicle is mainly affected by the incentive track irregularity, which is a major source of various generated vibration. And with increased speed, vertical acceleration increases and affects the vehicle body through frame, excite elastic vibration of the vehicle body with a higher frequency. And in turn, the body train affects the frame through the spring, affecting the dynamic performance of the train. Air spring and horizontal absorber as the second line and the first line suspension are the key components of train system, abnormal vibration generated when failure occurs. It can be known by the actual knowledge the that body vibration frequency is mainly concentrated in the low frequency range, the main vertical vibration is generally less than 4Hz.In order to extract the subtle characteristics under different faults, wavelet packet decomposition has a great advantage, and this paper also proposes that wavelet packet energy moment algorithm can reflect energy changes of different faults in different bands. The so-called wavelet packet is a family of functions which construct their orthonormal base library L2 ( R) , after the wavelet packet decomposition, the signal can be decomposed to the neighbour frequency band without leakage, overlapping, and the frequency range of each band is [(n 1)2 j 1 f s , n2 j 1 f s ], n 1, 2, 8 , where in, f s is sampling frequency. Most vertical vibration generated when high-speed train runs are nodding and rolling and pendulum and other complex vibration of typical vibration combination. The acceleration sensor mounted on the bogie train can monitor different frequency band energy distribution characteristics when signals are under different conditions. The traditional wavelet energy methods do not consider energy distribution of each band decomposition on the timeline, so that the extracted feature parameters can not accurately reflect the nature of fault, so this paper introduces the energy [8,9] parameters f s , the energy moment M ij of each band signal Sij is: n M ij (k t ) | Sij (k t ) |2 k 1 where in, t is the sampling interval n is the total number of samples, k is the sampling point, and the energy matrix algorithm steps are: (1) Conduct wavelet packet decomposition for the body vertical vibration signal, let S represent the original © 2014 ACADEMY PUBLISHER 209 signal, X jk be the wavelet packet decomposition coefficients of signal of j scale in k time. (2) Reconstruct coefficients of the wavelet packet decomposition to obtain the frequency range S jk of the signal. (3) Find wavelet packet energy moment M j of each band signal S jk . (4) Structure feature vector to obtain a normalized feature vector T : T [ M1 , M 2 n , Mn ] / M j 1 2 j 2) Fault Diagnosis Model According to changes of the proportion of each frequency band energy moment, train running status can be monitored. Train operation adjustment is that when the running train is disturbed, actual operation of the train deviates from the scheduled chart. Through re-adjustment of train operation plan, so far the actual train running route is as possible close to the scheduled chart [5]. Train operation adjustment is a multi-constrained combinatorial optimization problem, this kind of problem is usually expressed by using the following abstract form [6]: Equation of state: G( j 1) G( j ) T G( j) (1) Optimization objectives set: Object (1) and Object (2) ...and Object (n) Set of constraints: Restraint (1) and Restraint (2) ...and Restraint (n) Among them, G( j ) is the train running status of time j , T is the state transition operator decided by the adjust strategy of the running train. For the given set of features of analog train state S s1 , s2 , , sn , si 0,1 , i 1, 2, , n , where in n is the size of the feature set , 1 and 0 denote whether the corresponding feature is selected or not. Ultimate goal of feature selection is to improve the simulation train fault diagnostic accuracy (G), therefore, simulate the mathematical model of feature selection: max G ( S ) S s.t. (2) S s1 , s2 , , sn si 0,1 i 1, 2, , n Using particle swarm optimization algorithm to simulate the solving problem of train multi-feature combinatorial selection optimization, particle bit string representing the selected feature subset (S), PSO fitness function is the analog train fault diagnostic accuracy. In calculating the fitness value of each particle, first learn the training set according to the selection feature S, calculate the accuracy (G) of fault diagnosis of the 210 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 classifier of analog train, but the classifier (LSSVM) parameters needs to be given before calculating G. In the designing of train fault classifier based on LSSVM, we shall determine the kernel function and its parameters. Currently there are several LSSVM kernel function, a large number of studies have shown that when there is absence of a prior knowledge of the process, in the general case, LSSVM based on radial basis function (RBF) outperforms other kernel function, so we choose RBF kernel, which is defined as follows: K RBF (u, v) e u v 2 2 (3) In the formula, u , v represents the two vectors of input space, is the width of the kernel function. Besides RBF kernel function parameters σ, LSSVM classification performance is also related to the regularization parameter γ. Combined kernel function and related parameters, classifier parameter selection model based on LSSVM is: M , (4) Take analog train fault diagnostic accuracy (G) as the classification parameters electing target, classifier parameters mathematical model based on LSSVM is: regulation of many parameters, change impacts of the dimension is minor, and so on, so it is easy to implement [12]. In each iteration, the particle’s velocity and position updating equation is: vid (i 1) vid (i) c1 rand () ( Pbest xid (i) c2 rand () ( gbest xid (i)) xid (i 1) xid (i) vid (i 1) M (5) M , ( max , min ) ( , ) min max Like Train simulator troubleshooting features with combinatorial optimization, formula (4) uses PSO to seek solution. Particle bit string represents parameter (M) of LSSVM, the fitness function is the analog train fault diagnostic accuracy. In calculating the particle fitness value, LSSVM demands to study the training set according to the parameter (M), then calculate the simulated train fault diagnosis accuracy of classifier (LSSVM), but the feature subset (S) needs to be given before calculating G . In the current modeling process of train simulator fault diagnosis, the feature selection and LSSVM parameters intrinsic link between the two has not been considered, the two are independent to choose, so there are some drawbacks: firstly, it can not be determined should feature selection or LSSVM parameters [11] selection goes first. Secondly, if one process be carried out, then the other will be randomly determined, so even taking turns we can not guarantee that both are optimal. B. Model Solving Process 1) Particle Swarm Optimization Diagnostic Model Particle swarm optimization (PSO) finds the optimal solution by following their historic P best and throughout the history of particle swarm optimal solution g best. PSO algorithm has the advantage of fast convergence, without © 2014 ACADEMY PUBLISHER (7) where in, vid (i) and vid (i 1) respectively represents the current particle velocity and the updated particle velocity; xid (i) and xid (i 1) respectively represents the current position of the particle and the updated position of particles; w is the inertia weight; c1, c2 represents the acceleration factor; rand () indicates the number of random function. Location of individual particles is composed of three parts. The first part characterize the analog train status information, using the binary coding, in which each bit is respectively corresponding to a given feature, "1" indicates that the corresponding feature selection subset, and when this bit is "0", it indicates that the corresponding feature is not in the selected subset of features; second and third parts respectively represent and , the code length can be adjusted according to the required accuracy (8). max G ( M ) s.t. (6) p min p max p min p 2l 1 d (8) In this formula, p represents the converted value of the parameter; l represents the length of the bit string of corresponding parameters; max p and min p denote the minimum and maximum of parameter; d represent representative accuracy of binary. Train fault feature and classification parameters selection goal is to improve train fault diagnostic accuracy while ensuring fault feature as little as possible, so the fitness function is defined as: Nf f a Acc f fi i 1 1 (9) where in, f i denotes the feature selection status; N f indicates total number of features; a indicates feature weights, f means the weights of the number of features with respect to the accuracy of the validation set ; Acc validates set diagnostic accuracy. LSSVM is for two classification problems, however, train simulator includes a variety of fault types of fault, thus train simulator troubleshooting essentially is a multi-classification problem, currently there are "1 to 1" and "1 to many" which construct multiple classifiers. In this paper, we use the "1 to 1" way to build a multi-intended classifier for stimulating train fault. 2) Analog Troubleshooting Steps (1) Collect information of train simulator status and use wavelet packet to extract candidate feature. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 A. Effectiveness of Experimental Wavelet Analysis In order to verify the effectiveness of the method, install a sensor, which monitors the vertical acceleration, on the motor car test rig floor-mounted pillow beam of front body, collecting working conditions signals of EMU original car (normal condition), EMU front spring loses gas (air spring fails), EMU yaw full demolition (yaw damper failure conditions) and transverse damper motor car full demolition (lateral damper failure conditions).The sampling frequency is 243Hz, the sampling time is1 minute. Figure 1 and 2 are the time domain and frequency domain of the four conditions at 200Km / h. Many of the train vibration is low frequency vibration, so we first chose the Butterworth filter filters out signals above 15Hz and zero mean processing, so that we can eliminate the low-frequency-high-peak interference signals in the frequency domain. It is conducive to feature extraction. Figure 3 and 4 is a time-frequency domain after pretreatment. The figure shows that the vertical acceleration of EMU front springs is the biggest, and vibration energy at the fault characteristic frequency reaches maximum. It contains a lot of pulse-pounding ingredients, EMU yaw full demolition fault performance is not obvious, EMU lateral damper fully demolished has sensitive vibration frequency around 1Hz. In order to © 2014 ACADEMY PUBLISHER The vertical acceleration of m/s 2 The vertical acceleration of m/s 2 0.1 0 -0.1 0 Move the front overhead spring loss of air pressure 0.1 0 -0.1 50 time/s 0 time/s 50 80 amplitude amplitude 80 60 40 20 60 40 20 0 0 0 10 20 0 10 Frequency/Hz 20 Frequency/Hz Figure 1. Frequency motor vehicle and air spring air The vertical acceleration of m/s 2 Train all resist sinusoidal 0.1 0 -0.1 Train all transverse shock absorber 0.1 0 -0.1 0 50 time/s 0 80 time/s 50 80 amplitude EXPERIMENTS AND ANALYSIS Train the original car The vertical acceleration of m/s 2 III. further analyze the characteristics of all conditions for fault identification, we select db14 wavelet according to the main characteristics of the signal frequency range, taking into account that because we analyze the signal within 15Hz , we only reconstruct the preceding 8-band wavelet packet coefficients decomposed by the layer 6.The corresponding frequency range is: 0-1.875Hz, 1.875-3.75 Hz, 3.75-5.625 Hz, 5.625-7.5 Hz, 7.5-9.375 Hz, 9.375-11.25 Hz, 11.25-13.125 Hz, 13.125-15 Hz. Corresponding figure of frequency band and energy moment can be obtained under different speeds ,as shown from figure 3 to figure 6. amplitude (2) To prevent that too large a difference of characteristic values will adversely affect the training process, the characteristic values needs to be normalized. (3) Initialize PSO. Randomly generate m particles to compose the initial particle swarm, each particle is composed by a subset of features, LSSVM parameters (γ, σ). (4) According to the coding scheme of the particle, the binary representation of each particle is changed into the selected subset of features, LSSVM parameters γ and σ, then calculate the fitness value of the particle according to formula (8). (5) For each particle, it should compare the fitness function value with of its own optimal value, and if the fitness function value is better than the optimal value, then the fitness value replace the historic optimal value, and uses the position best of the current particle. (6) For each particle, it should compare the fitness function value with of its group optimal value, and if the fitness function value is better than the group optimal value, then the fitness value replace the historic optimal value, and uses the position best of the current particle. (7) Update the particle velocity and position according to equation (6) and (7), and adjust the inertia weight. (8) When the maximum number of iterations is reached, then output corresponding feature subset of optimal particles, LSSVM parameters; otherwise go to step (4) and continue the iteration. (9) Simplify the training set and test set according to the optimal feature subset, and then use the optimal parameters of LSSVM to learn the training set to build a simulation model train fault diagnosis and diagnose test set, then output the diagnosis. 211 60 40 20 0 60 40 20 0 0 10 Frequency/Hz 20 0 10 20 Frequency/Hz Figure 2. Frequency domain anti hunting around buildings and lateral damper fully In the figure, the letters A-D respectively denotes the original car EMU, EMU loss of gas spring before overhead, yaw full EMU demolition and EMU lateral damper demolition. Brackets after the letters indicates failure state when the train runs at certain speed. Instability occurs when EMU is at yaw full demolition 220Km / h , so 250Km / h without the condition, it is know from the figure that the same conditions has the same trend at different speeds, front overhead spring loss of gas failure, energy is concentrated in a second band, i.e. There is sensitive vibration sequency within the 2-4Hz and there are many responsive impulse .Known from the train model, the air spring is mounted and supported on JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 the bogie of the rubber base body, the compressed air inside the rubber balloon is took as the elastic restoring force of the reaction force, this can reduce the vibration and shock. The air spring failure just relies on the rubber buffer alone to reduce damping and this will make the acceleration vibration of the vehicle increase. Energy of the other two faults and normal condition is concentrated in the first and second bands, as the speed increases, the torque difference between the energy band increases. When lateral damper fully removed excite instability, the energy is mainly in the first frequency band. Above analysis indicates that under different conditions and different speeds, energy changes according to the respective conditions of frequency and time distribution. 160Km/h 1 A B C(micro sway) D Energy moment 0.8 0.6 0.4 0.2 0 0 2 4 6 8 Eight before 6 layer decomposition of wavelet packet frequency band Figure 3. 160 Km/h wavelet packet energy moment A B C(shakes, HuanShou) D Energy moment 0.8 0.6 0.4 0.2 0 2 4 6 8 Eight before 6 layer decomposition of wavelet packet frequency band Figure 4. 200 Km/h wavelet packet energy moment 220Km/h 1 A B C (buckling) Energy moment 0.8 D (vibration, shaking, HuanShou) 0.6 0.4 0.2 0 250Km/h 1 A B 0.8 D(turbulence instability ) 0.6 0.4 0.2 0 0 2 4 6 8 Eight before 6 layer decomposition of wavelet packet frequency band Figure 6. 250 Km/h wavelet packet energy moment From table 1, we know, as the speed increases in each case, the correct recognition rate is 100% when EMU gas spring failure speed is 200Km / h; when the motor car full demolition fault yaw rate reached 220Km / h, meandering instability occurs and when the speed is 200Km / h, the recognition rate is the highest; excitation instability occurs when EMU lateral damper fully dismantle is at 250Km / h , then the vibration frequency is very small and the correct identification rate is 100 %. 200Km/h 1 0 last layer wavelet packet. The experiment has four conditions of the vibration signal, so it needs to establish three second-class SVM. The fault identification topology structure is shown in Figure 7, four kinds of conditions of the correct recognition rate results are shown in Table 1. Energy moment 212 0 2 4 6 8 Eight before 6 layer decomposition of wavelet packet frequency band Figure 5. 220 Km/h wavelet packet energy moment B. Fault Diagnosis Test SVM is advantageous for the small sampling, nonlinear and high dimensional pattern recognition [10-12], so we use support vector machines for fault identification. We extract 30 sample groups for each group of these four kinds of working conditions, there are 120 groups of samples in total, wherein 60 groups are for training and 60 groups for testing. Each group is a dimensional feature vector which is composed of the energy moment of the 8 preceding frequency band of the © 2014 ACADEMY PUBLISHER C. Fault Identification In order to verify the effectiveness of high-speed train bogie mechanical fault signal extract wavelet entropy features, we use support vector machine to classify and recognize characteristic data. The data used is the simulation data under single fault condition, select the responding data collected by the 58 sensors as a group when the speed is 200km / h, under four kinds of fault state. In order to achieve classification, take each group’s data under the same conditions and the same position into a 3 seconds of data segments, each segment as a sample, then a single failure has 70 samples an the four faults have 280 samples. As previous description, these samples needs to do de-noising preprocessing and feature extraction wavelet entropy, wherein we abandon the distance rounding wavelet entropy, each sample has a feature vector with five of five-dimensional wavelet entropy .Put these five-dimensional feature vectors which are extracted by 280 samples into the input support vector machine to recognise. Wherein, 60% of the samples were randomly selected as the training samples, and the remaining 40% as test samples. Figure 8 is a three-dimensional features figure of lateral acceleration signal of the central frame and acceleration signal plot of longitudinal axis gearbox. It can be seen from the figure that there is a little overlap phenomenon of two-dimensional wavelet entropy under four kinds of conditions, the same conditions feature is not very concentrated, but some characteristics of the particular condition has a good degree of differentiation, so when we select five-dimensional wavelet entropy features to form a high-dimensional feature, we are able to get a satisfactory recognition effect. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 The feature vectors The input SVM 1 F ( x )= 1 ? 213 Yes No Move the front overhead spring loss of air pressure SVM 2 F ( x )= 1 ? Yes No Yes SVM 3 F ( x )= 1 ? Train normal signal Train of serpentine complete No Train all transverse shock absorber Figure 7. The fault identification topology structure is shown TABLE I. speed(Km/h) 160 200 220 250 280 300 330 The average recognition rate DIFFERENT SPEEDS UNDER FOUR CONDITIONS OF THE RATE OF CORRECT RECOGNITION Motor car 20% 60% 80% 66.7% 93.3% 93.3% 100% 73.3% Before starting the overhead spring loss of gas 93.3% 100% 100% 100% 100% 100% 100% 99% TABLE II. Wavelet entropy feature 1—10 11—20 21—30 31—40 41—50 51—58 discrimination % wavelet energy feature Wavelet entropy feature 40 km/h 34.4 26.7 30.3 44.6 29.4 83.0 56.2 92.8 58.0 90.1 97.3 93.7 66.9 75.0 66.0 47.3 98.2 85.7 75.9 69.6 91.9 70.5 24.1 62.5 75.0 62.5 51.7 54.4 91.9 64.3 71.4 61.6 83.9 74.1 61.6 67.8 59.8 59.8 96.4 33.9 58.9 60.7 THE EXPERIMENTAL RESULTS ARE SHOWN 80 km/h 41.3 38.3 Data signal recognition results of each channel (different sensor) are shown in Table 2. Experimental data were collected from 58 channels, distributed in various parts of the bogie. But the theory is not clear about which parts of the vibration signal acquisition is more conducive to the identification of a fault condition, the experimental results show that the recognition performance is uneven between different sensors. As can be seen from table 2, a high recognition rate of the channel is the channel 11, 20, 25, 26 and 53. They are respectively corresponding to the mounting position of the sensor lateral acceleration of the central frame, one axle lateral acceleration, longitudinal axis gear box accelerometer, three-axis gearbox lateral acceleration and a series a relative displacement. In order to verify the validity of feature extraction, this paper compares the proposed method with the traditional wavelet energy feature extraction methods, and respectively calculate the fault recognition rate when the train is running at 40km / h, 80 km / h, 120 km / h, 160 © 2014 ACADEMY PUBLISHER Vehicle lateral damper around buildings 60% 66.7% 46.7% 100%(Excitation instability) 68.4% THE RECOGNITION SENSOR CHANNEL RATE discrimination % 66.9 66.9 67.8 94.6 79.4 92.8 48.2 93.6 76.7 91.0 84.8 90.2 70.5 59.8 71.4 71.4 65.1 95.5 TABLE III. Train anti hunting around buildings 40% 73.3% 53.3%(unstability) 55.5% 120 km/h 70.1 69.6 140 km/h 79.6 81.2 160 km/h 81.0 85.7 200 km/h 84.9 96.4 km / h and 200 km / h , the experimental results are shown in table 3. It can be concluded from table 3 that compared to traditional wavelet energy feature, the wavelet entropy features can get higher fault recognition rates, especially in the high state. As the train speeds increase gradually the recognition rate increases and when the speed is 200 km / h the wavelet entropy feature recognition rate can reach 90% or a more satisfactory result. It is believed that the higher the speed, the greater the difference between the law of the vibration signal bogie caused by different failure. The more obvious the fault feature are the greater the impact on train mechanical systems and this conclusion is consistent with the kinetic theory. IV. CONCLUSION According to high-speed train running status, this paper adjusts constraints, air spring and the running status of lateral damper components and vibration signals of vertical acceleration of the vehicle body. Combined with operation characteristics of urban train, the paper 214 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 establishes an optimized train operation adjustment model, and proposes the train state estimation method of corresponding wavelet packet energy moment. Besides, as for the problem that the features of the current analog fault diagnosis does not match the classification parameter selection, this paper also proposes an analog train fault diagnosis model with joint selection characteristics and classifier parameters. As for the vertical acceleration vibration signals based on monitoring front beam floor, we propose using the feature extraction method of wavelet packet energy moments, and use SVM to estimate the running state. Experimental results show that the is the signal is most sensitive to air spring failure and has sensitive vibration frequency within 2-4Hz; lateral damper fully removed is relatively more sensitive when excitation instability occurs around 1Hz, and the accurate recognition rate is 100%.It indicates that vertical acceleration monitoring data is effective for air spring losing of gas and lateral damper excitation instability. The next step needs to be done is: the train running from normal to abnormal is a gradual process, the shown signs are fuzzy and random in many cases. This article just makes a preliminary discussion, further study of the high-speed train safety warning and health maintenance needs to be carried out. The original car Air spring loss of air pressure Coil resistance failure Lateral damper failure 0.7 [2] [3] [4] [5] [6] [7] [8] WSE 0.6 0.5 0.4 0.3 [9] 0.2 0.4 0.2 WTFE 0 0 0.2 0.4 0.6 0.8 [10] WEE a) 11 channel architecture the central lateral acceleration signal 0.7 The original car Air spring loss of air pressure Coil resistance failure Lateral damper failure [11] WSE 0.65 0.6 0.55 [12] 0.5 0.45 0.4 0.2 WTFE 0.4 0.3 0 0.1 0.2 WEE 0.5 0.6 [13] b) 25 channel three axle gear box longitudinal acceleration signal Figure 8. Three dimensional feature pictures of different position signal [14] ACKNOWLEDGMENT [15] This work was supported in part by The National High Technology Research and Development Program of China (Grant No. 2011AA110506). [16] REFERENCES [1] Kuihe Yang, Ganlin Shan, Lingling Zhao. Application of Wavelet Packet Analysis and Probabilistic Neural Networks in Fault Diagnosis, Proceedings of the 6th World © 2014 ACADEMY PUBLISHER Congress on Intelligent Control and Automation. 2006 pp. 4378-4381. Jiang Zhao, Feng Sun, Huapeng Wang. Pipeline leak fault feature extraction based on wavelet packet analysis and application, IEEE 2011 International Conference on Electrical and Control Engineering, 2011 pp. 1148-1151. Alexios. D. Spyronasios, Michael. G. Dimopoulos. Wavelet Analysis for the Detection of Parametric and Catastrophic Faults in Mixed-Signal Circuits. IEEE Transactions on Instrumentation and Measurement, 2011, 60(6) pp. 2025-2038 Shengchun Wang, Qing Zhang, Study on The Fault Diagnosis Based on Wavelet Packet and Support Vector Machine, International Congress on Image and Signal Processing. 2010, (3) pp. 3457-3461. Urmil. B. Parikh, Biswarup. Das, Combined Wavelet-SVM Technique for Fault Zone Detection in a Series Compensated Transmission Line. IEEE Transactions on Power Delivery, 2008, 23(4) pp. 1789-1794. CHO Chan-Ho, CHOI Dong-Hyuk, QUAN Zhong-Hua, et al. Modeling of CBTC Carborne ATO Functions using SCADE. //Proc of 11th International Conference on Control, Automation and Systems. Korea: IEEE Press, 2011 pp. 1089-1093. CHENG Yun, LI Xiao-hui, XUE Song, et al. The position and speed detection sensors based on electro-magnetic induction for maglev train. //Proc of the 29th Chinese Control Conference Beijing: IEEE Press, 2010 pp. 5463-5468. HOU Ming-xin, NI Feng-lei, JIN Ming-he. The application of real-time operating system QNX in the computer modeling and simulation. //Proc of 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce. Deng Leng: IEEE Press, 2011 pp. 6808–6811. ESTEREL Technologies. SCADE Suite. (2012-11-1) [2013-03-07]. http: //www. esterel-technologies. com/ products/ scade-suite/. WANG Hai-feng, LIU Shuo, GAO Chun-hai. Study On Model-based Safety Verification of Automatic Train Protection System. //Proc of 2nd Asia-Pacific Conference on Computational Intelligence and Industrial Applications. Wuhan : IEEE Press, 2009 pp. 467-470. DH Wang, WH Liao. Semi-Active Suspension Systems for Railway Vehicles Using Magnetorheological Dampers. Vehicle System Dynamics, 2009, 47(11): pp, 1130-1135 Guangjun Li, Weidong Jin, Cunjun Chen. Fuzzy Control Strategy for Train Lateral Semi-active Suspension Based on Particle Swarm Optimization//System Simulation and Scientific Computing Communications in Computer and Information Science 2012, pp. 8-16 Camacho J, Picó J. Online monitoring of batch processes using multi-phase principal component analysis. Journal of Process Control, 2006, 16(10) pp. 1021-1035. Hua Kun-lun, Yuan Jing-qi. Multivariate statistical process control based on multiway locality preserving projectio- ns. Journal of Process Control, 2008, 18(7-8) pp. 797-807. Yu Jie, Qin S J. Multiway Gaussian mixture model based multiphase batch process monitoring. Industrial & Engineering Chemistry Research, 2009, 48 (18) pp. 8585-8594. Guo Jin-yu, Li Yuan, Wang Guo-zhu, Zeng Jing. Batch Process monitoring based on multilinear principal component analysis//Proc of the 2010 International Conference on intelligent systems and Design and Engineering Applications, 2010, 1 pp. 413-416. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 [17] Chang Yu-qing, Lu Yun-song, Wang Fu-Li, et al. Sub-stage PCA modelling and monitoring method for uneven-length batch processes. The Canadian Journal of Chemical Engineering, 2012, 90(1) pp. 144-152. [18] Kassidas A, MacGregor J F, Taylor P A. Synchronization of batch trajectories using dynamic time warping. AIChE Journal, 1998, 44(4) pp. 864-875. [19] Rothwell S G, Martin E B, Morris A J. Comparison of methods for dealing with uneven length batches//Proc of the 7th International Conference on Computer Application in Biotechnology (CAB7), 1998 pp. 387-392. [20] Lu Ning-yun, Gao Fu-rong, Yang Yi, et al. PCA-Based modeling and on-line monitoring strategy for uneven-length batch processes. Industrial & Engineering Chemistry Research. 2004, 43(13) pp. 3343-3352. [21] Yao Yuan, Dong Wei-wei, Zhao Lu-ping, Gao Fu-rong. Multivar -iate statistical monitoring of multiphase batch processes with uneven operation durations. The Canadian Journal of Chemical Engineering, 2012, 90(6) pp. 1383-1392. [22] Zhao Chunhui, Mo Shengyong, Gao Furong, et al. Statistical analysis and online monitoring for handling multiphase batch processes with varying durations. Journal of Process Control, 2011, 21(6) pp. 817-829. [23] Wang Jin, Peter He Q. Multivariate statistical process monitoring based on statistics pattern analysis. Industrial & Engineering Chemistry Research, 2010, 49 (17) pp. 7858-7869. [24] Garcia-Alvarez D, Fuente M J, Sainz G. I. Fault detection and isolation in transient states using principal component analysis. Journal of Process Control, 2012, 22(3) pp. 551-563. [25] Wise B M, Gallagher N B, Butler S W, et al. A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process. Chemomotrics, 1999, 13(3-4) pp. 379-396. Yu Bo (1985-), he is currently pursuing the Ph.D. degree in traffic and transportation at Beijing Jiaotong University. He is currently working on real-time monitoring and safety warning technology of urban rail trains. His main research directions are train safety, train fault diagnoses, train networks and etc. © 2014 ACADEMY PUBLISHER 215 Jia Limin (1963-), received Ph.D. degree from China Academy of Railway Sciences 1991 and EMBA from Peking University 2004. He is now a chair Professor at the State Key Lab of Rail Traffic Control and Safety, Beijing Jiaotong University. His research interests include Intelligent Control, System Safety, Fault Diagnosis and their applications in a variety of fields such as Rail Traffic Control and Safety, Transportation and etc. Ji Changxu (1960-), received Ph.D. degree from Jilin university of technology. He is now a Professor at school of traffic and transportation, Beijing Jiaotong Univesity. His research interests include train safety, train fault diagnoses, traffic planning and management, network optimization of the comprehensive passenger transport hub service and etc. Lin Shuai (1987-), she is currently pursuing the Ph.D. degree in traffic and transportation at Beijing Jiaotong University. She is currently working on reliability assessment of urban rail trains. Her main research directions are train safety, train reliability and etc. Yun Lifen (1984-), she is pursuing the Ph.D. degree in traffic and transportation at Beijing Jiaotong University and currently as exchange students in Mississippi State University. Her main research directions are traffic planning and management, network optimization of the comprehensive passenger transport hub service and etc. 216 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Massive Medical Images Retrieval System Based on Hadoop YAO Qing-An 1, ZHENG Hong 1, XU Zhong-Yu 1, WU Qiong 2, LI Zi-Wei 2, and Yun Lifen 3 1. College of Computer Science and Engineering, Changchun University of Technology, Changchun, China 2. College of Humanities and Information, Changchun University of Technology, Changchun, China 3. Mississippi State University, Civil and Environmental Engineering, Mississippi State, USA Abstract—In order to improve the efficiency of massive medical images retrieval, against the defects of the single-node medical image retrieval system, a massive medical images retrieval system based on Hadoop is put forward. Brushlet transform and Local binary patterns algorithm are introduced firstly to extract characteristics of the medical example image, and store the image feature library in the HDFS. Then using the Map to match the example image features with the features in the feature library, while the Reduce to receive the calculation results of each Map task and ranking the results according to the size of the similarity. At the end, find the optimal retrieval results of the medical images according to the ranking results. The experimental results show that compared with other medical image retrieval systems, the Hadoop based medical image retrieval system can reduce the time of image storage and retrieval, and improve the image retrieval speed. Index Terms—Medical Image Retrieval; Feature Library; Brushlet Transform; Local Binary Patterns; Distributed System I. INTRODUCTION The development of digital sensor technology and storage device leads to the rapid expansion of the digital image library, and all kinds of digital equipment produce vast amounts of images every day. So how to effectively organize the management and access of these images becomes a hot research direction in recent years. The traditional text-based image retrieval system uses the key words to retrieve the marked images. But owing to the limitations that artificial marking causes large workload, the content of the images cannot be completely described by words, and the understanding of images is different from person to person and so on, the text-based image retrieval system cannot meet the requirements for massive images retrieval. And how to carry on the effective management and organization of these medical images to provide services to clinical diagnosis becomes a problem faced by medical workers [1]. The content-based medical image retrieval (CBMIR) has the advantages of high retrieval speed and high precision and so on, and has been widely applied in the fields such as medical teaching, aided medical diagnosing, medical information management, etc [2]. © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.216-222 The content-based image retrieval [3] is a kind of technology which makes use of the visual features of the images to carry on the image retrieval. Under the premise of a given query image, and according to the information of the image content or the query standard, it searches and finds out the images that meet the query requirements in the image library. There are mainly three key steps: first, selecting the appropriate image characteristics; second, adopting the effective feature extraction method; third, using the effective feature matching algorithm. Features that can be extracted from an image include color, texture, shape, flat space corresponding relation, etc. Color can be presented by color moment, histogram, etc. Texture can extract the Tamura feature, Gabor and wavelet transform of image. Shape can be divided into area-based method and edge-based method. Flat space corresponding relation can be described through two-dimensional string [4]. At present, many institutions have further studied CBMIR, and developed systems that went into practice. Such as the earliest commercial QBIC system [5] developed by IBM, WebSeek system [6] by Columbia University, Photobook system [7] by Massachusetts Institute of Technology and so on. There are also many outstanding works in the content-based image retrieval direction in recent years, for example, literature [8], based on the clustering of unsupervised learning, are the typical examples of CBMIR technology, literature [8][9] use the semi-supervised learning method, literature [9] carry on image retrieval with the method of relevance feedback, and a lot of works also improve the quality of image retrieval by improving the method of feature extraction, such as literature [11, 12]. The CBMIR algorithm needs to calculate the similarity between the features of sample medical images and the features in the feature library. It is a typical data-intensive computing process [13]. When the number of the features in the library is large, the efficiency of the single-node retrieval in the traditional browser/server mode (B/S) is difficult to meet the real-time requirements of the images, and the system has a poor stability and extensibility [14]. Cloud computing can assign the tasks to each work node to complete the tasks together, and with a distributed and parallel processing ability, it provides a new research idea for medical image retrieval [15]. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Hadoop is an open-source project under the Apache Software Foundation organization, which provides the reliable and scalable software under distributed computing environment. It is a framework that allows users easily using and distributing computing platform, and it can support thousands of computing with PB-level nodes data [11, 12]. Hadoop distributed computing platform is suitable for all kinds of resources, data and other deployed on inexpensive machines for distributed storage and distributed management. It is with high reliability, scalability, efficiency and high fault tolerance, etc., and it can effectively improve the image the speed of retrieval. The text on the basis of reaching open-source framework Hadoop, analyzing the traditional image retrieval system and combining the content-based image retrieval technology and MapReduce computing framework [13] stores the image feature in database HDFS and developers the realized Hadoop-based mass image Retrieval System. Hadoop Distributed File System (HDFS) is a scalable distributed file system. For it can be run on a cheap and ordinary hardware, it is supported by many companies, such as Google, Amazon, Yahoo! and so on. Under the circumstance that the underlying details is unknown, using the Map/Reduce functions to realize the parallel computing easily has been widely applied in the field of mass data processing [16]. Making use of the advantages of Hadoop, the problem that the retrieval efficiency is low in the process of medical image retrieval can be better solved, and there is no related research in the domestic presently [17]. Content-based image retrieval CBIR is the underlying objective retrieval by using global and local features of the image. Global features include color, shape, texture and so on; local features include SIFT, PCA-SIFT, SURF and so on [14]. As an automatic objective reflection image content-based retrieval method, CBIR is suitable for mass image retrieval. Semantic retrieval is the direction of development of CBIR image, but the image semantic has the characteristics of complexity, subjectivity, etc., and it is difficult in the extraction, expression and application of technical exist [15]. There are two main aspects of development of parallel image processing system; one is for some algorithms. It is searching the efficient parallel algorithm and development of high-performance parallel computer to achieve specific purposes, but such system is limited to the scope of application. The other is developed for general-purpose parallel image processing system, which is the mainstream of the parallel image processing system [16]. Image parallel computing generally are divided into two kinds: pipelined parallel and data parallel. Pipelined parallel is with the handling unit sequentially connected in series, that is, the output of a processing unit and the input of the next processing unit is connected. Data parallelism is composed of a plurality of processing units in parallel arrays, and each processing unit can perform its tasks independently [17]. With the increasing of the image data, the mass of the image retrieval process has become a very time consuming process. © 2014 ACADEMY PUBLISHER 217 To improve the efficiency of medical image retrieval, aiming at the shortage of the B/S single-node system, a medical image retrieval system based on the distributed Hadoop is put forward. And the experimental results show that the Hadoop-based medical image retrieval system not only reduces the time of image retrieval, improves the efficiency of image retrieval, but also presents a more apparent advantage for massive medical images retrieval. Main innovations of this paper: (a) With the continuous development of digital technology, there is a sharp increase in the amount of image data for the image data. For the mass interested in image retrieval problem of low efficiency, as well as the deficiencies of B / S single-node system, the efficiency of medical image retrieval is further improved and Medical Image Retrieval system based on Hadoop Distributed is proposed. It is based on Hadoop cloud computing platform, adopts the parallel retrieval technology and uses the SIFT Scale Invariant Feature Transform algorithm to solve the problem of massive image retrieval. (b) Medical Image Retrieval system based on the Hadoop Distributed improves the efficiency of image storage and retrieval, which get better search results. They are mainly showing in the following aspects: medical image retrieval can meet real-time requirements of medical image retrieval, especially when dealing with large-scale medical image. It has the unparalleled advantages compared to traditional B / S single-node, and at the same time it reduces the image retrieval time and improves the efficiency of image retrieval, especially for massive medical image retrieval. II. HADOOP DISTRIBUTED MEDICAL IMAGE RETRIEVAL A. Hadoop Platform Hadoop platform is the most widely used open source cloud computing programming platform nowadays. It is an open source framework which runs large database to deal with application programs on the cluster, and it supports the use of MapReduce distributed scheduling model to implement the virtualization management, scheduling and sharing of resources [10]. The structure of HDFS is that a HDFS cluster consists of a master server (NameNode) and multiple chunk servers (DataNode), and accessed by multiple clients. The NameNode is responsible for managing the namespace of the file system and the access of the clients to the files, while DataNode manages the storage of the data of its node, handles the client’s reading and writing requests of the file system, as well as carries on the creation, deletion and copy of the data block under the unified scheduling of NameNode [11]. HDFS cuts the files into pieces, then stores them in different DataNode dispersedly, and each piece can be copied and stored in different DataNode. Therefore, HDFS has high fault tolerance and high throughout of data reading and writing. MapReduce is a programming model, which is used for the calculation of large amount of data. For the calculation of large amount of data, the usually adopted 218 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 processing technique is parallel computing. First of all, breaking a logically complete larger task into subtasks, then according to the information of the tasks, using appropriate strategies, the system assigns the different tasks to different resource nodes for their running. When all the subtasks have been finished, the processing of the whole large task is finished. Finally, the processing result is sent to the user [12]. In the Map phase, each Map task calculates the data assigned, and then maps the result data to the corresponding Reduce task, according to the key value output by Map. In the Reduce phase, each Reduce task carries on the further gathering processing of the data received and obtains the output results. To make the data processing cycle of MapReduce more visual, the calculation process of the MapReduce model is shown in Figure 1. Map Map User1 StartJob1 … Combine Stop Job1 Reduce Map Map Tasks Service Get Results Store status and results Map Map … User2 StartJob2 Combine Reduce n 1, 2, Map Tasks Figure 1. Data processing cycle of map reduce 14 13 12 11 10 9 8 7 6 15 deviation n of the respectively are: 5 4 3 17 18 19 20 2 1 36 35 34 21 22 33 24 25 26 27 28 29 30 31 32 Figure 2. Level three decomposition direction of brushlet B. Feature Extraction of Brushlet Domain Brushlet transform is the image multi-scale geometric analysis tool which aims at solving the problem of angular resolution. The two-dimensional Brushlets has certain direction structure and vibration frequency range, and can be reconstructed perfectly. The structure size of its basic function is inversely proportional to the size of the analysis window. The two-dimensional Brushlet with phase parameters shows its direction, thus better reflects the direction information of the image, and can conduct the decomposition of the Fourier domain [13]. Level one of Brushlet transform will divide the Fourier plane into four quadrants, and the coefficient is divided into four sub-bands, the corresponding direction is / 4 k / 2 , k 0,1, 2,3 . Level two further divides each quadrant into four parts on the basis of level one, and the whole twelve © 2014 ACADEMY PUBLISHER ni 4l . The mean value n and the standard n 16 23 i and each sub-band reflects the direction information of its corresponding decomposition direction. The place where the energy focused is exactly the parts where the texture image mutations. For each sub-band, its energy information can choose to be shown by the mean value and the standard deviation of the module value. Because Brushlet is a complex function, the corresponding sub-band coefficient of the real part and the imaginary part after the decomposition is used to calculate the module value at the same time. After the decomposition, the n sub-bands of the real part and the imaginary part respectively are marked to be and fˆ fˆ nr Stop Job2 Map directions respectively are /12 k / 6 , k 0,1, 11 . There are sixteen coefficients after the decomposition, among which the four sub-bands around the center are with low frequency component, and the rest are with high frequency component. And so on in a similar fashion. Figure 2 is the decomposition direction graph of level three. Given an image f , and conducts level l decomposition of Brushlet to it, there are will be two parts after the decomposition, which are the real part fˆr and the imaginary part fˆ . Each part has 4l sub-bands, 1 MN 1 MN n M n N sub-band’s module value | f i 1 j 1 M N i 1 j 1 n (i, j ) | (1) [ f nr (i, j )] [ f ni (i, j )] 2 ^ 1 M N (| f n (i, j ) | n ) 2 MN i 1 j 1 2 (2) In the above equation, i 1, 2, , M , j 1, 2, , N . M and N respectively represents the line number and the column number of each sub-band. The feature vector of image f is: F [1 , 1 , 2 , 2 , ] (3) C. Feature Extraction of LBP LBP can depict the changes relative to the center of the pixel’s gray level within the territory. It pays attention to the changes of the pixel’s gray level, which is in accordance with human’s visual perception features of the image, and the histogram is treated as the airspace characteristics of the image. LBP3u 2 7 i s( gi g c )2 , U ( LBP3 ) 2 i 0 256, otherwise Among which: (4) JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 1, gi gc 0 s ( gi g c ) 0, gi gc 0 219 (5) U ( LBP3 ) s ( g 7 g c ) s ( g 0 g c ) 7 | s( gi gc ) s( gi 1 gc ) | (6) i 1 In the above equation, g c is the pixel’s gray value of a neighborhood center, and g i means each pixel’s gray value of the neighborhood in clockwise, which is within the range of 3×3, and g c as the center. the distributed processing method of MapReduce is applied to upload the image to HDFS. The specific situation is as follows: (1) In the Map phase, using the Map function to read a medical image every time, and extract the color and texture feature of the image. (2) In the Reduce phase, the extracted feature data of medical image is stored in HDFS. HBase is a column-oriented distributed database, thus the table form of it is used for the medical image of HDFS. The specific process is shown in Figure 3. Medical images uploaded to the HDFS D. The Similarity Matching To measure the feature similarity of Brushlet domain, the average distance is used: HDFS took a picture of a medical image as the Map input 6 SimBrushlet P, Q EPi EQi Extract the image feature (7) i 1 Among which, P is the medical image waiting to be retrieved, and Q is the image of the medical image library. For the LBP features of the image, firstly the characteristics are being unanimously processed, and then the Euclidean distance is used to calculate the similarity. 32 Pi WQi i 1 2 Y (8) In the above equation, W represents the characteristic vector after the normalization. Because the value range of Sim'Brushlet and SimLBP is different, the external normalization is being processed to them. The specific process is as follows: 1 SimBrushlet P, Q Brushlet Sim'Brushlet P, Q 2 6 Brushlet Sim' LBP P, Q 1 SimLBP P, Q LBP 2 6 LBP (9) (10) In the above equation, Brushlet , Brushlet , LBP and LBP respectively represents the standard deviation and the mean value of Sim'Brushlet and Sim' LBP . The distance between the two medical images is as follows: Sim P, Q w1Sim'Brushlet P, Q w2 Sim' LBP P, Q (11) In the equation, w1 and w2 are for the weight, and meet the formula that w1 + w2 =1. E. The Algorithm of Medical Image Retrieval 1) The Medical Image Storage of MapReduce Image storage is the foundation of the automatic medical image retrieval, and it is a data-intensive computing process. The using of the traditional method to put the image into HDFS is very time-consuming, thus © 2014 ACADEMY PUBLISHER N Collect the output of each Map Figure 3. Storage process of medical image Upload the medical images to HDFS→Take a medical image from HDFS and input it as Map→Extract the image features→Write the image and features in HBase →Complete the image processing in HDFS→Collect the output of Map 2) Medical Image Retrieval of MapReduce The medical image and its features are all stored in HBase, when the data set of HBase is very large, the scan and search of the entire table will take a relatively long time. To reduce the time of image retrieval and improve the retrieval efficiency, the MapReduce calculation model is used to conduct the parallel computing of medical image retrieval. The specific framework is shown in Figure 4. Medical image retrieval Medical image ID set Reduce Map Map Map Medical image ID W Finish the HDFS image processing Medical image upload SimLBP P, Q The image and characteristics into HBase HDFS (images) feature HDFS (and features to retrieve image) Figure 4. Work diagram of image retrieval The steps of MapReduce based medical image retrieval are as follows: 220 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 (1) Collect the medical images, extract the corresponding features and store the features into HDFS. (2) With the user’s submission of search requests, extract the Brushlet features and LBP features of the medical images waiting for retrieval. (3) In the Map phase, conduct the similarity matching between the features of the medical images waiting for retrieval and the features of images in HBase. The output of the map is the key value of <similarity, image ID>. (4) Conduct the ranking and redistricting of the whole key value of <similarity, image ID > output by map, according to the size of the similarity, and then input them into the reducer. (5) In the Reduce phase, collect all the key-value pairs of <similarity, image ID >, then conduct the similarity sorting of these key values, and write the first N keys into the HDFS. (6) Output the ID of those images that are the most similar to the medical images waiting for retrieval, and the user gets the final result of the medical retrieval. The function of Map and Reduce is as follows: Map key, value Begin //read the features of the medical images waiting for retrieval Csearch Re ad SearchCharact ; // read the data in the feature library Cdatabase value ; // read the image path in the image library Path Get Figure Path value ; // calculate the similarity between the features of Brushlet domain and the features of LBP SimByBrushlet Compare By Brushlet ; Csearch, Cdatabase SimByLBP CompareByLBP Csearch, Cdatabase ; // calculate the similarity of matching, among which w1 and w2 respectively represents the similarity weight of the Brushlet domain features and LBP features. Sim w1 * SimByBrushlet w2 * SimByLBP ; Commit Sim, Path ; End Re duce key, value Begin // conduct the ranking of the medical images Sort key, value ; // key refers to the similarity value, value refers to the path of the similar medical images Commit key, value ; End III. THE SIMULATION TEST A. Experimental Environment Under the Linux environment, one master node (Name Node) and three work nodes (Data Node) form a Hadoop distributed system. The specific configuration is shown in © 2014 ACADEMY PUBLISHER table 1.In the Hadoop distributed system, by conducting the test of medical image retrieval with different number of nodes, compare its test results with the test results of the traditional image retrieval system in literature [15] and the image retrieval system under the B/S structure. The system performance evaluation criteria use the storage efficiency, retrieval speed, precision ratio (%) and recall ratio (%), and analysis the performance of the Hadoop distributed image retrieval system. TABLE I. CONFIGURATION OF EACH NODE IN THE DISTRIBUTED SYSTEM Node NameNode DataNode1 DataNode2 DataNode3 CPU Intel Core i7-3770K 4.5GHz AMD Athlon II X4 631 2.8GHz AMD Athlon II X4 631 2.8GHz AMD Athlon II X4 631 2.8GHz RAM 8G 2G 2G 2G IP 192.168.0.1 192.168.0.21 192.168.0.22 192.168.0.23 B. Load Performance Testing of the System For Hadoop medical image retrieval system, the CPU usage rate of each node in 400000 medical images is shown in Figure 5. From Figure 5 it is known that due to there are only two Map tasks, the tasks are respectively assigned to DataNode1 and DataNode3. In the t1 and t2 moment, the Map tasks of the two nodes are in the execution; in t3 moment, the Map task in DataNode3 has been completed and the Reduce task is started in this node, while the Map task in DataNode1 is still in the complementation; in t4 moment, the Map task in DataNode1 is completed, and DataNode1 transfers the intermediate result generated from the Map task to DataNode3 to conduct the processing of Reduce; in t5 moment, only DataNode3 is processing the Reduce task, while DataNode1 and DataNode2 are idle; in t6 moment, the whole retrieval task is finished, each node is in the idle state. For 800000 and one million medical images, the CPU usage rate of each node is shown in Figure 6 and Figure 7. From Figure 6 and 7 it is known that the loading condition of each node is similar to that of 400000 medical images. C. Result of the Medical Image Retrieval After uploading a medical image, and using the Hadoop medical image system to retrieve, the results are shown in Figure 8. From Figure 8 it is known that the retrieval results are relatively better. The results show that the Hadoop distributed medical image system is based on Hadoop, and uses the Map/Reduce method to decompose the tasks, which transforms the traditional single-node working mode into the teamwork between all the nodes in the cluster, and splits the parallel tasks to the spare nodes for processing, improves the retrieval efficiency of the medical image. D. Performance Comparison with the Traditional Method 1) Contrast of Storage Performance With different number of medical images, and under the situation of different nodes, the storage time of the images is shown in Figure 9. From Figure 9 it is known that, when the number of the medical image is less than 200000, the difference of the storage performance between the two systems is little. But with the increasing JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 of the image number, the storage time of the B/S single-node system increases sharply, while that of the Hadoop distributed system grows slowly. At the mean time, the storage performance of the text-based system is superior to that of the traditional Hadoop image processing system. This is because the traditional Hadoop image processing system is still using the traditional uploading method, which only uses the Map/Reduce method in the process of image retrieval, while the text-based system uploads the medical images to HDFS through the method of Map/Reduce. DataNode1 DataNode2 DataNode3 80 60 40 20 0 t1 t2 t3 t4 Point in time t5 t6 Figure 5. CPU Usage Rate of Processing 400000 Medical Images CPU utilization % 100 DataNode1 DataNode2 DataNode3 80 60 is known that, when the size of the medical image is small, the difference of the retrieval time between the distributed system and the B/S single-node system is little. With the increasing of the medical image’s number, the retrieval time of the two systems increases accordingly. But the retrieval time of the B/S single-node system grows with larger amplitude, while that of the Hadoop medical image system grows more slowly. That is mainly due to the advantage of using the Map/Reduce parallel computing, which assigns the medical image retrieval tasks to multiple nodes, improves the retrieval efficiency of the medical images. At the same time, the more nodes there are, the faster the speed will be. By increasing the nodes of the Hadoop system, the performance of the image retrieval system is improved. Compared with the traditional Hadoop image retrieval system, the text-based image retrieval system adopts the Map/Reduce method to conduct the parallel processing for both image storage and image matching. Relatively to the traditional Hadoop image retrieval system, which only adopts Map/Reduce method for image matching, the text-based retrieval system reduces the time to scan and search the whole medical image feature library and the time of medical image matching, improves the image retrieval efficiency. 6000 40 20 0 t1 t2 t3 t4 Point in time t5 t6 Figure 6. CPU Usage Rate of Processing 800000 Medical Images Storage time (in seconds) CPU utilization % 100 221 4000 3000 2000 1000 0 100 DataNode1 DataNode2 DataNode3 80 60 20 40 60 80 100 Medical image number (m) 120 Figure 9. Storage time comparison within three systems 40 440 Image system (B/S) 20 0 t1 t2 t3 t4 Point in time t5 t6 Figure 7. CPU usage rate of processing one million medical images To retrieve images Retrieve the time (in seconds) CPU utilization % Image system (B/S) Traditional Hadoop retrieval system In this paper, the Hadoop retrieval system 5000 Traditional Hadoop retrieval system In this paper, the Hadoop retrieval system 340 240 140 40 The retrieval results 20 40 60 80 100 120 Figure 10. Medical image retrieval efficiency comparison between two systems Figure 8. Result of the medical image retrieval 2) Contrast of Retrieval Efficiency With different size of medical image library, under the situation of different nodes, the retrieval time of the medical images is shown in Figure 10. From Figure 10 it © 2014 ACADEMY PUBLISHER 3) Contrast of Retrieval Results For different types of medical images, by using the Hadoop and traditional retrieval system to conduct the comparison experiment, the precision rate and recall rate are shown in Table 2 and Table 3. From Table 2 and Table 3 it is known that the precision rate and recall rate of the text-based Hadoop system are slightly higher than those of the traditional Hadoop image retrieval system and B/S single-node image retrieval system, the 222 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 TABLE II. PRECISION RATE (%) COMPARISON WITHIN MULTIPLE TYPES OF MEDICAL IMAGES Different Types of Medical Images Images of Brain CT Images of Brain MRI Images of Skin-micro Images of X-ray Breast HRCT of Lung TABLE III. Different Types of Medical Images Images of Brain CT Images of Brain MRI Images of Skin-micro Images of X-ray Breast HRCT of Lung Text-based Retrieval System 95.04 91.61 93.67 91.46 93.52 Text-based Retrieval System 92.21 90.32 90.38 90.82 91.10 CONCLUSION CBMIR medical image retrieval is a data-intensive computing process, the traditional B/S single-node retrieval system has the defects of low efficiency and poor reliability and so on. Thus, a kind of Hadoop medical image retrieval system is put forward. The results of the simulation test show that the Hadoop medical image retrieval system improves the efficiency of the image storage and image retrieval, obtains a better retrieval result, and can satisfy the real-time requirements of the medical image retrieval. Especially when deals with the massive medical images, it has the advantages the traditional B/S single-node system cannot compared with. Therefore, the working focuses in the future are improving the transmission speed of data between the Map task and the Reduce task, reducing more time consumption which is due to the transfer of information, to further improve the execution efficiency of the existing image retrieval system. REFERENCES [1] Song ZHen, Yan Yongfeng. Interest points in images integrated retrieval features of the. 2012 based on computer applications, 32 (10) pp. 2840-2842. [2] Zhang Quan, Tai Xiaoying. Relevance feedback Bayesian in medical image retrieval based on. Computer Engineering, 2008, 44 (17) pp. 158-161. [3] Yu Sheng, Xie Li, CHeng Yun. Image color and primitive features of computer application based on. 2013, 33 (6) pp. 1674-1708. © 2014 ACADEMY PUBLISHER B/S Single-node Retrieval System 94.63 91.28 92.26 90.67 92.53 RECALL RATE (%) COMPARISON WITHIN MULTIPLE TYPES OF MEDICAL IMAGES advantages over the precision rate and recall rate is not obvious. But for the large-scale medical image retrieval system, the merits of the system performance are mainly measured by the image retrieval efficiency. Through Figure 10 it is known that the text-based Hadoop distributed system effectively reduces the retrieval time of the medical image, improves the retrieval efficiency of the medical image, which better solves the problem that the massive medical images retrieval has a low efficiency, obtains a relatively satisfactory retrieval results. IV. Traditional Retrieval System 94.98 91.58 92.93 91.09 92.93 Traditional Retrieval System 91.26 89.84 90.32 90.04 90.57 B/S Single-node Retrieval System 91.59 90.94 90.33 89.60 89.31 [4] FAY C, JEFFREY D, SANYJAY G, et al. Bigtable: A distributed storage system for structured data// Proceedings of the 7th Symposium on Operating Systems Design and Implementat. Seattle: WA, 2006, 276-290. [5] KEKRE H B, THEPADE S, SANAS S. Improving performance of multileveled BTC based CBIR using sundry color spaces. International Journal of Image Processing, 2010, 4(6) pp. 620-630. [6] Liye Da, Lin Weiwei. A Hadoop data replication method of computer engineering and applications, 2012, 48 (21) pp. 58-61. [7] Wang Xianwei, Dai Qingyun, Jiang Wenchao, Cao Jiangzhong. Design patent image retrieval methods for MapReduce. Mini micro system based on 2012, 33 (3, 626-232.). [8] SANJAY G, HOWARD G, SHUNTAK L. The Google File System// Proceedings of the 19th ACM Symposium on Operating Systems Principles. Bolton Landing: ACM, 2003 pp. 29-43. [9] Liang Qiushi, Wu Yilei, Feng Lei. MapReduce micro-blog user search ranking algorithm of computer application based on. 2012, 32 (11) pp. 2989-2993. [10] JEFFREY D, SANJAY G. Mapreduce: a flexible data processing tool. Communications of the ACM 2010, 53(1) pp. 72-77. [11] KONSTANTIN S, HAIRONG K, SANJAY R, et al. Hadoop distributed file system for the Grid// Proceedings of the Nuclear science Symposium Conference Record (NSS/MIC). IEEE: Orlando, 2009 pp. 1056-1061. [12] JEFFREY D, SANJAY G. Mapreduce: simplified data processing on large clusters // Proceedings of the 6th Symposium on Operating Systems Design and Implementat. IEEE: San Francisco, 2004 pp. 107-113. [13] Lian Qiusheng, Li Qin, Kong Lingfu. The texture image retrieval combining statistical features of the circular symmetric contourlet and LBP. Chinese Journal of computers, 2007, 30 (12) pp. 2198-2204. [14] Wang Zhongye, Yang Xiaohui, Niu Hongjuan. Brushlet domain retrieval algorithm based on complex computer simulation of. image texture characteristics, 2011, 28 (5) pp. 263-266, 282 [15] ZHANG J, LIU X L, LUO J W, BO L T N. DIRS: Distributed image retrieval system based on MapReduce// The Network Security and Soft Computing Technologies. IEEE: Maribor, 2010 pp. 93-98. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 223 Kinetic Model for a Spherical Rolling Robot with Soft Shell in a Beeline Motion Zhang Sheng, Fang Xiang, Zhou Shouqiang, and Du Kai PLA Uni. of Sci & Tech/Engineering Institute of Battle Engineering, Nanjing, China Email: [email protected], [email protected] Abstract—A simplified kinetic model called Spring Pendulum is developed for a spherical rolling robot with soft shell in order to meet the needs of attitude stabilization and controlling for the robot. The elasticity and plasticity of soft shell is represented by some uniform springs connected to the bracket in this model. The expression of the kinetic model is deduced from Newtonian mechanics principles. Testing data of the driving angle acquired from a prototype built by authors indicate that testing data curve accords to the theoretic kinetic characteristic curve, so the kinetic model is validated. Index Terms—Soft Shell; Spherical Rolling Robot; Kinetic Model I. INTRODUCTION Spherical robot is a kind of robots which can roll by themselves. More and more researchers are focusing on spherical robot due to their many advantages on moving and their hermetical structure. More than 10 species spherical robots and their accessories are advanced as well [1-4]. These robots are preliminarily applied in many domains. All these robots are mainly constructed by hard shell. Soft-shell spherical robot has many advantages like good cross ability, changeable bulk and good impact resistance comparing to the hard-shell robots. Li Tuanjie and his group researched the light soft-shell spherical robot driven by wind power and founded the equation to describe the ability of the robot to cross the obstacle without deeply research about how much would the softshell influence the spherical robot [5]. Sugiyama Y, Irai S and other people researched the transformation-driven spherical robot. It uses several shape memory alloys to support and control the reduction by change the amount of the voltage to make the robot roll like crawling. It moves slowly and now still stays at the stage of checking the principle [6]. Fang Xiang and Zhou Shouqiang have gained the patent of automatically aerating and discharging soft-shell spherical robot [7]. On the modeling of spherical robot, Ref. [8, 9] began with the principle of kinematics, found the dynamic model of the hard-shell spherical robot walking along a straight line driven by pendulum. Since they ignored the quadratic items, there would be some errors in the dynamic model when the robot moves in high speed. In order to make the robot start and stop steady and speed controllable, Ref. [10] researched the kinematic model of a kind of spherical robot driven by two masses deflected © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.223-229 from the centre of the sphere moving straight in order to control the robot starting and stopping smoothly. According to the description of Euler angles, Ref. [11] found a kinematic model of the spherical robot. Ref. [12, 13] found the dynamic model of a hard-shell spherical robot from the angle of Newton mechanics. They simplified the model of a straight moving spherical robot to a single pendulum hung on the centre of the ball connected to the shell through the drive motor. They all had a simulation experiment on the dynamic model of their spherical robot respectively, but didn’t check the data from the experiment to prove the correctness of the model. So this paper will deeply analyze the characteristics of the soft-shell spherical robot on kinematics and dynamics to establish its mechanic model and use the experimental sample to check the correctness of the model. In this paper we consider a class of spherical rolling robots actuated by internal rotors. Under a proper placement of the rotors the center of mass of the composite system is at the geometric center of the sphere and, as a result, the gravity does not enter the motion equations. This facilitates the dynamic analysis and the design of the control system. The idea of such a rolling robot and its design was first proposed in [14], and later on studied in [15]. Also relevant to our research is the study [16] in which the controllability and motion planning of the rolling robots with the rotor actuation were analyzed. A spherical robot is a new type of mobile robot that has a ball-shaped outer shell to include all its mechanisms, control devices and energy sources inside it. Thisstructural characteristic of a spherical robot helps protect the internal mechanisms and the control system from damage. At the same time, the appearance of a spherical robot brings a couple of challenging problems in modeling, stabilization and position tracking (path following). Two difficulties hinder the progress of the control of a spherical robot. One is the highly coupled dynamics between the shell and inner mechanism, and another is that although different spherical robots have different inner mechanism including rotor type, car type, slider type, etc (Joshi, Banavar and Hippalgaonka, 2010), most of them have the underactuation property, which means they can control more degrees of freedom (DOFs) than drive inputs. There are still no proven general useful control methodologies for spherical robots, although 224 researchers attempted to develop such methodologies. Li and Canny (Li and Canny, 1990) proposed a three‐step algorithm to solve the motion planning problems of a sphere, the position coordinates of the sphere can converge to the desired values in three steps. That method is complete in theory, but it can only be applied to spherical robots capable of turning a zero radius as the configurations are constrained. Mukherjee and Das et al. (Das and Mukherjee, 2004), (Das and Mukherjee, 2006) proposed a feedback stabilization algorithm for four dimensional reconfiguration of a sphere. By considering a spherical robot as a chained system Javadi et al. (Javadi and Mojabi, 2002) established its dynamic model with the Newton method and discussed its motion planning with experimental validations. As compared to other existing motion planners, this method requires no intensive numerical computation, whereas it is only applicable for their specific spherical robot. Bhattacharya and Agrawal (Bhattacharya and Agrawal, 2000) deduced the first‐ order mathematical model of a spherical robot from the non‐slip constraint and angular momentum conservation and discussed the trajectory planning with minimum energy and minimum time. Halme and Suomela et al. (Halme, Schonberg and Wang, 1996) analyzed the rolling ahead motion of a spherical robot with dynamic equation, but they did not consider the steering motion. Bicchi, et al. (Antonio B. et. al., 1997), (Antonio and Alessia, 2002) established a simplified dynamic model for a spherical robot and discussed its motion planning on a plane with obstacles. Joshi and Banavar et al. (Joshi, Banavar and Hippalgaonka, 2009) proposed a path planning algorithm for a spherical mobile robot. Liu and Sun et al. (Liu, Sun and Jia, 2008) deduced a simplified dynamic model for the driving ahead motion of a spherical robot through input ‐ state linearization and derived the angular velocity controller and angle controller respectively with full feedback linearized form [17]. It should be noted the even though the gravitational term is not presented; the motion planning for the system under consideration is still a very difficult research problem. In fact, no exact motion planning algorithm has yet been reported for the case of the actuation by two rotors. In [18], the motion planning problem was posed in the optimal control settings using an approximation by the Phillip Hall system [19]. However, since the robot dynamics are not nilpotent, this is not an exact representation of the system and it results to inaccuracies. An approximate solution to the motion planning problem using Bullo’s series expansion was constructed in [19], but that has been done for the case of three rotors. An exact motion planning algorithm is reported only in [6], but as we will see it later it is not dynamically realizable. Thus, the motion planning in dynamic formulation for the robot under consideration is still an open problem and a detailed analysis of the underlying difficulties is necessary. This constitute the main goal of our paper. The paper is organized as follows. First, in Section II we provide a geometric description and a kinematic model of the © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 system under consideration and then, in Sections III derive its dynamic models. A reduced dynamic model is then furnished in Section IV, and conditions of dynamic realizability of kinematically feasible trajectories of the rolling sphere are established in Section V. A case study, dealing with the dynamic realizability of tracing circles on the surface of the sphere, is undertaken in Section VI. Finally, conclusions are drawn in Section VII. II. DYNAMICS MODEL OF SOFT-SHELLED SPHERICAL ROBOT A. Constitution The soft-shelled spherical robot developed by PLA Uni. of Sci & Tech is shown in fig. 1. There are 3 electromotors inside the spherical shell to provide moment of force input. One steering motor is connected to a bevel gear rolling in a gear circle. The battery and load are connected to the bevel gear as well in order to control the rotation direction. The other two drive motors in-phase and their shells are fixed on the bracket, while the armatures of them are on the shell of the spherical robot to provide drive moment of force. Fig. 1 illustrates the overview of the internal drivingmechanism. The internal driving mechanism is composed of two rotors with their axes perpendicular to each other. Each axis is called Yaw axis and Pitch axis, respectively. An actuator is put at the bottom of each axis and a rotor is put at the both ends of Pitch axis. The spherical shell is driven by the reaction torque generated by actuators. The internal driving device is fixed to the spherical shell at a point P. The point P is at the geometric center of the sphere. The gravity point of the internal driving device does not lie at the center of the sphere.Due to this asymmetry, the robot tends to be stable when the weights are beneath the center, while it tends to be unstable when they are above the center. This is important to realize both the stand-still stability and quick dynamic behavior by a single mechanism. Figure 1. Planform for inner machines of the spherical soft shell robot Figure 2. Appearance & planform for inner machines of the spherical soft shell robot B. Exterior Structure Fig. 2 illustrates the overview of the exterior structure. The exterior part is composed of two hemispheres and JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 225 circular board that divides the sphere in half. All electronic components such as sensors, motor drivers, and a microcomputer are put on the circular board. Weight of electronic components are large enough that we can not neglect them when we construct the dynamic model. Moreover, distribution of weight on the circular board is inequable. Therefore, the gravity point of exterior structure does not lie at the center of sphere. By considering this asymmetry property in the dynamic model, we can construct more accurate model and simulate the effects of distribution of weight on motions of the robot. Establishing the inertial coordinate system XOY on the ground and decomposing the spherical shell robot into two subsystems: spherical shell and “frame + pendulum”. The two subsystems connected by the bearing force and the bearing countermoment associated with each other. The positive direction of every parameter shows in Fig. 4. When the system is pure rolling in the horizontal plane in a straight line, the displacement of spherical and pendulum on the direction of X and Y is Xb, Yb, Xp, Yp respectively according to kinematic law: C. System Models in a Beeline Motion Without considering the viscous friction on robot produced by air resistance, the robot can be decomposed into two subsystems: one is the bracket and spherical shell, another is the single pendulum, then we make the following assumptions: a. The spherical shell is equivalent to a rigid and thin spherical shell which quality is mb and radius is R. There is no deformation of the spherical shell when it is contact with the ground, the soft and elastic properties of the spherical shell is reflected by the relative displacement of different directions between spherical shell and bracket. b. The component inside the ball equivalent to a solid ball which quality is M and radius is r beside the storage battery and load; They are equivalent to the connection through the radial light spring, known as the model called spring pendulum (Fig. 3).In Figure 3, the distance offset centre affected by the spring force is △R which becomes △ X and △ Y when it decomposed into horizontal and vertical displacement.(Fig. 4). (1) Figure 3. The model called spring pendulum c. The battery and load equivalent to a particle which quality is m and hinged with solid ball in center by massless connecting rod which length is L. Xb R Yb 0 X X L sin X bx p Yp L L cos Y Taking the above equation second derivative with time,then we can get the acceleration of spherical and pendulum on the direction of X and Y is abx, aby, apx, apy : abx R aby 0 2 a px R L cos L sin X a L sin 2 L cos Y py The force and moment balace of vector mechanics and moment of momentum theorem for spherical shell and pendulum can be presented as two equations: (3) and (4) F0 FX = mb M abx FY mb M g FN 0 2 2 2 2 T F0 R +FX Y FY X 3 mb R 5 Mr F = F 0 0 N (3) FX ma px FY mg ma py 2 T m RL cos mgL sin mL (4) where the static friction is F0 caused by the ground. The orthogonal component force is FX, FY when the bracket forces on shell in the plane; The supportive force is FN, the angle that rotated relative to the ground by the shell is φ and the angle that relative to the vertical direction by the pendulum is θ; The friction of static coefficient is μ0. When considering about the constraint when the ball is pure rolling and assuming that the motor is rotating in constant angular velocity as ωi, we can get the equation (5): i t Figure 4. Mechanics analysis for the bracket & shell and the pendulum © 2014 ACADEMY PUBLISHER (2) (5) Dates of manuscript submission, revision and acceptance should be included in the first page footnote. Remove the first page footnote if you don’t have any information there. Taking equations (1), (2) and (5) into equations (3) and (4) and sorting, then omitting the quadratic term of △X and △Y as small high-end quantity, where sinθ≈θ, cosθ≈1. 226 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 mLR 2 mL2 53 mb R 2 75 Mr 2 mR 2 mgL mRX 0 2 2 2 2 2 2 m0 LR mL mLR 3 mb R 5 Mr m0 LR mgL mb M g 0 R m0 RY 0 III. CALCULATION AND ANALYSIS OF DYNAMIC MODEL OF SOFT SHELL SPHERICAL ROBOT Equation (4) is composed of two order nonlinear differential equation. Generally it is hard to get the analytical solution. So this paper used the method of difference approximation for the numerical solution. The values of X and Y are relevant to θ. First of all, omitting the related items of X and Y , it can get the dynamic equation of hard spherical robot. Substituting relative parameters: the mass of spherical mb=0.62kg, radius R=0.39m, the mass of internal mechanism and support M=3.12 ㎏, equivalent radius r=0.07m; the mass of battery and load m=6.29kg, L=0.28m, μ0 =0.5. Defining the initial conditions: the 0 time, θ(0)= (0)=0. Substituting relative parameters, difference discretization of the two differential equations, taking 0.05 as steps,it can get the numerical solutions of hard shell spherical robot driving angle θ, which was shown by solid line in figure 4. By determination, the values of X and Y were less than 10-2, it may be assumed as constants, X = Y =0.02m/s2. Substituting equation (6), it can get dynamic equation on soft spherical robot; the numerical solution of the equations was shown in dotted line in figure 4. 6 Hard Shell Soft Shell 5 4 θ/rad 3 2 1 0 (6) rolling robot move with a desired translational velocity by a simple feedback controller. Based on the observed state shown in the above subsection, the driving torque τ in (5) is given by a state feedback law. It should be noted that the counter torque -τ is applied to the inner subsystem composed of the gyro case and the gyro as shown in (3). Since the gyro has a large angular momentum, nutation of the subsystem may be caused by the angular momentum. However, the nutation was not seen in the results of preliminary experiments. It seems that the nutation is quickly damped by the frictional torque between the outer shell and the gyro case. In this paper, we adopt Strategy A and use the feedback law (6) in the experiments. In Strategy A, the rotational motion of the outer shell around the vertical axis would not be controlled by (6). However, the rotation around horizontal axes would approach the desired horizontal rotation, and the experimental results in the next section will show that the spherical rolling robot can achieve a translational motion by the feedback law (6). The experimental prototype quality, size and other parameters are the same to the last chapter, the filled pressure of spherical shell is 1.8×105Pa and the battery voltage is 12V. Using photoelectric encoder to control the speed of driving motor should be maintained in the π rad/s and the robot starts from rest. PID is used to control the steering angle of steering motor so that to keep the robot lateral stability and the robot can keep a horizontal linear to move. What’s more, it also makes use of the three axis accelerometer and a three axis gyro sensors to measure the three axis acceleration and angular velocity at the same time. The sampling frequency is 20Hz. Through data processing, we can get the change trend of driven angle θ, which is shown in dotted line in Fig. 6. 6 -1 0 0.5 1 1.5 2 t/s 2.5 3 3.5 Hard Shell Soft Shell 4 5 Figure 5. Driving angle curve from theoretic kinetic model for the robot with hard shell & soft shell 4 IV. PROTOTYPE TEST A. Test Conditions To demonstrate the feasibility of the new driving mechanism shown in Sec. II, we make the spherical © 2014 ACADEMY PUBLISHER θ/rad 3 As can be seen from Fig. 5, in the case of a constant speed of drive motor, The change of horizontal line pure rolling soft shell spherical robot driven angle has consistent trend with hard shell spherical robot of the same parameters: firstly, the maximum swing angle appeared in a relatively short time, and then decreased rapidly, finally, kept in a certain angle oscillation. Soft shell spherical robot driven angular oscillation amplitude was bigger than the hard shell spherical robot, but the maximum swing angle was smaller, the impact was relatively small. 2 1 0 -1 0 0.5 1 1.5 2 t/s 2.5 3 3.5 4 Figure 6. Driving angle curve from testing result & theoretical result V. CASE STUDY Spherical rolling robots have a unique place within the pantheon of mobile robots in that they blend the efficiency over smooth and level substrates of a traditional wheeled vehicle with the maneuverability in the holonomic sense of a legged one. This combination of normally exclusive abilities is the greatest potential JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 benefit of this kind of robot propulsion. An arbitrary path that contains discontinuities can be followed (unlike in the case of most wheeled vehicles) without the need for the complex balancing methods required of legged robots. However, spherical rolling robots have their own set of challenges, not the least of which is the fact that all of the propulsive force must somehow be generated by components all of whom are confined within a spherical shape. This general class of robots has of late earned some notoriety as a promising platform for exploratory missions and as an exoskeleton. However, the history of this type of device reveals that they are most common as a toy or novelty. Indeed, the first documented device of this class appears to be a mechanical toy dating to 1909 with many other toy applications following in later years. Many efforts have been made to design and construct spherical rolling robots and have produced many methods of actuation to induce self-locomotion. With a few exceptions, most of these efforts can be categorized into two classes. The first class consists of robots that encapsulate some other wheeled robot or vehicle within a spherical shell. The shell is then rolled by causing the inner device to exert force on the shell. Friction between the shell and its substrate propels the assembly in the same direction in which the inner device is driven. Early examples of this class of spherical robot had a captured assembly whose length was equal to the inner diameter of the spherical shell, such as Halme et al. and Martin, while later iterations included what amounts to a small car that dwells at the bottom of the shell, such as Bicchi et al. The second major class of spherical robots includes those in which the motion of the sphere is an effect of the motion of an inner pendulum. The center of mass of the sphere is separated from its centroid by rotating the arm of the pendulum. This eccentricity of the center of mass induces a gravitational moment on the sphere, resulting in rolling locomotion. Examples of these efforts are those of Michaud and Caron, Jia et al. Javadi and Mojabi [16] as well as Mukherjee et al. have devised systems that work using an eccentric center of mass, but each moves four masses on fixed slides within the spherical shell to achieve mass eccentricity instead of tilting an inner mass. Little work has been done outside these two classes. Jearanaisilawong and Laksanacharoen and Phipps and Minor each devised a rendition of a spherical robot. These robots can achieve some rolling motions when spherical but are capable of opening to become a wheeled robot and a legged walking robot respectively. Sugiyama et al. created a deformable spherical rolling robot using SMA actuators that achieves an eccentric center of mass by altering the shape of the shell. Finally, Bart and Wilkinson and Bhattacharya and Agrawal each developed spherical robots where the outer shell is split into two hemispheres, each of which may rotate relative to each other in order to effect locomotion. The condition of dynamic realizability (4) imposes a constraint on the components of the vector of the angular velocity ω0, and this constraint needs to be embedded © 2014 ACADEMY PUBLISHER 227 into the motion planning algorithms. If the motion planning is based on the direct specification of curves on the sphere or on the plane, as is the case in many conventional algorithms [10], [12], the embedding can be done as follows. Assume that the path of the rolling carrier is specified by spherical curves, and the structure of the functions u0(t) and v0(t) up to some certain constant parameters is known. The kinematic equations (3) can now be casted as va Ru0 sin cos Rv0 cos (7) ua Ru0 cos cos Rv0 sin (8) In [6] the rotors are mounted on the axes n1 and n3, so the condition of dynamic realizability becomes n2·Jcω 0 = 0. However, the motion planning algorithm in [6] is designed under the setting n2·ω0 = 0, which is not equivalent to the condition of dynamic realizability. To guarantee the dynamic realizability, express ω z in the last formula through ω x and ω y. In doing so, we first need to express ω x, ω y as well as n3x, n3y, n3z in terms of the contact coordinates. From the definition of the angular velocity 0 =RRT , one obtains x u0 cos v0 sin v0 cos (9) y u0 cos v0 cos v0 sin (10) while n3 is simply the last column of the orientation matrix R. Therefore, n3 x sin u0 cos cos u0 sin v0 sin (11) n3 y sin u0 sin cos u0 sin v0 cos (12) n3 z cos u0 cos v0 (13) Having expressed everything in terms of the contact coordinates, one can finally replace in (13) by tan u0 cos v0 If we, formally, set here k = 0 the variable will be defined as in the pure rolling model. However, in our case k > 1. Consider a maneuver when one traces a circle of radius a on the spherical surface. This maneuver is a component part of many conventional algorithms (see, for instance [6], [10], [13], [14]). Tracing the circle results to the nonholonomic shift Δ h(a) of the contact point on the plane and to the change of the holonomy (also called as the geometric phase), Δ φ(a). By concatenating two circles of radii a and b, one defines a spherical figure eight. By using the motion planning strategy [13], based on tracing an asymmetric figure eight n times, one can, in principle, fabricate an exact and dynamically realizable motion planning algorithm. A detailed description of the circle-based motion planning algorithm is not presented in this paper due to 1 k u0 sin v0 kv0 228 the page limitation. However, in the remaining part of his section we illustrate under simulation an important feature of this algorithm—the dependance of the nonholonomic shift on the inertia distribution specified by the parameters k. B. Results Analysis The most apparent behavior that the spherical robot prototype displayed was a tendency to wobble or rock back and forth with little damping. For example, when the sphere was at rest with the pendulum fixed inside, bumping the sphere would cause it to oscillate back and forth about a spot on the ground. The sphere also wobbled if a constant pendulum drive torque was suddenly applied to the sphere starting from rest. In this case, it would accelerate forward while the angle between the pendulum and the ground would oscillate. Since the pendulum was oscillating, the forward linear velocity of the sphere also appeared to oscillate as it accelerated. When traveling forward and then tilting the pendulum a fixed angle to the side to steer, the radius of the turn would oscillate as well. Another behavior that was observed but not found to be discussed in the literature was the tendency of the primary drive axis to nutate when the sphere was traveling at a reasonable forward velocity. Specifically, the primary drive axis (the axis of the main drive shaft attached to the spherical shell) would incur some angular misalignment from the axis about which the sphere was actually rolling. When traveling slowly (estimated to be less than 0.5 m/s) this nutating shaft behavior, which could be initiated by a bump on the ground, would damp out quickly. When traveling at a moderate speed, the nutation would persist causing the direction of the sphere to oscillate back and forth. When attempting to travel at high speed (estimated to be above 3 m/s) the angular misalignment between the axes would go unstable until the primary drive axis was flipping end over end. Even during a carefully controlled test on a level, smooth surface The angle of inclination of the gyro increased rapidly from 0 [deg] to about 10 [deg] by t = 0.25 [s], and kept increasing slowly to about 20 [deg] for 0.25 ≤ t ≤ 3 [s]. It seems that the increase after t = 0.25 [s] was caused by the rolling friction at the contact point between the outer shell and the floor surface that was covered with carpet tiles. The rolling friction may change the total angular momentum of the robot. The friction torque about the vertical may also decrease the total angular momentum, when ω(0) 10z is not zero. We will examine the behavior of the inclination angle for other types of floor surfaces and for Strategy B in future works. Moreover, due to the limited power of the DC motors, the maximum angular speed of the outer shell that was achieved in the experiments was about 1.5πrad/s. Comparing with the driven angle measured curve (dashed line) and the results of theoretical calculations (solid line) of the soft shell spherical robot in Fig. 5, we can see that the measured curve is basically agree with the theoretical calculation results, which can prove the © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 correctness of the dynamics model of “spring pendulum” of soft shell spherical robot in this paper. The theoretical and experimental curves calculated difference: (1) the final angular oscillation amplitude is smaller than the theoretical analysis, probably because the theoretical model does not consider the energy loss of internal movement; (2) the measured maximum pendulum angle is bigger than the theoretical results, which is probably caused by the modeling error. For example, the parameters of support quality equivalent radius r is difficult to be precise enough, and the other reason to result in modeling errors is that the support eccentric displacement acceleration and are X Y approximately regarded as constant. VI. CONCLUSIONS A dynamic model named spring pendulum of the softshell spherical robot is advanced in this paper. The theoretic curve of drive angle for time is educed in the condition of invariable drive motor rev from this dynamic model. The test result on a soft-shell prototype is identical to the theoretic result which proves the validity of the spring pendulum model. The rules of drive angle fluctuation and the influence characteristics will be proposed by means of numerical research on the spring pendulum model in order to stabilize and control the attitude of soft-shell spherical robot. ACKNOWLEDGMENT This work was supported in part by a grant from Chinese postdoctoral fund. REFERENCES [1] Jiang Jie, Wang Hai-sheng, Su Yan-ping. Structural design of the internal and external driven spherical Robots. Machinery, 2012, 03 pp. 42-44. G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529–551, April 1955. [2] Sang Shengju, Zhao Jichao, Wu Hao, Chen Shoujun, and An Qi. Modeling and Simulation of a Spherical Mobile Robot. ComSIS, 2010, 7(1), Special Issue: 51-61J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp. 68–73. [3] Mattias Seeman, Mathias Broxvall, Alessandro Saffiotti, Peter Wide. An Autonomous Spherical Robot for Security Tasks IEEE International Conference on Computational Intelligence for Homeland Security and Personal Safety, Alexandria, USA, 2007 pp. 51 - 55 I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350. [4] S. Chaplygin, “On rolling of a ball on a horizontal plane,” Mathematical Collection, St. Petersburg University, vol. 24, pp. 139–168, 1903, (In Russian; English transl. : Regular & Chaotic Dynamics, Vol. 7, No. 2, 2002, pp. 131–148). [5] Li Tuanjie, Liu Weigang. Dynamics of the Wind-driven Spherical Robot. Acta Aeronautica Et Astronautica Sinica, 2010, 31(2) pp. 426-430. R. Nicole, “Title of paper with only first word capitalized”, J. Name Stand. Abbrev, in press. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 [6] Sugiyama Y. Circular / Spherical Robots for Crawling and Jumping. Proceedings of the 2005 IEEE lntemational Conference on Robotics and Automation, Barcelona, Spain, 2005 pp. 3595-3600. Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron spectroscopy studies on magnetooptical media and plastic substrate interface, “IEEE Transl. J. Magn. Japan, vol. 2, pp. 740–741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982]. [7] China Patent, CN202243763U. 2012-05-30M. [8] Zhao Kai-liang, Sun Han-xu, Jia Qing-xuan, et al. Analysis on acceleration characteristics of spherical robot based on ADAMS. Journal of Machine Design, 2009, 26(7) pp. 2425. [9] ZHENG Yi-li1, SUN Han-xu. Dynamic modeling and kinematics characteristic analysis of spherical robot. Journal of Machine Design, 2012, 02 pp. 25-29. [10] ZHAO Bo, WANG Pengfei, SUN Lining, et al. Linear Motion Control of Two-pendulums-driven Spherical Robot. Journal of Mechanical Engineering, 2011, 11 pp. 1-6. [11] FENG Jian-Chuang, ZHAN Qiang, LIU Zeng-Bo. The Motion Control of Spherical Robot Based on Sinusoidal Input. Development & Innovation of Machinery & Electrical Products, 2012, 04 pp. 7-9. [12] YUE Ming, LIU Rong-qiang, DENG Zong-quan. Research on the effecting of coulomb friction constraint to the Spherical robot. Journal of Harbin Institute of Technology, 2007, 39(7) pp. 51. [13] YUE Ming, DENG Zong-quan, LIU Rong-qiang. Quasistatic Analysis and Trajectory Design of Spherical Robot. Journal of Nanjing University of Science and Technology (Natural Science), 2007, 31(5) pp. 590-594. [14] V. A. Joshi, R. N. Banavar, and R. Hippalgaonkar, “Design and analysisof a spherical mobile robot, “ Mech. Mach. Theory, vol. 45, no. 2, pp. 130–136, Feb. 2010. © 2014 ACADEMY PUBLISHER 229 [15] E. Kayacan, Z. Y. Bayraktaroglu, and W. Saeys, “Modeling and controlof a spherical rolling robot: A decoupled dynamics approach,” Robotica, vol. 30, pp. 671–680, 2011. [16] M. Ahmadieh Khanesar, E. Kayacan, M. Teshnehlab, and O. Kaynak, “Extended Kalman filter based learning algorithm for type-2 fuzzy logic systems and its experimental evaluation, “IEEE Trans. Ind. Electron. vol. 59, no. 11, pp. 4443–4455, Nov. 2012. [17] Haiyan Hu, Pengfei Wang, Lining Sun. Simulation Platform of Monitoring and Tracking Micro System for Dangerous Chemicals Transportation. Journal of Networks, Vol 8, No 2 (2013), 477-484, Feb 2013. [18] J. -C. Yoon, S. -S. Ahn and Y. -J. Lee, “Spherical Robot with NewType of Two-Pendulum Driving Mechanism, “ Proc. 15th Int. Conf. on Intelligent Engineering Systems, pp. 275-279, 2011. [19] Q. Zhan, Y. Cai and C. Yan, “Design, Analysis and Experiments of an Omni-Directional Spherical Robot, “Proc. IEEE Int. Conf. on Robotics and Automation, pp. 4921-4926, 2011. Zhang Sheng was born in Jiangsu Province, China on Nov. 13, 1979. He was in PLA Ordnance College, Shi Jiazhuang, Hebei Province, China from 1998 to 2005 and earned his bachelor degree and master degree in ammunition engineering and weapon system application engineering respectively. The author’s major field of study is cannon, autoweapon & amunnition engineering. He was a Lecturer from 2005 to 2013 in PLA International Relationships University. His current and research interests are smart ammunitions. 230 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Coherence Research of Audio-Visual CrossModal Based on HHT Xiaojun Zhu*, Jingxian Hu, and Xiao Ma College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China *Corresponding author, Email: [email protected],{hjxocean, tyut2010cstc}@163.com Abstract—Visual and aural modes are two main manners that human utilize to senses the world. Their relationship is investigated in this work. EEG experiments involving mixed aural and visual modes are designed, utilizing HilbertHuang Transform (HHT) and electroencephalogram (EEG) signal processing techniques. During EEG data processing, I-EEMD method of similar weighted average waveform extension is proposed to decompose the EEG signals, specifically accounting for the problem of end effects and mode mixing existing in the traditional HHT. The main components of are obtained after decomposing the signals including mixed modes with I-EEMD respectively. The correlation coefficient of consistent and inconsistent mixed signal is calculated, and the comparison is made. Investigation on the comparison condition of the correlation coefficient indicates that there is coherence in both the visual and aural modes. Index Terms—EEG; EEMD I. Audio-visual; Coherence; HHT; INTRODUCTION Human obtain information from the outside world with different sensory channels such as vision, hearing, touch, smell and taste. However, the role of different sensory modalities for human memory and learning are not independent of each other. The encouraging research results” Cross modal learning interaction of Drosophila” of Academician Aizeng Guo and Dr. Jianzeng Guo of Chinese Academy of Sciences proves that there are mutually reinforcing effects between modal of visual and olfactory in drosophila’s learning and memory [1]. Then, whether the human’ visual and auditory are able to produce a similar cross-modal collaborative learning effect? Can we take advantage of this learning effect to strengthen the information convey efforts, and then produce the effect of synergistic win - win and mutual transfer on memory? Human beings obtain and understand the information of the outside world by multiple sensory modals [2] [3]. However, the information from multiple modals sometimes may be consistent, and sometimes may be inconsistent. So that it requires the brain to treat and integrate the information and form a unified one. Since vision and hearing are the primary ways to percept the outside world for human [4], the coherence research for the information in the visual and audio channels is particularly important, and it also has the extraordinary significance for discovering the © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.230-237 functional mechanism of the brain. Therefore, the coherence research of audio-visual information and its function in knowing the world and perceiving the environment will contribute to improving the lives of the handicapped whose visual or audio channel is defective, and make the reconstruction of some functions in their cognitive system come true [5]. Meanwhile, it will also give the active boost to improve the visual and audio effect of the machine and further develop the technology of human-computer interaction. Human brains’ integration to the visual and auditory stimuli of the outside world is a very short but complicated non-linear process [6] [7]. In recent years, EEG is widely used in the visual and audio cognitive domain. EEG is the direct reflection of the brain electrical physiological activity, of which the transient cerebral physiological activities are included [8] [9] [10]. Accordingly, some researchers consider the transient process of the brain integrating the visual information and audio information mutually can cause the electric potential on the scalp surface to change [11]. EventRelated Potential (ERP) is the brain potential extracted from EEG and related to the stimulation activities. It can establish the relations between the brain responses and the events (visual or auditory stimulus), and capture the real-time brain information processing and treating process. Thus, in recent years, when the researchers in the brain science field and artificial intelligence field are studying the interaction in multiple sensory and crossing modals, the technology for analyzing ERP has be paid attention to unprecedentedly. In this article, we will discuss the coherence between the visual EEG signal and the audio EEG signal from the perspective of signal processing based on Hilbert-Huang Transform (HHT) [12], and then investigate the mutual relations between the visual and audio modals. Firstly, this paper designs a visual and auditory correlation test experiment; evoked potential data under the single visual stimulus, the single audio stimulus, and the stimulus of audio-visual consistence and the stimulus of audiovisual inconsistence were collected respectively. Then, he main IMF components of single visual signal and single audio signal are decomposed by HHT, and analysis the coherence of visual and audio modals by calculating the correlation coefficient between these components and ERP signal under the stimulus of visual & audio modals. Our paper is organized as follows. Section II describes JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Experiment and Records of Audio-visual Evoked. Section III describes Data Treatment Method in detail. We analyze the I-EEMD Processing and Analysis of Experimental Data, and provide the simulation results in Section IV. In Section V, We conclude this paper. II. EXPERIMENT AND RECORDS OF AUDIO-VISUAL EVOKED EEG A. Design of Experiment The experiment contents include single visual experiment (experiment A), single audio experiment (experiment B) and stimulus experiment of audio-visual modals (experiment C). The stimulation experiment of audio-visual modals is also divided into experiment of consistent audio-visual modals (experiment C1) and experiment of inconsistent audio-visual modals (experiment C2). The materials for visual stimulus include seven elements in total, namely Chinese characters “ba”, “ga”, “a”, the letters “ba”, “ga”, “a” and a red solid circle. The size and lightness of presented Chinese characters or the letters are consistent, and appeared by pseudorandom fashion; the materials for audio stimulus include four sound elements, namely the sound “ba”, “ga”, “a” and a short pure sound “dong”. The sound file is edited by use of Adobe audition with the unified attribute of two-channel stereo, sampling rate of 44100 and resolution ratio of 16-bit. In the experiment of visual evoked potentials, the red solid cycle is the target stimulus, the pictures of other Chinese characters or letters are the non-target stimuli; In the experiment of audio evoked potentials, the short pure sound “tong” is the target stimulus, the other sounds are the non-target stimuli; In the audio-visual dual channels experiment, the visual pictures and the audio sounds combine randomly. When the picture is red solid cycle and the sound is short pure sound “dong”, it is target stimulus, and other combinations are non-target stimuli. The experiment requires the subject to pushing a button to indicate their reactions for the tested target stimuli. This experiment model investigates the ERP data under the non-note model. Three groups of experiment stimulus are all OB (Oddball Paradigm) with the target stimulus rate of 20%. The lasting time for every stimulus is 350ms with the interval of 700ms. Three groups of experiments all include 250 single stimuli, of which 50 are target stimulus (trials). The software E-prime is used to implement this experiment. B. Conditions to be Tested 20 healthy enrolled postgraduates with no history of mental illness (including 10 males and 10 females, righthanded, and their age from 22 to 27 years old) were selected as the subjects. All of them have normal binocular vision or corrected vision and normal hearing. Before the experiment, all of them have signed the informed consent form of the experiment to be tested voluntarily. Before the experiment, scalp of the subjects was kept clean. After the experiment, certain reward was given. Every subject participated in the experience for about one hour, including the experiment preparation and © 2014 ACADEMY PUBLISHER 231 formal experiment process. In the process of the experiment, it was arranged that every subject had three minutes for resting, to prevent the data waveform of ERP being affected because of the subjects’ overfatigue. C. Requirements of Experiment and Electrode Selection This EEG experiment was arranged to be completed in an independent sound insulation room. The subject faces toward the computer monitor and pronunciation speaker. The subject was 80cm away from the screen, the background color was black. In the process of experiment, the subjects were required to be relax, not nervous, keep the sitting posture well, concentrate, not to twist the head, stare at the computer screen with eyes, press the key “space” when the target stimulus appeared, and not to react to the non-target stimulus. EEG data was recorded by NEUROSCAN EEG system with 64-lead. The electrode caps with the suitable size were worn by the subjects. The electrode caps were equipped with AgAgCl electrodes. 10-20 system which is used internationally was employed for the electrode placement. Its schematic diagram is shown as figure 1. The conductive paste is placed between electrode and subject’s scalp. It is required that the impedances of all leads should be lower than 5K . Fp1 A1 Fp2 F7 F3 Fz F4 F8 T3 C3 Cz C4 T4 T5 P3 Pz P4 T6 O1 A2 O2 Figure 1. 10-20 Electrode Lead System In this experiment, 64-lead EEG acquisition system of NEUROSCAN is adopted. However, according to the needs of the experiment, we only use 12 leads among them for analysis. According to the human brain scalp structure partitions and their functions, the visual activities mainly occur in occipital region, and the leads O1, O2 are chosen for analysis; The audio region is located at temporal lobe, and the leads T3, T4, F7, F8 related to the audio activities are chosen for analysis; In addition, the leads F3, F4, Fp1, Fp2 related to the stimulation classification at frontal lobe and the leads C3, C4 related to the whole brain information treatment process are chosen for analysis. D. Records and Pre-treatment of EEG Signal The process of EEG experiment is safe and harmless to human body, the time resolution ratio is also extremely high. Therefore, it plays a more and more important role in the field of cognitive science. The parts contained in EEG experiment designed by this paper include lead, electrode cap, signal amplifier, stimulation presentation computer, data recording computer and ERP 232 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 synchronization software. The details are shown in figure 2: Figure 2. Experimental system equipment In the experiment, 64-lead EEG equipment of NEUROSCAN is used and the storing data is collected. Quik-Cap of NEUROSCAN is employed for the electrode cap, and the caps of all electrodes are marked. The location of electrodes is simple and fast. The data collection is realized by AC. The reference electrodes are placed at the nose tip. Bilateral mastoid is the recorded electrode. The data sampling rate is set as 1000Hz. Before performing HTT analysis treatment to EEG, it is necessary to pre-treat the recorded and stored EEG data. The general methods include several procedures, namely, eliminating electro-oculogram (EOG), digital filtering, dividing EEG into epochs, baseline correction, superposition average and group average [13]. III. DATA TREATMENT METHOD In the process of the EEG signal processing with traditional HHT, the problems such as end effect [14] and mode mixing [15] may be caused, so very great affect will be brought to the experiment results. Therefore, based on a great number of researches which are implemented for the existing solutions, this paper puts forth an end extension algorithm of similar waveform weighted average to restrain the end effect. Meanwhile, EEMD is used to replace EMD to eliminate the mode mixing. The combination of these two methods is named as I-EEMD (Improved-EEMD). The relevant methods are described in details below. A. Extension Algorithm of Similar Waveform Weighted Average So far, EMD method has been widely applied in several fields of signal analysis. Although this method has the advantage which is not possessed by other methods, the problem of end effect will bring great obstacles to the application in practical uses. For the problem of end effect, the researchers have brought forward some solutions, such as mirror extension method [16], envelope extension method [17], cycle extension method [18] and even continuation [19] etc. These methods can reduce the influence of end effect in some extent. However, EEG signal is a typical nonlinear and non-stationary signal and it has high requirements for the detail feature of the signal [20] during analysis and treatment. Therefore, these methods still need to be improved. For the end extension, the continued signal must be maintained with the variation trend inside the © 2014 ACADEMY PUBLISHER original signal. After analyzing all kinds of end extension methods, this paper puts forth a method of similar waveform weighted matching to extend the ends. Definitions S1 (t ) , S2 (t ) are set as two signals on the same time axis, P1 (t1 , S1 (t1 )) and P2 (t2 , S2 (t2 )) are two points on S1 (t ) and S2 (t ) respectively. The condition of t1 t2 is satisfied but S1 (t1 ) S2 (t2 ) . Here the condition of t1 t2 is set. The signal S1 (t ) is moved right with the length of ( t2 t1 ) horizontally along the time axis t, to make the points P1 and P2 coincide. Along the coincident point P1 takes the wave form section with the length of L on the left (or right), and the waveform matching degree m of the signals S1 (t ) and S2 (t ) for the point P1 (or P2 ) can be defined as: L [S (i) S (i)] 2 m i 1 2 1 L . (1) Apparently, the more S1 (t ) and S2 (t ) are matching, the less the m value will be. According to the signal analysis theory we know that, the similar waveform will appear in the same signal repeatedly, so that we can choose a number of matching waves similar to the waveform at the end. Moreover, weighted averaging is performed to them, then, the obtained average wave is used to extend the signal ends. The extension for the signal ends generally includes two ends on the left and right. In the following, the left end of the signal is taken as the example. The original signal is set as x(t ) , the leftmost end of x(t ) is x(t0 ) , the rightmost end is x(t ' ) , and the signal contains n samplings points. Starting from the left end x(t0 ) of the signal, part of the curved section of x(t ) is taken from the right, and this curved section is set as w(t ) , which needs to only contain an extreme point (either the maximum value or the minimum value) and a zero crossing point. The length of w(t ) is l .The right end of the curved section w(t ) can be set as one zero crossing point, and it is recorded as x(t1 ) . The intermediate point x(tm1 ) in the horizontal axis of w(t ) is taken, of which tm1 (t0 t1 ) / 2 . Taking x(tm1 ) as the reference point, the sub-wave w(t ) is moved to the right horizontally along the time axis t. When some point x(t ) on the signal x(t ) coincides with x(tm1 ) , the sub-wave with the same length of w(t ) and the point x(ti ) as the central point is taken and recorded as wi (t ) . The wave form matching degree mi of wi (t ) and w(t ) is calculated, and the wave form matching degree mi as well as a small section of data wave (the wave form length of this section is set as 0.1 l ) in the front of wi (t ) are stored. Move it to the right horizontally with the same process, and successively record these adjacent data JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 waves on the left with the length of 0.1 l as v1 (t ) , v2 (t ) ... vk (t ) . Finally, a data pair collection comprised of the wave form matching degree and corresponding subwaves in the adjacent part on the left of the matching waves is obtained: [V , m] (v(t ), m) (v1 (t ), m1 ),(v2 (t ), m2 ) (vk (t ), mk ) (2) If the collection [V , m] is null, it indicated that the wave form of the original signal is extremely irregular. It is not suitable to adopt the theory of similar wave form, and the extension is not performed to it. The extreme value point method is used to solve it. If the collection [V , m] is not null, all the obtained value of wave form matching degree is ranked in sequence from small one to large one. Obtain [V ', m '] and the first j data pair of 233 various frequency components it contains can be isolated regularly. That is to say, for white noise, EMD method acts as a binary filter, and each IMF component from decomposition has a characteristic of similar band pass in the power spectrum [22]. Based on the characteristics of white noise in EMD method and in order to better solve mode mixing problem, Z.Wu and NEHuang have proposed a noise-assisted empirical mode decomposition method on the basis of original EMD decomposition. And this new method is called as “EEMD”, which means ensemble empirical mode decomposition [23]. The specific steps of EEMD algorithm are as follows: 1) Add a normal distributed white noise x(t) to the original signal s(t), and then obtain an overall S(t): S (t ) s(t ) x(t ) (3) [V ', m '] is taken out, of which j [ 3 k ] . The weighted average v p of all sub-waves in these j data pairs is 2) Use standard EMD method to decompose S(t), which is the signal with white noise, and then decompose it into a plurality of IMF components ci , and a surplus calculated, and then v p is used to extend the left end component of rn : point of x(t ) of the signal. The end extension algorithm for similar wave form weighted matching is as follows. Input: signal x(t ) . Output: matching wave of weighted average v p n S (t ) ci rn 3) Repeat step 1), 2) and add the different white noises to the to-be-analyzed signals: Si (t ) s(t ) xi (t ) . Steps: (1) For t t0 to t ' ; (2) Calculate the waveform matching degree mi according to the formula (1), and take part of sub-waves vi on the left of the (4) j 1 (5) 4) Decompose the superposed signals from the previous step in EMD method, and then obtain: n Si (t ) cij rin . matching wave wi . L( vi )=0.1*L( wi ); (3) End for; (6) j 1 ' i ' i (4) Rank the collection [ m , v ] in sequence from small one to large one according to the value of mi , and obtain the new collection [ mi' , vi' ] with the length of k ; (5) Take the first j data pairs of [ mi' , vi' ] of which j [ 3 k ] ; (6) Calculate the weighted average wave v p . 5) The added several white noises are random and irrelevant, and their statistical mean value must be zero. And do the overall average for each component to offset the influence of Gaussian white noise, and then obtain the final decomposition results: (7) Use v p to extend the left end of signal x(t ) . B. Eliminating Mode Mixing Problem The problem of mode mixing is often coming out in the process of EEG signal decomposition with EMD method, and its reason is relatively complex. Not only the factors of EEG itself can cause it, such like the frequency components, sampling frequency and so on, but also the algorithms and screening process of EMD. Once mode mixing problem is appearing, the obtained IMF components would lose the physical meanings they should have, and would bring negative influence to the correct analysis of the signals. N.E. Huang did a lot of research on the EMD of white noise [21], and he found that the energy spectrum of white noise is uniform over the frequency band, and its scale performance in time-frequency domain is evenly distributed. At the same time, a French scientist, Flandrin, after doing a lot of EMD decompositions to white noise and in the base on statistics, also found that all the © 2014 ACADEMY PUBLISHER cj 1 N n c i 1 ij . (7) In the formula, N means the number of added white noises. IV. I-EEMD PROCESSING AND ANALYSIS OF EXPERIMENTAL DATA A. I-EEMD Processing and Analysis of Experimental Data In the following, the evidence of existing coherence of audio-visual modals would be discussed from the perspective of EEG signals processing. So, it is needed to extract the main components of audio-visual evoked potentials and analyze them. We choose C3 and C4 leads, which related to the whole-brain information processing to analyze. The C3 lead is taken for example, and its ERP waveforms of single visual stimulus, single auditory stimulus, consistent visual & auditory data and 234 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 inconsistent visual & auditory data are obtained shown as Fig. 3: one-tenth of the biggest correlation coefficient [24]. Caculate the correlation coefficients between these seven IMF componnets and original signal, and they are shown in Table 1. Figure 3. C3 Lead ERP Waveform (a) visual modal In Fig. 3 it shows four kinds of ERP data waveforms on the length of one epoch. Among them the selected stimulating text for the visual stimulation is the screen text letter "ba"; the selected stimulating material for the auditory stimulus is the sound "ba" from the audio; the stimulating material for the audio-visual consistent is the letter "ba" and sound "ba"; and the selected stimulating material for audio-visual inconsistent is the letter "ba" and sound "ga". After decomposing the above four kinds of ERP data in the I-EEMD method, all the IMF components are obtained as Fig. 4 shown. Every component is distributed according to the frequency from high to low. For the point of decomposition effect, each component is relatively stable at the end, and there was also no flying wing phenomenon, and producing no significant mode mixing problem. Each component has a complete physical meaning, and through these components it could examine the advantages and disadvantages of decomposition effects. And the fact that Res, the surplus component is close to zero, once again proves the validity of the proposed method in this paper.And from that figure it can be seen, through the I-EEMD decomposition the VEP data of C3 lead decomposed into 7 IMF components and a surplus component. Among these seven IMF components, there may be some pseudo-components, which could be screened out by the method of correlation coefficients calculation. Through the I-EEMD decomposition the AEP data of the C3 leads turned into seven IMF components and a surlpus component. The audio-visual consistent data of C3 leads through I-EEMD decomposition turned into seven IMF components and a surplus component. The audio-visual inconsistent data of C3 lead through I-EEMD decomposition turned in six IMF components and a surplus component. Among the seven IMF components of the VEP data through I-EEMD decomposition, there usually are some pseudo-components, which should be screened out and not taked into consideration. Through the relavent theory of signals, the varity of IMF components could be judged. The dicision threhold value of the pseudo-component is (b) auditory modal (c) audio-visual consistent modal (d) audio-visual inconsistent modal Figure 4. I-EEMD Decomposition of ERP Data © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 TABLE I. THE CORRELATION COEFFICIENT BETWEEN IMF COMPONENTS AND ORIGINAL SIGNAL IMF1 0.0153 TABLE II. IMF2 0.9435 IMF3 0.5221 IMF4 0.0275 IMF2 0.6541 IMF3 0.9022 IMF4 0.5728 As Table 1 shown, the correlation coefficients of IMF1, IMF4, IMF7 with the original signal are relatively low, which are 0.0153, 0.0275 and 0.0649. From that we could say that these three components are pseudo-components obtained from the decomposition. And the correlation coefficient between surplus component and the original signal is 0.0032. So these four decomposition components don’t have real physical meanings and don’t deserve deeper analysis. The correlation coefficents of IMF2, IMF3, IMF5, IMF6 with the original signal are relatively high, so they are the effective components from decomposition. Similarly, do the same process to the IMF components from AEP data and obtain Table 2 It could be seen from Table 2 that the correlation coefficients of IMF1, IMF5, IMF6 and IMF7 with original signal are relatively low, which are 0.0211, 0.0325, 0.0411 and 0.0353. From that we could say that these four components are pseudo-components coming from decomposition and don’t have real physical meanings. The correlation coefficients of IMF2, IMF3, IMF4 with orignal signal are relatively high, which means they are the effecitve components from decomposition. B. Analysis of Experiment Result It could analyze the coherence of audio-visual modals by comparing the correlation coefficients of ERP signal in single visual or auditory modal with the ERP signal in audio-visual modals. Due to the sound is “ga” when audio-visual are inconsistent, it should choose the sound as “ga” when considering compare the ERP data in single auditory modal and audio-visual inconsistent modal. The other situations should take the sound “ba”. The comparison situation of correlation coefficient calculating from the above experiment data is shown in Table 3: COMPARISON OF THE CORRELATION COEFFICIENT Correlation coefficient value Audio-visual consistent ERP data Audio-visual inconsistent ERP data Single visual ERP data 0.5331 0.2379 Single auditory ERP data 0.4519 0.2022 It could be seen from Table 3 that the signal correlation coefficient of ERP data between single visual modal and audio-visual consistent modal is 0.5331, while the signal correlation coefficient of ERP data between it and audio-visual inconsistent is 0.2379; the signal correlation coefficient between the ERP data of single auditory stimuli and that of audio-visual consistent is 0.4519, while the signal correlation coefficient of ERP data between it and audio-visual inconsistent is 0.2022. So from that we could say, when the information in © 2014 ACADEMY PUBLISHER IMF5 0.7433 IMF6 0.7028 IMF7 0.0649 Res 0.0032 THE CORRELATION COEFFICIENT BETWEEN IMF COMPONENTS AND ORIGINAL SIGNAL IMF1 0.0211 TABLE III. 235 IMF5 0.0325 IMF6 0.0411 IMF7 0.0353 Res 0.0049 audio-visual modal is consistent, it could improve the information in single modal; while, when the information in audio-visual modal is not consistent, it could have inhibitory effect on single-modal state information. In addition, could also find some evidence to support the above points when considering the main compositions of ERP signal in single audio or visual stimulus and the correlation of ERP signal in audio-visual modals mixing stimuli. From the principle of EMD decomposition, we could know that the IMF components got from decomposition have complete physical meaning. So it could inspect the coherence of audio-visual modals through the main compositions of single visual stimulus, single auditory stimulus and the correlation coefficient of the ERP data of audio-visual consistent and inconsistent. To compare the valid components of single visual evoked potentials and single auditory evoked potentials, and compare the correlation coefficients of ERP data of audio-visual consistent and inconsistent, we could get the data in Table 4: TABLE IV. VISUAL COMPOSITION'S COMPARISON OF THE CORRELATION COEFFICIENT Correlation coefficient Audio-visual consistent data Audio-visual consistent data IMF2 0.5111 0.2195 IMF3 0.3853 0.1528 IMF5 IMF6 0.5037 0.4202 0.3001 0.2673 From Table 4 we could see that the correlation coefficients between the main compositions of single visual stimulus evoked potentials and the ERP signal in audio-visual consistent are obviously greater than the correlation coefficients between it and ERP signal in audio-visual inconsistent. And that also shows that when the audio-visual information is consistent, it could help people to prompt information-controlling power of outside world. Then let’s look at the corresponding data in audio modal. Because in experiment under the situation of audio-visual inconsistent, we select sound “ga” and letter “ba” as the to-be-analyzed data. In order to get a better comparability of the experiment results, here we choose the sound “ga” in the auditory modal to compare with the data in audio-visual cross-modal data, and get the results shown as Table 5: TABLE V. AUDIO COMPOSITION'S COMPARISON OF THE CORRELATION COEFFICIENT Valid auditory components IMF2 Correlation coefficient of audio- 0.6232 visual consistent Correlation coefficient of audio- 0.2752 visual inconsistent IMF3 0.7869 IMF4 0.5466 0.3456 0.0387 From Table 5 it could be seen that the comparison is similar to what in the visual modal, which is to say that, 236 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 the correlation coefficients between the main compositions of single auditory evoked potentials and the ERP signal in audio-visual consistent modal are apparently greater than what of ERP signal in audiovisual inconsistent modal. And what could also show that when the information in audio-visual is consistent, the EEG signal will be stronger than single modal auditory information in the brain. V. CONCLUSIONS This paper discusses the theory evidence of audiovisual modals’ coherence from the perspective of EEG signal processing. And the experiment is designed based on the EEG in audio-visual cross-modal and then collect, process and analyze the experiment data. During the process of EEG signal, it uses EEMD, which combined the similar waveform average end continuation algorithms, which is called in paper as I-EEMD, considering the need to restrain the effects of mode mixing and end point effect. And try to describe the collected ERP data through two perspectives based on the theory of signal coherence. First, investigated the correlation of the data in single visual modal, single auditory modal and the data in audio-visual cross-modal. From the calculated correlation coefficient it could be found that, when audio-visual is consistent, the correlation coefficient of it with any single modal is relatively great. Second, form the main valid compositions in single visual or audio modal, the comparison situation of the calculated correlation coefficient is similar. From these two points we could find that, when the informations in audio & visual modals are consistent, it could help the brain promptly handle outside environment, and that is to say, it could improve each other’s information under the condition of audiovisual consistent; when the informations in these two modals are not consistent, it could restrain each other’s information and the brain could get a combined result after integrated, which is consistent with the famous McGulk effect. ACKNOWLEDGMENT This work was supported by the National Science Foundation for Young Scientists of Shanxi Province, China (GrantNo.2013021016-3). REFERENCES [1] Guo, Jianzeng and Guo, Aike. Cross modal interactions between olfactory and visual learning in Drosophila, Science, vol. 309, no. 5732, pp. 307–310, 2005. [2] Suminski, Aaron J and Tkach, Dennis C and Hatsopoulos, Nicholas G. Exploiting multiple sensory modalities in brainmachine interfaces, Neural Networks, vol. 22, no. 9, pp. 1224–1234, 2009. [3] Ohshiro, Tomokazu and Angelaki, Dora E and DeAngelis, Gregory C. A normalization model of multisensory integration. Nature Neuroscience, vol. 14, no. 5, pp. 775– 782, 2011. [4] Nishibori, Kento and Takeuchi, Yoshinori and Matsumoto, Tetsuya and Kudo, Hiroaki and Ohnishi, Noboru. ”Finding the correspondence of audio-visual events by object © 2014 ACADEMY PUBLISHER [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] manipulation. in IEEJ Transactions on Electronics, Information and Systems, 2008, pp. 242–252. Mitchell, T. AI and the Impending Revolution in Brain Sciences. in Eighteenth National Conference on Artificial Intelligence, July 28-August 1, 2002, Menlo Park, United States. Ethofer, Thomas and Pourtois, Gilles and Wildgruber, Dirk. Investigating audiovisual integration of emotional signals in the human brain” in Progress in Brain Research, vol. 156, pp. 345–361, 2006. Gonzalo-Fonrodona, Isabel. Functional gradients through the cortex. multisensory integration and scaling laws in brain dynamics. in Neurocomputing, vol. 72, no. 4C6, pp. 831-838, 2009. Liu, Baolin and Meng, Xianyao and Wang, Zhongning and Wu, Guangning. An ERP study on whether semantic integration exists in processing ecologically unrelated audio-visual information. in Neuroscience Letters, vol. 505, no. 2, pp. 119-123, 2011. Lee, TienWen and Wu, YuTe and Yu, Younger WY and Chen, Ming-Chao and Chen, Tai-Jui. The implication of functional connectivity strength in predicting treatment response of major depressive disorder: A resting EEG study. In Psychiatry Research-Neuroimaging, vol. 194, no. 3, pp. 372-377, 2011. Blankertz, Benjamin and Tomioka, Ryota and Lemm, Steven and Kawanabe, Motoaki and Muller, K-R. Optimizing spatial filters for robust EEG single-trial analysis. in IEEE Signal Processing Magazine, vol. 25, no. 1, pp. 41-56, 2008. Molholm, Sophie and Ritter, Walter and Murray, Micah M and Javitt, Daniel C and Schroeder, Charles E and Foxe, John J. Multisensory auditory-visual interactions during early sensory processing in humans: a high-density electrical mapping study. Cognitive brain research, vol. 14, no. 1, pp. 115-128, 2002. Yuan, Ling and Yang, Banghua and Ma, Shiwei. Discrimination of movement imagery EEG based on HHT and SVM. ”Chinese Journal of Science Instrument, vol. 31, no. 3, pp. 649-654, 2010. Coyle, Damien and McGinnity, T Martin and Prasad, Girijesh. Improving the separability of multiple EEG features for a BCI by neural-time-series-prediction preprocessing. Biomedical Signal Processing and Control, vol. 5, no. 3, pp. 649-654, 2010. He, Zhi and Shen, Yi and Wang, Qiang. Boundary extension for Hilbert-Huang transform inspired by gray prediction model.” Signal Processing, vol. 92, no. 3, pp. 685-697, 2012. Lee, Ray F and Xue, Rong. A transmit/receive volume strip array and its mode mixing theory in MRI. Magnetic Resonance Imaging, vol. 25, no. 9, pp. 1312-1332, 2007. Jin Ping, ZHAO and Da ji, Huang. Mirror extending and circular spline function for empirical mode decomposition method. Journal of Zhejiang University, vol. 2, no. 3, pp. 247-252, 2001. Gai, Q. Research and Application to the Theory of Local Wave Time-Frequency Analysis Method. PHD, Dalian University of Technology, Dalian, 2001. Hamilton, James Douglas. Time Series Analysis; Cambridge Univ Press, vol. 2, 2001. Qiao Shijie. The Symmetric Extension Method for Wavelet Transform Image Coding.” Journal of Image and Graphics, vol. 5, no. 9, pp. 725-729, 2005. Pigorini, Andrea and Casali, Adenauer G and Casarotto, Silvia and Ferrarelli, Fabio and Baselli, Giuseppe and Mariotti, Maurizio and Massimini, Marcello and Rosanova, JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 [21] [22] [23] [24] Mario. Time-frequency spectral analysis of TMS-evoked EEG oscillations by means of Hilbert-Huang transform. Journal of Neuroscience Methods, vol. 192, no. 2, pp. 236245, 2011. Huang, Norden E. Review of empirical mode decomposition. In Proceedings of SPIE - The International Society for Optical Engineering, Orlando, FL, United States, March 26; SPIE: Bellingham WA, United States, 2001. Bao, Fei and Wang, Xinlong and Tao, Zhiyong and Wang, Qingfu and Du, Shuanping. “EMD-based extraction of modulated cavitation noise.” Mechanical Systems and Signal Processing, vol. 24, no. 7, pp. 2124-2136, 2010. Wu, Zhaohua and Huang, Norden E. Ensemble empirical mode decomposition: a noise assisted data analysis method. Advances in Adaptive Data Analysis, vol. 1, no. 1, pp. 1-41, 2009. Yu, Dejie and Cheng, Junsheng and Yang, Yu. Application of Improved Hilbert-Huang Transform Method in Gear Fault Diagnosis.” Journal of Aerospace Power, vol. 41, no. 6, pp. 1899-1903, 2009. © 2014 ACADEMY PUBLISHER 237 Xiaojun Zhu was born in Jiangsu, China, in 1977. He received the Master degree in Computer Science in 2001, from Taiyuan University of Technology, Taiyuan, China. He received the Doctor’s degree in 2012, from Taiyuan University of Technology, Taiyuan, China. His research interests include Intelligent Information processing, cloud computing, and audiovisual computing. Jingxian Hu is currently a Graduate student and working towards his M.S. degree at Taiyuan University of Technology, China. Her current research interest includes wireless sensor network and Intelligent Information processing. Xiao Ma is currently a Graduate student and working towards his M.S. degree at Taiyuan University of Technology, China. Hercurrent research interest includes cloud computing, and audiovisual computing. 238 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Object Recognition Algorithm Utilizing Graph Cuts Based Image Segmentation Zhaofeng Li and Xiaoyan Feng College of Information Engineering Henan Institute of Science and Technology, Henan Xinxiang, China Email: [email protected], [email protected] Abstract—This paper concentrates on designing an object recognition algorithm utilizing image segmentation. The main innovations of this paper lie in that we convert the image segmentation problem into graph cut problem, and then the graph cut results can be obtained by calculating the probability of intensity for a given pixel which is belonged to the object and the background intensity. After the graph cut process, the pixels in a same component are similar, and the pixels in different components are dissimilar. To detect the objects in the test image, the visual similarity between the segments of the testing images and the object types deduced from the training images is estimated. Finally, a series of experiments are conducted to make performance evaluation. Experimental results illustrate that compared with existing methods, the proposed scheme can effectively detect the salient objects. Particularly, we testify that, in our scheme, the precision of object recognition is proportional to image segmentation accuracy. Index Terms—Object Recognition; Graph Cut; Image Segmentation; SIFT; Energy Function I. INTRODUCTION In the computer vision research field, image segmentation refers to the process of partitioning a digital image into multiple segments, which are made up of a set of pixels. The aim of image segmentation is to simplify and change the representation of an image into something that is more meaningful and easier for users to analyze. That is, image segmentation is typically utilized to locate objects and curves in images [1] [2]. Particularly, image segmentation is the process of allocating a tag to each pixel of an image such that pixels with the same tag sharing specific visual features. The results of image segmentation process can be represented as a set of segments which totally cover the whole image [3]. The pixels belonged to the same region are similar either in some characteristics or in some computed properties, which refer to the color, intensity, or texture. On the other hand, adjacent regions are significantly different with respect to the same characteristics. The problems of image segmentation are great challenges for computer vision research field. As the time of the Gestalt movement in psychology, it has been known that perceptual grouping plays a powerful role in human visual perception. A wide range of computational vision problems could in principle make good use of segmented images, were such segmentations reliably and © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.238-244 efficiently computable. For instance intermediate-level vision problems such as stereo and motion estimation require an appropriate region of support for correspondence operations. Spatially non-uniform regions of support can be identified using segmentation techniques. Higher-level problems such as recognition and image indexing can also utilize segmentation results in matching, to address problems such as figure-ground separation and recognition by parts [4-6]. As salient objects are important parts in images, hence, if they can be effectively detected, the performance of image segmentation can be promoted. Object recognition refers to locate collections of salient line segments in an image [7]. The object recognition systems are designed to correctly identify an object in a scene of objects, in the presence of clutter and occlusion and to estimate its position and orientation. Those systems can be exploited in robotic applications where robots are required to navigate in crowded environments and use their equipment to recognize and manipulate objects [8]. In this paper, the image segmentation is regarded as a graph cut problem, which is a basic problem in computer algorithm and theory. In computer theory, the graph cut problem is defined on data represented in the form of a graph G (V , E ) , where V and E represent the vertices and edges of the graph respectively, such that it is possible to cut G into several components with some given constrains. Graph cut method is widely used in many application fields, such as scientific computing, partitioning various stages of a VLSI design circuit and task scheduling in multi-processor systems [9] [10]. The main innovations of this paper lie in the following aspects: (1) The proposed algorithm converts the image segmentation problem into graph cut problem, and the graph cut results can be obtained by an optimization process using energy function. (2) In the proposed, the objects can be detected by computing the visual similarity between the segments of the testing images and the object types from the training images. (3) A testing image is segmented into several segments, and each image segment is tested to find if there is a kind of object can match it. The rest of the paper is organized as the following sections. Section 2 introduces the related works. Section 3 illustrates the proposed scheme for recognizing objects JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 from images utilizing graph cut policy. In section 4, experiments are implemented to make performance evaluation. Finally, we conclude the whole paper in section 5. II. RELATED WORKS In this section, we will survey related works about this paper in two aspects, including 1) image segmentation and 2) graph cut based image segmentation. Dawoud et al. proposed an algorithm that fuses visual cues of intensity and texture in Markov random fields region growing texture image segmentation. The main idea is to segment the image in a way that takes EdgeFlow edges into consideration, which provides a single framework for identifying objects boundaries based on texture and intensity descriptors [11]. Park proposed a novel segmentation method based on a hierarchical Markov random field. The proposed algorithm is composed of local-level MRFs based on adaptive local priors which model local variations of shape and appearance and a global-level MRF enforcing consistency of the local-level MRFs. The proposed method can successfully model large object variations and weak boundaries and is readily combined with wellestablished MRF optimization techniques [12]. Gonzalez-Diaz et al. proposed a novel region-centered latent topic model that introduces two main contributions: first, an improved spatial context model that allows for considering inter-topic inter-region influences; and second, an advanced region-based appearance distribution built on the Kernel Logistic Regressor. Furthermore, the proposed model has been extended to work in both unsupervised and supervised modes [13]. Nie et al. proposed a novel two-dimensional variance thresholding scheme to improve image segmentation performance is proposed. The two-dimensional histogram of the original and local average image is projected to one-dimensional space in the proposed scheme firstly, and then the variance-based criterion is constructed for threshold selection. The experimental results on bi-level and multilevel thresholding for synthetic and real-world images demonstrate the success of the proposed image thresholding scheme, as compared with the Otsu method, the two-dimensional Otsu method and the minimum class variance thresholding method [14]. Chen et al. proposes a new multispectral image texture segmentation algorithm using a multi-resolution fuzzy Markov random field model for a variable scale in the wavelet domain. The algorithm considers multi-scalar information in both vertical and lateral directions. The feature field of the scalable wavelet coefficients is modelled, combining with the fuzzy label field describing the spatially constrained correlations between neighbourhood features to achieve more accurate parameter estimation [15]. Han et al. presented a novel variational segmentation method within the fuzzy framework, which solves the problem of segmenting multi-region color-scale images of natural scenes. The advantages of the proposed segmentation method are: 1) by introducing the PCA © 2014 ACADEMY PUBLISHER 239 descriptors, our segmentation model can partition colortexture images better than classical variational-based segmentation models, 2) to preserve geometrical structure of each fuzzy membership function, we propose a nonconvex regularization term in our model, and 3) to solve the segmentation model more efficiently, the authors design a fast iteration algorithm in which the augmented Lagrange multiplier method and the iterative reweighting are integrated [16]. Souleymane et al. designed an energy functional based on the fuzzy c-means objective function which incorporates the bias field that accounts for the intensity inhomogeneity of the real-world image. Using the gradient descent method, the authors obtained the corresponding level set equation from which we deduce a fuzzy external force for the LBM solver based on the model by Zhao. The method is fast, robust against noise, independent to the position of the initial contour, effective in the presence of intensity inhomogeneity, highly parallelizable and can detect objects with or without edges [17]. Liu et al. proposed a new variational framework to solve the Gaussian mixture model (GMM) based methods for image segmentation by employing the convex relaxation approach. After relaxing the indicator function in GMM, flexible spatial regularization can be adopted and efficient segmentation can be achieved. To demonstrate the superiority of the proposed framework, the global, local intensity information and the spatial smoothness are integrated into a new model, and it can work well on images with inhomogeneous intensity and noise [18]. Wang et al. presented a novel local region-based level set model for image segmentation. In each local region, the authors define a locally weighted least squares energy to fit a linear classifier. With level set representation, these local energy functions are then integrated over the whole image domain to develop a global segmentation model. The objective function in this model is thereafter minimized via level set evolution [19]. Wang et al. presented an online reinforcement learning framework for medical image segmentation. A general segmentation framework using reinforcement learning is proposed, which can assimilate specific user intention and behavior seamlessly in the background. The method is able to establish an implicit model for a large stateaction space and generalizable to different image contents or segmentation requirements based on learning in situ [20]. In recent years, several researchers utilized the Graph Cut technology to implement the image segmentation, and the related works are illustrated as follows. Zhou et al. present four technical components to improve graph cut based algorithms, which are combining both color and texture information for graph cut, including structure tensors in the graph cut model, incorporating active contours into the segmentation process, and using a "softbrush" tool to impose soft constraints to refine problematic boundaries. The integration of these components provides an interactive 240 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 segmentation method that overcomes the difficulties of previous segmentation algorithms in handling images containing textures or low contrast boundaries and producing a smooth and accurate segmentation boundary [21]. Chen et al. proposed a novel synergistic combination of the image based graph cut method with the model based ASM method to arrive at the graph cut -ASM method for medical image segmentation. A multi-object GC cost function is proposed which effectively integrates the ASM shape information into the graph cut framework The proposed method consists of two phases: model building and segmentation. In the model building phase, the ASM model is built and the parameters of the GC are estimated. The segmentation phase consists of two main steps: initialization and delineation [22]. Wang et al. present a novel method to apply shape priors adaptively in graph cut image segmentation. By incorporating shape priors adaptively, the authors provide a flexible way to impose the shape priors selectively at pixels where image labels are difficult to determine during the graph cut segmentation. Further, the proposed method integrated two existing graph cut image segmentation algorithms, one with shape template and the other with the star shape prior [23]. Yang et al. proposed an unsupervised color-texture image segmentation method. To enhance the effects of segmentation, a new color-texture descriptor is designed by integrating the compact multi-scale structure tensor, total variation flow, and the color information. To segment the color-texture image in an unsupervised and multi-label way, the multivariate mixed student's tdistribution is chosen for probability distribution modeling, as MMST can describe the distribution of color-texture features accurately. Furthermore, a component-wise expectation-maximization for MMST algorithm is proposed, which can effectively initialize the valid class number. Afterwards, the authors built up the energy functional according to the valid class number, and optimize it by multilayer graph cuts method [24]. III. THE PROPOSED SCHEME A. Problem Statement In this paper, the problem of image segmentation is converted into the problem of graph cut. Let an undirected and connected graph G (V , E ) where V {1, 2,..., n} and E {(i, j ),1 i j n} are satisfied. Let the edge weights wij w ji be given such that wij 0 for (i, j ) E , and in particular, let wii 0 . The graph cut problem is to find a partition results (V1 ,V2 , ,VN ) of V VN is satisfied. where the condition V1 V2 In the problem image segmentation, the nodes in V denotes the pixels of images and the edge weight is estimated by computing the distance between two pixels. Particularly, the graph cut based image segmentation results can be obtained by a subset of the edges of the edge set E . There are several methods to calculate the quality of image segmentation results. The main idea is © 2014 ACADEMY PUBLISHER quite simple, that is, we want the pixels in a same component to be similar, and the pixels in different components to be dissimilar. Thai is to say that edge between two nodes which are belonged to the same component should have lower value of weights, and edges which are located between nodes in different components should have higher value of weights. Partition 1 Partition 3 Partition 2 Figure 1. Explaination of the graph cut problem B. Graph Cut Based Image Segmentation In the proposed, the main innovation lies in that we regard the graph cut based image segmentation problem as an energy minimization problem. Therefore, given a set of pixels P and a set of labels L , the object is to seek a label l : P L , which can minimize the following equation. E (l ) Rp (l p ) pP pP , qN p C pq (l p , lq ) (1) where N p denotes the pixel set which is belonged to the neighborhood of p , and Rp (l p ) refers to the cost of allocating the label l p to p . Moreover, C pq (l p , lq ) denotes the cost of allocate the label l p and lq to p and q respectively. Afterwards, the proposed energy function is defined in Eq. 2. E 1 Dp ( f p ) 2 S p ( xo ) pP pP , qN p 3 Cqp (l p , lq ) (2) s.t. 1 2 3 1 In Eq. 2, the parameters 1 , 2 and 3 denote the weight of data stream D p , shape term D p , and the boundary term respectively. Furthermore, the above modules can be represented as the following forms. LogP( I p O), l p object label Dp (l p ) (3) LogP( I p B), l p background label Cqp (l p , lq ) (l p , lq ) ( I p I q ) 2 exp dis( p, q) 2 2 (4) JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 1, l p lq 0, l p lq (l p , lq ) 241 N (5) Sim(d I , Oi ) where I p denotes the intensity of the pixel p , and P( I p O) , P( I p B) represents the probability of intensity for pixel p which is belonged to the object and the background intensity. dis( p, q) refers to the distance between pixel p and q , and denotes the standard deviation of the intensity differences of the neighbors. Next, based on the graph cut algorithm, the graph G is represented as G (V , E ) , where V and E refer to a set of nodes and a set of weighted edges. The graph cut problem concentrate on seek a cut C with minimal cost C , which is the sum of the weight for all the edges. Following the above description, the graph cut process with the cost C which is equal to E (l ) is implemented by the following weight configuration as follows. Wqp 3 Cqp (6) Wt p 2 Dp (t ) 3 S p (t ) (7) where refers to a constant which can ensure the weight Wt p be positive, and t belongs to the set of labels and © 2014 ACADEMY PUBLISHER x 1 y 1 x I , Oiy ) N M Object ( I ) arg min Sim(d I , Oi ) i (8) (9) Therefore, the objects with the minimized values in Eq. 9 are regarded as the objects in image I IV. EXPERIMENTS In this section, we make performance evaluation utilizing three image dataset, which are 1) MIT Vistex [25], 2) BSD 300 [26] and 3) SODF 1000 [27]. As the object recognition and image segmentation are quite subjective, the performance measuring metric is very important. In this experiment, PRI and NPR are used as performance evaluation metric to make quantitative evaluation. PRI refers to the probabilistic rand index and NPR denotes the normalized probabilistic rand. Particularly, the values of PRI and NPR range from [0, 1] and from [, 1] respectively. Larger value of the two metrics means that the image segmentations are much closer to the ground truths. 1 The proposed scheme 0.9 MAP-ML Negative logarithm values of PRI p C. Object Recognition Algorithm From the former section, a testing image is segmented into several segments, next, for each segment we will try to match it in a pre-set training image dataset which includes many image segments, and the segments belonged to the same object types are collected together. We use corel5k dataset as to construct training dataset, which consists of 5,000 images which are divided into 50 image classes with 100 images in each class. Each image in the collection is reduced to size 117 181 (or 181117 ). We use all the 5,000 images as training dataset (100 per class). Each image is treated as a collection of 20 20 patches obtained by sliding a window with a 20-pixel interval, resulting in 45 patches per image. Moreover, we utilize the 128-dimension SIFT descriptor computed on 20 20 gray-scale patches. Furthermore, we add additional 36-dim robust color descriptors which have been designed to complement the SIFT descriptors extracted from the gray-scale patches. Afterwards, we run k-means on a collection of 164D features to learn a dictionary of 256 visual words. For a test image I , we partition it into several blocks and map each image block to a visual word through bag of visual words model. Thus, similar to documents, images can be represented as a set of visual words (denoted as d I ). For a object type Oi , the similarity between image I and the object type tag Oi can be calculated as follows. S (d Afterwards, the objects in the test image can be detected by the following equation. 0.8 JSEG 0.7 MSNST 0.6 CTM 0.5 0.4 0.3 0.2 0.1 0 Figure 2. Negative logarithm values of PRI for different methods. 1 The proposed scheme 0.8 0.6 Values of NPR the weight of which is Wt M MAP-ML JSEG MSNST 0.4 CTM 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 Figure 3. Values of NPR for different methods. Afterwards, to testify the performance of the proposed graph cut based image segmentation approach, other four existing unsupervised color-texture image segmentation methods are compared. These four methods contain the 242 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 TABLE I. Dataset OVERALL PERFORMANCE COMPARISON FOR DIFFERENT DATASETS . Type Metric PRI NPR PRI NPR PRI NPR PRI NPR PRI NPR PRI NPR Mean MIT Vistex Variance Mean BSD 300 Variance Mean SODF 1000 Variance TABLE II. CTM 0.764 0.292 0.129 0.366 0.804 0.293 0.134 0.351 0.725 0.278 0.122 0.328 MSNST 0.753 0.436 0.124 0.272 0.848 0.422 0.133 0.287 0.766 0.382 0.132 0.270 JSEG 0.742 0.347 0.147 0.383 0.736 0.379 0.153 0.398 0.726 0.319 0.133 0.377 MAP-ML 0.791 0.401 0.119 0.318 0.790 0.442 0.115 0.336 0.748 0.430 0.122 0.310 The proposed scheme 0.823 0.444 0.118 0.256 0.873 0.464 0.121 0.243 0.810 0.435 0.118 0.261 COMPARISON OF TIME COST FOR DIFFERENT APPROACHES. Approaches Running time(s) Running platform CTM 223.7 Java MSNST 247.8 C++ methods for unsupervised segmentation of color-texture regions in images or video (JSEG) [28], maximum a posteriori and maximum likelihood estimation (MAP-ML) [29], compression-based texture merging(CTM) [30], and MSNST which integrates the multi-scale nonlinear structure tensor texture and Lab color adaptively [31]. JSEG 35.4 Java MAP-ML 136.2 Matlab The proposed scheme 105.3 C++ The memory we used is the 8GB DDR memory with 1600MHz, and the hard disk we utilized is 500GB SSD disk. Moreover, the graphics chip is the NVIDIA Optimus NVS5400M. Based on the above hardware settings, the algorithm running time are compared in Table 2 as follows. 1 The proposed scheme MAP-ML 0.8 0.9 JSEG 0.7 0.85 MSNST 0.6 CTM MSNST JSEG MAP-ML The proposed scheme 0.95 CTM Precision Cumulative percentage of PRI values 1 0.9 0.8 0.75 0.5 0.4 0.7 0.65 0.3 0.6 0.2 0.55 0.1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 4. Cumulative percentage of PRI score for different methods. Figure 6. Precision of object recognition for different kinds of objects. MAP-ML 0.8 0.7 0.6 1 The proposed scheme JSEG MSNST CTM 0.5 0.4 0.3 0.2 0.1 Precision of object recognition Cumulative percentage of NPR values 1 0.9 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Image segmentation accuracy Figure 5. Cumulative Percentage of NPR Values for different methods. Figure 7. Relationship between precision of object recognition and image segmentation accuracy. Afterwards, the mean values and variance values of PRI and NRP under the above approaches are given using the BSD 300 dataset (shown in Table.1). All the experiments are conducted on the PC with Intel Corel i5 CPU, the main frequency of which is 2.9GHz. From Table 2, it can be seen that the proposed scheme is obviously faster than other approaches except JSEG. However, the performance of JPEG is the worst of the five methods. Hence, the proposed scheme is very valuable. © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 243 Figure 8. Example of the object recognition results by the proposed image segmentation algorithm In the following parts, we will test the influence of image segmentation accuracy to object recognition. Firstly, experiments are conducted to show the precision of object recognition for different kinds of objects, and the results are shown in Fig. 6. Secondly, the relationship between precision of object recognition and image segmentation accuracy is shown in Fig. 7. As is shown in Fig. 7, precision of object recognition is proportional to image segmentation accuracy. Therefore, image segmentation module in the proposed is very powerful in the object recognition process. From the above experimental results, it can be seen that the proposed scheme is superior to other two schemes. The main reasons lie in the following aspects: (1) The proposed scheme converts the image segmentation problem into graph cut problem, and we obtained the graph cut results by an optimization process. Moreover, the objects can be detected by computing the visual similarity between the segments of the testing images and the object types from the training images. (2) For the JSEG algorithm, there is a major problem which is caused the varying shades due to the illumination. However, this problem is difficult to handle because in many cases not only the illuminant component but also the chromatic components of a pixel change their values due to the spatially varying illumination. © 2014 ACADEMY PUBLISHER (3) The MAP-ML algorithm should be extended to segment image with the combination of motion information, and the utilization of the model for specific object extraction by designing more complex features to describe the objects. (4) The CTM scheme should be extended to supervised scenarios. As it is of great importance to better understand how humans segment natural images from the lossy data compression perspective. Such an understanding would lead to new insights into a wide range of important problems in computer vision such as salient object detection and segmentation, perceptual organization, and image understanding and annotation. (5) The performance of MSNST is not satisfied, because the proposed method is the compromise between high segmentation accuracy and moderate computation efficiency. Particularly, the parameter setting in this scheme is too complex and more discriminative segmentation process should be studied in detail. V. CONCLUSIONS In this paper, we proposed an effective object recognition algorithm based on image segmentation. The image segmentation problem is converted into the graph cut problem, and then the graph cut results can be computed by estimating the probability of intensity for a given pixel which is belonged to the object and the background intensity. In order to find the salient objects 244 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 we compute the visual similarity between the segments of the testing images and the object types deduced from the Corel5K image dataset. REFERENCES [1] Peng Qiangqiang, Long Zhao, A modified segmentation approach for synthetic aperture radar images on level set, Journal of Software, 2013, 8(5) pp. 1168-1173 [2] Grady, Leo, Random walks for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(11) pp. 1768-1783 [3] Noble, J. Alison; Boukerroui, Djamal Ultrasound image segmentation: A survey, IEEE Transactions on Medical Imaging, 2006, 25(8) pp. 987-1010 [4] Felzenszwalb, PF; Huttenlocher, DP, Efficient graph-based image segmentation, International Journal of Computer Vision, 2004, 59(2) pp. 167-181 [5] Boykov, Yuri; Funka-Lea, Gareth Graph cuts and efficient N-D image segmentation, International Journal of Computer Vision, 2006, 70(2) pp. 109-131 [6] Lei Zhu, Jing Yang, Fast Multi-Object Image Segmentation Algorithm Based on C-V Model, Journal of Multimedia, 2011, 6(1) pp. 99-106 [7] Kang, Dong Joong and Ha, Jong Eun and Kweon, In So, Fast object recognition using dynamic programming from combination of salient line groups, Pattern Recognition, 2003, 36(1) pp. 79-90 [8] Georgios Kordelas, Petros Daras, Viewpoint independent object recognition in cluttered scenes exploiting raytriangle intersection and SIFT algorithms, Pattern Recognition, 2010, 43(11) pp. 3833-3845 [9] Andreev Konstantin, Räcke Harald, Balanced Graph Partitioning, Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, 2004, pp. 120-124 [10] Shi, JB; Malik, J Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8) pp. 888-905 [11] Dawoud A., Netchaev A., Fusion of visual cues of intensity and texture in Markov random fields image segmentation, IET Computer Vision, 2013, 6(6) pp. 603609 [12] Park Sang Hyun, Lee Soochahn, Yun Il Dong, Hierarchical MRF of globally consistent localized classifiers for 3D medical image segmentation, Pattern Recognition, 2013, 46(9) pp. 2408-2419 [13] Gonzalez-Diaz Ivan, Diaz-de-Maria Fernando, A regioncentered topic model for object discovery and categorybased image segmentation, Pattern Recognition, 2013, 46(9) pp. 2437-2449 [14] Nie Fangyan, Wang Yonglin, Pan Meisen, Twodimensional extension of variance-based thresholding for image segmentation, Multidimensional Systems and Signal Processing, 2013, 24(3) pp. 485-501 [15] Chen Mi, Strobl Josef, Multispectral textured image segmentation using a multi-resolution fuzzy Markov random field model on variable scales in the wavelet domain, International Journal of Remote Sensing, 2013, 34(13) pp. 4550-4569 [16] Han Yu, Feng Xiang-Chu, Baciu George, Variational and PCA based natural image segmentation, Pattern Recognition, 2013, 46(7) pp. 1971-1984 © 2014 ACADEMY PUBLISHER [17] Balla-Arabe Souleymane, Gao Xinbo, Wang Bin, A Fast and Robust Level Set Method for Image Segmentation Using Fuzzy Clustering and Lattice Boltzmann Method, IEEE Transactions on Cybernetics, 2013, 43(3) pp. 910920 [18] Liu Jun, Zhang Haili, Image Segmentation Using a Local GMM in a Variational Framework, Journal of Mathematical Imaging and Vision, 2013, 46(2) pp. 161176 [19] Wang Ying, Xiang Shiming, Pan Chunhong, Level set evolution with locally linear classification for image segmentation, Pattern Recognition, 2013, 46(6) pp. 17341746 [20] Wang Lichao, Lekadir Karim, Lee Su-Lin, A General Framework for Context-Specific Image Segmentation Using Reinforcement Learning, IEEE Transactions on Medical Imaging, 2013, 32(5) pp. 943-956 [21] Zhou Hailing, Zheng Jianmin, Wei Lei, Texture aware image segmentation using graph cuts and active contours, Pattern Recognition, 2013, 46(6) pp. 1719-1733 [22] Chen Xinjian, Udupa Jayaram K., Alavi Abass, GC-ASM: Synergistic integration of graph-cut and active shape model strategies for medical image segmentation, Computer Vision And Image Understanding, 2013, 117(5) pp. 513-524 [23] Wang Hui, Zhang Hong, Ray Nilanjan, Adaptive shape prior in graph cut image segmentation, Pattern Recognition, 2013, 46(5) pp. 1409-1414 [24] Yang Yong, Han Shoudong, Wang, Tianjiang, Multilayer graph cuts based unsupervised color-texture image segmentation using multivariate mixed student's tdistribution and regional credibility merging, Pattern Recognition, 2013, 46(4) pp. 1101-1124 [25] MIT VisTex texture database, http: //vismod. media. mit. edu/vismod/imagery/VisionTexture/vistex. htmls. [26] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in: Proceedings of IEEE International Conference on Computer Vision, 2001, pp. 416-423. [27] R. Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tunedsalient region detection, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1597-1604. [28] Y. Deng, B. S Manjunath, Unsupervised segmentation of color–texture regions in images and video, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23 pp. 800-810. [29] S. F. Chen, L. L. Cao, Y. M. Wang, J. Z. Liu, Image segmentation by MAP-ML estimations, IEEE Transactions on Image Processing, 2010, 19 pp. 2254-2264. [30] A. Y. Yang, J. Wright, Y. Ma, S. Sastry, Unsupervised segmentation of natural images via lossy data compression, Computer Vision and Image Understanding, 2008, 110 pp. 212-225. [31] S. D. Han, W. B. Tao, X. L. Wu, Texture segmentation using independent-scale component-wise Riemanniancovariance Gaussian mixture model in KL measure based multi-scale nonlinear structure tensor space, Pattern Recog nition, 2011, 44 pp. 503-518. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 245 Semi-Supervised Learning Based Social Image Semantic Mining Algorithm AO Guangwu School of Applied Technology, University of Science and Technology Liaoning, Anshan, China SHEN Minggang School of Materials and Metallurgy, University of Science and Technology Liaoning, Anshan, China Abstract—As social image semantic mining is of great importance in social image retrieval, and it can also solve the problem of semantic gap. In this paper, a novel social image semantic mining algorithm based on semi-supervised learning is proposed. Firstly, labels which tagged the images in the test image dataset are extracted, and noisy semantic information are pruned. Secondly, the labels are propagated to construct an extended collection. Thirdly, image visual features are extracted from the unlabeled images by three steps, including watershed segmentation, region feature extraction and codebooks construction. Fourthly, vectors of image visual feature are obtained by dimension reduction. Fifthly, after the process of semi-supervised learning and classifier training, the confidence score of semantic terms for the unlabeled image are calculated by integrating different types of social image features, and then the heterogeneous feature spaces are divided into several disjoint groups. Finally, experiments are conducted to make performance evaluation. Compared with other existing methods, it can be seen than the proposed can effectively extract semantic information of social images. Index Terms—Semi-Supervised Learning; Social Image; Semantic Mining; Semantic Gap; Classification Hyperplane I. INTRODUCTION In recent years, low-level features of images (such as color, texture, and shape) have been widely used in content-based image retrieval and processing. While lowlevel features are effective for some specific tasks, such as “query by example”, they are quite limited for many multimedia applications, such as efficient browsing and organization of large collections of digital photos and videos, which require advanced content extraction and image semantic mining [1]. Hence, the ability to extract semantic information in addition to low-level features and to perform fusion of such varied types of features would be very beneficial for image retrieval applications [2]. Unfortunately, as the famous semantic gap exists, it is hard to effectively extract semantic information from low-level features of images. The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation [3]. The number of Web photo is increasing fastly in recent years, and retrieving them semantically presents a © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.245-252 significant challenge. Many original images are constantly uploaded with few meaningful direct annotations of semantic content, limiting their search and discovery. Although some websites allow users to provide terms or keywords for images, however, it is far from universal and applies to only a small proportion of images on the Web. The related research of image semantic information mining has reflected the dichotomy inherent in the semantic gap and is divided between two main classes, which are 1) concept-based image retrieval and 2) content-based image retrieval. The first class concentrates on retrieval by image objects and high-level concepts, and the second one focuses on the low-level visual features of the image [4]. To detect salient objects in images, the image is usually divided into several segments. Segmentation by object is widely regarded as a difficult problem, which will be able to replicate and perform the object recognition function of the human vision system. Particularly, semantic information of images combined with a region-based image decomposition is used, which aims to extract semantic properties of images based on the spatial distribution of color and texture properties. All in all, direct extracting high-level semantic content in images automatically is beyond the capability of current multimedia information processing technology. Although there have been some efforts to combine lowlevel features and regions to higher level perception, these are limited to isolated words, and this process need substantial training samples. These approaches have limited effectiveness in finding semantic contents in broad image domains [4-6]. The source of image semantic information can be classified in two types, which are 1) the associated texts and 2) visual features of images. If this information can be integrated together effectively, image semantic information can be mined with high accuracy. For the research of image semantic mining, social image semantic information is quite important. Currently, Social image sharing websites have made great success, which allow users to provide personal media data and allow them to annotate media data with the user-defined tags. With the rich tags, users can more conveniently retrieve image visual contents on these websites [7]. 246 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Figure 1. An example of a social image photo with rich metadata Online image sharing Websites, such as Flickr, Facebook, Photobucket, Photosig, which are named as social media, allow users to upload their personal photos on the web. As is shown in Fig. 1, social images usually have rich metadata, such as “(1) photo”, “(2) other people’s comments”, “(3) the description of the author own”, “(4) Photo albums”, and “(5) Tags” and “(6) Author information”. Regarding these rich tags as index terms, user can conveniently retrieve these images. From the above analysis, we can see that how to mine the semantic information of social images has brought forth a lot of new research topics. In this paper, the social image website we used is Flickr. As is illustrated in Wikipedia, Flickr is an image hosting and video hosting website, and web services suite that was created by Ludicorp in 2004 and acquired by Yahoo! in 2005. In addition to being a popular website for users to share and embed personal photographs, and effectively an online community, the service is widely used by photo researchers and by bloggers to host images that they embed in blogs and social media. Yahoo reported in June 2011 that Flickr had a total of 51 million registered members and 80 million unique visitors. In August 2011 the site reported that it was hosting more than 6 billion images and this number continues to grow steadily according to reporting sources. Photos and videos can be accessed from Flickr without the need to register an account but an account must be made in order to upload content onto the website. Registering an account also allows users to create a profile page containing photos and videos that the user has uploaded and also grants the ability to add another Flickr user as a contact. For mobile users, Flickr has official mobile apps © 2014 ACADEMY PUBLISHER for IOS, Android, PlayStation Vita, and Windows Phone operating systems. The main innovations of this paper lie in the following aspects: (1) Visual features of social images are extracted from the unlabeled images by watershed segmentation, region feature extraction and codebooks construction (2) Using the semi-supervised learning algorithm, we integrate the median distance and label changing rate together to obtain the class central samples. (3) The confidence score of semantic words of the unlabeled image is calculated by combining different types of image features, and the heterogeneous feature spaces are divided into several disjoint groups. (4) The vector which represented the contents of unlabeled image is embedded into Hilbert space by several mapping functions. The rest of the paper is organized as the following sections. Section 2 introduces the related works. Section 3 illustrates the proposed scheme for social image semantic information mining. In section 4, experiments are conducted to make performance evaluation with comparison to other existing methods. Finally, we conclude the whole paper in section 5. II. RELATED WORKS Liu et al. proposed a region-level semantic mining approach. As it is easier for users to understand image content by region, images are segmented into several parts using an improved segmentation algorithm, each with homogeneous spectral and textural characteristics, and then a uniform region-based representation for each image is built. Once the probabilistic relationship among JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 image, region, and hidden semantic is constructed, the Expectation Maximization method can be applied to mine the hidden semantic [8]. Wang et al. tackle the problem of semantic gap by mining the decisive feature patterns. Interesting algorithms are developed to mine the decisive feature patterns and construct a rule base to automatically recognize semantic concepts in images. A systematic performance study on large image databases containing many semantic concepts shows that the proposed method is more effective than some previously proposed methods [9]. Zhang et al. proposed an image classification approach in which the semantic context of images and multiple low-level visual features are jointly exploited. The context consists of a set of semantic terms defining the classes to be associated to unclassified images. Initially, a multiobjective optimization technique is used to define a multifeature fusion model for each semantic class. Then, a Bayesian learning procedure is applied to derive a context model representing relationships among semantic classes. Finally, this context model is used to infer object classes within images. Selected results from a comprehensive experimental evaluation are reported to show the effectiveness of the proposed approaches [10]. Abu et al. utilized the Taxonomic Data Working Group Life Sciences Identifier vocabulary to represent our data and defined a new vocabulary which is specific for annotating monogenean haptoral bar images to develop the MHBI ontology and a merged MHBI-Fish ontologies. These ontologies are successfully evaluated using five criteria which are clarity, coherence, extendibility, ontology commitment and encoding bias [11]. Wang et al. proposed a remote sensing image retrieval scheme by using image scene semantic matching. The low-level image visual features are first mapped into multilevel spatial semantics via VF extraction, objectbased classification of support vector machines, spatial relationship inference, and SS modeling. Furthermore, a spatial SS matching model that involves the object area, attribution, topology, and orientation features is proposed for the implementation of the sample-scene-based image retrieval [12]. Burdescu et al. presented a system used in the medical domain for three distinct tasks: image annotation, semantic based image retrieval and content based image retrieval. An original image segmentation algorithm based on a hexagonal structure was used to perform the segmentation of medical images. Image’s regions are described using a vocabulary of blobs generated from image features using the K-means clustering algorithm. The annotation and semantic based retrieval task is evaluated for two annotation models: Cross Media Relevance Model and Continuous-space Relevance Model. Semantic based image retrieval is performed using the methods provided by the annotation models. The ontology used by the annotation process was created in an original manner starting from the information content provided by the Medical Subject Headings [13]. © 2014 ACADEMY PUBLISHER 247 Liu et al. concentrated on the solution from the association analysis for image content and presented a Bidirectional- Isomorphic Manifold learning strategy to optimize both visual feature space and textual space, in order to achieve more accurate comprehension for image semantics and relationships. To achieve this optimization between two different models, Bidirectional-Isomorphic Manifold Learning utilized a novel algorithm to unify adjustments in both models together to a topological structure, which is called the reversed Manifold mapping. [14]. Wang presented a remote-sensing image retrieval scheme using image visual, object, and spatial relationship semantic features. It includes two main stages, namely offline multi-feature extraction and online query. In the offline stage, remote-sensing images are decomposed into several blocks using the Quin-tree structure. Image visual features, including textures and colours, are extracted and stored. Further, object-oriented support vector machine classification is carried out to obtain the image object semantic. A spatial relationship semantic is then obtained by a new spatial orientation description method. The online query stage, meanwhile, is a coarse-to-fine process that includes two sub-steps, which are a rough image retrieval based on the object semantic and a template-based fine image retrieval involving both visual and semantic features [15]. Peanho et al. present an efficient solution for this problem, in which the semantic contents of fields in a complex document are extracted from a digital image. In order to process electronically the contents of printed documents, information must be extracted from digital images of documents. When dealing with complex documents, in which the contents of different regions and fields can be highly heterogeneous with respect to layout, printing quality and the utilization of fonts and typing standards, the reconstruction of the contents of documents from digital images can be a difficult problem [16]. On the other hand, semi-supervised learning is a powerful computing tool in the field of intelligent computing. In the following parts, we will introduce the applications of semi-supervised learning algorithm. Wang et al. proposed a bivariate formulation for graphbased SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity [17]. Hassanzadeh et al. proposed a combined SemiSupervised and Active Learning approach for Sequence Labeling which extremely reduces manual annotation cost in a way that only highly uncertain tokens need to be manually labeled and other sequences and subsequences are labeled automatically. The proposed approach reduces manual annotation cost around 90% compare with a supervised learning and 30% in contrast with a similar fully active learning approach [18]. 248 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Figure 2. Framework of the proposed algorithm of social image semantic information Shang et al. proposed a novel semi-supervised learning (SSL) approach, which is named semi-supervised learning with nuclear norm regularization (SSL-NNR), which can simultaneously handle both sparse labeled data and additional pairwise constraints together with unlabeled data. Specifically, the authors first construct a unified SSL framework to combine the manifold assumption and the pairwise constraints assumption for classification tasks. Then a modified fixed point continuous algorithm to learn a low-rank kernel matrix that takes advantage of Laplacian spectral regularization is illustrated [19]. III. method is utilized to analyze the relationship between visual feature of images and the semantic information by considering labeled and unlabeled images. To avoiding introduce extra manual labeling data when utilizing class central samples, in this paper, we utilize the semi-supervised learning algorithm, we combine median distance and label changing rate to obtain the class central samples. For the problem of binary classification, the unlabeled samples should be classified to two classes, which are the positive class (denoted as P ) and the negative class (denoted as N ) as follows. P {xi xi U , f ( xi ) 0} (1) N {xi xi U , f ( xi ) 0} (2) PROPOSED SCHEME A. Framework of the Proposed Scheme The Framework of the proposed algorithm of social image semantic information is shown in Fig. 3. The corpus we used is made up of a small amount of manually labeled images and a large number of unlabeled images. For this framework, five modules are designed. In module 1, labels which tagged the images in the given dataset are extracted, and then to promote the accuracy of image semantic mining, noisy terms in the label database are deleted. In module 2, the labels obtained in the former module are propagated to construct an extended collection. Then, image visual features are extracted from the unlabeled images by three steps in module 3, of which 1) “Watershed segmentation”, 2) “Region feature extraction” and 3) “Constructing codebooks” are included. In module 4, vectors of image visual feature are obtained by dimension reduction. Finally, after the process of semi-supervised learning and classifier training, the confidence score of semantic terms for the unlabeled image can be calculated in module 5. After collecting the training images, it is of great importance to choose a suitable learning model for social image semantic information mining. As is well known, the classification performance is better for supervised learning algorithm than for unsupervised learning algorithm. When the iteration process is initiated, there are only a few labeled images which are available to train the classifier for social image semantic information mining. Based on the above analysis, a semi-supervised © 2014 ACADEMY PUBLISHER Afterwards, for each class the proposed semisupervised learning algorithm calculates the label changing rate for all the unlabeled images, and then chooses the centroid samples of the given class as follows. The unlabeled samples of which the label changing rates is equal to 0 can be obtained by the following equation. U P {xi xi P, ( xi ) 0} (3) U N {xi xi N , ( xi ) 0} (4) where ( xi ) refers to the label changing rates of the sample xi . Then, using U P and U N , the samples which has the median distance to the current classification hyperplane to separate the positive class and the negative class can be obtained as follows. xP median(d ( xi ) xi U p ) (5) xN median(d ( xi ) xi U N ) (6) xi xi However, an image cluster should not be separated if it contains the images which have the same labels, whether the labels are relevant or not. Furthermore, it is not suitable to separate an image cluster which contains only JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 249 a few images. Therefore, we defined a condition to determine if the image cluster could be separated as follows. di di 1 or 1 true, if di di di di k Stop( N i ) or di di 2 false, otherwise (7) where 1 and 2 refer to two pre-defined threshold, N ik is the ith node in the kth image cluster. Moreover, d i and d i denote the number of images which are labeled and not labeled with the given label in N ik respectively. Based on the above process, we will introduce how to calculate the confidence score of semantic terms for the unlabeled image. As the social images have rich heterogeneous metadata, different types of image features can be extracted from social images, and then we can divide the heterogeneous feature spaces into several N disjoint groups ( {g1 , g2 , , g N } ), and G g1 is i 1 satisfied. Hence, the feature vector of the ith social image xi can be represented as follows. V ( xi ) ( xiT, g1 , xiT, g2 , , xiT, g N )T (8) With the grouping structure of the original image feature vectors, V ( xi ) is embedded into Hilbert space by G mapping functions as follows. 1 ( x) : 1 M i k ( x, xi ) M d m km (10) where is equal to [1 , 2 , m 1 , na nt ] and k ( x, xi ) refers to a kernel function, and k ( x, xi ) is obtained by the following equation. M k ( x, xi ) ( xi )T ( x j ) m ( xi )T m ( x j ) m 1 M km ( xi , x j ) (11) m 1 Afterwards, the semantic terms with higher confidence score are regarded as semantic information mining results. © 2014 ACADEMY PUBLISHER recall (t ) Nc Ns Nc Nr (12) (13) fG the jth kernel matrix of xi . Then the confidence score of semantic terms for the unlabeled image x is calculated by the following equation. i 1 precision(t ) (9) Afterwards, the G distinct kernel matrixes M can be obtained, and M (M1 , M 2 , M G ) , where M j refer to A. Dataset and Performance Evaluation Metric We choose two famous social images dataset to make performance evaluation, which are NUS-WIDE and MIR Flickr. In the following parts, the two dataset are illustrated as follows. NUS-WIDE is made up of 269,648 images with 5,018 unique tags which are collected from Flickr. We downloaded the owner information according to the image ID and obtained the owner user ID of 247,849 images. The collected images belong to 50,120 unique users, with each user owning about 5 images. Particularly, we choose the users with at least fifty images and keep their images to obtain our experimental dataset, which is named as NUSWIDE- USER15. Moreover, The NUSWIDE provides ground-truth for 81 tags of the images [20]. Another dataset we used in named as MIR Flickr which consists of 25000 high-quality photographic images of thousands of Flickr users, made available under the Creative Commons license. The database includes all the original user tags and EXIF metadata. Particularly, detailed and accurate annotations are provided for topics corresponding to the most prominent visual concepts in the user tag data. The rich metadata allow for a wide variety of image retrieval benchmarking scenarios [21]. In this experiment, we utilize precision and recall and F1 as metric. For each tag t , the precision and recall are defined as follows. f2 G ( x) : G CS ( x) EXPERIMENTS f1 2 ( x) : 2 na nt IV. where N s and N r refer to the number of retrieved images and the number of true related images in the test set. Moreover, N c denotes as the number of correctly annotated images. To integrate these two metric together, F1 measure is defined as follows. F1(t ) 2 precision(t ) recall (t ) precision(t ) recall (t ) (14) Next, we will test the proposed algorithm on NUSWIDE and MIR Flickr dataset respectively. B. Experimental Results and Analysis To testify the effectiveness of the proposed approach, other existing methods are compared, including 1) Usersupplied tags(UT), 2) Random walk with restart (RWR) [22], 3) Tag refinement based on visual and semantic consistency (TRVSC) [23], 4) Multi-Edge graph (MEG) [24], 5) Low-Rank approximation (LR) [25]. F1 values for different methods for different concepts using NUS-WIDE and MIR Flickr dataset are shown in Fig. 3 and Fig. 4 250 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 1 0.9 F1 0.8 0.7 0.6 0.5 0.4 Airport Beach Birds UT RWR Book Buildings TRVSC Castle MEG Cityscape Computer LR Cow Dog The proposed algorithm Figure 3. F1 value for different methods for different concepts using NUS_Wide dataset 1 0.95 0.9 0.85 0.8 F1 0.75 0.7 0.65 0.6 0.55 0.5 UT RWR TRVSC MEG LR The proposed algorithm Figure 4. F1 value for different methods for different concepts using MIR Flickr dataset © 2014 ACADEMY PUBLISHER methods can not integrate the rich heterogeneous metadata of social images. 1 UT 0.9 RWR 0.8 Precision Next, we will compare the performance of different methods using precision-recall curves on several specific concepts selected from NUS-WIDE and MIR Flickr dataset (shown in Fig. 5-Fig. 8). The average F1 value of different methods under different dataset is given in Table 1, and in order to shown the effectiveness of the proposed algorithm, some examples of semantic extraction of the MIR Flickr dataset are illustrated in Table 2 From the above experimental results, it can be seen that the proposed scheme is superior to other schemes. The main reasons lie in the following aspects: (1) Using the semi-supervised learning algorithm, we integrate the median distance and label changing rate together to obtain the class central samples. (2) The proposed semi-supervised learning algorithm could compute the label changing rate for all the unlabeled images. (3) The confidence score of semantic words of the unlabeled image is calculated by combining different types of image features which are be extracted from social images, and then the heterogeneous feature spaces are divided into several disjoint groups. (4) The vector of the unlabeled image is embedded into Hilbert space by several mapping functions. (5) There are a lot of noisy information in usersupplied tags in social images, hence, the performance of UT is the worst among all the methods. (6) Other methods are more suitable to mine the semantic information for normal images. However, the performance of social image semantic information mining using these methods is not satisfied, because these TRVSC 0.7 MEG 0.6 LR 0.5 0.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 The proposed algorithm Recall Figure 5. Precision-recall curves on the concept “dog” V. CONCLUSIONS In this paper, we propose a novel social image semantic mining algorithm utilizing semi-supervised learning. Before the semantic information mining process, labels which tagged the images in the test image dataset are extracted, and noisy semantic information are deleted. Then, the labels are propagated to construct an extended collection. Next, image visual features are extracted from the unlabeled images and vectors of image visual feature are obtained by dimension reduction. Finally, the process of semi-supervised learning and classifier training are implemented, and then the confidence score of semantic terms for the unlabeled image are calculated. Particularly, the semantic terms with higher confidence score are regarded as semantic information mining results. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 TABLE I. Method NUS-WIDE MIR Flickr 251 AVERAGE F1 VALUE OF DIFFERENT METHODS UNDER DIFFERENT DATASET. UT 0.576 0.667 TABLE II. RWR 0.661 0.728 TRVSC 0.676 0.758 MEG 0.657 0.738 LR 0.666 0.788 The proposed algorithm 0.747 0.858 EXAMPLES OF SEMANTIC EXTRACTION OF THE MIR FLICKR DATASET Image Semantic information Car, Corners Pad, Desk, Wire Woman, Face, Gazing City, Night, Building, Light Camera, Girl, Olympus, Len Sky, Grass, Tree, Water Flower, White Dog, Puppy, Pet, Grass Image Semantic information 0.8 UT 0.75 ACKNOWLEDGEMENT 0.7 RWR Precision 0.65 0.6 TRVSC 0.55 0.5 MEG 0.45 0.4 LR This study was financially supported by The Education Department of Liaoning Province Key Laboratory of China (Techniques Development of Heavy Plate by Unidirectional Solidification with Hollow Lateral Wall Insulation. Grant No. 2008S1222) 0.35 0.3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 The proposed algorithm Recall Figure 6. Precision-recall curves on the concept “Tree” 0.9 UT 0.85 0.8 RWR Precision 0.75 0.7 TRVSC 0.65 0.6 MEG 0.55 0.5 LR 0.45 0.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 The proposed algorithm Recall Figure 7. Precision-recall curves on the concept “Vehicle” 0.9 UT 0.8 RWR Precision 0.7 0.6 TRVSC 0.5 MEG 0.4 LR 0.3 0.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 The proposed algorithm Recall Figure 8. Precision-recall curves on the concept “Rainbow” © 2014 ACADEMY PUBLISHER REFERENCES [1] Smeulders AWM, Worring M, Santini S, “Content-based image retrieval at the end of the early years”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12) pp. 1349-1380. [2] Luo JB, Savakis AE, Singhal A, “A Bayesian networkbased framework for semantic image understanding”, Pattern Recognition, 2005, 38(6) pp. 919-934. [3] Carneiro Gustavo, Chan Antoni B., Moreno, Pedro J., “Supervised learning of semantic classes for image annotation and retrieval”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(3) pp. 394410. [4] Wong, R. C. F.; Leung, C. H. C. “Automatic semantic annotation of real-world web images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11) pp. 1933-1944. [5] Djordjevic D., Izquierdo E., “An object- and user-driven system for semantic-based image annotation and retrieval”, IEEE Transactions on Circuits and Systems for Video Technology, 2007, 17(3) pp. 313-323. [6] Tezuka Taro, Maeda Akira, “Image retrieval with generative model for typicality, Journal of Networks, 2011, 6(3) pp. 387-399. [7] Fuming Sun, Haojie Li, Yinghai Zhao, Xueming Wang, Dongxia Wang, “Towards tags ranking for social images”, Neurocomputing, In Press. [8] Liu Tingting, Zhang Liangpei, Li Pingxiang, “Remotely sensed image retrieval based on region-level semantic mining”, Eurasip Journal on Image and Video Processing, 2012, Article No. 4 252 [9] Wang W, Zhang AD, “Extracting semantic concepts from images: a decisive feature pattern mining approach”, Multimedia Systems, 2006, 11(4) pp. 352-366 [10] Zhang Qianni, Izquierdo Ebroul, “Multifeature Analysis and Semantic Context Learning for Image Classification”, ACM Transactions on Multimedia Computing Communications and Applications, 2013, 9(2), Article No. 12 [11] Abu Arpah, Susan Lim Lee Hong, Sidhu Amandeep Singh, “Semantic representation of monogenean haptoral Bar image annotation”, BMC Bioinformatics, 2013, 14, Article No.48 [12] Wang Min, Song Tengyi, “Remote Sensing Image Retrieval by Scene Semantic Matching,” IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(5) pp. 2874-2886 [13] Burdescu Dumitru Dan, Mihai Cristian Gabriel, Stanescu Liana, “Automatic image annotation and semantic based image retrieval for medical domain”, Neurocomputing, 2013, 109 pp. 33-48. [14] Liu Xianming, Yao Hongxun, Ji Rongrong, “Bidirectionalisomorphic manifold learning at image semantic understanding & representation”, Multimedia Tools and Applications, 2013, 64(1) pp. 53-76 [15] Wang M., Wan Q. M., Gu L. B., “Remote-sensing image retrieval by combining image visual and semantic features”, International Journal of Remote Sensing, 2013, 34(12) pp. 4200-4223 [16] Peanho Claudio Antonio, Stagni Henrique, Correa da Silva, Flavio Soares, “Semantic information extraction from images of complex documents, Applied Intelligence, 2012, 37(4) pp. 543-557 [17] Wang Jun, Jebara Tony, Chang Shih-Fu, “Semi-Supervised Learning Using Greedy Max-Cut”, Journal of Machine Learning Research, 2013, 14 pp. 771-800. © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 [18] Hassanzadeh Hamed, Keyvanpour Mohammadreza, “A two-phase hybrid of semi-supervised and active learning approach for sequence labeling”, Intelligent Data Analysis, 2013, 17(2) pp. 251-270 [19] Shang Fanhua, Jiao L. C., Liu Yuanyuan, “Semisupervised learning with nuclear norm regularization”, Pattern Recognition, 2013, 46(8) pp. 2323-2336 [20] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. “Nus-wide: a real-world web image database from national university of singapore”, Proceedings of the ACM International Conference on Image and Video Retrieval, 2009, pp.48-55. [21] Huiskes Mark J, Thomee Bart, Lew Michael S, “New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative”, Proceedings of the international conference on Multimedia information retrieval, 2010, pp. 527-536. [22] Changhu Wang, Feng Jing, Lei Zhang, and HongJiang Zhang. “Image annotation refinement using random walk with restarts”, Proceedings of the 14th annual ACM international conference on Multimedia, 2006, pp. 647-650 [23] Dong Liu, Xian-Sheng Hua, Meng Wang, Hong-Jiang Zhang. “Image retagging”, Proceedings of the international conference on Multimedia, 2010, pp. 491-500, 2010. [24] Dong Liu, Shuicheng Yan, Yong Rui, and Hong-Jiang Zhang. “Unified tag analysis with multi-edge graph”, Proceedings of the international conference on Multimedia, 2010, pp. 25-34, 2010. [25] Guangyu Zhu, Shuicheng Yan, Yi Ma. “Image tag refinement towards low-rank, content-tag prior and error sparsity”, Proceedings of the international conference on Multimedia, 2010, pp. 461-470 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 253 Research on License Plate Recognition Algorithm based on Support Vector Machine Dong ZhengHao 1 and FengXin 2 1. School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing, P. R. China 2. Correspondence Author School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing, P. R. China Abstract—Support Vector Machine (SVM) as an important theory of machine learning and patternrecognition has been well applied to small sample clustering learning, nonlinear problems, outlierdetection and so on. The license plate recognition system has received automation extensiveattention as an important application of machine learning and pattern recognition in intelligent transport.The license plate recognition system is composed of three parts, license plate preprocessing and location, license plate character segment, license plate character recognition. In this paper, we mainly introduce the flow of license plate recognition, related technology and support vector machinetheory. Experimental results show the effectiveness of our method. Index Terms—Support Vector Machine; License Plate Recognition; Intelligent Transportation; Character Segment I. INTRODUCTION License plate recognition, as an important research field used in computer vision, pattern recognition, image processing and artificial intelligence, which is one of the most important aspects of the intelligent transportation system of human society in the 21st century. Recently, license plate recognition can be widely used in road traffic security monitoring, open tollbooth, road traffic flow monitoring, the scene of the accident investigation, vehicle mounted mobile check, stolen vehicle detection, traffic violation vehicle-mounted mobile automatic recording, parking lot automatic security management, intelligent park management, access control management and etc. [1-6]. It has a very important position in the modem traffic management and control system and has good application value. Meanwhile, License plate recognition can also be used in other identification field. So it has become one of the key problems in modem traffic engineering field [7-8]. With the rapid economic development and social progress, number of cars in the city and urban traffic flow are massive increases consequent on the highway and urban traffic management difficulty increases rapidly, however, as the world increasing levels of science and technology, a variety of cutting-edge technology for traffic management also continue to emerge to enrich and enhance traffic management level, so that more and more intelligent modern traffic. License Plate Recognition is comprised of four main sections: image preprocessing, © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.253-260 license plate location, character segment and character recognition. Image processing based on pattern recognition is one of the most important research directions in the image recognition field. License plate Recognition based on image is an important application in computer vision and pattern recognition of the intelligent transportation field. And it is also the core technology in the intelligent transportation system. In this paper, In 90s of the last century, a license plate recognition system was designed by A.S. Johnson et., they used digital image processing technology and pattern recognition to implement license plate recognition [ 9]. This system uses the histogram method to count threshold of plate images, and then use the template matching license plate character recognition method. The accuracy had made great breakthrough at time, but this system could not meet the need of real-time. In 1994, M. Fahmy realized the license plate recognition by BAM with neural networks [10]. BAM neural networks are constituted by the same neurons bidirectional associative single network, using a matrix corresponds to a unique license plate character template, template matching recognize license plate characters, but this method is still a big drawback is that the system capacity can’t solve the contradiction between recognition speed and system capacity. However, further study of neural network development, which gradually replaced by the template matching method led license plate recognition, to avoid a large number of data analysis and mathematical modeling work, after years of technological development is increasingly concerned by the majority of scholars [11-12]. The aim of this paper is to research license plate recognition algorithm based on SVM. License plate recognition system of all steps for implementing a complete system include image preprocessing, license plate location, character segmentation and character recognition, which are detailed in this paper. Then SVM theory is detailed and extract license plate feature to construct classifier by SVM. Experimental results show the effectiveness of license plate recognition based on SVM. The license plate recognition system is composed of three parts, license plate preprocessing and positioning, licenseplate character segment, license plate character recognition. In this paper, with vehicle imagesobtained from the actual scene, a license plate recognition system 254 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 is designed based on SVM. This research mainly consists of the following parts: (1) In the part of preprocessing and position, the grayscale, contrast enhancement, medianfiltering, canny edge detection, and threshold binarization method were applied in this research. Inthe positioning stage, the line scan mode and vertical projection means were used to effectivelydetermine the around borders for the followed character segment. (2) In the part of the segment stage, the plates are detected and corrected with Houghtransform which can detect the tilt angle of the plate. With the inherent characteristics of thecharacter and geometry, the character segmentation boundaries are determined by the verticalprojection and threshold value. (3) In the part of character recognition, the features were extracted by normalized charactertrait. A method of SVM combined with Sequential Minimal Optimization algorithm was used forclassification and prediction, optimization parameter under small samples were obtained bycross-validation. This rest paper is organized as follows.Section2 concisely introduces license plate location and character segmentation. This section includes image preprocessing technology and license plate location technology. Then we introduce the SVM theory and license plate character recognition method based on SVM in Section 3. Experimental resultsare drawn in Section4 and conclusions are described inSection5. II. LICENSE PLATE LOCATION AND CHARACTER SEGMENTATION A. License Plate Preprocessing Technology 1) Image Gray Processing Color image contains a lot of image information, but in the license plate pretreatment grayscale image using only part of it, so you can improve on the license plate image processing speed and efficiency without disrupting subsequent related operations. Subsequent plates positioning, segmentation is based on grayscale images up operation. In the need to obtain color images, you can get the coordinates in grayscale image back to the color image, to obtain the corresponding part of the color image, so that both the pretreatment reduced the amount of information but also improves the processing efficiency. The aim of gray-scale image processing is to adjust the three components with R, G and B for color image. Assuming gray is the gray component of the image, so the transform equation is expressed as following: (1) 2) Image Enhancement and Denoising Image enhancement methods can be divided into two categories: one is the direct image enhancement method; the other is an indirect method of image enhancement. Histogram stretching and histogram equalization are the two most common indirect contrast enhancement methods. Histogram Stretching is the most basic kind of gradation conversion, using the simplest function is piecewise linear transformation, its main idea is to improve the image processing grayscale dynamic range, thereby increasing the prospects for grayscale and the difference between background intensity, in order to achieve the purpose of contrast enhancement, this method can be linear or non-linear way to achieve; histogram equalization is to use the cumulative function of the gray value is adjusted to achieve the purpose of contrast enhancement. Histogram equalization is a use of the image histogram to adjust the contrast method. Histogram equalization is the core of the original image histogram from a more concentrated into a gray zone change for the entire range of gray uniform distribution. Its purpose is to target image linear stretching redistribute image pixel value, a grayscale range so that approximately the same number of pixels. In license plate recognition system can be carried out to improve the lighting poor contrast enhancement in case of license plate image processing, making the subsequent part of the process to be more efficient and faster processing. (a)Original image (b)Median filteredg image Figure 2. Median filtered image denoising (a) Color image (b) Gray image Figure 1. Gray-scale image processing © 2014 ACADEMY PUBLISHER As due to the light and ambient interference to licenses plate images, there may be a lot of noise, which has JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 brought much high demands for the license plate location and identification. Therefore, we need to implement image denoising and ensure plate positioning stage not to be effected in image processing stage. Common image denoising methods have the following categories: Mean filter. This filtering can also be called a linear filter, which uses the core of the neighborhood average. Its basic principle is to use means to replace the original values of each pixel in the image, that is, the current pixel to be treated select a template, and the template consists of a number of its neighbor pixels, find the mean of all the pixels in the template, then this means assigned to the current pixel, as the processed image on the gray value of that point.As Fig. 2. Fig. 2 shows the results of median filtered image denoising. Median filtering is based on the theory of order statistics and can effectively suppress image noise nonlinear signal processing methods. Its basic principle is the image with a pixel value in a neighborhood of the point values of each point in pixel value to replace, so that the surrounding pixel gray value changes relatively large difference between the pixel values of the surrounding pixels taken and close to the value which can eliminate isolated noise points. Wavelet transform is a fixed window size, which can change the shape of the window time-frequency localization analysis method. Mostly due to noise, high frequency information, so that, when the wavelet transform, the noise information mostly concentrated in the second low frequency sub-tuner, and high-frequency sub-block, in particular high-frequency sub-block, based almost the noise, then this high-frequency sub-block when it is set to zero, and the second low frequency of sub-sub-block certain adjustments, you can achieve the removal of noise or noise suppression effect. 3) Image Edge Detection For an image, the image edge is its most basic, is the most obvious feature. Margin is the gray value within certain regions discontinuity caused by a pixel if the image pixel gray neighborhood has a step change, then to meet this nature the composition of the set of pixels to form the image edge, or may be the first order derivative of the second edge detection and judgment. Changes of the so-called step is the gray value of a point on both sides of significantly different and changes in the larger degree, the direction of its second derivative is zero. The main idea of image edge detection idea is to use edge detection operator to locate the image of the local edge position, and then define the pixels of the "edge strength" and by setting the monitoring threshold to locate the edge of the point of collection. Common edge detection algorithms consist of Roberts, Sobel, Prewitt and Canny [13-16]. Roberts is the gradient operator in the most simple from all operators. It is mainly used to locate the position of the edge partial differential, and it can obtain the better detection performance for steep and low-noise image edge. This operator is expressed as following: © 2014 ACADEMY PUBLISHER 255 (2) Sobel operator is with the size 3 × 3, to assume the point as the center, and then the operator is expressed as following: (3) There are similar detection principle between Prewitt operator and Sobel operator. Each pixel is implemented convolution with two templates, and then what is maximum value is as the output result. The differences between the two operators are to select different convolution templates. So, the template is selected as follows: (4) Figure 3. Edge detection and license plate location Canny operator is looking for an image to the local maximum gradient edge detection, and its gradient is the first derivative of Gaussian function to calculate the number, which is calculated as follows: (5) The magnitude and direction are computed as: (6) B. License Plate Location Technology In order to accurately locate the license plate, license plate recognition system generally includes coarse location and fine location. The basic plate positioning location process is showed as Fig. 4. Candidate regions can be obtained after image preprocessing, and then the real plate location is obtained by judgment. If license plate exists skew after coarsely 256 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 location, then to use Hough transform to detection the tilt angle of the license plate, and then to implement angle correction. Plate image finally can be obtained by vertical projection to fine location. The plate image is segmented into different patch to obtain characters. Because there may be tilt problems in coarse location, and therefore we need to be considered for tilt correction for precise location. There are common three methods for correct the plate image: one is based on Hough transform; one is based on the corner with the projector; one is based on the corner with the projection method. In this paper, we use the corrected method based on Hough transform to correct plate. The traditional Hough transform is a common method for detecting. The example of the image line detection, Hough transform the graphics from the image space to the parameter space, image space at any point in the parameter space corresponds to a curve. Fig. 5 shows the correction results by Hough transform. tilt, and for horizontal tilt plate image, we are using the iterative least projection method. The basic idea is to use this approach illumination model, license plate on the vertical tilt of the light irradiation, the use of intercharacter gap size, to get the amount of each projection vector. Only in the case of the character without tilting the projector is minimal, in which case it acquired the license plate tilt angle, and the plates of each row of pixels in the image shift. This part of the license plate of fine positioning with some coupling, so the thin plate portion further introduction locate relevant content. Input image Preprocessing Figure 6. License plate correction Candidate regions III. Adjust regions No Coarse Location Yes Correct by tilted angle Yes No Fine Location Output plate Image Figure 4. The basic flow of plate location Figure 5. Coarse location These points satisfy one line that they constituted all curves should intersect at a point in the parameter space, and coordinates of the point in image space is the parameters of related line. This line equation is as follows: (7) For the vertical direction inclined plate image, we use Hough transform method can effectively get the angle of © 2014 ACADEMY PUBLISHER LICENSE PLATE CHARACTER RECOGNITION BASED ON SVM A. SVM Theory Support vector machines (Support Vector Machine, SVM) Vapnik et al [17] first proposed in 1995, SVM classifier is a strong generalization ability, especially in the optimization of the small sample size problem, multilinear, non-linear and high dimensional pattern recognition problems demonstrate the unique advantage. It can be well applied to the function fitting model predictions other machine learning problem. VC theory SVM method is based on statistical learning theory and based on structural risk minimization principle above, through limited training sample information to seek the best compromise between the model complexity and generalization of learning ability, expect most good generalization ability. SVM is proposed mainly used for two types of classification problems in high-dimensional space to find a hyper-plane to achieve the classification ensures minimal classification error rate. SVM is a new statistical learning theory part, and it is mainly used to solve the case of a limited number of samples pattern recognition problem. SVM is proposed from optimal hyperplane under linearly separable condition, the so-called optimal classification plane is required to be classified without error, and to obtain the biggest distance between the two classes. Fig. 5 shows the classification process by SVM. Assume a linear separable sample set , where and . The general form of a linear discriminant function is expressed as following: (8) The classification plane equation is: (9) JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 257 where, represents sign function, is classification threshold and it is obtained by any one support vector. But for those of non-linear samples, if using SVM classification and the minimum number of points to make misclassification, we can add slack variable ( >0). So, there is the following equation: (17) Under constrained conditions, to give the constant , then to solve and to minimize the following equation: Figure 7. SVM classification (18) To normalize discriminant function between the two classes of all samples to meet , and then to set these samples that are close to the classification plane to meet . So, classification interval is , to We can transform the above optimal problem. Constraints can be expressed as: maximum interval is equivalent to minimize the . If all samples are required to classify correctly, there should meet the following criterion: In high-dimensional data transformation kernel function to solve the non-linear data conversion issues, kernel function method is rewrite the decision function in the above equation to be obtained as following: (10) Therefore, a classification plane that meets the above criterion and it also can minimize , the plane will be the best classification plane. Classification plane samples from the two nearest point and parallel to the optimal separating hyperplane face training samples, which is these samples they can make the formula established, and they are called support vectors. According to the above description, the optimal classification surface problem can be further converted into the following constrained optimization problem. To minimize the following formula: (11) This is a quadratic programming problem, to define the following Lagrange function: (12) where, and it is Lagrange coefficient. Under constraint conditions and , to solve the maximum of the following formula according to : (19) (20) We do not need to find a mapping function from lowdimensional to high-dimensional data mapping, only need to know the output can be converted. For the common linear inseparable, SVM can take advantage of the known nuclear function mapping low-dimensional data from low-dimensional to high-dimensional space, and can be constructed in a high-dimensional space to be divided into a linear hyper-plane. Since the original classic SVM algorithm for the two types of classification and recognition algorithm, achieved by a combination of two types of facial expression recognition of multi-class problems. To briefly summarize SVM theory, its basic idea is to firstly to implement a nonlinear transformation by converting the input space to a high dimensional space, and the space in this new classification surface optimal linear problem solving, and this linear transformation is the inner product by selecting the appropriate function to achieve. Fig. 6 shows that the optimal classification plane is obtained by kernel function method. (13) Optimal solution needs to be met: (14) Obviously, the coefficients of support vector are non-zero, and only support vector will affect the final classification result. So, can be expressed as: (15) where, weighted vector of optimal classification plane a linear combination of the training sample vectors. If is the optimal solution, after solving the above problem to get the optimal classification function is as following: Figure 8. Find the optimal classification plane by kernel function method Common kernel functions are described as following: (21) (22) (16) © 2014 ACADEMY PUBLISHER 258 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 (23) (24) How to choose the practical application of SVM Parameters of this group? Currently several common methods includes: grid method [18], bilinear method [19] and genetic algorithm [20]. This article uses the grid method using a cross-check to get local optimum parameters. Optimum penalty factor and nuclear function parameters are needed in practical applications. Therefore, in the experiment raw data can also be divided into more groups, between the groups in training and cross-test repeated. B. Feature Extraction and Classifier Construction After the license plate character segmentation, it is necessary for character feature extraction, feature extraction is a key step in the character recognition. How to select features to improve efficiency and accuracy of recognition is feature extraction problem to be solved. License plate characters have a lot of features, such as the character density characteristics, geometry, gray feature, contour features, color features and so on. There are many feature extraction methods, the main methods include of the skeleton refinement, 13 characteristics, block statistical feature, pixel location characteristics, etc. After license plate location, we can obtain the sizes of license plate. The license plate with a larger size can be determined as a close-range license plate, and the smaller size of the vision of the license plate can be determined as a far-view license plate. License plate characters are segmented for feature extraction. Pixel-by-pixel feature and block statistic feature are used to describe the license plate characters in this paper. We normalize the plate image with the size as , to obtain 48 features to count all pixels of each row. Similarly, we can obtain 24 features to count all pixels of each column. Then a plate image is segmented 16 blocks, and the sum of all pixels from each block is as one feature. So, we can obtain 88 features for a plate image. For the common linear inseparable, SVM can take advantage of the known nuclear function mapping lowdimensional data from low-dimensional to highdimensional space, and can be constructed in a highdimensional space can be divided into a linear hyperplane. Since the original classic SVM algorithm for the two types of classification and recognition algorithm, achieved by a combination of two types of facial expression recognition of multi-class problems. There are two methods to construct classifier: (1) "One-to-one" strategy, training multiple classifiers that separate each category twenty-two; (2) One-to-many "strategy, that is training a classifier which a separate class and all the rest of the class. This paper used the principle of "one to many", and SVM classifier with the nearest neighbor distance separation combined to achieve optimal classification performance. IV. EXPERIMENTAL RESULTS AND ANALYSIS In this paper, a license plate recognition system based on SVM is designed. The system is composed of three parts, license plate preprocessing and positioning, licenseplate character segment, license plate character recognition. Fig. 9 shows the flow of license plate recognition. Input plate image Preprocessing Image gray processing Coarse Location Precise Location Character segmentation Classifier Enhancement && denoising Character Recognition Figure 9. The flow of license plate recognition In order to evaluate recognition algorithm, we define the following indicators. Recognition rate is computed as following: (25) where, is the number of that plates are corrected recognition, and is the number of that plates are correctly located. Recognition rate is the ration between the number of plates that are correct recognition and the number of palates that are correctly located. Then, detection rate is defined as following: (26) where, is the number of that plates are correctly located, and is the number of the total plates. In the license plate character recognition experiments, we firstly select 500 images with a license plate. Then, we use plate location method and recognition algorithm to test the all plate images. Fig. 10 shows the partial of plate images. Figure 10. Partial of license plate images © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 259 occlusion and other effects, the detection rate and recognition rate will sharply reduce. Therefore, some preprocessing for plate image is necessary to reduce these factors, thus to improve detection rate and license plate recognition rate in future. TABLE I. LICENSE PLATE RECOGNITION RESULTS Different operating Correctly locate license plate Wrongly locate license plate Correctly identify license plate Wrongly identify license plate TABLE II. CORRECT RATE AND WRONG RATE Recognition rate Detection rate V. Figure 11. Results of license plate location and detection Fig. 11 shows results of license plate location and detection. From these experimental results, we can see that our method can accurately locate and detect license plate from plate images. Fig. 12 shows results of plate detection on different angles. The plates are detected and corrected with Houghtransform which can detect the tilt angle of the plate. From these experimental results, we can see that our method can locate plate on different angles. Especially, our method stilly locate plate when big angles. The number of plates 477 30 461 60 Correct rate 92.2% 95.4% Wrong rate 7.8% 4.6% CONCLUSIONS SVM as an important theory of machine learning and pattern recognition has been well applied to small sample clustering learning, nonlinear problems, outlier detection and so on. The License Plate Automation Recognition System has received extensive attention as an important application of Machine Learning and Pattern Recognition in Intelligent Transport. Therefore, it is theoretically and practically significant to research the license plate character recognition technology based on SVM. The plate recognition system is composed of three parts, license plate preprocessing and positioning, license plate character segment, license plate character recognition. In this paper, we mainly introduce the flow of license plate recognition, related technology and SVM theory. Experimental results show the effectiveness of license plate recognition using SVM REFERENCES Figure 12. Plate detection results on different angles From Table 1 and Table 2, we can see that the detection rate is 95.4% and recognition rate is 92.2%. Correct rate reflects the effectiveness of recognition algorithm, and detection rate reflects the effectiveness of location algorithm. From experimental results, we can also see that there are some errors in our experiments, as there is 7.8% wrong rate in plate recognition. Because the license plate image is sensitive to lighting, angle, © 2014 ACADEMY PUBLISHER [1] Broumandnia A, Fathy M, “Application of pattern recognition for Farsi license plate recognition”, ICGST International Journal on Graphics, Vision and Image Processing, vol. 5, no. 2, pp. 25-31, 2005. [2] Chang S L, Chen L S, Chung Y C, et al, “Automatic license plate recognition”, Intelligent Transportation Systems, IEEE Transactions on, vol. 5, no. 1, pp. 42-53, 2004. [3] Yu M, Kim Y D, “An approach to Korean license plate recognition based on vertical edge matching”, Systems, Man, and Cybernetics, IEEE International Conference on. IEEE, vol. 4, pp. 2975-2980, 2000. [4] Hegt H A, De La Haye R J, Khan N A, “A high performance license plate recognition system”, Systems, Man, and Cybernetics, 1998 IEEE International Conference on. IEEE, pp. 4357-4362, 1998. [5] Yan D, Hongqing M, Jilin L, et al. , “A high performance license plate recognition system based on the web technique”, 2001 Proceedings of Intelligent Transportation Systems, pp. 325-329, 2001. [6] Ren X, Jiang H, Wu Y, et al. , “The Internet of things in the license plate recognition technology application and design “, Business Computing and Global Informatization (BCGIN), 2012 Second International Conference on. IEEE, pp. 969-972, 2012. 260 [7] Chang C J, Chen L T, Kuo J W, et al. , “Applying Artificial Coordinates Auxiliary Techniques and License Plate Recognition System for Automatic Vehicle License Plate Identification in Taiwan”, World Academy of Science, Engineering and Technology, pp. 1121-1126, 2010. [8] Robert K, “Video-based traffic monitoring at day and night vehicle features detection tracking”, Intelligent Transportation Systems, ITSC'09. 12th International IEEE Conference on. IEEE, pp. 1-6, 2009. [9] Comelli P, Ferragina P, Granieri M N, et al. “Optical recognition of motor vehicle license plates”, Vehicular Technology, IEEE Transactions on, vol. 44, no. 4, pp. 790799, 1995. [10] Sirithinaphong T, Chamnongthai K, “The recognition of car license plate for automatic parking system”, Signal Processing and Its Applications, 1999. ISSPA'99. Proceedings of the Fifth International Symposium on. IEEE, pp. 455-457, 1999. [11] Kim K K, Kim K I, Kim J B, et al, “Learning-based approach for license plate recognition”, Proceedings of the 2000 IEEE Signal Processing Society Workshop, pp. 614623, 2000. [12] Wei D, “Application of License Plate Recognition Based on Improved Neural Network”, Computer Simulation, vol. 28, no. 8, pp. 2011. © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 [13] Maini R, Aggarwal H, “Study and comparison of various image edge detection techniques”, International Journal of Image Processing (IJIP), vol. 3, no. 1, pp. 1-11, 2009. [14] Vincent O R, Folorunso O, “A descriptive algorithm for sobel image edge detection”, Proceedings of Informing Science & IT Education Conference (InSITE), pp. 97-107, 2009. [15] Sen A. Implementation of Sobel and Prewitt Edge Detection Algorithm, 2012. [16] Wang B, Fan S S, “An improved CANNY edge detection algorithm”, 2009. WCSE'09. Second International Workshop on. IEEE, pp. 497-500, 2009. [17] Vapnik V, “The Nature of 6tatistical Learning Theory”, Data Mining and Knowledge Discovery, vol. 6, pp. 1-47, 2000. [18] Osuna E, Freund R, Girosit F, “Training support vector machines: an application to face detection”, Proceedings 1997 IEEE Computer Society Conference on, pp. 130-136, 1997. [19] Kao W C, Chung K M, Sun C L, et al. “Decomposition methods for linear support vector machines”, Neural Computation, vol. 16, no. 8, pp. 1689-1704, 2004. [20] Hsu C W, Lin C J, “A comparison of methods for multiclass support vector machines”, Neural Networks, IEEE Transactions on, vol. 13, no. 2, pp. 415-425, 2002. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 261 Adaptive Super-Resolution Image Reconstruction Algorithm of Neighborhood Embedding Based on Nonlocal Similarity Junfang Tang Institute of Information Technology, Zhejiang Shuren University, Hangzhou 310015, Zhejiang, China Email: [email protected] Xiandan Xu New Century Design& Construction (NCDC Inc.), New York 10013, USA Email: [email protected] Abstract—Super resolution technology origins from the field of image restoration. The increasing difficulties in the resolution improvement by hardware prompts the super resolution reconstruction that can solve this problem effectively, but the general algorithms of super resolution reconstruction model are unable to quickly complete the image processing. Based on this problem, this paper studies on adaptive super-resolution reconstruction algorithm of neighbor embedding based on nonlocal similarity, in the foundation of traditional neighborhood embedding super resolution reconstruction method, using nonlocal similarity clustering algorithm, classifying the image training sets, which reduces the matching search complexity and speeds up the algorithm; by introducing new characteristic quantity and building a new calculation formula for solving weights, the quality of reconstruction is enhanced. The simulation test shows that the algorithm proposed in this paper is superior to the traditional regularization method and the spline interpolation algorithm no matter on the objective index about statistic and structural features or subjective evaluation. Index Terms—Neighborhood Embedding; Super Resolution; Image Restoration I. INTRODUCTION Super-resolution reconstruction refers to the technology that constructs high-resolution images from low-resolution ones [1]. It was first proposed with the concept and method of the single frame image reconstruction, which mainly resorts to resampling and interpolation algorithm. However, these methods will usually lead to some smoothing effects, and as a result, the image edge details cannot be reconstructed very well. And multi-frame image reconstruction can just solve this problem [2]. This technology enhances the image resolution by making full use of the different information offered by low-resolution images of different frames. Image resolution is an important index of image detail appearance ability which describes the pixels image contains, and put another way, is a measure of the amount of image information [3]. In many cases, however, due to © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.261-268 the limit of hardware device in imaging system (such as imaging sensor), people cannot observe image of high resolution. It costs too much updating the hardware to improve image resolution, and in the short term it is difficult to overcome the technical problems of some specific imaging system. The super resolution image reconstruction instead utilizes the software on the premise of existing hardware device to improve the resolution, which is applicable in many fields [4]. Super-resolution reconstruction was first proposed in the 1960s. In the following decades, many scholars still haven’t got its ideal effects in the practical applications although they studied a lot. At that time, it had been called the “myth of super-resolution” for the superresolution was considered impossible with the effects of noise. There has not been any breakthrough until the end of 1980s with the efforts of Hunt, etc. In the 1980s, researchers in the field of computer vision began to study SR reconstruction technique [5]. Tsai and Huang first proposed a multi-image SR reconstruction algorithm based on Fourier domain [6]. Then, researchers improved this algorithm to extend the application range. However, this kind of SR algorithm is applicable only to the degradation model of the global translational motion and constant linear space. SR reconstruction gradually made progress since 1990s. In 1995, Hunt firstly explained from the theory the possibility of super resolution reconstruction [7]. At the same time, researchers also proposed some classical SR algorithms, for example, iterative back projection method (IBP) [8], projection onto convex sets (POCS) [9], the maximum likelihood estimation method (ML) [10], maximum a posteriori estimation method (MAP) and the hybrid ML/MAP/POCS method [11]. In the late 1990s, SR reconstruction became a hot international topic deriving a variety of SR reconstruction algorithms. In 2004, Chang [12] introduced the idea of Neighbor Embedding in manifold learning into super-resolution reconstruction, assuming that the low resolution image block and high resolution image block have similar local manifold 262 structure, getting the training sets of low resolution image and high resolution image block through the image training, then searching the K neighbor blocks in the training set of low resolution image blocks to be reconstructed and solving their neighbor coefficients, and with the linear combination of the coefficient and K corresponding high resolution neighbor blocks of high resolution training set to obtain the high resolution image block after matching reconstruction. The advantage of this method is that the number of training sets is small and the reconstruction time is relatively short, but the reconstruction exists the over-fitting and under-fitting phenomenon. In 2008, the compressed sensing [13] was introduced into the super-resolution reconstruction, and Yang et al. [14] used the linear programming and low resolution dictionary to solve the sparse representation of low resolution image block to be reconstructed, using sparse representation coefficients and the corresponding high resolution image block to finishing image reconstruction. This algorithm was advantageous in its no need to set the block number of a low resolution image for the sparse representation, but the construction of the dictionary is random and unpopular. Super-resolution reconstruction algorithm can be divided into two kinds, the way based on reconstruction and based on study. Most super-resolution reconstruction algorithms can be classified into the way based on reconstruction according to the existing documents. It also can be divided into frequency domain method and spatial domain methods according to the reconstructed super-resolution reconstruction algorithm. Frequency domain methods improve the quality of images by eliminating the frequency aliasing in the frequency domain. Tsai and Huang proposed the image reconstruction methods based on approaching frequency domain according to the shifting properties of Fourier transform. Kim and others extended Tsai and Huang’s ideas and proposed the theory based on WRLS. In addition, Rhee and Kang adopted DCT (Discrete Cosine Transform) instead of DFT (Discrete Fourier Transform) in order to decrease the operand and increase the efficiency of algorithm. Also, they overcome the lack of frames by LR sampling and the ill-conditioned reconstruction of unknown sub-pixel motion information by regularization parameter. The frequency domain methods were with comparatively easy theories and low computation complexity. However, this kind of methods could only handle some conditions of global motion. What’s more, the loss of data’s dependency in the frequency domain made the application of prior information in regularization ill-conditioned problem difficult. So the recent studies are most in the spatial domain methods. This essay lucubrated the relative problems from the aspect of super-resolution methods, proposing an algorithm to super-resolution reconstruct images by non-local similarity, and improved algorithm to present the high-speed algorithm. This essay presented that the image super-resolution reconstruction algorithm of non-local similarity could eliminate the man-made effects such as marginal sawtooth in reconstructed image © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 by studying the super-solution image reconstruction methods and analyzing its key problems. Intensify the real margin by bilateral filter. And it could learn from the low-resolution image by the non-local similarity of natural image and guide to reconstruct image with the relationship between similar structural pixels. This paper proposes a neighbor embedding adaptive super-resolution reconstruction algorithm based on nonlocal similarity, using the K-means clustering algorithm to classify the image training sets, which reduces the calculation of search matching, speeds up the algorithm, and then improves reconstruction quality by introducing new features and new formula to solve the weight. The simulation test shows that, compared to the traditional regularization method and the spline interpolation algorithm, the model proposed in this paper is better in both the objective index of the statistic and structure features and in subjective evaluation. The basic idea of super resolution is the combination of fuzzy image sequences of low resolution and noise to produce an image or image sequence of high resolution [15]. Most of the super-resolution image reconstruction methods have three compositions, as shown in Figure 1: motion compensation, including motion estimation and image registration, interpolation and blur and noise reduction [16]. These steps can be realized separately or simultaneously according to the reconstruction methods. SR reconstruction method based on frequency domain contains only two links: the motion estimation and interpolation, during which to solve the equations of displacement in the frequency domain is equivalent to the interpolation process. Spatial domain SR reconstruction method contains these three links, and most of the spatial domain methods, such as IBP, POCS, MAP and adaptive filtering method, integrates the interpolation and blur and noise reduction into a process. Some other spatial domain methods synthesize the motion estimation, interpolation and blur and noise reduction to only one step. Figure 1. Super-resolution scheme II. SUPER-RESOLUTION IMAGE RECONSTRUCTION ALGORITHM A. Frequency Domain Frequency domain methods utilize the aliasing existing in each low resolution image to reconstruct a high JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 263 resolution image [17]. Tsai and Huang firstly derived a systematic equation between a low resolution image and a super resolution image expected by using the relative movement between the low resolution images [18]. It is based on three principles: The displacement properties of Fourier transform; The aliasing relationship between Continuous Fourier transform (CFT) of the original high resolution image and the discrete Fourier transform (DFT) of low resolution observation image; Original super resolution image is supposed to be band-limited. These properties make possible the formalism of systematic equation which links the DFT coefficient of an aliasing low resolution image with a CFT sample of unknown image. Suppose f (t1 , t2 ) represents the continuous super resolution image, and F (w1 , w2 ) is the continuous Fourier transform. The k th displacement image f k (t1 , t2 ) f ((t1 k1 , t2 k 2 ) after global translation which is the only motion in frequency method, where k1 and k 2 are known arbitrary values, k 1, 2,..., p . Then, the CFT Fk (w1 , w2 ) of displacement image is noted by these properties as: Fk (w1 , w2 ) exp j 2 ( k1w1 k 2 w2 ) F (w1 , w2 ) (1) This expression indicates the relationship between CFT of displacement image and CFT of reference image. Low resolution observation image gk (n1 , n2 ) is generated by taking samples to the displacement image f k (t1 , t2 ) with sampling period T1 and T2 . From the aliasing and band-limited assumption of F (w1 , w2 ) , namely for w1 ( L1 / T1 ) w2 ( L2 / T2 ) , there is F ( w1 , w2 ) , the relationship is noted between the CFT of super resolution image and the DFT of the k th low resolution image: Gk (1 , 2 ) 1 T1T2 L1 1 L2 1 2 1 2 2 n1 , n2 (2) T2 N 2 1 N1 F T k n1 0 n2 0 where 1 and 2 are the sampling points in discrete Fourier transform domain of gk (n1 , n2 ) , and 0 1 N1 , 0 2 N 2 . Lexicographical Order is employed to the index n1 , n2 on the right side of the equation and the k on the left side to obtain a matrix vector form of (2): G F (3) Here, G is a p 1 vector with elements of DFT coefficient of gk (n1 , n2 ) ; F is L1 L2 n1 vector with the unknown samples as the elements through CFT of x(t1 , t2 ) ; is p L1 L2 vector which links the DFT of low resolution observation image with samples from continuous super resolution image. Therefore, the © 2014 ACADEMY PUBLISHER reconstruction of a super resolution image requires definite to solve the inverse problem. The simple theory is the main advantage of the frequency domain method that the relationship of low resolution image and super resolution image is explained clear in the frequency domain. It is useful for the parallel computing reducing the device complexity. But the observation model is limited to the global translation motion and the blur with constant linear space. In addition, lack of data correlation in the frequency domain makes difficult the application of prior knowledge about spatial domain to regularization. B. Regularized Super-Resolution Reconstruction Method In the case of insufficient low resolution images and non-ideal fuzzy operator, super resolution image reconstruction methods usually tend to be ill-posed [19]. A method used for the stable inverse process of ill-posed problem is known as regularization method. Following we introduces the deterministic regularization and the stochastic regularization method for super resolution image reconstruction. Here highlights the constrained least squares (CLS) and maximum a posteriori probability (MAP) super-resolution image reconstruction method. With the estimation of registration parameters, the observation model in equation (2) can be determined completely. Deterministic regularization super-resolution method uses prior information about the solution to solve the inverse problem in equation (2), which can make the problem a well posed one. CLS means choosing appropriate f to minimize the Lagrange operator. p g k Ak f k 1 2 Cf 2 (4) Here, operator C is usually a high-pass filter; denotes the norm of L2 . In this equation, the prior information of a reasonable solution is expressed by the smooth constraint which means most of the images are naturally smooth together with limited high frequency motion. Therefore, the reconstruction image of the minimum high-pass energy should be considered as the solution of deterministic method. The Lagrange multiplier , is usually called as regularization parameter which controls the compromises of data precision f and the smoothness of the solution Cf 2 . More means more smooth solution. Larger is a quite useful when there are few low resolution images or the precision of observation data reduces because of the registration error and noise while smaller is useful on the opposite side. The cost function, a differential convex function in (4) adopts a square regularized term in favor of the global unique solution f . A basic deterministic iteration method is to solve the following equation: p p T ˆ T T Ak Ak C C f Ak g k k 1 k 1 (5) 264 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Method of steepest descent taken to it, the iteration to ˆf should be, p fˆ n 1 fˆ n AkT ( g k Ak fˆ n ) C T Cfˆ n (6) k 1 where is the convergence parameter, and AkT contains the sampling operator and blur and deformation operator. Katsaggelos et al proposed a multi-channel regularization super-resolution method where the regularization functional was used to calculate the regularization parameter without any prior knowledge at each iteration step. Kang described the generalized multichannel deconvolution method including multi-channel regularization super-resolution method. Hardie et al proposed a super-resolution reconstruction method minimizing regularized cost functional, defining an observation model of the optical system and detector array (a kind of sensor point spread function). They used an iterative registration algorithm based on gradient, and took into consideration two optimization processes minimizing the cost functional, the gradient descent and conjugate gradient optimization process. Bose et al pointed out the importance of regularization parameter, and put forward a constrained least squares super resolution reconstruction method which obtains optimal parameter by using the L curve method. C. Random Method Random super resolution image reconstruction is a typical Bayesian method which provides a convenient tool for the prior knowledge model of solution. Bayesian estimation usually makes effect when a posterior probability density functional (PDF) of original images can be constructed. MAP estimator of f maximizes the PDF P( f g k ) . f arg max P( f g1 , g2 ,..., g P ) (7) Bayesian theorem and logarithmic function are used for conditional probability to show the MAP optimization problem. f arg max ln P( g1 , g2 ,..., g p f ) ln P( f ) (8) Prior image model P( f ) and conditional density P( g1 , g2 ,..., g p f ) is determined by the prior knowledge and noise statistic information of the high resolution image f . Due to its prior constraints, this MAP optimization method can effectively provide a regularized super-resolution estimation. Bayesian estimation usually takes Markov random field prior model, a strong method for image prior model. P( f ) can be described by a equivalent Gibbs prior model, with the probability density defined as: P( f ) 1 1 exp U ( f ) exp c ( f ) (9) Z Z cS © 2014 ACADEMY PUBLISHER Z is a regularized constant, U ( f ) is the energy function, c ( f ) is a potential function depending on the pixel in the clique, and S signifies the clique set. U ( f ) can measure the cost of the irregularity of solution through defining c ( f ) as the function of image derivative. Usually the image is thought as global smooth and be combined into the estimation by Gaussian prior model. The advantage of the Bayesian frame is that it can use the prior model of constant edge. Potential function is expressed in a quadric form c ( f ) ( D( n) f )2 with Gaussian prior model, where D( n ) is n order difference. Although quadric potential function can form the linear algorithm in the process of the derivation, it severely punishes the high-frequency components. Thus, the solution is an over smooth solution. However, if the potential function model is weak in the punishment on large difference f , then an edge-preserving image will be obtained with high resolution. If the inter-frame error is independent, and noise is the independent and identically distributed zero mean Gaussian noise, then the optimization problem can be compactly represented as p fˆ arg max g k Ak fˆ c ( fˆ ) cS k 1 (10) Here, is regularization parameter. If the Gaussian prior model is adopted in (10), then the estimation defined by (4) is MAP estimation. Maximum likelihood (ML) estimation is also used for super resolution reconstruction. ML estimation is a special case of the MAP estimation in the absence of prior set. However, for the ill posed condition of inverse problem of super resolution, MAP estimation is usually better than ML estimation. The stability and flexibility of the model on the noise characteristics and a priori knowledge is the main advantage of random super resolution method. If the noise process is white Gaussian model, MAP estimation with a convex energy function can guarantee the uniqueness of the solution in the prior model. Therefore, gradient descent method is not only able to estimate high resolution images, but also used to estimate the motion information and high resolution image at the same time. Generally speaking, all of three kinds of super resolution image reconstruction algorithms listed above are sensitive to high frequency information, which is not conducive to the edge preserving etc. III. NEIGHBORHOOD EMBEDDING SUPER-RESOLUTION RECONSTRUCTION ALGORITHM BASED ON NONLOCAL SIMILARITY Local linear embedding is to solve the linear expression in a higher dimensional space, and map it into a low dimensional space while neighborhood embedding is to solve the linear relation in low dimensional space and then map it to a high dimensional space [20]. Therefore, neighborhood embedding can be regarded as JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 the inverse process of local linear embedding with the same steps. Super resolution reconstruction algorithm based on neighborhood embedding is mainly divided into two steps: the first step is to select some typical images as the training images, simulating the process of degradation, extracting corresponding high and low resolution image block to establish image training set; the second step is to search for matching high resolution characteristic image and calculate the corresponding coefficients for reconstruction. We suppose Lt is the low resolution image to be reconstructed, H t is the output high resolution image after reconstruction, Ls is the low resolution training set and H s is the high resolution training set. Extracted from the Lt firstly, the characteristic quantity should keep well with the characteristic selected from the image training set. Because the reconstruction process aims at the image block, Lt after extraction should be blocked to reconstruct every low resolution image f i . After characteristic extraction, matching searching follows that requires K image blocks closest to the f i . This algorithm is based on the Euclidean, that is to say, it finds K image blocks in the low resolution image set. According to the premise of the algorithm, a low resolution image block and a high resolution characteristic image are similar in local manifold, together with the consistency of the low and high resolution training sets. Therefore, it is nature to find the corresponding K high resolution characteristic image for a linear combination. Then follows the calculation of reconstruction weight coefficient by finding out K low resolution neighbor blocks with matching search method, writing out the linear expression. It can be solved with the equation as below: 265 h j is the high resolution characteristic image block and wij is the reconstruction weight coefficient. The high resolution image blocks from the inverse process of characteristic extraction make up the high resolution image after reconstruction. The similarity between pixel i and pixel j in an image can be evaluated by selecting a fixed size square window as a neighbor window and supposing N i is the square area centering on i and N j is the area centering on j . Their pixel similarity is determined by similarity of the gray vector z ( Ni ) and z ( N j ) . Considering the structure feature of the neighbor windows, the Gaussian Weighted Euclidean distance between gray vectors is chosen as a measure, as shown in equation (13). d (i, j ) z ( Ni ) z ( N j ) 2 (13) 2, a a is the standard deviation of Gaussian Kernel function. Then the similarity is obtained with the calculated Euclidean distance. There is a positive correlation between them, w(i, j ) 1 exp(d (i, j ) / h 2 ) Z (i) Z (i) exp(d (i, j ) / h2 ) (14) (15) j Z (i ) , here, is a normalized constant, and h determines the degree of attenuation and has a great influence on similarity. If the nonlocal similarity is used in the de-noising, the formula for a pixel i is shown below: NL v (i) w(i, j )v( j ) (16) jI 2 Wi arg min fi Wi w d ij s.t.wij 1 (11) j d j Ni j f i denotes the i th characteristic of low resolution image block to be reconstructed; d j is the j Here, th neighbor block in the low resolution training set; N i is the set of all the K low resolution blocks; Wi is the reconstruction weight coefficient. To solve the (11) is to minimize the error and meet the requirements that the sum of wij is below 1 and wij equals to 0 if block does not belong to the set N i . Finally, the high resolution characteristic image block yi is obtained after the linear combination of the K reconstruction weight coefficients and corresponding high resolution image blocks, as shown in expression (12), yi w h ij h j N i © 2014 ACADEMY PUBLISHER j (12) NL v (i) is the value of pixel i after de-noising, I is the set of pixel similar to i , w(i, j ) is the weight coefficient which measured by the similarity between i and j , and v( j ) is the pixel value of pixel j . For a super-resolution reconstruction problem, the low resolution image usually contains the noise which will influence the extraction of image block. For example, an image block to be reconstructed, pi , its characteristics is formed by following expression, fˆi fi n (17) fˆi is the extracted characteristic, f i is real characteristic of image block, and n denotes the noise. When the image block is a flat block, the influence of f i will be smaller than noise n , at this time, fˆ will show i more characteristics about noise and the neighbor block 266 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 will be inaccurate. In order to remove the noise effect, we refer to the de-noising method of nonlocal mean filtering, through finding out the similar blocks and calculating their weight, then combining them to search for K neighbor block. Firstly, similar block to be reconstructed should be searched out with low resolution, as shown in figure 2. Supposing the block as p1 with size 3×3, a 7×7 matching block mi is built centering on it with which similar blocks are searched in a 21×21 searching window. Nonlocal mean de-noising algorithm uses Euclidean distance as a measure, different with the algorithm in this paper that sets the sum of absolute deviation (SAD) value as the measure between searching block and matching block. Taking two 7×7 blocks with the minimum SAD, recorded as SAD1 and SAD2, their corresponding center place, two 3×3 small blocks p2 and p3 , then the weight effect of the similar block can be solved by following equation: m1 1 SAD1 m2 e h SAD2 m e h 3 (18) jN W (19) similar blocks p1 p2 and p3 , respectively. Figure 2. Non-local similarity search. Then the characteristics f1 f 2 and f 3 are obtained through the extraction of the similar block of low resolution image block. Together with the weight 2 2 2 f2 l j 2 2 2 3 f3 l j ) (20) 2 N is the low resolution characteristic image block training set. Through the introduction of nonlocal similarity constraint, a joint search involving the search of similar blocks and calculation of the weighted coefficient helps find k neighborhood blocks in the low resolution training set, effectively restraining the effect of noise on image block. In addition, this algorithm applying the nonlocal similarity for image restoration in a sparse model presents the same dictionary elements in a similar block in the sparse decomposition, which can be used to solve the joint sparse representation coefficients. According to this theory, in this algorithm, the training set is similar to a dictionary, and similar blocks benefit finding out the exact K nearest neighbor coefficient. The weight coefficient calculation formula is updated with the similar block weights. 3 where 1 , 2 and 3 are the weight coefficients of the © 2014 ACADEMY PUBLISHER min(1 f1 l j 2 W arg min k f k In this expression, parameter h controls the degree of attenuation of the exponential function determined by the searching window. Doing normalization to (18), we get the weight coefficient: m1 1 m1 m2 m3 m2 2 m m 1 2 m3 m3 3 m1 m2 m3 coefficients and Euclidean distance, K neighboring blocks are searched out with the expression, k 1 w j d j s.t.w j 1 d j M (21) j 1 , 2 and 3 is from the (19), M is the set of K neighborhood blocks searched out by (20), w is the reconstruction weight coefficient. The solution of (21) makes the minimum error and meet the requirements that the sum of w j is 1 and the w j of the block out of the set M is zero. Similar blocks have similar neighbor structures in the training set. By introducing the constraint effect of similar block, it is useful to estimate the neighbor structure and calculate accurate weight coefficient. IV. SIMULATION EXPERIMENT In order to test the proposed non-local similarity neighborhood embedded self-adaptive super-resolution reconstruction algorithm model, the following two experiments were carried out on it. The first experiment tested its PSNR and image structure similarity. The second experiment compared PSRN and run time on the platform of Matlab. The analysis and comparison of the two experiments tested the functionality of the non-local similarity neighborhood embedded self-adaptive superresolution reconstruction algorithm model in detail proposed in this essay. Experiment 1: The Number image with size 256x256 and Lena image are selected as the original highresolution image. Then a sequence of low solution images, totally 15, are generated after translation with the shift range between 0-3 pixels, fuzzy with a Gauss operator in a 3x3 window, and down sampling whose factor is 2. Gauss noise with different noise variances are added into low resolution image sequence to get the required data. Compared with the spline interpolation method and the JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 traditional regularization methods, this algorithm adopts the peak signal to noise ratio and structure similarity to measure the quality of image reconstruction. From Figure 3 and Figure 4, it is obvious that the algorithm this paper proposed has great improvement compared with the regularization method and the spline interpolation algorithm PSNR, an average improvement of 0.5dB. This is because the algorithm takes into account the local information of image, reducing the error introduced by the regularization. It also can be seen from (b), (d) that the algorithm has larger improvement in SSIM, and for Number images, along with the change of noise variance, corresponding curve of spline interpolation algorithm and the traditional regularization methods declines similar to the negative exponential curve while that of this algorithm similar to the line. Therefore, in a certain range, this algorithm is superior obviously. 267 image low-pass filtered with the sampling factor 2. The bilateral filtering includes these parameters: filtering window size 7 x 7; the variance of spatial distance function c 10(domain filtering ) ; the variance of pixel similarity function s 30(range filtering ) . The parameters of nonlocal similarity are matching window size 5 x 5, the search window size 9 x 9, and the number of similar structure pixels 7. Table I is the objective evaluation of PSNR experiments. From the results, the former two methods show barely difference in PSNR, but PSNR difference of the method in this paper is relatively large, ranging from 0.3 to 0.5dB. This is because this algorithm will lose basic image information during the degradation of image block data, this algorithm is to reduce the dimensionality of high-dimensional data, which means our algorithm transforms the high dimension data into low dimensional data space in the loss of a small amount of information, and thus decreases the objective measurement of PSNR. TABLE I. LR image Original algorithm Edge detection algorithm Algorithm TABLE II. LR image Figure 3. Image PSNR curve Figure 4. Image structure similarity curve From figure 3 and 4, this algorithm proposed is superior no matter on the objective index about statistic and structural features or subjective evaluation. Experiment 2: The following experiments are carried out on the Matlab platform. The simulation experiment is to degrade the test image (Fig. 5) to get low resolution image, and then to reconstruct it to get the reconstructed high resolution image. The degradation process from high resolution to low resolution images involve the Gauss filtering to original image with Gauss filtering window size 5 x 5 and the variance 1, and down sampling of the © 2014 ACADEMY PUBLISHER Original algorithm Edge detection algorithm Algorithm PSNR (DB) RESULTS Cman 26.36 26.36 25.64 Bike 26.92 29.86 26.32 Foreman 32.47 32.45 31.88 House 24.95 24.88 24.26 RUN TIME (S) RESULTS Cman (128×128) Bike (256×174) Foreman (176×144) House (256×256) 19.5 58.4 29.7 77.0 7.2 21.1 10.9 28.2 1.5 7.7 2.4 9.9 Table II shows the operation time of three kinds of methods. It can be seen that the running time of the method of adding the pixel classification is about one fifth of that of the original algorithm. The running time of different images by adding different image edge detection method is different for the edge detection is in a positive relationship with the image content. The more textured edges the image has, the more time the image processing consumes. Our algorithm incorporating the degradation and edge detection runs the fastest because it greatly reduces the dimensions, dealing with from 49 dimensional data to 16 dimensions data. It could be seen that the PSNR of the non-local similarity neighborhood embedded self-adaptive superresolution reconstruction algorithm model proposed by this essay had more improvement as the change of noise variance than that of the traditional regularization method and the spline interpolation. The corresponding curve of interpolation algorithm and traditional regularization method which was similar to the negative exponent decreased. However, the algorithm corresponding curve in this essay was similar to straight-line decline, so in a certain range, the advantage of the algorithm in this essay was more obvious and the run speed of dimensionality reduction with detection processed method was the fastest. 268 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 V. CONCLUSION Digital image is the foundation of image processing and the spatial resolution of digital imaging sensor is an important factor to image quality. With the progress of information and the popularization of image processing, scientific research and practical application demands high on the quality of digital images, and thus poses a new challenge to the manufacturing technology of the image sensor. We can try a scheme of hardware to improve the spatial resolution of the image, such as reducing the pixel size or expanding photoreceptor chip to increase the number of pixels of unit area, but reducing the pixel size and increasing the size sensor chip exist technical difficulties, and the expensive high precision sensor is not suitable for popularization and application. Therefore, super resolution reconstruction technique employing signal processing method improves the image resolution of existing low resolution imaging system, which attracts great attention and in-depth study globally and has important theoretical significance and application value. REFERENCE [1] SU Bing hua, JIN Wei qi, NIU Li hong, LIU Guang rong, “Super resolution image restoration and progess”, Optical Technology, vol. 27, no. 1, pp. 6-9, 2001. [2] PARK S C, PARK M K, KANG M G, “Super-resolution image reconstruction: a technical overv iew”, IEEE Signal Processing Magazine, vol. 20, no. 3, pp. 21-36, 2003. [3] WANG Liang, LIU Rong, ZHANG Li, “The Meteorological Satellite Spectral Image Registration Based on Fourier-Mellin Transform”, Spectroscopy and Spectral Analysis, no. 3, pp. 855-858, 2013. [4] GUO Tong, LAN Ju-long, HUANG Wan-wei, ZHANG Zhen, “Analysis the self-similarity of network traffic in fractional Fourier transform domain”, Journal on Communications, vol. 34, no. 6, pp. 38-48, 2013. [5] CHEN Huahua, JIANG Baolin, LIU Chao, “Image superresolution reconstruction based on residual error”, Journal of Image and Graphics, vol. 16, no. 1, pp. 42-48, 2013. [6] BAI Li ping, LI Qing hui, WANG Bing jian, ZHOU Hui xin, “High Resolution Infraed Image Reconstruction Based on Image Sequence”, Infrared Technology, vol. 24, no. 6, pp. 58-61, 2002. [7] ZENG Qiangyu, HE Xiaohai, CHEN Weilong, “Compressed video super-resolution reconstruction based on regularization and projection to convex set”, Computer Engineering and Applications, vol. 48, no. 6, pp. 181-184, 2012. [8] JIANG Yu-zhong, YING Wen-wei, LIU Yue-liang, “Fast Maximum Likelihood Estimation of Class A Model”, Journal of Applied Sciences, vol. 32, no. 2, pp. 165-169, 2013. © 2014 ACADEMY PUBLISHER [9] XU Zhong-qiang, ZHU Xiu-chang, “Super-resolution Reconstruction Technology for Compressed Video”, Journal of Electronics & Information Technology, vol. 29, no. 2, pp. 499-505, 2007. [10] SU Heng, ZHOU Jie, ZHANG Zhi-Hao, “Survey of Superresolution Image Reconstruction Methods”, Acta Automatica Sinica, vol. 39, no. 8, pp. 1202-1213, 2013. [11] CHANGH, YEU NGDY, XIONGYM. “Super-resolution th rough neighbor embedding”, Proceedings of the IEEE C o mputer So ciety Conferenceon Computer Vision and Patte rn Recognition, 2004, pp. 275-282. [12] CANDES E J. “Compressive sampling” Proceedings of the International Congress of Mathematicians, 2006, pp. 143145. [13] CANDES E J, WAKIN M B, “An introduction to compress ive sampling”. IEEE Signal Processing Magazine, 2008, pp. 21-30. [14] WRIGHT J, HUANG T, MA Y. Image super-resolution as s parse representation of raw in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1-8. [15] YANG J C, WRIGHT J, HUANG T, etal. Image super- re solution via sparse representation on Image Processing, 2010, pp. 2861- 2873. [16] XIE Kai, ZHANG Fen, “Efficient super resolution image reconstruction parameter estimation algorithm”, Journal of Chinese Computer Systems, 2013, pp. 2201-2204. [17] YING Li-li, AN Bo-wen, XUE Bing-bin, “Research on Super-resolution Reconstruction of Sub-pixel Images”, Infrared Technology, 2013, pp. 274-278 [18] Zhang Yilun, Gan Zongliang, Zhu Xiuchang, “Video super-resolution method based on similarity constraints”, Journal of Image and Graphics, 2013, pp. 761-767. [19] CAO Ming-ming, GAN Zong-liang, ZHU Xiu-chang, “An Improved Super-resolution Reconstruction Algorithm with Locally Linear Embedding”, Journal of Nanjing University of Posts and Telecommunications (Natural Science), 2013, pp. 10-15. [20] JIANG Jing, ZHANG Xue-song, “A Review of Superresolution Reconstruction Algorithms”, Infrared Technology, 2012, pp. 24-30. Junfang Tang, born January 1977 in Shangyu City, Zhejiang Province, China, majored in management information system during her undergraduate study in Shanghai University of Finance and Economics and got a master degree of software engineering in Hangzhou Dianzi University. She focuses her research mainly on computer graphics and image processing and has published several professional papers on the international journals and been in charge of several projects, from either the Zhejiang Province or Zhejiang provincial education department. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 269 An Image Classification Algorithm Based on Bag of Visual Words and Multi-kernel Learning LOU Xiong-wei 1, 3, HUANG De-cai 2, FAN Lu-ming 3, and XU Ai-jun 3 1. College of Information Engineering, Zhejiang University of Technology, Hangzhou, Zhejiang, 310032, China 2. School of Computer Science & Technology, Zhejiang University of Technology, Hangzhou, Zhejiang, 310032, China 3. College of Information Engineering, Zhejiang A & F University, Linan, Zhejiang, 311300, China Abstract—In this article, we propose an image classification algorithm based on Bag of Visual Words model and multikernel learning. First of all, we extract the D-SIFT (Dense Scale-invariant Feature Transform) features from images in the training set. And then construct visual vocabulary via K-means clustering. The local features of original images are mapped to vectors of fixed length through visual vocabulary and spatial pyramid model. At last, the final classification results are given by generalized multiple kernel proposed by this paper. The experiments are performed on Caltech-101 image dataset and the results show the accuracy and effectiveness of the algorithm. Index Terms—BOVW; Image Pyramid Matching; Kernel I. Classification; Spatial INTRODUCTION The image is always an important approach to convey information and has penetrated into all aspects of our life. In particular, with the development of the Internet and multi-media technology nowadays, the digital image has become an import media for modern information, and its increasing rate makes traditional management method of manual labeling more and more infeasible [1]. Thus, many researchers have started to work on automatic image classification by computers to sort images into different semantic classes according to people’s comprehension. Problems in image classification, including scene detection, object detection and so on, are hot and difficult issues in modern computer vision and multi-media information. Due to the wide application of images and videos, we are in bad need of excellent and accurate image comprehension algorithms to address problems in image classification. Computer vision aimed at image comprehension emphasizes on the function of computers to visually comprehend images. Vision is an essential approach for human to observe and cognize the world. According to statistics, a big portion of information people obtained from the outside world stem from the visual system. Narrowly speaking, the final target of vision is to reasonably explain and describe the image to the observant. Generally speaking, vision even includes action plan according to the explanation, description, environment and the will of the observant. Therefore, computer vision aimed at image comprehension is the realization of human vision via © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.269-277 computers, and it is an important step for artificial intelligence to accurately comprehend the world, which can percept, cognize and comprehend the 2D scenes of the world. For the time being, this research area mainly focuses on object detection, object description and scene comprehension. Therein, object detection serves for accurate description of scene and is the basis of scene description and comprehension. In turn, scene description and comprehension provide priory knowledge for object detection and guide the process by giving background knowledge and context information. In the light of computers, image comprehension is to input the image (mainly digital image) from vision via a series of computational analysis and perceptive learning, which outputs the detected objects in the scene and their relations, while the overall description and comprehension of the scene as well as the comprehensive image semantic description. All in all, image content detection and classification not only include the overall knowledge of an image, but also provide the context under which the objects appear on it and thus lay the foundation of further comprehension, which is widely applicable to many aspects. When application is considered, image classification techniques are nowadays potentially applicable to a variety of areas, such as image and video retrieval, computer vision and so on. Image retrieval [2] based on content is the simplest and most direct application of object detection which can provide effective aids and evidence for image information retrieval and procession. With the popularization of electronic digital cameras, the number of digital images are increasing astonishingly and comprehension based on objects is helpful to efficiently organize and browse database, so the result of object detection is valuable to image retrieval. Therefore, image classification and object detection have a promising application perspective. Apart from the application on computer sciences such as image engineering and artificial intelligence, its research products can be applied to studies on human visual system and its mechanism, the psychology and physiology of human brain and so on. With the development of interdisciplinary basic research and the improvement of computer performance, image comprehension will be widely used in more complicated 270 application. Image classification needs different kinds of features to describe the image contents. Such classification methods based on bottom features have been studied for years in the area of image and video retrieval. These works usually perform supervised learning via images features such as colors, textures and boundaries, and thus sort the images into different semantic classes. The color [3] is an important feature of images and one of the most widely used features in image retrieval. It is usually highly emphasized and deeply studied. Compared to geometric feature, the color is more stable and less sensitive to the size and the orientation. In many cases, it is the simplest feature to describe an image. Color histogram is a widely used color feature in many studies on image content detection. The values in color histogram, measured via statistics, show the numerical features about colors in the image and reflect their statistical distribution and the basic hues. The histogram only contains the frequency that a certain color appears, but leaves out the spatial information of a pixel. Each image corresponds to a unique histogram, but different images may have the same color distribution and therefore the same histogram. So there is a one-to-many relation between histograms and images. Traditional color histogram only depicted the ratio of the number of pixels of a certain color to that of all pixels, which is only a global statistical relation. On the other hand, color correlogram describes the distribution of colors related to distances, which reflects the spatial relations between pairs of pixels and the distribution relations between local and global pixels. It is easy to calculate, restricted in range and well-performed, so some researches use it as the key feature for describing image content. The texture is also an important visual feature for describing the homogeneity of images [4]. It is used to depict the smoothness, coarseness and arrangement of images and is not uniformly defined currently. It is essentially the description of the spatial distribution of pixels in neighboring grey space. The methods of texture description can be divided into four classes: statistical, structural, modelling and frequency spectral. Textures are often shown as locally irregular and globally regular features, such as the highly textured region on a tree and the vertical or horizontal boundary information of a city. The texture reflects the structural arrangement on the surface of an object and its relation with the surrounding environment, which is also widely applied in content based image retrieval. In the area of object detection, sometimes global features such as colors and textures can not effectively detect objects of the same kind. Objects with the same semantic may have different colors, such as cars with various colors. It is the same with cars of different textures. Therefore, the shape has been paid more and more attention. It is typically local feature that depict the shapes of objects in an image and generally are extracted from the corners in the image, which keep important information of the objects. And the features will not be influenced by light and have important properties such as spatial invariance and rotational invariance. © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Due to the low accuracy of image object detection based on global features, researchers have changed the focus of research to local features of images recently. There are three kinds of local features based on points, boundaries and regions but most researches today focus on those based on points. The extraction of local features based on points is generally divided into two steps: 1) key point detection and 2) generation of feature descriptor. Harris Corner Detector is a widely used method of key point detection based on the eigenvalues of a two-order matrix. However, it is not scale invariant. Lindeberg forwarded the concept of automatic scale selection to detect key points on the specific scale of the image. He used Laplacian method of the determinant and trace of Hessian matrix to detect the spotted structure in the image. Mikolajczyk [5] etc. improved this method by proposing key points detector with robustness and scale invariance: Harris-Laplace and Hessian-Laplace. They used Harris method or the trace of Hessian matrix to select locations and Laplacian method to select scales. Lowe [6] employed a method similar to LOG operator, i.e. Difference of Gaussians (DOG), to improve the detection rate. Bay etc employed fast Hessian matrix for key points detection and further improved the detection rate. Moment invariants and phased-based local features etc. are the early feature descriptors, whose performances are not satisfying. In later studies of descriptors, Lowe proposed the famous scale invariant feature transformation description. SIFT is proved to be the best through literature. SIFT has many variants such as PCASIFT [7], GLOH and so on, but their detective performances are not as good as SIFT. Bay etc. proposed Speeded-up Robust Features (SURF) descriptor [8], which describes Harris-wavelet responses with the key point region. Although the detective performance of SURF is slightly worse than SIFT, but it’s much faster than the latter. SIFT and SURF are the most widely used local features in researches on image content detection. Bag of Visual Words model [9] is the most famous image classification method, which is derived from Bag of Words model in text retrieval. Recently, Bag of Visual Words model is extensively applied to quantitative local features for image description and its performance is good. However, it has two main limitations: one is that this model leaves out the spatial information of images, i.e. each block in an image is related to a visual word in the word library, but its location in this image is neglected; the other is the method of presenting an image block by one or several approximated visual words, which is not accurate for image classification. Lazebnik etc. proposed Spatial Pyramid Matching (SPM) [10] algorithm to address the spatial limitation of Bag of Visual Words model. This method divides an image into several regions along three scales, and intertwines Bag of Visual Words model with the local features of each regions, which in a way adds spatial information. Softweighting method searches for several nearest words and reduces greatly the increased value on each word, which addresses the second limitation. However, the problems JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 such as vocabulary generation and features coding still confine the performance of image classification. In the area of multi-kernel learning [11], many researchers have applied this model to a variety of algorithms, especially in the area of image object detection. Bosch etc. described the shapes of objects in a multi-kernel way under the frame of pyramid. Lampert etc. used multi-kernel method to automatically obtain a strategy based sparse depending graph of a related object class, which realized multi-object associative detection and improved object detection rate. Considering the strong distinguish ability of sparse classifier from multikernel linear combination, Damoulas etc. performed fast solution by combining multi object descriptors in feature spaces. With the development of SVM theory, more attention is paid to kernel method. It is an effective method to solve problems in non-linear mode analysis. However, a single kernel function often cannot meet complicated application requirements for example image classification and object recognition. It is also proved that multi-kernel model performs better than sole-kernel models or their combination. Multi-kernel model is a sort of kernel based on learning which is more flexible. The paper proposes a weighed multi-kernel function, which is used in image classification. Due to the weighted multi-kernel learning, kernel function parameters can be better adjusted according to images from different classes and the simple BOVW histogram is substituted by pyramid histogram of visual words (PHOW), which adds the ability of distinguishing spatial distribution to the former. In this article, we research the popular algorithms in the area of image classification and object recognition. And present an image classification algorithm based on BOVW models and multi-kernel. For feature extraction, we employ D-SIFT which is robust, efficient and has more extraction speed compared to traditional methods. For feature coding, we using Bag of Words model and Spatial Pyramid model, which is state-of-the-art method in the fields. For classifier, we are the first to forward the weighed multi-kernel function. This function has outperformed classification performance among multikernel learning classifier based on Support Vector Machine (SVM). The effectiveness of the methods in this article is proved by experiments. II. RELATED WORKS A. SIFT Feature In content based image classification, the principle basis is the contents of the image. Results of classification are given based on the similarity of image contents, and image contents are described via image features. The extraction of visual features is the first step to image classification and the basis of image content analysis. It exists in all processing procedures in image analysis and influences directly the ability of describing image. Therefore, it makes a huge difference to the quality of further analysis and the effectiveness of application systems. © 2014 ACADEMY PUBLISHER 271 SIFT operator is an image local feature descriptor forwarded by David G Lowe in 2004. It is one of the most popular local features, based on scale space and invariant to scaling, rotation and even affine transformation. Firstly, SIFT algorithm detects features in the scale space and confirms the location and scale of key points. Then, it sets the direction of gradient as the direction of the point. Thus the scale and direction invariance of the operator are realized. SIFT is a local feature, which is invariant to rotation, scaling and change of light and stable in a certain extent of changes in visual angle, affine transformation and noise. It ensures specificity and abundance, so it is applicable to fast and accurate matching among mass feature data. Its large quantity ensures that even a few objects can generate a number of SIFT features, high-speed satisfies the requirement of real-time, and extensibility makes it easy to combine with other feature vectors. For an image, the general algorithm to calculate its SIFT feature vector has four steps: (1) The detection of extreme values in scale space to tentatively determine the locations and scales of key points. During this process, the candidate pixel need to be compared with 26 pixels, which are 8 neighboring pixels in the same scale and 9×2 neighboring pixels around the corresponding position of adjacent scales. (2) Accurately determine the locations and scales of key points via fitting three dimensional quadratic functions, meanwhile delete the low-contrast key points and unstable skirt response points (for DOG algorithm will generate strong skirt responses). (3) Set the direction parameters for each key point via the direction of gradient of its neighboring pixels to ensure the rotation invariance of the operator. Actually, the algorithm samples in the window centered at the key point and calculate the direction of gradient in the neighboring area via histogram. A key point may be assigned to several directions (one principal and more than one auxiliary), which can increase the robustness of matching. Up to now, the detection of key points is completed. Each key point has three parameters: location, scale and direction. Thus an SIFT feature region can be determined. (4) Generation of SIFT feature vector. First of all, rotate the axis to the direction of key point to ensure rotation invariance. In actual calculation, Lowe suggests to describe each key point using 4×4 seed points to increase the stability of matching. Thus, 128 data points, i.e. a 128-dimensional SIFT vector, are generated for one key point. Now SIFT vector is free from the influence of geometric transformations such as scale changes and rotation. Normalize the length of the feature vector, and the influence of light is eliminated. B. Bag of Visual Words Model With the widely application of local features in computer vision, more attention is placed on methods of local feature based image classification. When extracting local features, the number of key point varies in different images, so machine leaning is infeasible. To overcome these difficulties, researchers such as Li-feifei from 272 Stanford University were the first to phase Bag of Words model into computer image process as a sort of features [12]. Using Bag of Words model in image classification not only solves the problem brought by the disunity of local features, but also brings the advantages of easy expression. Now the method is extensively used in image classification and retrieval [13]. The main steps are as following: (1) Detect key points though image division or random sampling etc. (2) Extract the local features (SIFT) of the image and generate the descriptor. (3) Cluster these feature related descriptor (usually via K-means) and generate visual vocabulary, in which each clustering center is a visual word. (4) Summarize the frequency of each visual word in a histogram. Images are presented only by the frequency of visual words, which avoids complicated calculation during matching of image local features and shows obvious superiority in image classification with a large number of classes and requiring a lot of training. Despite the effectiveness of image classification based on Bag of Words model, the accuracy of visual vocabulary directly influences the precision of classification and the size of vocabulary (i.e. the number of clusters) can only be adjusted empirically by experiments. In addition, Bag of Words model leaves out spatial relations of local features and loses some important information, which causes the incompleteness of visual vocabulary and poor results. C. SVM and Multi-Kernel Learning Method Support Vector Machine (SVM) was a major achievement in machine learning proposed by Corte and Vapnik in 1995 [14]. It was developed from VC dimension theory and structural risk minimization in statistical learning, rather than empirical risk minimization in traditional statistics. The excellence of SVM is its ability to search for the optimal tradeoff between complicated model and learning ability to reach the best extensibility based on limited sample information. With the development of researches, multi-kernel learning has become a new focus in machine learning. The so-called kernel method is effective to solve problems in non-linear mode analysis. However, in some complicated situations, sole-kernel machine cannot meet various and ever-changing application requirements, such as data isomerism and irregularity, large size of samples and uneven sample distribution. Therefore, it is an inevitable choice to combine multiple kernel functions for better results. In addition, up to now there is no complete theory about the construction and selection of kernel functions. Moreover, when facing sample isomerism, large sample, irregular high-dimensional data or uneven data distribution in high-dimensional feature space, it is inappropriate to employ a simple kernel to map all samples. To solve these problems, there are a large number of recent researches on kernel combination, i.e. multi-kernel learning. Multi-kernel model is a sort of kernel based learning which is more flexible. Recently the interpretability of © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 substitution of sole kernel by multi-kernel has been proved by both theories and applications. It is also proved that multi-kernel model performs better than sole-kernel models or their combination. When constructing multikernel model, the simplest and most common method is to consider the convex combination if basic kernel functions, as: M M j 1 j 1 K j k j 0, j 1 (1) In this formula, kj is a basic kernel function, M is the total number of basic functions, βj is the weighing factor. Therefore, under the frame of multi-kernel function, the problem of presenting samples in the feature space is converted to the selection of basic kernels and their weighs. In this combined space constructed from multiple spaces, the selection of kernels as well as parameters and models related kernel target alignment (KTA) is addressed successfully because the feature mapping ability of every kernel is utilized. Multi-kernel learning overcomes the shortcomings in sole-kernel function, and has become a focus in machine learning. III. IMAGE CLASSIFICATION BASED ON MULTI-KERNEL In this article, images are presented by Dense Scaleinvariant Feature Transformation (D-SIFT) combined with Bag of Words model. Here BOVW word library is a visual word library constructed on the basis of D-SIFT. Related library is trained according to each image semantic to get the proper description. Then the features are organized via Spatial Pyramid. Next the results are given by the classifier in this article, which is generated from the combination of general kernel and multi-kernel learning. This method can commendably extract from features the spatial information contained in the semantic and optimize the parameter selection in kernel functions. The experiments are tested on Caltech-101 image dataset and include the comparisons on operational speed, size of Bag of Words and kernel functions. The final results show that this classification method based on general kernel function is effective in image classification and performs better than present algorithms of the same kind. A. Feature Extraction and Organization The algorithm uses D-SIFT feature extracted from grids. It is similar in properties with SIFT feature, except for the key point detection method during feature extraction. During key point detection, in SIFT the first step is to detect key point in scale space, which is usually Gaussian Feature Space, then the location and scale of a key points are determined, and finally the direction of a key point is set as the principal direction of gradient in its neighboring region, thus the scale and direction invariance of the operator are realized. However, a large amount of calculation is involved in this process, and plenty of time is spent on searching and comparison during Gaussian difference space calculation and extreme value detection in Gaussian difference space. These calculations are costly in situations with low scale and direction invariance. For example, when classifying JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Caltech-101 images, the images in this dataset are preprocessed so that objects are rotated to the right orientation. D-SIFT algorithm has two important features. First of all, it is free from extreme detection in Gaussian difference space because it is extracted from grids, so the algorithm can skip a time-consuming step during calculation. Secondary, rotational normalization is no longer needed owing to the lack of extreme detection. Thus it is free of rotational calculation during direction extraction, and only operations on the proper grids in the original images are needed. Generally, when extraction D-SIFT descriptor, features are calculated on grids separated by M pixels (M is typically 5 to 15) and calculations are performed on several values respectively. For each SIFT grid, extract SIFT feature in the circle block centered on the grid with the radius of r pixels (r is typically 4-16). Similar to normal SIFT, a 128-dimensioanl SIFT feature is generated. SIFT is a local feature, which is invariant to rotation, scaling and change of light and stable in a certain extent of changes in visual angle, affine transformation and noise. It ensures specificity and abundance, so it is applicable to fast and accurate matching among mass feature data. As a variant of SIFT, D-SIFT can greatly increase the efficiency as well as maintain the former invariance. In traditional SIFT algorithm, massive features will be extracted from each image after key point detection in Gaussian feature space. In D-SIFT, although key point detection is not needed and feature extraction is carried out according to fixed intervals and scales, there are still a large number of SIFT features in an image, which are even munch more than traditional SIFT algorithm. The organization of these features is critical for the following procedures such as machine learning and classification. Bag of Words model at first appeared in text detection and have achieved great success in text processing. Probabilistic Latent Semantic Analysis model mines the concentrated theme from the text via non-supervise methods, i.e. it can extracts semantic features from bottom. Bag of Words model neglects the connections and relative positions of features. Although this results in the loss of some information, but it makes model construction convenient and fast. Traditional neighborhood feature extraction techniques in images and videos mainly focus on the global distributions of colors, textures, etc. from the bottom layer, such as color histogram and Gabor Filter. For a specific object, always only one feature vector is generated, and Bag of Words model is not necessary in such application. However, recent works have showed that global features alone cannot reflect some detailed features of images or videos. So more and more researchers have proposed kinds of local features, such as SIFT. This feature descriptor of key points are effective in local region matching, but when applied to global classification, the weak coupled features of each key points cannot effectively represent the entire image or video. Therefore researchers have phased Bag of Words model from text classification into image description. The analysis of the relation between © 2014 ACADEMY PUBLISHER 273 text classification and image classification is helpful to adapt all kinds of mature methods in the former to the latter. Comparing text classification to image classification, we assume that an image contains several visual words, similar to a text containing several text words. The values of key points in an image contain abundant local information. A visual word is similar to a word in text detection. Clustering these features into groups so that the difference between two groups is obvious, and the clustering center of each group is a visual word. In other images, group the extracted local features according to the distance of words and a specific feature vector of an image is generated based on a particular group of words. Such descriptive method is suitable to work with linear classifiers such as SVM. In this method, we at first summarize D-SIFT features formerly extracted, and then obtain the centers of Bag of Words via K-means, which reflect the spatial aggregation of D-SIFT features and meanwhile serve as the Bag of Words basis for training and test samples. According to the algorithm in the article, the image features are shown as the histogram vector of these Bags of Words. B. Kernel Function and Classifier Designing With the development of SVM theory, more attention is paid to kernel method. It is an effective method to solve problems in non-linear mode analysis. However, a single kernel function often cannot meet complicated application requirements. Thus more people have started to combine multiple kernel functions and multi-kernel learning method has become a new focus in machine learning. Multi-kernel model is a sort of kernel based learning which is more flexible. Recently the interpretability of substitution of sole kernel by multikernel has been proved by both theories and applications. It is also proved that multi-kernel model performs better than sole-kernel models or their combination. Kernel learning can effectively solve the problems of classification, regression and so on, and it has greatly improved the performance of classifier. When constructing multi-kernel model, the simplest and most common method is to consider the convex combination if basic kernel functions, as: F k ( x, y ) m k m ( x, y ) (2) m 1 In this formula, km(x, y) is a basic kernel function, F is the total number of basic functions, βm is the related weighing factor and the object to be optimized. This optimization process can be constructed via Lagrange function. Multi-kernel learning automatically works out the combination of kernel function during the training stage. It can optimize combinatory parameters of kernel function in SVM. First of all, extracting features from the input data. Then perform spatial transformation to these features by mapping them to the kernel function space, which is the same as in traditional SVM kernel function. The third step is to summarize all former features by combinatory parameters β1, β2…βM and get the combined 274 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 kernel through linear combination. At last, classification or regression is complete by classifier and the final result is given. In traditional SVM, the most common kernel function is Radial Basic Kernel function, also called Gaussian Kernel function, which is: n k ( x, y) exp( ( xi yi ) 2 ) (3) i 1 Gaussian kernel function treats each dimension of feature x and y equally and often cannot represent the inner structure of features. Multi-kernel learning theory can solve this problem. For Multi-kernel learning, suppose divide a pyramid feature into m blocks, each of the length L so that n=ML. Here each block corresponds to a block in certain layer of grid in the pyramid. Assign the initial values d1, d2…dm to the blocks, then the following Gaussian Kernel function is obtained: m k ( x, y) di exp( i 1 iL k ( i 1) L 1 ( xk yk ) 2 ) (4) n (5) i 1 n k ( x, y) exp( di ( xi yi )2 ) (6) i 1 Due to the introduction of multi-kernel learning, image classification can better adjust kernel function parameters according to different semantic of images. So in many occasions, the simple BOW histogram is substituted by Pyramid Histogram of Visual Words (PHOW), which added the ability of distinguishing spatial distribution to the former spatial disorder features in the histogram. Meanwhile, the former ordinary kernel function is substituted by corresponding Pyramid matching kernel function during training, and training and test are performed by multi-kernel classifier. The histogram of visual words in this method presents images as the histogram of a series of visual key words, which are extracted from D-SIFT features of training images via Kmeans. Then a series of key words of different resolution are extracted via pyramid method to get the structural features of the images. In pyramid expression, an image is presented in several layers, each containing some feature blocks. Therein, the feature block of the 0th layer is the image itself, and in the latter layers until the Lth layer, each block of the previous one is divided into four non-overlapping parts. At last, join the features of each block together as the final descriptor. In pyramid model, the feature of the 0th layer is presented by a vdimensional vector, corresponding to V blocks in the histogram, then that of the 1th layer is presented by a 4vdimensional vector, and so forth. For the PHOW descriptor of the Lth layer, the dimension of feature vector © 2014 ACADEMY PUBLISHER i 0 sparse blocks at the bottom are assigned with larger weights in Pyramid matching kernel function, and the dense blocks at the top with smaller ones. Setting Hxl and Hyl as the histogram of x and y in the Ith layer and presenting the number of x and y in the ith block of the histogram by sums, then the total number of matches of the histogram orthogonal kernel in D blocks is: D L( H xl , H yl ) min( H xl (i), H yl (i )) (7) i 1 The matches found in the Ith layer are also found in the I+1th layer, so the number of new matches should be LlLl+1. Simplify L(Hxl, Hyl) as Ll and assign its weight as 1/2L, which is reciprocal to the width of this layer. All in all, the final version of pyramid kernel matching function is: k L ( x, y) In Gaussian Multi-kernel learning, sum of RBF and product of RBF are two common kernel functions, they are shown as following: k ( x, y) di exp(( xi yi ) 2 ) L is V 4i . To better reveal the pyramid features, the 1 0 L 1 L L l 1 Ll 2L i 1 2 (8) By now, this article has proposed a generalized Gaussian Combinatory kernel function on the basis of present kernel function, according to the properties of multi-kernel functions and the distinguish ability of pyramid features in image spatial information. This method provides traditional pyramid kernel function with fixed weight distribution and obtains the integration parameters of each part automatically via multi-kernel learning. The kernel function in Formula (4) has more parameters than traditional function, but it leaves out the inner structure of features and is determined only via the relations between blocks. In Formula (5) and (6), the weight of each dimension in the feature in considered in the kernel function, but the structure of blocks are neglected. Integrating the advantages of both kernel functions, we have proposed a generalized Gaussian combinatory kernel function. It comprehensively takes block relations and inner structures into consideration, and the integration parameters are given automatically by multi-kernel learning classifier. In this function, n+m parameters are taken for optimization. Therein, d1, d2…dn are the weights among each feature block, and dn+1, dn+2…dn+m are those between different blocks. The function is shown as: m k ( x, y) d n i exp( i ( xi yi )2 ) (9) i 1 As shown in the above formula, this kernel function essentially combined Gaussian Sum and Gaussian Product. Meanwhile, it takes the inner structure of features into consideration and distinguishes geometric presentations of images via blocks. The function has simplified calculation as well as observes Mercer Condition. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 C. Image Classification Algorithm in This Article In this section, we will introduce the overall framework of image classification system. In this framework, we extract D-SIFT feature from an image, organize it via BOW method and obtain the final blocked histogram descriptor via Spatial Pyramid model. During the training stage, generalized Gaussian Combinatory Kernel function is employed and combined with Gaussian Multi-kernel learning classifier for classification. The procedure of the algorithm is: 1. Divide an image into grids and extract D-SIFT feature; 2. Obtain the vocabulary via K-means training; 3. Organize the statistical histogram of D-SIFT by Spatial Pyramid model; 4. Process the former features via generalized Gaussian Combinatory Kernel function; 5. Use GMKL as classifier, optimize kernel function parameters and obtain the final classifier. In this method, the first step is to extract D-SIFT feature. Compared to traditional SIFT feature, it is free from key point detection and grids are drawn as extraction regions, which is more efficient. During DSIFT extraction of our experiments, the sizes of grids are set to 4, 8, 12 and 16 pixels, increasing 10 pixels each time. Then, Bag of Words method is applied. In this method, all previous image features are clustered via Kmeans to get the center of every cluster. There are c=300 centers, so in the end we obtain the feature vocabulary with the length of 300. After the generation of the vocabulary, we organize the features via Spatial Pyramid model mentioned previously and assign corresponding weights so that the histograms of large blocks are assigned with large weights and that of small blocks with small weights. In the experiments, set L to 2 so there are 3 layers and 21 feature blocks during classification. Next, process the spatial pyramid histograms previously generated via generalized Gaussian Combinatory Kernel function. The parameters are undefined, so the calculated kernel function needs optimization, which is combined with GMKL classifier. Optimize the selected kernel function by gradient descent algorithm step by step, and finally obtain the optimal solution and corresponding SVM model. Up to now, the training process is completed. The feature extraction step is the same in testing process, and the same vocabulary is used in BOW feature summarization. For different semantic, use different kernel function parameters and SVM model for judgment and get the final results. IV. STIMULATORY EXPERIEMNTS AND ANALYSIS The dataset used in these experiments is Caltech-101 collected by Professor feifei Li from Princeton University in 2003. It contains 101 groups of objects, each consists of 31 to 800 images and the resolution of most images is 300×200 pixels. This dataset features in big intergroup difference and is used by many researchers to test the effectiveness their algorithms. In experiments, we analyze the time consumption of our algorithm at first. Then, we test on the size of vocabulary and pick out a © 2014 ACADEMY PUBLISHER 275 proper size. Next, we compare the combinatory kernel function we have proposed with the original one. Finally, we test our algorithm on the entire Caltech-101 dataset, selecting respectively 15 and 30 images from each group for training and conducting the test. In the experiments, we extract features via the opensource library Vlfeat [15]. It is an open-source image processing library established by Andrea Vedaldi and Brian Fulkerson and contains some common computer vision algorithms such as SIFT, MSER, K-means and so on. The library is realized in C and MATLAB, C Language is more efficient and MATLAB more convenient. Vlfeat 0.9.9 is used in the experiment and we mainly use SIFT algorithm realized in MATLAB and Kmeans algorithm for clustering. As mentioned before, we select GMKL (Generalized Multiple Kernel Learning) open-source library written by Marnik Varma as the multi-kernel classifier for classification learning. This library is realized in MATLAB and consists of two files. The most import part of the algorithm is obtaining the optimal kernel function by gradient projection descent method. It is called by another top-layer file, which contains some kernel functions, such as Sum of Gaussian Kernel function, Product of Gaussian Kernel Function, Sum of Recomputed Kernel function, Product of Exponential Kernel of Recomputed Distance Matrix. We add the selfdesigned kernel function for better results. The libraries used in this experiment are coded in MATLAB and provided with interfaces, so we conduct the entire experiment in MATLAB Version 7.12.0.635 (R2011a). A. Calculating Speed Analysis We compare our kernel function with existing ones in same conditions. In this experiment, the CPU is Intel(R) Core(TM) i5-2410M with dual cores of 2.30-2.80GHz, the Memory is 8.00GB and the OS is Windows 7 Ultimate. First of all, we measure and compare the training duration of every group of images, and average them to get the following data: TABLE I. Kernel function GGC Sum of RBF Product of RBF TIME CONSUMPTIONS OF DIFFERENT ALGORITHMS Training time(s) 63.4 45.5 43.7 This tables shows that the time consumption of Sum of RBF and Product of RBF is nearly the same, they are 45.5s and 43.7s respectively, while that of GGC, 63.4s, is slightly higher than the former two but still in the same level. To improve the accuracy, the algorithm proposed in this article includes more weighting factors. It is shown in Formula 12 that it has more weighting factors than other two algorithms and the exceeding parameters are those for feature blocks. The first two algorithms are merely the exchange of addition and multiplication and the there is no need of additional operation when calculating kernel functions and gradients, so it consumes more time than the former. Even though, the time consumptions of these algorithms are at the same level and stay stable to the 276 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 introduction of more calculations. Because their complexity is on the same level, we will compare their effectiveness from accuracy of classification. B. Relationship Between Size of Vocabulary and Accuracy Randomly select some image from the dataset and calculate the D-SIFT feature vectors of all key points. Cluster these vectors via k-means and get the clustering centers as words. Each cluster center is a word, and the size of Bag of Words is determined by the number of Kmeans clusters. The number of words has a huge influence on the accuracy of final results, so in this section we focus on the selection of proper size of vocabulary. In this experiment, we randomly pick two images from each of the first 10 groups in Caltech-101 for feature extraction. We test in small dataset and sum up the D-SIFT features of these images in all scales as the input for K-means clustering. Employing the classification framework proposed in this article to different sizes, we test six different sizes of vocabularies (50, 75, 150, 300, 500 and 800) and observe the influence of size to the accuracy of results. TABLE II. THE RELATION BETWEEN SIZE OF VOCABULARY AND AVERAGE ACCURACY Size of Vocabulary 50 70 150 300 500 800 Average accuracy 84.65% 84.89% 85.77% 88.23% 87.60% 87.41% The above table shows that the accuracy of classification varies with the change in vocabulary size. When the size is small, the accuracy increases as the size increases; when the size is large, the accuracy decrease as the size increases. In this change that first increases then decreases, it is shown that when the size reaches 300, we can get the maximum accuracy of 88.23%. The data show that generally speaking, the larger the vocabulary is, the longer the histogram becomes, which will increase the amount of calculation and slow down the operation. Meanwhile, over large vocabulary will cause the over dense of clustering centers and assign the same sort of key points into different groups, i.e. different words, thus the images will not be well presented. In contrast, too small vocabulary size will cause under fitting that many features are not well separated but rather grouped in to one BOVW block, which will influence the accuracy of classification. Therefore, we select 300 as the size of vocabulary to trade off efficiency and accuracy and reach the optimal classification results without too large amount of calculation. The data show that only when the size of vocabulary is 300, the accuracy is 88.23%, which is over 88%. The accuracies under other vocabulary sizes are all less than 88%. C. Comparing with the Existing Kernel Function We compare GCC kernel proposed in this article with Sum of Gaussian Kernel and Product of Gaussian Kernel using the same overall framework and features. In this © 2014 ACADEMY PUBLISHER experiment, we select the first 10 groups in Caltech-101 for comparison and focus on the groups on which our kernel function has better optimization and classification results. These 10 groups are: Background Google, Faces, Faces Easy, Leopards, Motorbikes, Accordion, Airplanes, Anchors, Ants and Barrels. The results are shown in Table 3: TABLE III. THE AVERAGE CLASSIFICATION ACCURACIES OF THREE METHODS (%) Kinds of classifier Background Google Faces Faces_easy Leopards Motorbikes, Accordion Airplanes Anchors Ants Barrels GGC 69.4 81.2 90 95 92.4 99.4 95.8 85 83.5 87.6 Sum of RBF 74 81.3 86.3 94.3 87.2 98.6 91.7 87.4 83.5 87.6 Product of RBF 74.2 80.3 86.7 94.3 87.4 98.6 91.7 87.4 83.5 87.6 The above table shows that the accuracies of kernel function in proposed in this article is maintained in many groups, which proves that this method can maintain the effectiveness of traditional methods (Sum of Gaussian and Product of Gaussian). On the other hand, its accuracies have been improved in many groups. Regarding Faces Easy, Motorbikes and Airplanes, experimental data show that our method of GGC Kernel function has increased the accuracies greatly from86.3%(86.7%) and87.2%(87.4%) to 92.4%, 91.7%(91.7%) to 95.8%. After observation on the three groups of images, we can discover that the common feature of them is that the objects in those images remain at a certain position. In these cases, our kernel function has certain advantages in region matching due to the combination with pyramid model. So generalized Gaussian Combinatory Kernel function has exceeding advantage when deal with such problems. For other groups, the results of different kernel functions are basically the same, except for the first group Background Google, whose result of our method is slightly worse than other two methods. Nevertheless, this group is typically selected as reference and barely has classification value. As to the overall accuracy, GGC is 87.79%, which is higher than other two. The accuracy of Sum of Gaussian is 86.97% and that of Product of Gaussian is 86.91%. D. Comparing with the Existing Image Classification Algorithm Many researchers use Caltech-101 as the testing dataset of their algorithms, so we can conveniently compare our algorithm to others. In [16] the author designed an image presentation method with high discovery rate and robustness, which integrated a number of shape, color and texture feature. In the article, a variety of classification models were compared, including basic methods and some multi-kernel learning methods. This method was aimed at searching for the combinations of different training data features, in which Boosting reached the best result. In [12] the extraction of middle layer features is divided in two steps, i.e. coding and JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 pooling. In the article, the combination of several methods was tested regarding the two parts, for example Hard Vector Quantization, Soft Vector Quantization, Sparse Coding, Average Pooling and Maximum Pooling. The best result was reached under the combination of Sparse Coding and Maximum Pooling. In [13] a method derived from Spatial Pyramid Matching was proposed. It combined the Spatial Pyramid of images with the Sparse Coding presenting the SIFT vector, which can eliminate the base limitation of vectorization. We have compared our method to those with best results in researches and proved the effectiveness of our method. In this experiment, we select respectively 15 and 30 images from each group of Caltech-101 dataset, calculate the average accuracy and compare it with other method. The results are shown as follows: TABLE IV. Algorithms LP- Sparse ScSPM GGC 277 [3] [4] [5] [6] [7] THE COMPARISON OF AVERAGE DETECTION RATES 15 images per class 71% 73.3% 70.8% 81.9% V. 30 images per class 78% 75.4% 73.2% 83.6% [8] [9] CONCLUSIONS In this article we have propose an image classification algorithm based on Bag of Visual Words model and Multi-kernel learning. It is relatively efficient during classification and can well present the spatial information contained in Spatial Pyramid features. We use D-SIFT feature as an example to construct image word vocabulary and form Bag of Words to describe the images. It has been proved by experiments that our algorithm is not only highly efficient, but also more accurate than previous algorithm during detection. [10] [11] [12] ACKNOWLEDGEMENT The word was supported by the Foundation of National Natural Science of China (30972361), Zhejiang province department of major projects (2011C12047), Zhejiang province natural science foundation of China (Y5110145). REFERENCES [1] Datta R, Joshi D, Li J, et al, “Image retrieval: Ideas, influences, and trends of the new age”, ACM Computing Surveys, vol. 40, no. 2, pp. 1-60, 2008. [2] Rui Y, Huang T S, Chang S F, “Image retrieval: Current techniques, promising directions, and open issues”, © 2014 ACADEMY PUBLISHER [13] [14] [15] [16] Journal of Visual Communication and Image Representation, vol. 10, no. 1, pp. 39-62, 1999. Van de Sande, K. E. A, Gevers, T. & Snoek, C. G. M. Evaluation of color descriptors for object and scene recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2008. Frances J. M., Meiri A. Z., Porat B. A., Unified texture model based on a 2D world-like decomposition. In: IEEE Trans. On Signal Processing, vol. 41, no. 8, pp. 2995-2687, 1993. Kristina Mikolajczyk, Cordelia Schmid. “A Performance Evaluation of Local Descriptors”, IEEE Trans. on Pattern Analysis and Machine Intelligence (S0162-8828), vol. 27, no. 10, pp. 1615-1630, 2005. Lowe D. G. Distinctive image features from scale-invariant keypoints, In: Proceedings of International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004. Ke Y. Sukthankar R. PCA-SIFT. A more distinctive representation of local image descriptors. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Washington DC, USA, pp. 506~513, 2004. Viola P, Jones M J. Robust Real-Time Face Detection, International Journal of Computer Vision, vol. 57, no. 2, pp. 137 -154, 2004. G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray Visual categorization with bags of keypoints, In workshop on Statistical Learning in Computer Vision, ECCV, pp, 1-22, 2004. Lazebnik, S., Schmid, C., Ponce. J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, Proceedings of the IEEE Computer society Conference on Computer Vision and Pattern Recognition, pp. 2169-2178, 2006. Lanckriet, G. R. G., Cristianini, N., Bartlett, P. learning the kernel matrix with semidefinite programming, In: Journal of Machine Learning Research, JMLR. Org, 2004, 5. LI. Fei-fei, FERGUS R, PERONA P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories, In IEEE Conference on Computer Vision and Pattern Recognition, 2004. E. Nowak, F. Jurie, B. Triggs. Sampling strategies for bagof-features image classification, In Proceedings of European Conference on Computer Vision, pp. 490-503, 2006. Vanpik, V. N., The Nature of Statistical Learning Theory. Springer Verlag, New York. 2000. Vedaldi, A., and Fulkerson, B. VLFeat: An open and portable library of computer vision algorithms. http://www. vlfeat. org/, 2010. Gehler, P., Nowozin, S. On feature combination for multiclass object classification, In: 2009 IEEE 12th International Conference on Computer Vision, pp. 221-228, 2009. 278 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Clustering Files with Extended File Attributes in Metadata Lin Han 1, Hao Huang 2*, Changsheng Xie 2, and Wei Wang 1 1. School of Computer Science & Technology/Huazhong University of Science & Technology, Wuhan, P. R. China 2. Wuhan National Laboratory for Optoelectronics/Huazhong University of Science & Technology, Wuhan, P. R. China *Corresponding Author, Email: [email protected], {thao, cs_xie}@hust.edu.cn, [email protected] Abstract—Classification and searching play an important role in modern file systems and file clustering is an effective approach to do this. This paper presents a new labeling system by making use of the Extended File Attributes [1] of file system, and a simple file clustering algorithm based on this labeling system is also introduced. By regarding attributes and attribute-value pairs as labels of files, features of a file can be represented as binary vectors of labels. And some well-known binary vector dissimilarity measures can be performed on this binary vector space, so clustering based on these measures can be done also. This approach is evaluated with several real-life datasets, and results indicate that precise clustering of files is achieved at an acceptable cost. Index Terms—File Clustering; Extended File Attributes; File System; Binary Vector; Dissimilarity Measure I. INTRODUCTION The cost of storage devices was decreased dramatically in recent years, and the highly extendable network storage services, such as cloud storage services are becoming more and more popular today. It's common to find a PC with TBs of local storage and also TBs of network storage attached. An individual can easily access a massive storage space which was only available in mainframe computers 10 years ago, and have millions of documents, pictures, audio and video files stored in it. This leads to an increasing requirement of classification and searching services in modern file system. Because traditional directory based hierarchical file system was not capable to organize more than millions of files efficiently. People will easily forget the actual path of a file which was saved months ago, unless the names of files and directories which included these files are carefully designed. So modern file systems provide classification and searching functions more or less, but they are usually very simple, only basic functions are built-in in these file systems, such as searching by file name, type and modification time, etc. For example, indexing and searching services in most modern operating systems, such as Windows and Linux, will index and search files by their file name, file type suffix and last modification time, some higher version of these operating systems will even index full text of all text based files. But for those digital media files, which usually occupied most space of a file system, they can do © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.278-285 nothing more, because it is very hard to extract semantics from digital media data. Some sophisticated indexing and searching systems were built to solve this problem, but they usually rely on extended database or specified file formats. For example, some popular digital audio player software include a media library function, which provide indexing and searching service on all digital audio files in file system, such as MP3, WMA and OGG files. This audio file indexing and searching service usually rely on information extracted from certain tags in head of specified audio file format. These tags enhance semantics of digital media files and make them easier to be indexed and searched. Chong-Jae Yoo and Ok-Ran Jeong proposed a categorizing method for searching multimedia files effectively while applying the most typical multimedia file called podcast file [2]. Jiayi Pan and Chimay J. Anumba presented a semantic-discovery method of construction project by adopting semantic web technologies, including extensible markup language (XML), ontology, and logic rules [3]. This is proved to be helpful to manage tremendous amount of documents in a construction project, and provide semantic based searching interface. All these system need specified file formats and external descriptive files to store and extract semantics. Some recent researches are trying to improve indexing and searching performance by implementing semantic-aware metadata in new types of file systems. Yu Hua and Hong Jiang proposed a semantic-aware metadata organization paradigm in next-generation file systems [4], and performance evaluation shows that it has a promising future. But as next-generation file systems need years to be adopted by mainstream market, we still need a better solution that can be applied in currently running file systems. This paper introduces an extended labeling system (XLABEL) of files, and it can be applied in any modern file systems which supported Extended File Attributes (XATTR) [1]. Classification and searching functions can be realized in this labeling system by clustering files with the labels in XATTR. XLABEL regards attributes and attribute-value pairs in XATTR as labels of files, so the presence of a certain label in the XATTR of a file is a binary variable, and the features of a file can be represented as a binary vector of labels. Some wellknown binary vector dissimilarity measures can be JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 performed in this binary vector space, such as Jaccard, Dice, Correlation, etc., and clustering based on these measures can be done also. This approach is evaluated with some well-known real life datasets, and proven to be precise to cluster files, although the algorithm is somewhat time-intensive, and future optimization is required. The rest of the paper is structured as follows: Section 2 introduces the labeling system in extended file attributes. Section 3 presents a simple approach to clustering file with the labeling system introduced in section 2 and section 4 shows the evaluation experiments did on the approach and the evaluation results is presented. Section 5 briefly concludes the work of the paper. II. LABELING FILES WITH E XTENDED FILE ATTRIBUTES Classification and searching of data need features extracted from data previously. For files in a file system, properties such as file name, format, length, and creating time are all features of files, and they are usually stored in metadata of files. In most file systems, the metadata of a file is called an "inode". It keeps all basic properties which the operation system and users have to maintain for a file. It's very useful for file system and operation system, but not enough for any meaningful classification and searching operation. Because it lacks of properties of file contents, and when a user wants to classify files or search for a file, it is usually content based. So we need additional content based features to classify and search for files. These features are highly variable, so it's impossible to store them in strictly structured "inode". Many sophisticated indexing systems rely on external database or special format of files to store these content based features. Some modern file systems support a feature called Extended File Attributes (XATTR), which allows user defined properties associated with files. We can create a labeling system by using this feature. And all content based features extracted from files or given by user and user programs can be saved as labels in XATTR. A. Extended File Attributes Extended File Attributes is a file system feature that allows user to attach user defined metadata which not interpreted by the file system, whereas regular metadata or "inodes" of computer files have a strictly defined purpose, such as permissions and modification times, user defined attributes can not be added in it. Extended File Attributes had been supported in some mainstream modern file systems of popular operation system. Such as ext3, ext4 and ReiserFS of Linux, HFS+ of Mac OS X and NTFS of Microsoft Windows. Extended File Attributes are usually constructed by records of attribute-value pair, each attribute name is a null-terminated string, and associated value can be data of variable length, but usually also a null-terminate string. For example, an extended attribute of the author of a file can be expressed as a pair ("author", "John Smith"). © 2014 ACADEMY PUBLISHER 279 B. Labels in XATTR Using keywords is an efficient way to indexing a large amount of files, and offers benefits on classification and searching in a large file system. In traditional file systems, there is no space for user defined keywords except file name [5]. But using file name to save keywords have a lot of limitation. First, it misappropriates the function of file name, which is supposed to be the title of a file. And second, most file systems limited the length of file name, which is usually no more than 256 bytes, it is not enough for a detailed keyword set. TABLE I. FORMAT OF LABELS Keywords Type Category Label(attribute-value pair) John Smith category author ("xlabel.author", "John Smith") romantic standalone tags ("xlabel.tags", "romantic") XATTR in most modern file systems offers more than 4KB storage spaces out of the file content. It's enough for a detailed keyword set which describe the file in various aspects. We created a new simple labeling system which is called "Extended Labels" (XLABEL) in XATTR to keep keywords defined by user or extracted automatically from file content. It makes use of the attribute-value pair structure of XATTR, and classified keywords into two types. One is category keywords, which can be classified into categories, such as keyword "John Smith" in a category "author". The category name will be an attribute name in XATTR, and keywords belongs to this category will be values associated with this attribute name. Another is standalone keywords which can not be classified into any category. It's just one word to describe the content of a file. For example, we can describe the movie "Roman Holiday" with an adjective "romantic". For all keywords of this kind, we associate them with a specified category. We call this kind of keywords "tags", and they will be values of an attribute named "tag". A computer file can have only one instance of each category, but with multiple "tags", and they will be all in the namespace of "xlabel". Each attribute-value pair in XLABEL system is called a "label". Table I shows the representation of category keywords and standalone keywords in the format of labels in XLABEL system. C. Automatic File Labeling Although labeling in metadata of files is helpful to enhance semantic of files, and provide benefit of accurate indexing and searching, how to get proper labels of a file in an easy way is still a key problem in a practical file labeling system. Because the users are usually very lazy, and won't take much time to add labels for a file manually. The system must have the ability to automatically extract features and semantics of a file, and create proper labels according to it. There are several ways to automatically extract features and semantics from a file. First, most files that need to be indexed and searched in a file system are created for editing or viewing, so there must be certain software that will edit or view these files. This software may have the ability to automatically extract features and semantics from the file being edited or viewed. For 280 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 example, a word processor is usually capable of extracting titles and keywords from the text file it is editing, and a picture viewer is usually capable of extracting EXIF information from a digital photo. These extracted features and semantics can be used as labels in XLABEL system. And then, booming social network systems in recent years provided a new aspect of automatic data semantic extraction. When content is posted on social networks, the interactivities about this content from social network users will provide abundant resources about the semantics of this file, and mostly are text based, which can be analyzed more easily and efficiently. These extracted semantics can also be used as labels in XLABEL system. III. APPROACH OF CLUSTERING FILES The labeling of files in XATTR provided the ability of classifying files by categories and searching files by labels or keywords. But in a file system with millions of files, the ability of clustering files automatically and list files which are related in content with the files that the user is currently accessing is necessary. This will help users to find a file from a long list of thousands files without remembering the exact file name and deeply tracing down the hierarchical directories and subdirectories. Unlike the situation of most hierarchical clustering and K-means clustering algorithm [6], clustering files in a file system is without the complete set of vectors and dimension of vectors known previously. Files are continuously created, modified and deleted while the file system is working. Clustering files in a file system is actually clustering feature vectors in a continual data stream [7]. So the number of clusters can hardly been determined before the clustering completed. But a threshold distance can be designated to limit the distance between the vectors in the same cluster, and thus indirectly affect the total number of clusters generated. To insert a feature vector into an existing cluster, we expect that its distance with all other vectors in this cluster is less than the threshold diameter Dth. But directly measure the distance between the new vector and all other existed vectors in the cluster will cause too much calculation. If the cluster size is n, the time complexity of inserting a new vector to a existed cluster is O(n), not to mention multiple cluster may be tried before the right cluster is found, or even no existed cluster is suitable for the vector, and a new cluster have to be created. The cost of inserting a new vector will be unacceptable if the cluster size and file system size are very large. To reduce the time and space complexity of clustering operation, an alternative approximate approach was used. We can find a suitable centroid to represent a cluster, and a proper measure in vector space M. We will be able to determine whether a new vector can be inserted in a cluster by just measuring the distance between the new vector and the centroid of the cluster. The time complexity of this operation is O(1), so a very large file system can be handled efficiently. With this approach, we can not ensure the distances between every two vectors © 2014 ACADEMY PUBLISHER are less than Dth, but by carefully choosing the distance measure of vectors, we can have a good enough approximate clustering result as the strict clustering method with Dth, while the efficiency of the algorithm still maintained. A. Labels of Files as Binary Vectors Clustering files relies on features extracted from files, and the labels in Extended File Attributes can be very useful in file clustering. If we take every label as a feature of the file, we can describe and represent a file with a set of labels. And it will be a subset of the complete set of all labels. Let M be the complete set of all possible labels in XLABEL system, each file in the file system will have a subset of M in its Extended File Attributes. Let NA be the subset of M for file A, we can define the features of file A as a binary vector ZA as in (1) and (2): Z A ( f ( z1 ), f ( z2 ), f ( z3 ),..., f ( zn )),zn M (1) 1,z N A f ( z) 0,z M N A (2) B. Centroid of Cluster The centroid Xc of a finite set of k vectors xi (i {1,2,3,...k}) is defined as the mean of all the points in the set, as illustrated in (3): Xc x1 x2 ... xk k (3) It will minimize the squared Euclidean distances between itself and each point in the set. We can also use this definition in a binary vector space to define the centroid of a cluster. But the original definition of centroid will produce decimal fraction components in the centroid vector. So for calculation convenience of distance between the centroid and other vectors in the cluster, we use an approximate definition of centroid Zc as in (4), (5) and (6). Let Zi be a vector of a cluster C with k vectors in n-dimension binary vector space , And Ij be the unit vector of each dimension: wj 1 k Zi I j k i 1 j {1, 2,3,..., n}; (4) Zc {g (w1 ), g (w2 ), g (w3 ),..., g ( wn )}; (5) 1, g ( w) 0, 1 2 1 w 2 w (6) The centroid must be in vector space , and it have not to be an actual vector in XLABEL system, it can be a phony vector just for calculation and representing the cluster. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 281 C. Measures of Similarity & Dissimilarity Measures of similarity and dissimilarity of binary vectors have been studied for decades, and some measures were created based on binary vector space [8]. And comprehensive researches had also been done on the properties of these measures [9]. Here we briefly introduced some of the most popular measures on binary vector space. TABLE II. Measure MEASURES OF BINARY VECTORS S(X,Y) D(X,Y) Jaccard S11 S11 S10 S01 S10 S01 S11 S10 S01 Dice 2S11 2S11 S10 S01 S10 S01 2S11 S10 S01 S11S00 S10 S01 1 S11S00 S10 S01 2 2 Yule S11S00 S10 S01 S11S00 S10 S01 S10 S01 S11S00 S10 S01 Russell-Rao S11 N n S11 N Sokal-Michener S11 S00 N 2S10 2S01 S11 S00 2S10 2S01 Rogers-Tanimoto S11 S00 S11 S00 2S10 2S01 2S10 2S01 S11 S00 2S10 2S01 Rogers-Tanimoto-a S11 S00 S11 S00 2S10 2S01 2( N S11 S00 ) 2 N S11 S00 Kulzinsky S11 S10 S01 S10 S01 S11 N S10 S01 N Correlation but Dice use the sum of two vectors. And unlike the Jaccard distance, Dice distance is not a proper metric in binary vector space [10]. Both Jaccard and Dice distance is within a normalized range [0, 1] and with a relatively low computational complexity. In fact, Jaccard distance and Dice distance of the same two vectors can be transformed to each other with the following equation in (9). Let's denote Jaccard distance by DJaccard and denote Dice distance by DDice, we have: DJaccard (9) By observing these two equations, we will know that Jaccard distance is more sensitive on dissimilarities of two vectors than Dice distance, it will always output a greater distance value than Dice when comparing two vectors, and the disparity get greater while the similarity of two vectors is greater. To substantiate the difference, three example label vectors X, Y and Z with 4-dimensions are observed in Table III: TABLE III. X Y Z Let be the set of all N-dimension binary vectors, and give two vectors X and Y , let Sij (i, j ∈ {0,1}) be the number of occurrences of matches with i in X and j in Y at the corresponding position. We can define four basic operations on vector space as in (7) and (8): S11 ( X , Y ) X Y , S00 ( X , Y ) X Y (7) S10 ( X , Y ) X Y , S01 ( X , Y ) X Y (8) Based on these operations, let the similarity of two feature vectors denoted by S(X,Y) and dissimilarity denoted by D(X,Y), some well-known measures [8] can be defined as in Table II. Considering that there will be new labels generated in XLABEL system at any time, and the newly generated labels will change the S00 value and the dimension number N of all existing feature vectors. To avoid the similarity and dissimilarity of any two feature vectors been re-calculated every time a new label is generated, we must use a measure that is independent of S00 and dimension number N. Among these measures given in Table II, only Jaccard and Dice are independent of S00 and dimension number N. Jaccard and Dice distance measures are very similar in form, in fact they are only different on the sum of cardinalities, where Jaccard use the union of two vectors, EXAMPLE LABEL VECTORS X, Y AND Z xlabel attributes tag:started tag:important project1 1 1 project2 1 1 project3 1 1 Vector Name ((S10 S11 )(S01 S00 )(S11 S01 )(S00 S10 )) © 2014 ACADEMY PUBLISHER DJaccard 2 DDice , DDice 2 DJaccard 1 DDice leader:James 0 1 0 leader:John 0 0 1 Here we can find that, the leader attribute is a categorical attribute, vector Y and Z are different on the attribute leader while vector X missed this attribute. All other labels are the same in vector X, Y and Z. We can easily have the Jaccard distance and Dice distance calculated as DJaccard(X,Y) = 0.3333, DJaccard(Y,Z) = 0.5, DDice(X,Y) = 0.2 and DDice(Y,Z) = 0.3333. Since the difference of attribute xlabel.leader in vector Y and Z is determined, and X just missed this attribute, the difference is not determined between X and Y. So we shall have a lesser distance between X and Y than distance between Y and Z. Obviously we have DDice(X,Y)/ DDice(Y,Z) < DJaccard(X,Y)/ DJaccard(Y,Z), so Dice distance shall be a better measure than Jaccard in our application. D. Clustering Files with Dice Distance Like K-means clustering algorithm, the centroid of clusters are not known before the clustering started when clustering a data stream, so random centroid are designated at the initialization of clustering. And Kmeans algorithms can optimize the centroid with several iterations, and finally get an approximate optimum cluster sets. But clustering the file system operation stream can only have one run, so the iteration and the optimization process have to be taken at the runtime. When clustering the file system operation stream, the centroid of a cluster will be re-calculated every time a vector is inserted in or removed from the cluster. And every time when a centroid is changed, its distance with other centroid will also be re-calculated. If the distance between two centroids is less than a designated threshold radius Rth, vectors of the two clusters will be re-clustered until the 282 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 distance of the two centroids is greater than Rth, or the iteration count limitation is reached. The detailed clustering algorithm is described with the following pseudo codes in Fig. 1, Fig. 2 and Fig. 3: North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf. And it has 8124 instance of mushrooms with 22 categorical attributes. The dataset was divided into 2 classes for the edibility of mushroom. 4208(51.8%) out of 8124 samples are edible, and 3916(48.2%) are poisonous. This information was used for evaluating the result of clustering. Figure 1. Xlabel_clustering() algorithm for XLABEL system Figure 3. Insert_vector() sub-algorithm for XLABEL system Figure 2. Recluster() sub-algorithm for XLABEL system IV. EVALUATION E XPERIMENTS We evaluated the XLABEL system with three real-life dataset, the Zoo dataset, the Mushroom dataset and the Congressional Votes dataset. They were all obtained from the UCI Machine Learning Repository [11], and briefly introduced here: The Zoo dataset: It's a simple database with 101 instances of animals, and containing 18 attributes. The first attribute is animal name. Here we use it as the file name. And there is a "type" attribute which divided the dataset into 7 classes. 15 out of the remaining 16 attributes are Boolean-valued, and the last one is a numeric attribute with a set of values {0, 2, 4, 5, 6, 8}, which is the number of legs of the animal. Here we use all the 16 attributes except the "animal name" and "type" as the attributes of files, and labels were generated accordingly for each file. The "type" attribute was reserved for evaluating the result of clustering. The Mushroom dataset: It's a database of mushroom records drawn from the Audubon Society Field Guide to © 2014 ACADEMY PUBLISHER The Congressional Votes dataset: This dataset includes votes for each of the U.S. House of Representatives Congressmen on 16 key votes. It has 435 instances, and with 16 Boolean-valued attributes for the votes of each congressmen. The dataset is divided into 2 classes for the party affiliation of congressmen, 267 out of 435 are democrats, and 168 are republicans. This was used for evaluating the result of clustering. Different from other clustering algorithm, these datasets are not clustered separately, but mixed together to simulate the actual usage of XLABEL in file system. And they were mixed in sequence of the original order as in the dataset and other five pseudo random sequence. This is intended to evaluate whether XLABEL system will successfully cluster data from completely different datasets into different classes, and whether the different initial samples will affect the clustering result dramatically. A. Experiment Design The samples of the datasets are fed into XLABEL system one by one for one pass. After all the data is fed, and the clustering completed, the clustering results are read out, and evaluated with the class information of the JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 283 original datasets. Let's denote the number of clusters by m, the number of all records in a dataset by n, and ai is the number of records with the class dominates cluster i. The accuracy V and corresponding error rate E of the clustering result [12] is defined as in (10): 0.450 0.400 0.350 Error Rate 0.300 0.250 0.200 0.150 0.100 0.050 m n 0.3 , E 1V 0.35 (10) Different threshold radius Rth values are designated for each run of the experiment. And all 6 datasets, including one original order dataset and 5 different pseudo random sequences ordered dataset will be fed into XLABEL system for each Rth value. The range of Rth value is [0.30, 0.85] with a 0.05 step, so totally 72 runs of the experiment will be done. Besides the accuracy of each run will be recorded, the final number of clusters of each run will also be evaluated. The relationship between Rth, number of clusters, and clustering accuracy will be revealed by analyzing these data. The number of clusters and the accuracy of clustering at the same Rth, but with different ordered datasets will also be evaluated to conclude whether the XLABEL system will output a stable clustering result when initial vectors are different. B. Evaluation Results The experiment results shows that Zoo dataset, Mushroom dataset and Congressional Votes dataset in the mixed datasets are completely clustered into different classes successfully in all cases. The results are the same as clustering the three datasets separately. Fig. 4, Fig. 5 and Fig. 6 shows the error rate with different Rth. Fig. 7, Fig. 8 and Fig. 9 shows the number of clusters with different Rth. 0.4 0.45 0.5 0.55 Sequential Random 1 Random 4 Random 5 0.6 0.65 0.7 0.75 Random 2 0.8 0.85 RandomRth 3 Figure 6. Error rate of clustering Congressional Votes dataset 14 12 Number of Clusters i 1 0.000 10 8 6 4 2 0 0.3 0.35 0.4 0.45 0.5 0.55 Sequential Random 1 Random 4 Random 5 0.6 0.65 0.7 0.75 Random 2 0.8 Random 3 0.85 Rth Figure 7. Number of clusters of clustering Zoo dataset 30 25 Number of Clusters V ai 20 15 10 5 0 0.3 0.35 0.4 0.45 0.5 0.55 Sequential Random 1 Random 4 Random 5 0.6 0.65 0.7 0.75 Random 2 0.8 Random 3 0.85 Rth Figure 8. Number of clusters of clustering Mushroom dataset 60 50 Number of Clusters 0.700 Error Rate 0.600 0.500 0.400 0.300 40 30 20 10 0.200 0.100 0 0.3 0.000 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Rth Sequential Random 1 Random 2 Random 3 Random 4 Random 5 0.35 0.4 0.45 0.5 0.55 Sequential Random 1 Random 4 Random 5 0.6 0.65 Random 2 0.7 0.75 0.8 Random 3 0.85 Rth Figure 9. Number of clusters of clustering Congressional Votes dataset Figure 4. Error rate of clustering Zoo dataset TABLE IV. DATA OF ERROR RATE AND NUMBER OF CLUSTERS ON DIFFERENT RTH 0.600 Error Rate 0.500 0.400 0.300 0.200 0.100 0.000 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Rth Sequential Random 1 Random 2 Random 3 Random 4 Random 5 Figure 5. Error rate of clustering Mushroom dataset © 2014 ACADEMY PUBLISHER (Error Rate, Number Rth of Clusters) 0.3 0.35 0.4 0.45 0.5 0.55 0.6 (0.116, (0.142, (0.155,(0.170, (0.241,(0.295, (0.365, Zoo 12) 10) 9) 7) 6) 5) 4) (0.018, (0.044, (0.078, (0.102, (0.113,(0.124, (0.130, Datasets Mushroom 27) 19) 15) 12) 7) 7) 4) Congressional (0.072, (0.078, (0.083, (0.107, (0.111,(0.132, (0.146, Votes 45) 28) 19) 11) 8) 5) 4) By observing these figures, we can conclude that the error rate of clustering increases with the increasing of Rth, while the number of clusters decreases with the JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 increasing of Rth. And our approach of clustering will have a stable output when Rth<0.6. Table IV shows the detailed error rate and number of clusters on different Rth. With the new labeling system, XLABEL is capable to cluster vectors which are not uniform in dimension. For a balanced performance of number of clusters and error rate, 0.4<Rth<0.5 is recommended for practical use. We found that the performance of our clustering algorithm is similar with the Squeezer algorithm [13] which is also based on Dice measure, as illustrated in Fig. 10, Fig. 11 and Fig. 12. 0.5 XLABEL Error rate 0.4 Squeezer vectors to a mathematical optimized but not practical optimized position. But both our algorithm and Squeezer have a bad performance when cluster numbers is less than 5. So this advantage is actually not practical. 18000 Execution Time 16000 Execution Time (ms) 284 14000 12000 10000 8000 6000 4000 2000 0 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.3 Rth 0.2 0.1 Figure 13. Execution time of clustering in different Rth 0 2 3 4 5 6 7 8 9 The number of clusters Figure 10. Performance compare of XLABEL and Squeezer in Zoo dataset 0.4 XLABEL Error rate 0.3 Squeezer 0.2 0.1 0 2 3 4 5 6 7 8 9 The number of clusters Figure 11. Performance compare of XLABEL and Squeezer in Mushroom dataset Error rate 0.5 XLABEL 0.4 Squeezer 0.3 0.2 0.1 0 2 3 4 5 6 7 8 9 The number of clusters Figure 12. Performance compare of XLABEL and Squeezer in Congressional Votes dataset Generally, our algorithm is with slightly higher error rate compared with Squeezer algorithm, because our algorithm is designed to cluster a continuous feature vector stream, not a completely prepared dataset. XLABEL algorithm can not perform multiple clustering iteration in whole dataset to optimize the clustering result. But when the number of clusters is very small, we get a better result than Squeezer, especially in the datasets with many categorical attributes. It's also because we are clustering vector stream, so there is a better chance to get a better centroid before it was moved by many other © 2014 ACADEMY PUBLISHER As mentioned in Subsection D of Section III, the distance between each newly inserted label vector and centroid of every existing cluster have to be calculated before the label vector can be inserted in any cluster. So the execution time of inserting a label vector will increase when the amount of existing clusters increases. As we discussed above, the Rth can be a scaler of clustering accuracy and the final resulting amount of clusters, the greater value of Rth, the less amount of clusters. So the Rth can also be a scaler of calculation complexity of XLABEL algorithm. Fig. 13 shows that the total execution time decreases when Rth increases. The execution time were recorded in a platform with one Intel(R) Core(TM) i3-2100 3.1GHz dual core CPU and 2GB DDR3-1600 DRAM running CentOS-5.6 Linux. V. CONCLUSION We discussed the subject of clustering files in a file system at the runtime, and proposed a labeling system which can store features of files as labels in Extended File Attributes. A clustering approach based on this labeling system is also introduced and performance evaluation is done on this approach with some wellknown real life datasets. Evaluation results shows that our approach have a stable output when a proper threshold radius is set, and precise clustering of files is achieved at an acceptable cost. ACKNOWLEDGMENT Lin Han would like to extend sincere gratitude to corresponding author, Hao Huang, for his instructive advice and useful suggestions on this research. And we thank the anonymous reviewers for their valuable feedback and suggestions. This work is supported in part by the National Basic Research Program of China under Grant No.2011CB302303, the NSF of China under Grant No.60933002, and National High Technology Research and Development Program of China (863 Program) under Grant No.2013AA013203. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 REFERENCES [1] J. Morris, "Filesystem labeling in SELinux," Linux Journal, Red Hat, Inc., 2004, pp. 3-4. [2] C. J. Yoo, O. R. Jeong, "Category Extraction for Multimedia File Search," Information Science and Applications (ICISA), 2013 International Conference on. IEEE, 2013, pp. 1-3. [3] J. Pan, C. J. Anumba, "Semantic-Discovery of Construction Project Files," Tsinghua Science & Technology. 13, 2008, pp. 305-310. [4] Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, "Semanticaware metadata organization paradigm in next-generation file systems," Parallel and Distributed Systems, IEEE Transactions on. 23(2), 2012, pp. 337-344. [5] N. Anquetil, T. Lethbridge, "Extracting concepts from file names: a new file clustering criterion," Proc. ICSE '98. IEEE Computer Society Washington, DC, 1998, pp. 84-93. [6] Z. Huang, "Extensions to the k-means algorithm for clustering large data sets with categorical values," Data Mining and Knowledge. Discovery II, 1998, pp. 283-304. [7] C. Ordonez, "Clustering Binary Data Streams with Kmeans," ACM DMKD03. San Diego, CA, 2003, pp. 12-19. [8] S. S. Choi, S. H. Cha, C. C. Tappert, "A Survey of Binary Similarity and Distance Measures," Journal of Systemics, Cybernetics and Informatics. 8(1), 2010, pp. 43–48. [9] B. Zhang, S. N. Srihari, "Properties of Binary Vector Dissimilarity Measures," Proc. JCIS Int'l Conf. Computer Vision, Pattern Recognition, and Image Processing, 2003, pp. 26-30. [10] AH. Lipkus, "A proof of the triangle inequality for the Tanimoto distance," Journal of Mathematical Chemistry. 26(1-3), Springer, 1999, pp. 263-265. [11] A. Frank, A. Asuncion, "UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]," University of California, School of Information and Computer Science. Irvine, CA, 2010. [12] Z. Y. He, X. F. Xu, S. C. Deng, "A cluster ensemble method for clustering categorical data," Information Fusion. 6(2), 2005, pp. 143-151. [13] Z. Y. He, X. F. Xu, S. C. Deng, "Squeezer: an efficient algorithm for clustering categorical data," Journal of Computer Science and Technology. 17(5), 2002, pp. 611– 624. [14] Z. Y. He, X. F. Xu, S. C. Deng, "Improving Categorical Data Clusterinq Algorithm by Weighting Uncommon Attribute Value Matches," Computer Science and Information Systems. 3(1), 2006, pp. 23-32. [15] H. Finch, "Comparison of Distance Measures in Cluster Analysis with Dichotomous Data," Journal of Data Science. vol. 3, 2005, pp. 85-100. [16] O. Fujita, "Metrics based on average distance between sets," Japan Journal of Industrial and Applied Mathematics. Springer, 2011. Lin Han received the BS and MS degrees in computer science from Huazhong University of Science and Technology (HUST), China, in 2005 and 2007, respectively. He is currently working toward the PhD degree in computer science at HUST. His research interests include computer architecture, storage system and embedded digital © 2014 ACADEMY PUBLISHER 285 media system. He is a student member of the IEEE and the IEEE Computer Society. Hao Huang received the PhD degree in computer science from Huazhong University of Science and Technology (HUST), China, in 1999. Presently, He is an associate professor in the Wuhan National Laboratory for Optoelectronics, and School of Computer Science and Technology, HUST. He is also a member of the Technical Committee of Multimedia Technology in China Computer Federation, and a member of the Technical Committee of Optical Storage in Chinese Institute of Electronics. His research interests include computer architecture, optical storage system, embedded digital media system and multimedia network technology. Changsheng Xie received the BS and MS degrees in computer science from Huazhong University of Science and Technology (HUST), China, in 1982 and 1988, respectively. Presently, he is a professor and doctoral supervisor in the Wuhan National Laboratory for Optoelectronics, and School of Computer Science and Technology at Huazhong University of Science and Technology. He is also the director of the Data Storage Systems Laboratory of HUST and the deputy director of the Wuhan National Laboratory for Optoelectronics. His research interests include computer architecture, disk I/O system, networked data storage system, and digital media technology. He is the vice chair of the expert committee of Storage Networking Industry Association (SNIA), China. Wei Wang received the BS and MS degrees in computer science from Huazhong University of Science and Technology (HUST), China, in 2005 and 2007, respectively. He is currently working toward the PhD degree in computer science at HUST. His research interests include computer architecture, embedded digital media system and digital copyright protection system. He is a student member of the IEEE and the IEEE Computer Society. 286 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Method of Batik Simulation Based on Interpolation Subdivisions Jian Lv, Weijie Pan, and Zhenghong Liu Guizhou University, Guiyang, China Email: [email protected], {290008933, 328597789}@qq.com Abstract—According to realized Batik Renderings, we presented an algorithm for creating ice crack effects as found in Batik wax painting and Batik techniques. This method is based on Interpolation Subdivisions algorithm, which can produce crackle effects similar to the natural texture generated in the batik handcraft. In this method, the natural distribution of ice crack is created by Random distribution function, and then the creating of ice crack is controlled by Interpolation Subdivisions algorithm and the modified DLA algorithm, and the detail is associated with those parameters: The number of growth point, noise, direction, attenuation and so on. Then we blend Batik vector graphics with the ice crack and Mix Color between them, finally, such post processing realizes the visual effect. The simulation results show that this method can create different forms of ice crack effects, and this method can be used in dyeing industry. Index Terms—Ice Crack; Interpolation Segments Substitution; Batik I. Subdivisions; INTRODUCTION Batik craft has a long history more than 3000 years, and now it has been one of the world-intangible cultural heritages. Batik craft is famous for its long history and the civilization, which has occupies an important position in the history of the modern textile in the world. Because of the unique regional culture process characteristics, it has been formed different styles of Batik which are sought after people all over the world, such as Figure 1, they come from these representative places: Bali and Java in Indonesia, Guizhou in China, Japan and India. abcd Figure 1. Image a. Batik in Indonesia, b. Batik in china, c. Batik in Japan, d. Batik in India With the speeding up of industrialization and urbanization, the ancient Batik craft is dying. Owing to the protection of World intangible cultural heritage, the old craft bloom renascent vitality. The traditional Batik creates abundant of cultural elements and symbols, which provide vast resources for modern printing and dyeing industry. Now, with the development of digital art and © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.286-293 design technology, the protection and creation for traditional Batik has step a new way. There is a win-win situation between traditional Batik and modern printing and industry, and the Batik enters into modern life again. Computer simulation of Batik involved in image recognition, graphics vector quantization and ice crack simulation. The most important of simulation is expressing the esthetic of Batik. The graphics and symbols of Batik have unique aesthetic value, which have carried the history, culture, folk customs, myths and legends. The ice crack is a texture which generate from the process of naturally cracking on the wax coat. That's exactly what people like. The ice crack is born with abstraction, contingency and uniqueness, which is the key feature what distinguish other printing and dyeing technology from. So, computer simulation of Batik has a profound impact for modern batik art creation and batik industry. With the development of intangible cultural heritage protection all over world, more and more people are interested in researching this ancient art form by computer. According to the visual characteristics and aesthetic value of Batik, there are two research hotspots: first, vector quantization of dermatoglyphic pattern in Batik, that can generate a large number of basic shapes; second, the creation of ice crack, which can simulate the real texture of Batik. So far, some research results have been widely used in modern printing and dyeing industry. Currently, simulation of ice crack has been a research hotspot in 3D animation, such as crack in ice, glass, ceramic, and soil. Wyvill [1] first proposed an algorithm to generate batik ice patterns which is based on Euclidean Distance Transform Algorithm [2, 3].The method is to get a pixel gradient image which is starting from skeleton to edge in original pattern. Tang Ying [4] present an improved algorithm of Voronoi [5], which is similar to Craquelure. Besides, FEA [6, 7, 8, 9, 10] is another way to simulate ice crack through setting up mechanics model. The study in the field of fractal theory [11] is a hotspot, such as are DAL [12, 13] and L - system [14, 15], and both of them are suit for the growth model. Lightning simulation [16] proposed a multiple subdivisions which represents the fission model. Generally, most of those algorithms can be used in 2D and 3D graphics, and the crack simulation of different object has a high level sense of reality, we present a method which is based on Interpolation Subdivisions algorithm. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 II. VISUAL CHARACTERISTICS ANALYSIS OF ICE CRACK In order to analysis the visual characteristics of ice crack, it’s necessary to analysis the traditional batik handicraft. Taking the Batik of Guizhou in China as example, the technological process included: refining of the cloth, pattern design and drawing on cloth, drawing pattern again with liquid wax, staining, dewaxing, and cleaning. The ice crack mainly generates in the process of the waxing and dyeing, when the liquid wax cooled on the fabric, and the pigment dip dyeing into the cracks. In the end, we get the ice crack which is born with abstraction, contingency and uniqueness. In the history of Batik, some viewpoint in the folk once said that the ice crack is defective workmanship, but just because of this beauty of defect, Batik is loved by people all over the world. There are some factors affecting about the formation of ice crack, such as cloth material, wax, wax temperature, dyeing time. Under natural conditions, the visual distribution of ice crack is random distribution, and the number of crack is also random that effected by many factors; the curve of the texture is complex and changeable, generally, the curve of ice crack is succinct where the pattern of Batik is in a line shape, and complicated where the pattern of Batik is in area; the direction of the crack is also random, most of the textures grow irregularly and interwoven together; the curve of ice crack is changeful in width and brush, usually, they are thicker in cross points, especially after dip dyeing again and again, the lines in cross points are thicker and full of tension. But with the attenuation of growth, the end of the ice crack tends to be thinner more and more. Figure 2 shows the detail of one batik work. 287 ice cracks. So, firstly we should create the fission point set. In order to simulate the distribution of fission point set, we bring in D(Uc , a) as the density function of main fission point set. uc D(Uc , a) Here, U c is the standard density of ice crack trunks, is the density of fission point set, and a is the vibration coefficient. We define the density function uc D(2 103 ,1106 ) . So the fission point set is controlled by the density function D (). Figure 3(b) shows the initial point set. a b Figure 3. Image a. initial points set, b. initial segments set Figure 2. The main visual characteristics of ice crack According to the visual characteristics analysis of Batik ice cracks above all, we presented an algorithm based on Interpolation Subdivisions. Combine with vector quantization to extract dermatoglyphic pattern in Batik, linetype transform of ice crack, and color mixing, we can realize the simulation of Batik. We present the process of the algorithm as follows: 1) distribution of initial point and number control; 2)creating initial texture; 3)Interpolation Subdivisions, that including the control of those factors such as creating number, noise, direction, width and attenuation degree; 4) Image fusion and image after processing. III. INTERPOLATION SUBDIVISIONS A. Creating Fission Point Set Through process analysis and visual features analysis of Batik, we find that there are many fission points in the ice cracks. Usually, one fission point grows one or more © 2014 ACADEMY PUBLISHER B. Creating Initial Segments After Creating Fission Point Set, the next step is Creating Initial Segments. In terms of visual features, the initial segment of one ice crack is a segment which is controlled by three factors: Length, Width and Angel. Now, we have got the Initial Point Set, we could get the segments through connecting the End Point Set with Initial Point Set in sequence. The Initial Segments reflect the standard distribution form of ice cracks. We give the steps of creating of End Point Set as following. L C (1 e) C R G (uc , f ) Here, L is the length of Initial Segment; C is the standard length of Initial Segment; R is the length of canvas; f is the standard stress degree coefficient which reflects the stress of the Initial point; P2 P2 P2 is the growing length coefficient function of one Initial Segment. C is controlled by three factors: R, uc and f. we 288 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 get L by add vibration “e” to the standard length “C”. In this paper, we define L as the standard distribution form: (0.5~1.5) C. We define as the direction angle of one Initial Segment. Usually, in a Batik, it’s a great probability for global that the ice crack along with the normal direction of the Batik pattern. Define E as the Initial point; define F as the end point, so the relational expression for them is as following: F L e j E . Here, we will not take special considerations for width of ice crack, and just define a basic form; W Ws (1 b) , Ws is the standard width of one ice crack, b is the width coefficient. This definition of width is just a temporary effect, and we will take a method of brush replacing to realize the final effect for width. Based on the definition above all, we can create Initial Segments in a canvas. Figure 3(b) shows the initial segments set. C. Interpolation Subdivisions Algorithm Usually, one ice crack grows abundant details characteristics including bifurcate, cross and so on. Ice crack usually includes these characteristics forms in Figure 4: one-way crack, mesh crack, clusters crack and fission crack. Among them, clusters crack and fission crack have a higher probability than others. Especially, the fission crack born with abstraction, contingency and uniqueness, and has a high aesthetic value. We mainly research the simulation of fission crack by Interpolation Subdivisions Algorithm. Now, there are some similar algorithms such as L-system [12], DLA [14], Finite Element [6], Voronoi [5], lightning simulation [16], and so on. In the following, we will compare Interpolation Subdivisions Algorithm with these similar algorithms. and they are two inverse processes in graphic visual. The main method of Interpolation Subdivisions is as following: Define the initial vertex and the end vertex, insert one or two vertex by linear interpolation; and then take the two adjacent vertexes as the new initial vertex and end vertex, repeat the above operation until the branch details achieve the visual aim of ice crack. Finally, connect all the adjacent vertexes. According to the creating of growth cracks, we present Interpolation Subdivisions Algorithm. In Figure 5, First, define the two initial vertex E (a1 , b1 ,c1 ) , F(a 2 , b2 ,c2 ) and make the coordinate of the inserted point as P(x, y, z) . Point P' is got from E and F by linear interpolation. Establish coordinate system as Figure 6, which is based on point E and F. In this coordinate system, define u , v and w as the unit vector of U, V, and W direction in turn. The expression of point P is listed as following and all the variable definition are list in Table 1. P P DVP v DUP u DVP e DA n R(D PV x, D PV y) DUP e DA n R(D PU x, D PU y) Figure 5. Initial vertex defining ab cd Figure 4. Image a. One-way crack, b. Mesh crack, c. Clusters crack, d. Fission crack Since we got the initial vertex and the end vertex, connect them, and we got the initial segment. Interpolation Subdivisions Algorithm is different from the DLA or L-system algorithm. Generally accepted, DLA [14] or L-system [12] algorithm follows the method of plant growth model; Interpolation Subdivisions Algorithm follows the method of succession subdivisions, © 2014 ACADEMY PUBLISHER Figure 6. Coordinate system based on EF In Figure 5, segment EF is the initial segment which decided by two initial point. When we got linear interpolation point P ' and create point P, connect EP and PF, and we get the initial curve. In the process of creating ice crack, it’s important to generate abundant of branches details. So, In addition to segment EP and PF, in Figure 7, another segment PQ is required which generated from P to Q. The following is the solving process of point Q and all the variable definition are list in Table 1. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Q P DPQ r (DWQ w DVQ v DUQ u ) DWQ e DB m R(DQW x, DQW y) DVQ e DB m R(DQV x, DQV y) DUQ e DB m R(DQU x, DQU y) DPQ e DB m R(D L x, D L y) 289 the number of total fission vertexes as NUM (V) and define the number of segments as NUM (Seg) , the relationship of them is as following. n NUM (V) i 1, j 1 n NUM (P) i 1, j 1 n NUM (Q) i 1, j 1 3n 1 3n 1 n 3 1 2 2 n n NUM ( Seg ) i 1, j 1 NUM (V) 1 i 1, j 1 Define the vertex density as uv , define the area of fission process as Sv , and uv can be given by n Figure 7. The first fission of Q uv NUM (V) i 1, j 1 Sv a Figure 8. Tagging rules The symbols’ meanings are listed in TABLE I as follow. When point P and Q are created, connect PE, PF and PQ. These segments structure the initial trunk and branch of ice crack. In the following fission processes, we define the trunk fission points as Pij , define the branch fission points as Qij , n is the number of fission process, i is the b sequence number of fission process, j is the sequence number of fission point (1≤i≤n, 1≤j≤n). Figure 8 shows the tagging rules of fission points. The fission process is an iteration steps which based on the fission vertexes of last step. TABLE I. Symbol DUP , DVP , DWP DPU , DPV , DPW DWQ , DVQ , DUQ DQW , DQV , DQU TABLE PARAMETERS Meaning P: decay degree of deviating initial segment at U,V and W direction P: limits of deviating initial segment at U,V and W direction Q: decay degree of deviating initial segment at U,V and W direction Q: limits of deviating initial segment at U,V and W direction DPQ segment PQ after attenuation DA , DB a basic decay degree DL Limits of segment PQ number of subdivisions a random value between two parameters unit vector n, m R() r() When the number of fission process reaches n , define the number of point P, Q as NUM (P) , NUM (Q) , define © 2014 ACADEMY PUBLISHER c Figure 9. Image a, b, c Give the standard vertex density, and give the constraint condition uv U v . End the fission process when it reaches the constraint condition. In order to achieve abundant of details, we add a fission probability q to vertex Qij . IV. SIMULATION AND ANALYSIS A. Simulation of Ice Crack The interpolation subdivisions algorithm is realized in Matlab. This process mainly include: creating initial point set, creating initial segment set, interpolation subdivisions, 290 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 and post processing. Figure 10 shows the relationship of these processes as following. frequency factor Generate initial point E Number control angle control Generate initial segment EF length control Generate linear interpolation point P' Generate insertion point P Generate branch point Q Interpolation Subdivisions Algorithm Repeat fission process Get sequence of points F constraint condition T Connect sequence of point Post processing Figure 10. Simulation processes In order to realize the real simulation effect of Batik, it is necessary to assign these parameters which related to the Interpolation Subdivisions Algorithm. The main parameters include: U c , the standard density of trunk; U v , the standard density of fission point set; q , the fission probability of branch; n , the fission number; W , the width of the fission segments. The following TABLE II is the main parameters assignment. TABLE II. THE MAIN PARAMETERS ASSIGNMENT Figure Uc Uv q n w a b c 50 50 100 5 5 10 0.4 0.6 0.8 3 5 7 (0.5,1.5) (0.5,1) (1,1.5) Through comparing these three results in visual feature, we chose Figure 11 (b) whose trunks and branches created can realize the simulation of growth crack naturally. B. Post Process of Batik 1) Creating Batik vector graphics Currently, vector graphics are widely used for creating Batik graphics in printing and dyeing industry. We can get vector graphics by CAD software. In order to coordinate with Post Process, we extract vector graphics rough Adobe Illustrator, assign color RGB (255, 255, 255) and RGB (0,0,0) for vector graphics, and assign color RGB (29,32,136) for background. © 2014 ACADEMY PUBLISHER 2) Graphics Blending Taking the vector graphics boundary as the growth boundary of ice cracks, we give the appropriate parameters assignment for U c and U v , and adjust other parameters to consummate the effect of ice cracks. Since the ice cracks are created in the Batik vector graphics, we have got the elementary simulation effect. Figure 13(b) is the Graphics blending effect. 3) Segments Substitution The width is a distinct feature of ice crack segment. Distance Transforms [1, 2, 3] uses a Multiplicative Color Model to generate the width effect where the intersections of ice crack. We present a method of segments substitution to simulate the width and color of dip dyeing. From a visual perspective, each section of the ice crack is not a single segment; it has multiple changes at width, linetype and color, so define a brush which has those features. Figure 12 is a brush we defined. We take segment MN as the trunk, point M and N are the inside endpoint but not the outside endpoint, so the intersections of ice crack could be coherent and thickening. Reduce the brush opacity gradually along with the direction where is far away from the trunk. Then assign color RGB (29,32,136) for the brush. In order to get more changes of brush, we add Perlin noise to the brush. Finally, we replace the segments generated above all with the brush. C. Result Analysis and Comparison Through Figure 10, we can see that there are many steps and parameters affecting the simulation result. Firstly, we create initial point set and create initial segment set, and we control the distribution of the fission trunks by density function uc D(Uc ,a) . When we creating the initial segment EF, we control parameter L to avoid too much intersection and realize discrete effect. Interpolation Subdivisions is the key step in the simulation, we realize various details through modifying those parameters in Table 1, especially with the increasing of fission times, and the details became more and more. In the post process, comparing with the Multiplicative Color Model, the method of segments substitution reduces the complexity of the algorithm, and it completes the thickness changes and the color dyeing effect in the same time, so it the process is simpler and more efficient than other method. The following is the compare of some classic algorithms. DLA model is a stochastic and dynamic growth model, which has the characteristics of dynamics and growth, so it always can be able to express the growth of plant and other growth model. Similarly, LSystem is a fractal art. L system using string rewriting mechanism for iteration through the formation of the continuous production string to guide Guixing draws graphics. And the following Figure 14 is the simulation effect comparison of these three algorithms. Through analysis of visual characteristics, DLA is suitable for the clusters crack simulation; L-System is suitable for the Growth Crack simulation; Interpolation JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 291 abc Figure 11. Creating result for different parameters assignment inTab.2 Figure 12. Defining crack brush abc Figure 13. Image a. Batik vector graphics, b. Graphics blending effect, c. Segments substitution effect abcd Figure 14. Image a. Interpolation Subdivisions, b. DLA, c. Voronoi, d. L-system TABLE III. Crack Form Algorithms Probability Clusters Crack DLA P2 © 2014 ACADEMY PUBLISHER CRACK TYPES, ALGORITHM AND PROBABILITY Growth Crack L-System P2 Fission Crack Interpolation Subdivisions P2 One –Way Crack Linear Interpolation P4 Mesh Crack Voronoi P5 … … … 292 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Figure 15. A batik which has blended five types of ice crack subdivisions is suitable for the Fission Crack simulation. Usually, we found that one Batik contain various types of ice crack. In a Batik, different type of ice crack has different probability. As the following TABLE Ⅲ , according to the crack types, we give the corresponding algorithm and probability. Figure 15 is a batik which has blended five types of ice crack. V. CONCLUSIONS AND FUTURE WORK The significance for research simulation of ice crack is that we can reproduce the aesthetic characteristics of Batik and use this method in the Printing and dyeing industry, and we can realize the mass production and personalized production for batik. By analyzing the traditional batik craft and visual characteristics of batik, Batik simulation mainly concentrated in the graphics vector quantization and ice crack generation. We research the growth mechanism and visual features of ice crack, and present Interpolation Subdivisions Algorithm. The method can realize the visual features performance of ice crack such as abstraction, contingency and uniqueness. In printing and dyeing industry, this method has an obvious advantage in discreteness and growth efficiency, and it is according with the feature of fission crack. In printing and dyeing industry, usually it’s required to create large-scale ice crack in a batik work which has a huge area. In the simulation process, usually the type of ice crack is multiple, and it’s difficult to complete the effect with only one method. So the next research object is large-scale growth efficiency of ice crack and multiple algorithm blending. We will develop the plug-in for Adobe Illustrator, so it will be easy to design and produce Batik through printing and dyeing industry. ACKNOWLEDGMENT This work was supported by National Science & Technology Pillar Program of China (2012BAH62F01, 2012BAH62F03); Science and Technology Foundation of Guizhou Province of China (No. [2013]2108); Scientific Research Program for introduce talents of Guizhou University of China (No. [2012]009); Development and © 2014 ACADEMY PUBLISHER Reform Commission Program of Guizhou Province of China (No. [2012]2747). REFERENCES [1] WYVILL B, OVERVELD K V, CARPENDALE S. Rendering cracks in Batik. Proceedings of the 3rd International Symposium on Non-photorealistic Animation and Rendering. 2004: 61-149. [2] Ricardo Fabbri, Luciano Da F. Costa, Julio C. Torelli, Odemir M. Bruno. 2D Euclidean distance transform algorithms: A comparative survey. ACM Computing Surveys (CSUR), v. 40 n. 1, p. 1-44, February 2008. [3] R. A. Lotufo, A. X. Falcão and F. A. Zampirolli. Fast Euclidean Distance Transform Using a Graph-Search Algorithm. Proc. XIII Brazilian Symp. Computer Graphics and Image Processing, pp. 269 -275 2000. [4] TANG Ying, FANG Kuanjun, SHEN Lei, FU Shaohai, ZHANG Lianbing. Rendering cracks in wax printing designs using Voronoi diagram. Journal of Textile Research, 2012, 33 (2): 125-130. [5] Franz Aurenhammer, Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR), v. 23 n. 3, p. 345-405, Sept. 1991. [6] HIROTA K, TANOUE Y, KANEKO T. Generation of crack patterns with a physical model. The Visual Computer, 1998, 14(3): 126-137. [7] IBEN H N, O'BRIEN J F. Generating surface crack patterns. GraphicalModels, 2009, 12(1): 1-33. [8] James F. O'Brien, Adam W. Bargteil, Jessica K. Hodgins, Graphical modeling and animation of ductile fracture. Proceedings of the 29th annual conference on Computer graphics and interactive techniques, July 23-26, 2002. [9] Gary D. Yngve, James F. O'Brien, Jessica K. Hodgins. Animating explosions, Proceedings of the 27th annual conference on Computer graphics and interactive techniques. p. 29-36, July 2000. [10] Alan Norton, Greg Turk, Bob Bacon, John Gerth, Paula Sweeney. Animation of fracture by physical modeling. The Visual Computer: International Journal of Computer Graphics, v. 7 n. 4, p. 210-219, July 1991. [11] A. - L. Barabási, H. E. Stanley. Fractal Concepts in Surface Growth. Cambridge University Press. 1995. [12] Witten S. Effective Harmonic Fluid Approach to Low Energy Properties of One Dimensional Quantum Fluids. Phys Rev-Let, 1981, 47: 1400-1408. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 [13] ArgoalF. Self Similarity of Diffusion limited Aggregates and Electrod eposition Clusters. Phys Rev Let, 1988, 61: 2558. [14] G Rozenberg and A Salomaa. Visual models of plant development. Handbook of formal languages SpringerVerlag, 1996. [15] Przemyslaw Prusinkiewicz. Modeling of spatial structure and development of plants: a review. Scientia Horticulturae. 1998, 74, 113-149. [16] KOU Yong, LIU Zhi-fang. Method of lightning simulation based on multiple subdivisions. Computer Engineering and Des, 2011, (10): 3522-3525+3569. Lv Jian, Hebei Province of China, November 28, 1983. Guizhou university, Automation and machinery manufacturing PH. D. Major in Advanced manufacturing mode and manufacture information system. He works in Guizhou university of china, Key laboratory of Advanced Manufacturing Technology, Ministry of Education. He has taken up a position of Director Assistant since 2010. He © 2014 ACADEMY PUBLISHER 293 attended IEEE Conference such as 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer, 2013 IEEE International Conference on Big Data. Weijie Pan, Henan Province of China. Guizhou university, Automation and machinery manufacturing. Associate professor, Dr. Major in Advanced manufacturing mode and manufacture information system. Zhenghong Liu, Hunan Province of China. Guizhou university, Automation and machinery manufacturing. Major in Advanced manufacturing mode and manufacture information system. 294 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Research on Saliency Prior Based Image Processing Algorithm Yin Zhouping and Zhang Hongmei Anqing Normal University, Anqing 246011, Anhui, China Abstract—According to high development of digital technologies, image processing is more and more import in various fields, such as robot navigator, images classifier. Current image processing model still need large amount of training data to tune processing model and can’t process large images effectively. The recognition successful rate was still not very satisfied. Therefore, this paper researched saliency prior based image processing model, present the Gaussian mixture process and design the feature point based classifier, and then evaluate the model by supervised learning process. Finally, a set of experiments were designed to demonstrate the effectiveness of this paper designed saliency prior based image processing model. The result shows the model works well with a better performance in accurate classification and lower time consumption. Index Terms—Image Processing; Saliency Prior; Gaussian Mixture. I. INTRODUCTION Recognition based on computer vision is based on the theory of learning and discrimination classified and judge images and video captured by cameras. The classic computer vision is divided into three levels: Underlying visual, Middle vision, High visual. The so-called underlying vision refers to research about the input image of the local metric information (Edge, Surface), such as image SIFT descriptors, contour detection of Sober; middle-visual including object segmentation, target tracking and so on; senior visual tend to means that rebuilding the underlying vision and middle visual information gradually, integrated into the decisionmaking process by ever-increasing complexity. With the improvement of computer processing large-scale data, the related technologies at all levels of computer vision has been more widely used in the areas of industrial production, security monitoring and others. Among these, high visual closest intelligence requirements and has a more promising practical and theoretical significance. The perception of visual mainly in the identification of objects and scenes. Scene mentioned here referred to a real-world environment composition by a variety of objects and their background in a meaningful way. Scene recognition is to study the expression of the scene. The definition of scenario corresponds to objects and textures. When the observer distance concerned target from 1 to 2 meter, the image content is “object”, and when there is a larger space between the observer and the fixed point (Usually larger than 5 m),we began to say scenes rather © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.294-301 than field of view. That is to say, most of the object is a hand distance but scenes usually means where we can move in this space. The research on scene recognition is similar to object classification and recognition research. The scene is unlike objects, object is often tight and works on it; but the scenario is extended in space and function it. However, visual recognition faces enormous challenges on range of applications and processing efficiency, illumination, covering, scale, changes within the class and other problems all affect the visual perception technology widely applications in practical technology and life. Throughout the theory and practice about the object and scene recognition from a decade, basic framework revolves around two core content: image expression and classifier design. Since the essentially of image is a twodimensional matrix or a high-dimensional vector, the number of original pixels is so huge that encounter enormous difficulties in handling data even the ability of current computer enhanced, meanwhile the original pixel contains a large number of invalid information, so the purpose of image expressing is obtaining lowdimensional image vector expression with strong judgment. The most classic image representation model is Bag-of-features model or named codebook model, this model is encode local descriptors of the image, quantified the training samples, projection obtained, the principle of this model is simple and easy to achieve, gained better results in the scene and object recognition these years. Commonly used classification comprises generative model and discrimination model, LiFeiFei of Stanford University practiced LDA technology appearing in the text semantic into the visual field, achieving the selfclassification of objects, non-supervised learning has been one of the most famous applications in the field of computer vision, as discrimination model has used tag information in the training process, so it always obtain better classification results than the generative model, in this chapter, we introduced the most important background knowledge and theory in object and scene recognition field. As mentioned above, underlying visual information such as edge, surface, detail and other information play an important role in the identification, the description based on the local structure is the underlying sub-visual content, and it’s the most commonly used image description method in senior visual. In this section, we JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 will respectively introducing classic descriptors: SIFT and give a short brief about SURF, DAISY and others. Before SIFT description, Gaussian distribution and directed acyclic graph first. Figure 1 shows the diagram of image processing Gaussian function. Scale (next octave) 295 3) Producing SIFT sufficient feature vector by adjusting the parameters; 4) Fast and optimization SIFT matching algorithm can even achieved real-time requirements; 5) Scalability can be very convenient joint with other forms of feature vectors. Scale (first ovtave) Gaussian Difference of Gaussian Figure 1. Key deduction function in image processing of Gaussian model A long time ago, people had made use of directed acyclic graph (DAG) to represent causal relationships between events. Geneticists Sewall Wright proposed a graphical approach indicates causal path analysis which is called path analysi; later it became a fixed causal model representation for the economics, sociology, and psychology. Good once used a directed acyclic graph to represent a causal relationship which is composed by Distributed cause binary variables. Influence diagrams represents another application for the decision analysis and the development of directed acyclic, they include the event nodes and decision nodes. In these applications, the main role of directed acyclic graph is to provide a description of a probability function effectively. Once the network configuration is complete, all subsequent calculations are all completed through operating probabilistic expression symbols. Pearl began to notice DAG the structure can be used as a structure of method of calculating method, and it can be used as a cognitive behavioral model. He updated concept of distributed program by a tree network, purpose is modeling for reading comprehension distributed processing and combining the reason of Top-down and bottom-up to form a consistent explanation. This dual reasoning model is the core of update Bayesian network model and also the central idea of Bayesian. SIFT which is Scale Invariant Feature Transform, was proposed by David Lowe in1999 and further improved in 2004, to detecting and describing the algorithms of the local feature. Descriptor SIFT looking local extreme value among adjacent scales by DOG(Difference-ofGauss) in scale space determine the salient points position and scale in the image, then extracting region gradient histogram in the finding significant point to obtain the final SLFT local descriptors. This method has the following features: 1) With a strong rotation, scale, brightness, etc. invariance and better overcome the viewpoint changes, affine transformation, noise; 2) Good discrimination description big difference in different local area after quantification, can be matched quickly and accurately; © 2014 ACADEMY PUBLISHER Figure 2. The calculation of SIFT descriptor Because of the characteristics of SIFT local features and descriptor, it has become a typical and a standard meaning descriptor in the session of computer vision as showed in figure 2. It has been widely applied in the field of object recognition, robot path planning and navigation, behavior recognition, video target tracking and so on. In this study, by description SIFT descriptors as the basic contrast characterization; make the test results better descriptive and comparative. LiFeiFei proposed a generation model based on LDA, and applied this model into scene classification tasks. This model does not require labeling of image, it can greatly improve the classification efficiency. Framework is based codebook model, obtaining the distribution of code word and scene theme through unsupervised training. This method is getting from improving LDA proposed by Dabbled. Got probability distribution of the local area and the intermediate subjects through an automatic learning approach, training set do not need to label anything other than the category labels. Images input Unknow images Feature Extract Pattern Detection Training Code generate Express images Feature Detection Model Learning Best Model Decision Figure 3. Algorithms working procedure Literature mainly introduces the basic theory of dynamic Bayesian network classifier; literature discussed the active Bayesian network classifier based on genetic algorithm, literature practice dynamic Bayesian network classifier for speech recognition and speaker recognition, literature research on time sequence recognition, graphic tracing and macroeconomic Modeling of dynamic Bayesian network methods. The algorithm working procedure is showed in figure 3. 296 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Explain model structure in plain language, after selected the class of the image, if known the category is mountains, we can obtain the probability vector, this vector pointed out each block image may have a certain intermediate subjects. In order to obtain an image subblock, firstly you should define a particular subject topic in all mixing. For example, if you choose “Rock” as your subject, the associated code word about rock will appear more frequencies (slanted line). So, after selected a more inclined to the horizontal edge topics, selected a possible code word of a horizontal partition. The process of repeating select a theme and code words ultimately generate an image of the scene patch which is created a complete range of mountains. Chart is a graphical illustration of this generation model. This model is called the topic model as showed in figure 4. n C Z θ τ X scene, can achieved by mixing characterized of the global scene and the local features of the target. Log spectrum Smoothed log spectrum Spectral residual β Figure 4. Topic model diagram As previously mentioned, image recognition which based on the classic machine learning theory is divided into two parts: Feature description and Characteristic judgment. This framework also applies to video object recognition self-organization. However, video object has its own unique and challenging: Target characteristic in Video is often experienced long-term and gradual process therefore its characteristics go through this process of change inevitably. This requires the analysis about the effectiveness of the features must be a progressive process The target in video is often appeared accompanied by scenes, that is the target and the background has a strong correlation. How to take advantage of this correlation, improve recognition performance, is one of the challenges. As figure 5 shows, conventional saliency processing model mainly consisted of two steps, which also can’t handle current challenges. Step one is log the spectrum and then smooth the logged spectrum or do spectral residual, finally, a saliency image would showed as the last two images shows. Current research indicates that there is no evidence shown human pattern recognition algorithm is advantageous than standard machine learning algorithms, and human beings not too dependent on the amount of training data. Therefore the key effect of the human cognitive accuracy may lie in the choice of characteristics. In fact, relative to the learning methods of discrimination characteristics, feature description plays a more important role in the performance of the object recognition. For this reason research focused on how to effectively description target features in the video. On the one hand, the gradient of target characteristics requires establish an online evaluation mechanisms for target feature, Specific features may only valid in a specific period of time, on the other hand, the relevance of the objectives and the © 2014 ACADEMY PUBLISHER Figure 5. Saliency image processing model II. SALIENCY PRIOR BASED IMAGE PROCESSING MODEL A. Feature Point Selection One aim of study is to analysis the effectiveness of different object characteristics in the course of recursive cognitive. This study analyzes the soundness and change in effectiveness of the target object characteristic in spatial scale and practice in the process of people's perception to the object, simulation the intensity changes of the clustering characteristics in local descriptors to the characteristic changes of target object which is perceived by the human eye, and continue to screening on the target object features, obtained robust through dimension reduction and the increasing characteristics. In the field of computer vision can achieved the analog of statistical characteristics of the object through a number of local descriptions. Compared to the overall description of the image this method has a better robustness and adaptability; the single local descriptor is only characteristics collection in a small area around the point of interesting, local structure can’t express the general characteristics of the target object. First, extracting descriptor by the samples of poly library, secondly clustering and generates several code word, and then, extract features description of the test sample and the training sample in the same way and projected onto the code word. Treat each code word as a channel, in this way we can get the result of each channel changes by the characterized projected in the timeline, get distribution curve of feature projection on each channel. Experiments under the framework of the information entropy and mutual information criterion, two different but related ways, reduce the dimension of the feature channel. Finally, analyze the effects of dimensionality reduction based on the code book and support vector JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 machine identification system, to achieve robustness and effectiveness of the characterized channel. In this report, firstly, establish the word bags express of the image, then information entropy and mutual information from the perspective of the characteristics of the cognitive process of analysis is divided into recursion, then divided the feature recursive cognitive process analysis into robustness analysis and impact analysis on decisionmaking in the perspective of information entropy and mutual information, illustrate the dimension reduction of the characteristics channel from the implementation of the next two-step. At the pre-processing stage of the target image, this study uses the codebook to express image (Bag-ofFeatures). This article was extracted features from the test library, training library, clustering library, and clustering generated codeword of M, getting the projection on code word from the test library image and clustering library sample, Bag-of-Features for object recognition and classification are divided into the following steps: Extracted Local feature of image, common local features include: SIFT, SURF, DAISY, Opponent-SIFT, texture feature and so on. Study the visual vocabulary. The learning process achieved mainly through the clustering algorithm, the cluster center from classical K-means and improved methods K-means++ is the code word, the collection of code word is called codebook. Quantified, projected the Local features of the training samples onto code word obtained code word frequency each image can be expressed by histogram which is composed from the code word. Image classification and identification. After the Bagof-Features, learning a discrimination to distinguish different types of goals. Commonly used classification Includes: Neighbor classifier, Neighbor K classifier, Linear classifier, SVM, Nonlinear SVM B. Gaussian Mixture Model Gaussian Mixture Model, this is linear combinations of A plurality of single-Gaussian distribution, is the promotion of Single Gaussian probability density function. GMM can be smoothly approximate the probability density distribution of any shape. Therefore, it is often used in background modeling and speech recognition in recent years, and achieved good results. In this research, using Gaussian mixture model distribution fitting projection vector of a series of video frames in the same dimension data. The resulting of Gaussian mixture model can show the distribution of projection vectors in the dimension of the subject more precisely. Furthermore, by calculating the projection vector sequences from the training and projection vector sequences from the testing, obtaining symmetry KL divergence from the two corresponding fitted distribution in same dimensional data, can determine the validity of dimensional data where the performance of the object characteristics, and exclude a certain lower validity dimensional, which can make no damage or even improve the ability to correctly identify the target to system while reducing the amount of data processing. © 2014 ACADEMY PUBLISHER 297 For the single sample x1 in the observational data sets X x1, x2, xn , The density function of Gaussian mixture distribution is: k P xi / wkpk xi / k i 1 In this Formula, wk is Mixing coefficient, regarded as the weight of each Gaussian distribution component: 1, 2, k is parameter space of each Gaussian distribution, k ik1 represents the component of a Gaussian distribution mean and variance in k. Generally use maximum likelihood estimation method to determine the parameters of the model: N P X / P xi / L / X i 1 arg max L / X For a Gaussian mixture model, it is not feasible to seeking its maximum value by directly the partial derivative. And then consult online EM for estimating Gaussian mixture model parameter. The related formula of the algorithm as following: wk ' pk xi/' p k / xi , ' k j 1 wj ' pj xj/ xi ,' dk p k / xi , ' N i 1 1 k ' N Getting a feature vector with higher dimensional after clustering and characterization targets. To ensure the correctness results of the matching, we should screen out the dimension which represents the target information stably, and removing unstable dimension. Analysis from the timeline seeking the probability density function through the distribution of each dimension, choosing a distribution is more stable dimension. Distance K-L is a degree to measure the similarity between distribution p pk and the known distribution wk ' q qk in statistical, which is defined as pi qi In this DKL 0 . Only when the two distributions are identical until the distance K-L is equal to 0.Distance K-L is not symmetric about p and q. Generally, the distribution from p to q is not equal to the distribution from q to p. the value of K-L distance is lager with the difference of the two distributions. The details are showed in tablet 1. We use K-L distance to calculate the similarity of the two mixed Gaussian distribution. After description the feature of the measured image, fit the data for each dimension of each feature, Get mixture Gaussian model at each dimension, and calculate the K-L distance to the corresponding dimension of the library. Select a certain number dimensions which is close proximity to K-L distance, achieved the purpose of reducing the feature DKL p // q pi log 2 i 298 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 dimension, improving stability of characterization, improving matching accuracy. TABLE I. GAUSSIAN MIXTURE MODEL FITTING RESULTS AT A CERTAIN DIMENSION Dimension 1 2 3 4 5 6 Mean 0.007923 0.000000 0.000000 0.053632 0.087521 0.142327 Variance 0.045144 12.500000 12.531255 0.2342323 0.3431683 0.5217323 Weight 1.000000 0.000000 0.000000 1.000000 1.000000 1.000000 In the course of this research, we found the characteristics for the same channel; Chart is the values for fitted distribution of three Gaussian mixture models in a certain dimension, wherein the 2.3 dimension is initial value. But the eventually weight is 0. By comparing the characteristics of all the dimensions, concluded that, the fitting is subject to a single Gaussian distribution whether for the test sample statistics or the training sample statistics. The projection value after normalizing the same dimension, is also subject to a single Gaussian distribution, this is also confirmed the correctness of the conclusions from the side. C. Evaluation Model Design The effectiveness of characteristic in cognitive process: the validity on object class descript as below: Showing by the previous evidence, the effectiveness of feature performed on the categories is the size of on the mutual information with the category labels. As the complexity of the conditional probability, design the following experiment to simulate: 1. Estimated the training sample distribution between each channel through Gaussian mixture models 2. Estimated the test sample frames image distribution for each channel by Gaussian mixture models 3. Seeking the KL divergence according the distribution between training and testing samples, comparing the divergence value with a predetermined threshold specified T, we define channel with no performance or smaller effect to target as characteristic channel which value is greater than the threshold. This process can be expressed as follows flowchart. As is shown in the flowchart, in the second process of dimension reduction, respectively fitting the training and test sets with Gaussian mixture model, then gather the KL distance of each dimensions to a scatter plot. From the figure, for the first one hundred dimensional SIFT feature, the distance between two distributions is so small. A greater part of distributions from the texture and color histogram is far away the outliers. This is mainly due to relatively large difference between the image scale in the training and testing database, caused by lacking scale invariance of color histogram. Likewise, described by the former, they are independent between the channels and features, so we can compare the KL distances between the distributions through threshold value T2. The channel greater than the T2 threshold value is considered to be bad characterize category information, and the channel smaller than the T2 © 2014 ACADEMY PUBLISHER threshold value is considered to be good characterize category information, contribute to separate the different types of objects in characteristics. Through the feature extraction, get the most effective feature to category determining, achieve an effective result in dimensionality reduction, reducing the complexity of the following model parameter estimation. Target Video Frame Extract Feature and map to N dimension Space Calculate 2N Gaussian Mixture Model Filter dimensional data according to size of divergence Calculate N distribute KL divergence Training Video Frame Adjust character vector Figure 6. Schematic considering the difference between test sample and training sample Classification decision refers to using statistical methods classified these identified objects as a category. Basic approach is set a judgment rules based on the training sample, lower the identified objects error recognition rate and loss caused by this rules, the decision rule of Bayesian network pattern recognition model is Bayesian network classifier, it’s obtained by learning the structure and parameters of Bayesian networks. Parameters usually determined by the structural and data sets, therefore Bayesian network structure learning is the core of Bayesian network learning. In this research, simulate the human eye perceives objects recursive process through targeted campaign on the scale space, analysis the robustness of characterized and effectiveness of objects cognition through the angle of minimum information entropy and maximum mutual information. Design a experiment, modeling on the sample according Gaussian mixture model and crossentropy, present feature evaluation criteria to both cases. Through the on-line self-organization recognition experiments, we can verify dimensionality reduction method is effective. D. Supervised Learning Process In order to generate high-quality code word, in the clustering process using intersection kernel metrics Kmeans method based on histogram. This is mainly because HIK is able to effectively calculate the number of points falling into the same collection at a given level; and In the process of visual perception, the local descriptors such as SIFT and DAISY are based on the description of the histogram; in the process of comparing the similarity of two local descriptor, using histogram intersection is more appropriate than classical Euclidean distance. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Set h h1, h2, hi Ri histogram, hi is the frequency of codeword in codebook model. Histogram intersection kernel KHI is defined as: d KHI h1, h2 min(h1i , h2i ) i 1 Initial center can be obtained by K-MEANS ++ method, each local feature assigned to the corresponding center, according to the following equation: hx mi = hx 2 2 h h j x jk j 2 j i i KHI hj , hk i 2 2 j j KHI hx , hj i If hx is a arbitrary partial description, mi is the current cluster centers, using histogram intersection kernel, when calculating the similarity between the local features and the current cluster center, the first item at this point does not affect the results, the second term needs to be calculated for different feature, each time a new element added, the amount of calculate mainly spent in the last one. Figure 7. Supervised learning express code image It is critical to select the appropriate classification for specific issues. Linear support vector machine get good results in the visual field for its efficient and high accuracy. In actually, Pyramid GIST description can be seen as image histogram description at multi-scale and multi-up and the codebook model is can be viewed as a histogram of the frequency for local significant structure, Euclidean distance is not the best metrics to describing the similarity of two descriptors from this point. As previously mentioned, histogram intersection core is better than the Euclidean distance in histogram metrics, it can be predicted that in the field of visual cognition using histogram intersection kernel support vector machines can achieve better results under this framework. Given labeled training set D yi , xi i 1 , N xi is the training data, yi 1, 2, n corresponding to xi categories. The dual form of SVM can be reduced to optimization problem, that is, N 1 W i i jyiyjk xi , yj 2 i i 1 © 2014 ACADEMY PUBLISHER 299 Define: 0 i C, iyi 0 The Commonly linear kernel is defined as k xi , xj xi xj The framework for this topic, as mentioned previously, codebook model and Pyramid global are all based essentially on the expression of the histogram. Therefore under this case, use the histogram intersection nuclear may better express the similarity between the two characterizes. However, histogram intersection kernel is non-linear kernel, requiring more memory and computing time than the linear kernel. Maji and some others have researched this problem, decomposing formula in a similar manner, accelerating the calculation, ultimately required O n time complexity and memory. In the experiment, using a modified LIBSVM to achieve multiclass distinguish. III. EXPERIMENT AND VALIDATION A. Experiment Environment In the test, according the experiments framework set by Quattoni, testing on a standards outdoor image database. For each category, select eighty images used for training, twenty pictures for testing, for the convenience and standardize, we directly use the name of the file they offer, so, in this experiment, we used the same experimental data with Quattoni. In order to train one to many classifiers, sampling N positive samples and sampling 3N negative samples in the same. Create threetier expression in accordance with the previously described pyramid image, each layer using the Gabor filter processing at three scales and eight directions, tandem all images to obtaining the expression vector in the end, totally 24192 dimensional vector. In order to obtain description of the partial salient region, we extracted dense SIFT descriptors at the respective three pyramid images, then projected on the 500 cluster centers, tandem the frequency of all the blocks descriptors similarly, so in codebook model, each image marked by a 10500 dimensional vector. Finally, as tablet 2 shows, tandem the two described to get the latest image compositing expression. In the determination phase, training in the histogram intersection kernel support vector machine. TABLE II. HIK SVM Liner SVM Polynomial SVM RBF SVM COMPARISON VECTORS OF DIFFERENT METHODS dimensionality reduction 100% 25.173% 92.053% 73.249% Normal dimensional 100% 100% 94.325% 92.971% In this model, the image is modeled as a collection of a series of local block, these regions are a part of the “subject”, each block is expressed by the code book, and through training can be obtained scenes themes of each class and code word distribution. For the test samples, identify the code word firstly, and then finding class model which is the best matching code word distribution. 300 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 B. Test Results Chart is the result of comparing the present method to the other reference. Repeated experimental results of Quattoni, include GIST of core RBF, Quattoni prototype representation and space pyramid matching of Lazebnik. In the experiment, take two layers of Pyramid, the size of the dictionary is 400, primitive image is 50, matching the two sides by histogram intersection kernel; in this paper, support vector machine approach based on hybrid expression and histogram intersection kernel (selected the Number of code words are200 and 500). It is obviously that, the proposed method achieved the best results, reached an accuracy rate of 40%. IV. CONCLUSION In this part, proposed a mixing expression of images, to test the effectiveness of hybrid expression, extend the goal to indoor scene. In fact, the essential of general object recognition and indoor scenes are the same, but the interior scenes with greater within-class variance and the similarity between classes. So much classical identification of objects, scene understanding methods have shown its weaknesses in processing the indoor scenes. Inspired by the Devi Parikh study, focused the first step of the design on the discrimination significance of image expression, considered from the overall expression of the image and local significant structures, by the further excavation from the relationship between the area of the overall expression and considering a more suitable histogram intersection distance in the classic codebook model, finally, got mixed image expression after the series. After acquiring an image expression, getting ultra-face of distinguish different types point in high-dimensional space through the training. This moment, compare the similarity between the image expression has become one of the key issues, using the histogram intersection kernel support vector machines, through the experimental comparison, found that the recognition rate under this framework enhance the accuracy of Kurdish in a large extent. Figure 8. Comparison of different meyhods ACKNOWLEDGEMENT Even pure pyramid GIST histogram intersection SVM method get to 30%, this has exceeded 4% to the highest accuracy rate of Quattoni. Finally, this method outstrips about 4 percentage points beyond the spatial pyramid matching of Lazebnik. From the figure, it has shown that, using more code word can significantly improve the accuracy. Figure 9. Recognition result of same hybrid with different kernel of SVM Figure 9 is using pyramid GIST express different kernel support vector machine comparison and verification: selected 50 pictures in each category randomly, select another 20 images for testing, image expressed by the GIST pyramid. It is shown in the simplified framework, the results of histogram cross core easily goes beyond others’. It is noteworthy that, better performance in the usual kernel support vector machine RBF was particularly bad in here, this may be due to this metric is not applicable to express the characterization of GIST based on histogram. © 2014 ACADEMY PUBLISHER This research is funded by: Youth Research Fund Anqing Normal University in 2011 Project: Domain Decomposition Algorithm for Compact Difference Scheme of the Heat Equation (Grant No. KJ201108). REFERENCES [1] Chikkerur, S., T. Serre, and T. Poggio, Attentive processing improves object recognition. Journal of Neuroscience, Vol. 20, No. 4, 2000. [2] Fergus, R., P. Perona, and A. Zisserman, Object class recognition by unsupervised scale-invariant learning, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 264-267, 2003.. [3] Jiang, Y., C. Ngo, and J. Yang. Towards optimal bag-offeatures for object categorization and semantic video retrieval, Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 494-501, 2007. [4] Niebles, J., H. Wang, and L. Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, vol. 79, no. 3, pp. 299-316, 2008 [5] Grauman, K. and T. Darrell, The pyramid match kernel: Efficient learning with sets of features, Journal of Machine Learning Research, pp. 725-760, 2007. [6] Cristianini, N. and J. Shawe-Taylor, An introduction to support Vector Machines: and other kernel-based learning methods, Cambridge Univ Pr, 2000. [7] Gambetta D, Can we trust trust? In: Gambetta D, ed. Trust: Making and Breaking Cooperative Relations. Basil Blackwell: Oxford Press, pp. 213~ 237, 1990. [8] Bouhafs F, Merabti M, Mokhtar H. A Semantic Clustering Routing Protocol for Wireless Sensor Networks, IEEE JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 [9] [10] [11] [12] [13] [14] Consumer Communications and Networking Conference, pp. 351-355, 2006 Avciba, I., et al., Image steganalysis with binary similarity measures. EURASIP Journal on Applied Signal Processing, pp. 2749-2757, 2005. Maji, S., A. Berg, and J. Malik. Classification using intersection kernel support vector machines is efficient, IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008. Lazebnik, S., C. Schmid, and J. Ponce, A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1264-1276, 2005. Manjunath, B. and W. Ma, Texture features for browsing and retrieval of image data. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 18, no. 8, pp. 836841, 2002. Bay, H., T. Tuytelaars, and L. Van Gool, Surf: Speeded up robust features. Computer Vision CECCV 2006, pp. 402415, 2006. Nowak, E., F. Jurie, and B. Triggs, Sampling strategies for bag-of-features image classification. Computer Vision CECCV 2006, pp. 491-502, 2006. © 2014 ACADEMY PUBLISHER 301 [15] Fischler, M. and R. Elschlager, The representation and matching of pictorial structures. Computers, IEEE Transactions on, vol. 100, no. 1, pp. 68-93, 2006. [16] Joubert, O, Processing scene context: Fast categorization and object interference. Vision Research, vol. 47, no. 26, pp. 3285-3295, 2007. [17] Biederman, J., J. Newcorn, and S. Sprich, Comorbidity of attention deficit hyperactivity disorder with conduct, depressive, anxiety, and other disorders. American Journal of Psychiatry, vol. 145, no. 5, pp. 563-577, 1991. [18] Oliva, A. and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, vol. 42, no. 3, pp. 144-174, 2001. [19] Odone, F., A. Barla, and A. Verri, Building kernels from binary strings for image matching. Image Processing, IEEE Transactions on, vol. 14, no. 2, pp. 168-180, 2005. [20] Maji, S., A. Berg, and J. Malik. Classification using intersection kernel support vector machines is efficient. : Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1-8, 2008. [21] Wu, J. and J. Rehg. Beyond the Euclidean distance: Creating effective visual codebooks using the histogram intersection kernel.: 2009 IEEE 12th International Conference on Computer Vision, pp. 630-637, 2009 302 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 A Novel Target-Objected Visual Saliency Detection Model in Optical Satellite Images Xiaoguang Cui, Yanqing Wang, and Yuan Tian Institute of Automation, Chinese Academy of Sciences, Beijing, China Email: {xiaoguang.cui, yanqing,wang, yuan.tian}@ia.ac.cn Abstract—A target-oriented visual saliency detection model for optical satellite images is proposed in this paper. This model simulates the structure of the human vision system and provides a feasible way to integrate top-down and bottom-up mechanism in visual saliency detection. Firstly, low-level visual features are extracted to generate a low-level visual saliency map. After that, an attention shift and selection process is conducted on the low-level saliency map to find the current attention region. Lastly, the original version of hierarchical temporal memory (HTM) model is optimized to calculate the target probability of the attention region. The probability is then fed back to the low-level saliency map in order to obtain the final target-oriented high-level saliency map. The experiment for detecting harbor targets was performed on the real optical satellite images. Experimental results demonstrate that, compared with the purely bottom-up saliency model and the VOCUS top-down saliency model, our model significantly improves the detection accuracy. Index Terms—Visual Salience; Hierarchical Temporal Memory I. Target-Oriented; INTRODUCTION With the development of remote sensing technology, optical satellite images have been widely used for target detection, such as harbors and airports. In recent years, high spatial resolution satellite images provide more details for shape, texture and context [1]. However, data explosion for high resolution remote sensing images, brings more difficulties and challenges on fast image processing. Visual saliency detection aims at quickly identifying the most significant region of interest in images by means of imitating the mechanism of the human vision system (HVS). In this way, significant regions of interest can be processed with priority by the limited computing resource, thus substantially improving the efficiency of image processing [2]-[3]. There are two models for HVS information processing, namely, bottom-up data driven model and top-down task driven model. Bottom-up model often acts as the unconscious visual processing in early vision and is mainly driven by low-level cues such as color, intensity and oriented filter responses. Currently, many bottom-up saliency models have been proposed for computing bottom-up saliency maps, by which we can predict human fixations effectively. Several bottom-up models are based on the well known biologist saliency model by © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.302-309 Itti et al [4]. In this model, an image is decomposed into low-level feature maps across several spatial scales, and then a master saliency map is formed by linearly or non-linearly normalizing and combining these maps. Different from the biological saliency models, some bottom-up models are based on mathematical methods. For instance, Graph-based Visual Saliency (GBVS) [5] formed a bottom-up saliency map based on graph computations; Hou and Zhang [6] proposed a Spectral Residual Model (SRM) by extracting the spectral residual of an image in spectral domain; Pulsed Cosine Transform (PCT) based model [7] extended the pulsed principal component analysis to a pulsed cosine transform to generate spatial and motional saliency. Although the bottom-up saliency models are shown to be effective for highlighting the informative regions of images, they are not reliable in target-oriented computer vision tasks. When apply bottom-up saliency models in optical satellite images, due to the lack of top-down prior knowledge and highly cluttered backgrounds, these models usually respond to numerous unrelated low-level visual stimuli and miss the objects of interest. In contrast, top-down saliency models learn from training samples to generate probability maps for localizing the objects of interest, and thus produce more meaningful results than bottom-up saliency models. A well-known top-down visual saliency model is Visual Object detection with a CompUtational attention system (VOCUS) [8], which takes the rate between an object and its background as the weight of feature maps. The performance of VOCUS is influenced by object background. Although it performs well in nature images, it does not work reliably in the complicated optical satellite images. Recently, several top-down methods have been proposed based on learning mappings from image features to eye fixations using machine learning techniques. Zhao and Koch [9]-[10] combined saliency channels by optimal weights learned from eye-tracking dataset. Peters and Itti [11], Kienzle et al. [12] and Judd et.al. [13] learned saliency using scene gist, image patches, and a vector of features at each pixel, respectively. It is established that top-down models achieve higher accuracy than bottom-up models. However, bottom-up models often take much lower computational complexity due to only taking into account of low-level visual stimuli. In this case, an integrated method of combining JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 bottom-up and top-down driven mechanisms is needed to get benefits from both types of mechanisms. How to effectively integrate bottom-up and top down driven mechanisms is still an unsolved problem for the visual saliency detection. According to the mechanism of HVS, this paper proposes a target-oriented visual saliency detection model, which is based on the integration of both the two driven mechanisms. The proposed model consists of three parts, namely pre-attention phase module, attention phase module and post-attention module. Firstly, a low-level saliency map is quickly generated by the pre-attention phase module to highlight the regions with low-level visual stimuli. Then the attention phase conducts an attention shift and selection process in the low-level saliency map to find the current attention region. After obtaining the attention region, a target probability of the region evaluated by the post-attention module is fed back to the low-level saliency map to generate a high-level saliency map where the suspected target regions are emphasized meanwhile the background interference regions are suppressed. The main contributions of this paper are: A new method is presented for combining top-down and bottom-up mechanisms, i.e. revising the low-level saliency map with target probability evaluation so that the attention regions containing suspected targets are enhanced, meanwhile inhibiting the non-target regions. An effective method for focus shift and attention region selection is proposed to focus on the suspected target regions rapidly and accurately. The original HTM model is improved in several respects including the input layer, the spatial module and the temporal module, leading to a robust estimation of the target probability. This paper is structured as follows: Section II describes the framework of the proposed model. The details of the three parts i.e. pre-attention phase module, attention phase module and post-attention module are presented in Section III, IV and V, respectively. Experimental results are shown in Section VI. Finally, we give the concluding remarks in Section VII. II. FRAMEWORK OF THE PROPOSED MODEL A new model is presented to simulate HVS attention mechanism, and composed of three functional modules, namely, pre-attention phase module, attention phase module and post-attention phase module, as shown in Fig. 1. The pre-attention phase is a bottom-up data driven process. It is employed to extract the lower features to form the low-level saliency map. According to principles of winner takes all, adjacent proximity and inhibition of return [4], the attention phase module carries out the focus of attention shift on the low-level saliency map and proposes a self-adaptive region growing method to rationally select the attentions regions. The post-attention phase is a top-down data driven process, and its major function is to apply the HTM model [14]-[15] to evaluate the target probability of the selected attention regions. The probability is then multiplied with the corresponding attention region on the low-level saliency map, thus a © 2014 ACADEMY PUBLISHER 303 high-level saliency map which is more meaningful to locate objects of interest is generated. III. PRE-ATTENTION PHASE In this phase, we first extract several low-level visual features to give rise to feature maps, and then we compute saliency map for each feature map using the PCT-based attention model. Finally, saliency maps are integrated to generate the low-level saliency map. The block diagram of the pre-attention phase is shown in Fig. 2. A. Feature Extraction If a region in the image is salient, it should contain at least one distinctive feature different from its neighborhood. Therefore, visual features of the image should be extracted first. For this, we extract three traditional low-level visual features, i.e. color, intensity and orientation. 1) Color and intensity: HSI color space describes a color from the aspect of hue, saturation and intensity, more consistent with human visual features than RGB color space. Hence, we transfer the original image from RGB to HIS in order to obtain the color feature map H , S and the intensity feature map I : 2R G B 1 90 arctan H 3 (G B) 360 {0, G B;180, G B} min( R, G, B) S 1 I RG B I 3 (1) 2) Orientation: Artificial targets in optical satellite images generally possess obvious geometrical characteristics. Therefore, orientation feature is crucial to identify the artificial targets. Here we adopt Gabor filters ( k 0 o ,45o ,90 o ,135o ) to extract the orientation feature. The kernel function of a 2-D Gabor wavelet is defined as: k ( z ) 2 v k 2 v k e v k cos k , sin k 2 2 2 z 2 iv e k e 2 2 (2) where z ( x, y) denotes the pixel position, and the parameter determines the ration between the width of Gaussian window and the length of wave vector. We set 7 4 in the experiment. Four orientation feature maps can be obtained by convoluting the intensity feature map I with k : Ok ( z ) I ( z ) k ( z ) (3) 304 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 training Images Attention Phase Pre-attention Phase test Image low-level saliency map generation feature extraction training attention region selection foucs shift HTM probability estimation high-level saliency map generation Post-attention Phase Figure 1. The framework of the proposed model B. The Generation of the Low-Level Saliency Map Recently, many effective approaches for saliency detection have been proposed. Here we employed PCT-based attention model because of its good performance in saliency detection and fast speed in computation [7]. According to the PCT model, the feature saliency map S F of a given feature map F can be calculated as: P sign(C ( F )) A abs (C 1 ( P)) (4) where C () is the 2-D discrete cosine transform and C 1 () is its inverse transform. G is a 2-D low-pass filter. We apply linear weighted method to integrate the feature maps. Due to the lack of priori information, the weight of each feature map is set to 1 N ( N is the saliency map N 7 ) and the low-level S low can be obtained as: S low 1 S H S S S I S Ok N k 1, 2, 3, 4 IV. S ( x, y ) D ( x, y ) , py t 1 arg max low x, y B ( x, y ) D( x, y ) ( x px t ) 2 ( y py t ) 2 Attention phase provides a set of attention regions so that the significant area of interest can be processed with priority in the post-attention phase. This phase includes two parts, namely, the focus of attention shift and the attention region selection. A. Focus of Attention Shift According to principles of winner takes all, adjacent proximity and inhibition of return, an un-attended pixel, of the highest salience and closest to the last focus of attention on the low-level saliency map, is chosen as the next focus of attention, which is based on the following formula: 1 2 (6) 0 ( x, y )has been focused B ( x, y ) otherwise 1 t , py t)is the location of the current focus of , py t 1) is the location of the next focus of attention, D() serves as the adjacent attention, (px t 1 proximity, i.e. areas close to the current focus of attention will be noticed with priority, B() serves as the inhibition of return, i.e. the noticed areas will not participate in the focus shift. Input Image Color feature map H ,S Intensity feature map I Orientation feature map Ok PCT-based attention model (5) ATTENTION PHASE © 2014 ACADEMY PUBLISHER t 1 where(px S F G A2 number of feature maps, here px Color saliency map S H ,SS Intensity saliency map S I Orientation saliency map SOk Feature integration Low-level saliency mapSlow Figure 2. Block diagram of the pre-attention phase B. Attention Region Selection Different from the attention region selection with fixed size in Itti’s model [4], the attention region in this research is identified by a self-adaptive region growing JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 305 method: taking the focus of attention as seed point, the region growing is conducted by computing the saliency difference between the current growing area and its surrounding areas according to a given step-size sequence. Once the difference tends to be decreasing, the growth will be terminated. Finally, the minimum area-enclosing rectangle of the growing area is deemed to the attention region. Here we define Ri as the growing area obtained in each growth, ni as the number of pixels in Ri , Ai as the saliency difference between surrounding area. Given a Ri step-size and its sequence N i (i [0, T ]) , where T denotes the maximum times of growing, the algorithm for the self-adaptive region growing is as Algorithm 1. Algorithm 1 Self-adaption region growing Input: Ni (i [0, T ]) , R0 { f } , where f is the present focus of attention; n0 1 ; i 1 . q i is quantization center and N is the number of the existing centers. All the Euclidean distances d between these centers are calculated and their sum S is where Iteration: while not reach the maximum growing time do Initialize Ri and ni : ni ni1 ; Ri Ri1 . while do produce a new growing point p : p arg max S ( p j ) , where pj p j A , A is the adjacent pixel set of Ri , S ( p j ) is the saliency of pj . update Ri and : Ri {Ri , p} ; ni ni1 1 . end while Calulate : Ai1 S ( p j ) Ni p j Ri p j Ri 1 S ( p j ) Ni1 when Ai 1 tends to POST-ATTENTION PHASE In the post-attention phase, we optimize the original version of the HTM model [14] to estimate the target probability of attention regions. The probability is then fed back to the low-level saliency map, and finally the target-oriented high-level saliency map is generated. A. The Optimization of HTM HTM model is the newest layering network model that imitates the structure of the new human neocortex [14]. HTM model takes time and space factors which depict samples into account in order to tackle with ambiguous rule of inference, presenting strong generalization ability. Thus, it has been gradually highlighted in the field of pattern recognition [16]-[19]. Different from most HTM-based applications [15]-[18] which apply the pixel’s grayscale as the input layer of HTM, in this research, the low-level visual features extracted in the pre-attention phase are taken as the input © 2014 ACADEMY PUBLISHER considered as a distance metric of the quantization space: N N i j S d (qi , q j ) (7) q c appears in the node, we first add q c to Q , and the distance increment inc caused by q c can be calculated as follows: when a new input sample decrease, the growth is terminated: if then the growth is terminated. else i i 1 ; growth continues. end if end while Output: the minimum area-enclosing rectangle of . V. layer for the purpose of improving the precision of the model. Fig. 3 shows the structure of our HTM model, where the notes in the second layer conduct the learning and reasoning of the low-level visual features, meanwhile, the notes above the third layer conduct the learning and reasoning of the spatial position relationships. Notes in different layers use the same mechanism to conduct the learning and reasoning process, and they have the same node structure which is formed by a spatial module and a temporal module. 1) Spatial module: The main function of spatial module is to choose the quantization centers of the input samples, that is, to select a few representative samples in the sample space. These centers should be carefully selected to ensure that the spatial module will be able to learn a finite quantization space from an infinite sample space. It is assumed that the learned quantization space in the spatial module of a node is Q [q1 , q2 ,..., qn ] , N inc d (qi , qc ) (8) i The change rate of the distance increment inc S is then examined against a given threshold . If inc S , q c is retained in Q otherwise, q c is removed from Q . This algorithm ensures that input samples which contain substantial information will be considered as new quantization centers, whereas those which do not contain representative information will be discarded. The learning of the spatial module is stopped when the added quantization centers are sufficient to describe the sample space. In practice, the learning is completed when the rate of adding new centers falls bellow a predefined threshold. 2) Temporal module: The temporal module proposed in [14] is suitable in applications where the input samples have obvious time proximity such as video images. However, the input images for training the HTM model rarely share any amount of time correlation in our research. Therefore, instead of the time adjacency matrix proposed in [14], we exploit a correlation coefficient matrix C to describe the time correlation between different samples. We adopt Pearson’s coefficient as the 306 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 level 5 class label probability estimation level 4 level 3 input space level 2 features extraction level 1 H O1 I S O2 O3 O4 Figure 3. The proposed HTM network structure measure of correlation. The N N correlation matrix, which contains the Pearson’s correlation coefficients between all pairs of centers, is calculated as follows: C (qi , q j ) E[( qi qi )( q j q j )] qi q j where E is the expected value operator, q and (9) q denotes the mean and the standard deviation of the respective quantization center, respectively. The larger the absolute value of correlation is the stronger the association between the two centers. A temporal grouping procedure is then utilized to separate the quantization space Q into highly correlated coherent subgroups. The major advantage of replacing the time adjacency matrix with the correlation coefficient matrix is that it enables the grouping procedure to be irrelevant with the temporal sequence of sample images, so as to improve the precision of the model. In [14], a computationally efficient greedy algorithm is introduced to the temporal grouping procedure. The algorithm is briefly described as follows: Select the quantization center with the greatest connectivity. Find the M quantization centers with greatest connectivity to the selected quantization center, and create a new group for the M centers. Repeat step 1 and step 2 until all quantization centers have been assigned. The greedy algorithm requires the groups to be disjoint, i.e., no quantization center can be part of more than one group. However, in real applications, rarely groups can be clearly identified. Some quantization centers usually lie near the boundaries of two of more groups. As a result, The greedy algorithm can lead to ambiguity because the quantization centers are forced to be member of only one group. To overcome shortcomings of the greedy algorithm, here we propose a fuzzy grouping algorithm that allows quantization centers to be member of different groups according to the correlation. © 2014 ACADEMY PUBLISHER We define a nq n g matrix PQG ( nq and n g is the numbers of quantization centers and groups, respectively), in which element PQG[i, j ] p(qi | g j ) denotes the conditional probability of quantization centers qi given the group g j . PQG[i, j ] can be obtained as follows: p(qk ) (10) PQG[i, j ] q g C (qk , q j ) k j p ( ql ) q g l j where p() is the prior probability of quantization PQG[i, j ] shows the relative probability of occurrence of coincidence qi in the context of group centers. g j , by which we design the fuzzy grouping algorithm, as described bellow: We first use the greedy algorithm to generate a initial grouping solution; then the groups with less than a given threshold nt centers are removed because they often bring limited generalization; the quantization centers grouped by the greedy algorithm are expected to be the most representative for the group, however, other centers not belonging to the group could have high correlation to centers in the group, we allow a center qi to be added to a group g j if PQG[i, j ] is high. The fuzzy grouping algorithm is shown in Algorithm 2. B. The Generation of High-Level Saliency Map The low-level saliency map predicts interesting locations merely based on bottom-up mechanism. By means of introducing top-down mechanism to obtain more meaningful results, simultaneously inspired by [14], we multiply the probability (estimated by the HTM model) with the according attention region on the low-level saliency map to generate a high-level saliency map. By this way, the suspected target regions are emphasized in the high-level saliency map meanwhile the background interference regions are suppressed. Assuming R t is the present attention region, P t is the JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 estimated probability of R t , Let S 0 high S low , the current high-level saliency map S t high can be obtained as follows: S t 1high ( x, y ) P t S t high( x, y ) t 1 S high ( x, y ) if ( x, y) R t (11) otherwise where S t 1high is the corresponding high-level saliency map of the last attention region. Algorithm 2 The fuzzy grouping algorithm 1. Create initial groups using the greedy algorithm. 2. Remove groups with less than nt (a given threshold) quantization centers. 3. Compute the matrix PQG , each element calculated according to equation(10). 4. for each qi do PQG[i, j ] is 307 improvement in the input layer, spatial and temporal module results in higher accuracy than the original version. The efficiency of the HTM could be further increased with the utilization of a stronger classifier in the top layer [15]. Therefore, we applied Support Vector Machine (SVM) to estimate the probability in the top layer to get higher accuracy results. To further verify the effectiveness of the optimized HTM, a single SVM classifier with a dimensionality reduction process via Principal Component Analysis (PCA) was used as a reference. TABLE II shows the detection accuracy of the original HTM+SVM, the optimized HTM+SVM and SVM+PCA. Obviously, by using a stronger classifier in the top layer, both the original HTM and the optimized HTM achieve higher accuracy than SVM+PCA. TABLE I. for each g j do if PQG[i, j] (we set 0.8 in the experiment) then DETECTION ACCURACY OF THE ORIGINAL HTM AND THE OPTIMIZED HTM g j g j qi end if end for end for VI. EXPERIMENT AND DISCUSSION To verify the effectiveness of our model, the experiment for detecting harbor targets is performed on the real optical satellite images. There are 50 images used in the experiment, all from Google Earth. Each image contains 1 to 5 harbor targets. A total of 187 targets are involved in the experiment, and 30 are chosen as the training samples of HTM model. Related parameters in the experiment are set as follows: The step-size sequence is set according to the size range of targets as: N {1,10 10,15 15,20 20,25 25,30 30, 35 35,40 40,45 45,50 50} The threshold value of inc S is set to 0.08 according to experiences, the learning of the spatial module is completed when the rate of adding new centers falls below 0.2, i.e. for every 10 new input vectors, when less than 2 new centers are added, the learning procedure should be stopped. The focus of attention transition is stopped when the transition times reach 20. A. Accuracy Evaluation of the Optimized HTM The original version of HTM [14] was implemented for benchmarking against the optimized HTM. Both versions used a 5-level network structure with the input images of size 128 by 128 pixels. Firstly, the efficiency of the original HTM and the optimized HTM were examined. Then the input layer, spatial module and temporal module of the original HTM was replaced individually by the optimized version, and the resulting efficiency was examined. The results are shown in TABLE I. Obviously, the optimized HTM shows much better performances than the original HTM and both the © 2014 ACADEMY PUBLISHER Original HTM Original HTM with feature maps Original HTM with the proposed spatial module Original HTM with the proposed temporal module Optimized HTM TABLE II. Detection rate of test set (%) 72.51 Detection rate of train set (%) 81.63 77.42 85.17 75.12 83.42 79.74 87.94 81.34 89.28 DETECTION ACCURACY OF ORIGINAL HTM+SVM, THE OPTIMIZED HTM+SVM AND SVM+PCA Original HTM+SVM Original HTM +SVM SVM+PCA Detection rate of test set (%) Detection rate of train set (%) 76.73 84.67 85.81 92.48 71.57 82.79 B. Saliency Detection Performance Three methods are compared for accuracy evaluation, including the low-level saliency map with the bottom-up mechanism only, VOCUS, and the proposed model. Fig. 4 shows an experiment result and it can be seen that: 1) the location of most harbors is significant on the low-level saliency map. However, the most significant regions are not harbors but other ground objects. 2) focus of attention is shifted according to the order of the declining intensity of significance. Moreover, the selection of attention regions shows self-adaption (see Fig. 5 for an example), which is more consistent with the HVS mechanism compared with the option of fixed size. 3) In post-attention phase, the suspected target attention regions on the low-level saliency map are enhanced while the non-target regions are inhibited. 4) our model performs better than VOCUS for it is more efficient to hit target regions. Fig. 6 shows the performance curve of the three methods. The proposed model presents higher detection precision than the other two methods, and can hit more than 75% targets under 25% saliency ratio. 308 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 H S I O1 O2 O3 O4 (a) ground truth image (c) low-level saliency map with the first 5 focus shifts. Target is hit in the 2th time. (b) feature maps (d) VOCUS saliency map with the first 5 focus shifts. Targets are hit in the 2th, 4th and 5th time. (e) high-level saliency map with the first 5 focus shifts. All Targets are hit in the first 4 shifts. The probability of the 5 attention regions, in sequences, is 0.77, 0.86, 0.73, 0.69, 0.21. Figure 4. Experiment results of low-level saliency map, VOCUS and high-level saliency map. the top-down procedure of VOCUS only takes the weight of lower feature into consideration while that of our approach applies HTM model comprehensively took account of the lower features and spatial location relationship, possessing more effective target orientation. Figure 5. The self-adaption region growing of the first focus in Fig. 4(c). The growth is terminated in the downward inflection point (marked as a red triangle in the figure). In order to further assess the precision of our model, we introduce three definitions: 1) hit number: the rank of the focus that hits the target in order of saliency; 2) average hit number: the arithmetic mean of the hit numbers of all targets 3) detection rate: the ratio between the hit target number in the precious 10 focus shifts and the total target number. The accuracy analysis of the three approaches is expressed in TABLE III and Fig. 7. It can be seen from the experiment results that due to the introduction of top-down mechanism, VOCUS and our method are better than the low-level saliency map with bottom-up mechanism only. At the same time, our approach is excellent to VOCUS. This is mainly because © 2014 ACADEMY PUBLISHER Figure 6. The performance curve of low-level saliency map, VOCUS and high-level saliency map. Saliency ratio is the ratio between the size of saliency area and of the total image. TABLE III. AVERAGE HIT NUMBER AND DETECTION RATE OF THE THREE METHODS. Average hit number Detection rate (%) Low-level saliency map 11.67 18.82 VOCUS 8.46 37.1 The proposed model 3.75 73.12 309 the number of targets hit in a single focus shift the number of targets hit in a single focus shift the number of targets hit in a single focus shift JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 the time of focus shift (a) low-level saliency map the time of focus shift (b) VOCUS the time of focus shift (c) the proposed model Figure 7. The number of targets hit in focus shifts. The total hit target number in the precious 10 focus shifts of the three methods is 35, 69, 136, respectively. It is obviously that our model can hit more targets in the first few focus shifts. VII. CONCLUSION In this paper we propose a novel target-oriented visual saliency detection model. Inspired by the structure of the human vision system, we build the model with three functional modules, i.e., pre-attention phase module, attention phase module and post-attention phase module. In the pre-attention phase module, a low-level bottom-up saliency map is generated to locate attention regions with low-level visual stimuli. In the attention phase module, we propose an effective method for focus shift and attention region selection to focus on the suspected target regions rapidly and accurately. In the post-attention phase, the original HTM is optimized in several respects including the input layer, the spatial module and the temporal module, leading to a robust probability estimation. Experimental results demonstrate that our model presents higher detection precision, compared with models of both low-level bottom-up saliency map and VOCUS model. It is proved that the proposed model provides a feasible way to integrate top-down and bottom-up mechanism in visual saliency detection. ACKNOWLEDGMENT This work was supported by the National Science Foundation of China No. 61203239, No. 61005067 and No. 61101222. REFERENCES [1] M. Li, L. Xu, and M. Tang, “An extraction method for water body of remote sensing image based on oscillatory network,” Journal of multimedia, vol. 6, no. 3, pp. 252–260, 2011. [2] Q. Zhang, G. Gu, and H. Xiao, “Image segmentation based on visual attention mechanism,” Journal of multimedia, vol. 4, no. 6, pp. 363–369, 2009. [3] B. Yang, Z. Zhang, and X. Wang, “Visual important-driven interactive rendering of 3d geometry model over lossy wlan,” Journal of networks, vol. 6, no. 11, pp. 1594–1601, 2011. [4] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 20, pp. 1254–1259, 1998. [5] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Advances in Neural Information Processing Systems, 2007, pp. 542–552. © 2014 ACADEMY PUBLISHER [6] X. Hou and L. Zhang, “Saliency detection: a spectral residual approach,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8. [7] Y. Yu, B.Wang, and L.Zhang, “Bottom-up attention: Pulsed pca transform and pulsed cosine transform,” Cognitive Neurodynamics, vol. 5, no. 4, pp. 321-332, 2011. [8] S. Frintrop, “Vocus: A visual attention system for object detection and goal-directed search,” Lecture Notes in Artificial Intelligence, Berlin Heidelberg, 2006. [9] Q. Zhao and C. Koch, “Learning a saliency map using fixated locations in natural scenes,” Journal of Vision, vol. 11, no. 3, pp. 1–15, 2011. [10] ——, “Learning visual saliency,” in Information Sciences and Systems Conference, 2011, pp. 1–6. [11] R. Peters and L. Itti, “Beyond bottom-up: Incorporating task dependent influences into a computational model of spatial attention,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007, pp.1–8. [12] B. Scholkopf, J. Platt, and T. Hofmann, “A nonparametric approach to bottom-up visual saliency,” in Advances in Neural Information Processing Systems, 2007, pp. 689–696. [13] T. Judd, K. Ehinger, F. Durand, and A. Torralba, “Learning to predict where humans look,” in International Conference on Computer Vision, 2009, pp. 2106–2113. [14] J. Hawkins and D. George, “Hierarchical temporal memory: Concepts, theory and terminology,” Whitepaper, Numenta Inc, 2006. [15] I. Kostavelis and A. Gasteratos, “On the optimization of hierarchical temporal memory,” Pattern Recognition Letters, vol. 33, no. 5, pp. 670–676, 2012. [16] A. Csap, P. Baranyi, and D. Tikk, “Object categorization using vfa-generated nodemaps and hierarchical temporal memories,” in IEEE International Conference on Computational Cybernetics, 2007, pp. 257-262. [17] W. Melis and M. Kameyama, “A study of the different uses of colour channels for traffic sign recognition on hierarchical temporal memory,” in Conference on Innovative Computing, Information and Control, 2009, pp. 111–114. [18] T. Kapuscinski, “Using hierarchical temporal memory for vision-based hand shape recognition under large variations in hands rotation,” in Artificial Intelligence and Soft Computing, 2010, pp. 272–279. [19] D. Rozado, F. B. Rodriguez, and P. Varona, “Extending the bioinspired hierarchical temporal memory paradigm for language recognition,” Neurocomputing, vol. 79, pp. 75– 86, 2012. 310 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 A Unified and Flexible Framework of Imperfect Debugging Dependent SRGMs with TestingEffort Ce Zhang* School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China School of Computer Science and Technology, Harbin Institute of Technology at Weihai, Weihai, China *Correspondence author, Email: [email protected] Gang Cui and Hongwei Liu School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China Email: [email protected], [email protected] Fanchao Meng and Shixiong Wu School of Computer Science and Technology, Harbin Institute of Technology at Weihai, Weihai, China Email: [email protected], [email protected] Abstract—In order to overcome the limitations of debugging process, insufficient consideration of imperfect debugging and testing-effort (TE) in software reliability modeling and analysis, a software reliability growth model (SRGM) explicitly incorporating imperfect debugging and TE is developed. From the point of view of incomplete debugging and introduction of new fault, software testing process is described and a relatively unified SRGM framework is presented considering TE. The proposed framework models are fairly general models that cover a variety of the previous works on SRGM with ID and TE. Furthermore, a special SRGM incorporating an improved Logistic testing-effort function (TEF) into imperfect debugging modeling is proposed. The effectiveness and reasonableness of the proposed model are verified by published failure data set. The proposed model closer to real software testing has better descriptive and predictive power than other models. Index Terms—Software Reliability; Software Reliability Growth Model (SRGM); Imperfect Debugging; TestingEffort I. INTRODUCTION Software reliability is important attribute and can be measured and predicted by software reliability growth models (SRGMs) which have already been extensively studied and applied [1-2]. SRGM usually views software testing as the unification of several stochastic processes. Once a failure occurs, testing-effort (TE) can be expended to carry out fault detection, isolation and correction. In general, with the removal of faults in software, software reliability continues to grow. SRGM has become a main approach to measure, predict and ensure software reliability during testing and operational stage. © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.310-317 As software reliability is closely related to TE, incorporating TE into software reliability model becomes normal and imperative, especially in imperfect debugging environment. As an important representative in sketching the testing resource expenditure in software testing, TE can be represented as the number of testing cases, CPU hours and man power, etc. In software testing, when a failure occurs, TE is used to support fault detection and correction. A considerable amount of research on TE applied in software reliability modeling has been done during the last decade [3-8]. TE which has different function expressions, can be used to describe the testing resource expenditure [4]. The available TEFs describing TE include constant, Weibull (further divided into Exponential, Rayleigh and Weibull, and so on) [4], loglogistic [5], Cobb-Douglas function (CDF) [7], etc. Besides, against the deficiency of TEF in existence, Huang presented Logistic TEF [3] and general Logistic TEF [6] to describe testing-effort expenditure. Finally, TE can also help software engineer to conduct optimal allocation of testing resources in component-based software [9]. In fact, software testing is very complicated stochastic process. Compared with perfect debugging, imperfect debugging can describe testing process in more detail. So, in recent years, imperfect debugging draws more and more attention [10-16]. Imperfect debugging is an abstraction and approximation of real testing process, considering incomplete debugging [12] and introduction of new faults [10, 11]. It can also be studied by the number of total faults in software [3, 4]. Reference [4] combined Exponentiated Weibull TEF with Inflection Sshaped SRGM to present a SRGM incorporating imperfect debugging described by setting fault detecting JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 rate b(t ) b r 1 r m(t ) . Obviously, when r=1, a the proposed model has evolved to exponential SRGM. Likewise, Ahmad [13] also proposed an inflection Sshaped SRGM considering imperfect debugging and had Log-logistic TEF employed in his SRGM. Besides, there is also research that suggests incorporating imperfect debugging and TE into SRGM to describe software testing process from the view of the variation of a(t). For example, reference [14] presented a(t ) aeW (t ) , and [3] employed a(t ) a m(t ) . Considering the fact that socalled “peak phenomenon” occurs when m>3 in EW TEF did not conform to real software testing [15], Huang introduced imperfect debugging environment into analysis by combining Logistic TEF with exponential and S-shaped SRGM to establish reliability model, finally obtaining a better effect. Kapur [16] proposed a unified SRGM framework considering TE and imperfect debugging, in which real testing process was divided into failure detection and fault correction, and convolution of probability distribution function was employed to represent the delay between fault detection and correction process. The imperfect debugging above is described by complete debugging probability p and by introducing new faults: a(Wt ) a m(Wt ) . Compared to the others, the proposed imperfect debugging in [16] is relatively thorough. Actually these research efforts conducted from different views and contents, lack thorough and accurate description. On the above basis, in the statistical literatures, some studies have involved imperfect debugging and TE. However, little research has been conducted to fully incorporate ID and TE into SRGM, failing to describe the real software testing. Thus, we come to know how important and imperative it is to incorporate ID and TE into software reliability modelling. Obviously, in testing, the more real factors SRGM considered, the more accurate the software testing process would be. In this paper, a SRGM framework incorporating imperfect debugging and TE is presented and can be used to more accurately describe software testing process on the basis of the existing research. Unlike the earlier techniques, the proposed SRGM covers two types of imperfect debugging including incomplete debugging and introduction of new faults. It unifies contemporary approaches to describe the fault detection and correction process. Moreover, an improved Logistic TEF with Not Zero Initialization is presented and verified to illustrate testing resource consumption. Finally, a special SRGM: SRGM-GTEFID is established. The effectiveness of SRGM-GTEFID is demonstrated through a real failure data set. The results confirm that the proposed framework of imperfect debugging dependent SRGMs with TE is flexible, and enables efficient reliability analysis, achieving a desired level of software reliability. The paper is structured as follows: Sec.2 presents a unified and flexible SRGM framework considering imperfect debugging and TE. Next, an improved Logistic * © 2014 ACADEMY PUBLISHER 311 TEF is illustrated to build a special SRGM in Sec.3. Sec.4 shows experimental studies for verifying the proposed model. Sec.5 contains some conclusions plus some ideas for future work. II. THE UNIFIED SRGM FRAMEWORK CONSIDERING IMPERFECT DEBUGGING AND TE A. Basic Assumptions In subsequent analysis, the proposed model and study is formulated based on the following assumptions [3, 4, 17-21]. (1) The fault removal process follows a nonhomogeneous poisson process (NHPP); (2) Let {N(t), t≥0} denote a counting process representing the cumulative number of software failure detected by time t, and N(t) is a NHPP with mean value function m(t) and failure intensity function (t ) respectively; Pr N (t ) k m(t ) k e m (t ) k! , k 0,1, 2.... . t m(t ) ( )d . (1) (2) 0 (3) The cumulative number of faults detected is proportional to the number of faults not yet discovered in the time interval (t, t+ t ) by the current TE expenditures, and the proportion function is b(t) hereinafter referred to as FDR; (4) The fault removal is not complete, that is fault correction rate function is p(t); (5) New faults can be introduced during debugging, fault introduction probability is proportional to the number of faults corrected, and the probability function is r(t) (r(t)<<p(t)). B. General Imperfect Debugging Dependent Framework Model Considering TE Based on the above assumptions, the following differential equations can be derived as: dm(t ) 1 dt w(t ) b(t ) a (t ) c(t ) dm(t ) dc(t ) . p(t ) dt dt dc(t ) da(t ) dt r (t ) dt (3) where a(t) denotes the total number of faults in software, c(t) the cumulative number of faults corrected in [0, t] and w(t) TE consumption rate at t, that is t W (t ) w( x)dx . Solving the differential equations 0 above with the boundary condition of m(0)=0, a(0)=a, c(0)=0 yields t c(t ) a w(u )b(u ) p(u )e 0 u 0 w( )b ( ) p ( )1 r ( )d du (4) 312 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 u t w ( ) b ( ) p ( ) 1 r ( ) d a(t ) a 1 w(u )b(u ) p(u )r (u )e 0 du (5) 0 t m(t ) a w(v)b(v) 0 (6) u v 0 w ( ) b ( ) p ( ) 1 r ( ) d 1 w ( u ) b ( u ) p ( u ) 1 r ( u ) e du dv 0 Then the current failure intensity function (t ) can be derived as: dm(t ) (t ) aw(t )b(t ) dt u (7) t 0 w ( ) b ( ) p ( ) 1 r ( ) d du 1 0 w(u )b(u ) p(u ) 1 r (u) e Obviously, by setting the different values for b(t), p(t), r(t) and w(t), we can obtain the several available models. (1) If p(t)=1, r(t)=0 and regardless of TE, then the proposed model has evolved into classical G-O model [17]; (2) If p(t)=1, r(t)=0 and TEF is Yamada Weibull, Burr type X, Logistic, generalized Logistic or Log-Logistic respectively, then the proposed model has evolved into the models in references [5,22]; (3) If p(t)=1, r(t)=0, b(t)=b [r+(1–r)m(t)/a] and TEF is Weibull, then the proposed model has evolved into the model in [4]; (4) In framework model, if p(t)=1, r(t)=1, b(t)=b2t/(1+bt) and TEF is framework function, the proposed model has evolved into the model in [3]; (5) If p(t)=1, r(t)=0, a(t) is increasing function versus time t, and TEF is framework function, the proposed model has evolved into the model in [14]; (6) If p(t)=1, r(t)=0 and TEF and b(t) are framework functions, the proposed model has evolved into the framework model in [15]. Thus, it can be seen that the proposed framework model is a generalization over the previous works on imperfect debugging and TEF and is more flexible imperfect debugging framework model incorporating TE. In a practical application, w(t), b(t), p(t) and r(t) can be set to the proper functional forms as needed to accurately describe real debugging environment. The proposed model in this study incorporating imperfect debugging by the current TE expenditures is more flexible and referred to as SRGM-considering Generalized Testing-Effort and Imperfect Debugging Model (SRGM-GTEFID). III. THE IMPERFECT DEBUGGING DEPENDENT SRGM WITH IMPROVED LOGISTIC TEF Generally speaking, the most important factors affecting reliability are the number of total faults: a(t), fault detection rate (FDR): b(t) [21], and TE expenditure rate: w(t). Hereon, we have obtained the expression of a(t), and w(t) and b(t) will be discussed below. Hereon, we present an improved Logistic TEF based on Logistic TEF [6, 15, 23, 24]. © 2014 ACADEMY PUBLISHER 1 e t l W (t ) W t 1 e k (8) where W represents total TE expectation, k and l denote the adjustment coefficient value, and is the consumption rate of TE expenditure. At some point, TE expenditure rate w(t) is: (k l )e t dW (t ) w(t ) W t 2 dt (1 ke ) (9) 1 l Obviously, W (0) W 0 indicates that a 1 k certain amount of TE needs to be expended before the test begins. As w(t)>0, W(t) is an increasing function with testing time t, and corresponds to the growing variation ln k trend of TE expenditure. When tmax , w(t) achieves (k l ) maximum: wmax (t ) W . Obviously, w(t) first 4k rises then falls. In a considerable amount of research, many research studies suggest that b(t) is constant value [17], increasing function or decreasing function versus time t. For example, b(t)=btk [20], b(t)=b(0)+km(t)/a [15], b b(t ) and b(t)=b(0) [1-m(t)/a] [15]. Actually, 1 ebt these b(t) functions can only describe the varying of FDR at some stage of software testing. Hereon, we present a relatively flexible b(t) to comprehensively illustrate FDR. e t b(t ) t 1 e b (10) In our previous study, (10) has been verified to describe the various changing trends of FDR. For simplicity and tractability, let p(t)=p, and r(t)=r is constant fault introduction rate due to r(t)<<p(t). If p≠0 and r≠0 obtained in experiment, the fault removal process is imperfect, namely, there exist incomplete debugging and introducing new faults phenomena. Below we elaborate the SRGM obtained when W(t) and b(t) are set to expressions in (8) and (10) respectively. For convenience of exposition here, Let g(t)= w(t)b(t). p (1 r ) v f (v) g (u )e u 0 g ( x ) dx 0 du (11) By integral transform, (11) can be converted to the following form: f (v ) v 1 p (1 r ) 0 g ( x ) dx 1 e p(1 r ) (12) Substitute (12) into (6), we can get: t m(t ) a g (v)e 0 p (1 r ) v 0 g ( x ) dx dv (13) JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 313 By the similar integral transform above, we can obtain: t a p (1 r ) 0 w( x )b ( x ) dx m(t ) e 1 p(1 r ) RE t k 0 0 (k l )e t be t 0 W (1 ke t )2 1 e t d (15) n1 ( n2 1) t n n ( ) 1 ( k ) 2 ( n 1) 1 e 2 W b( k l ) n1 (n2 1) n1 0 n2 0 Substitute G(t) in (15) into (14), finally, m(t) is derived as: a p 1 r ( )n1 ( k )n2 ( n 1) 1 e n1 ( n2 1) t 2 (16) p bW ( k l )(1 r ) n1 ( n2 1) n1 0 n2 0 1 e Accordingly, a(t) and c(t) can also be solved as follows: a 1 r n ( n2 1) t n n ( ) 1 ( k ) 2 ( n2 1) 1 e 1 (17) p (1 r ) bW ( k l ) n ( n 1) 1 2 n1 0 n2 0 1 e 2r 1 a(t ) a 1 r n ( n2 1) t n n ( ) 1 ( k ) 2 ( n2 1) 1 e 1 (18) (1 r ) p bW ( k l ) n ( n 1) 1 2 n1 0 n2 0 1 e IV. EXPERIMENTAL STUDIES AND PERFORMANCE COMPARISONS A. Criteria for Model Comparisons Here, to assess the models, MSE, Variance, RMS-PE, BMMRE and R-square are used to measure the curve fitting effects and RE to measure the predictive abilities. k MSE yi m(ti ) 2 k m(t ) y i i 1 k y i 1 © 2014 ACADEMY PUBLISHER (19) k i 1 R square i 1 m(ti ) Bias 2 (22) k 1 Bias 1 W b(k l ) ( ) d ( ) 2 0 (e e )(1 ke ) c(t ) i k t m(t ) y Variance G (t ) g ( )d w( )b( ) d t (21) q (14) where t m(tq ) q i y 2 2 ,y 1 k yi k i 1 (20) m(t ) y i i 1 i (23) k RMS -PE Bias 2 Variance2 (24) m(ti ) yi 1 k k i 1 min m(ti ), yi (25) BMMRE where yi represents the cumulative number of faults detected, m(ti) denotes the estimated value of faults by time ti, and k is the sample size the real failure data set. Obviously, the smaller the values of MSE, Variance, RMS-PE, BMMRE, the closer to 1 of R-square, the quickly closer to 0 of RE, which indicates better model than the others. TABLE I. Model SSRGMEWTEFID [4](S-shaped SRGMconsidering the Exponentiated Weibull TEF and Imperfect Debugging) DSSRGMLTEFID [3](Delay Sshaped SRGMconsidering Logistic TEF and Imperfect Debugging) SRGMGTEFID(the proposed model) THE SELECTED MODELS FOR COMPARISON m(t) m(t ) a 1 ebW (t ) 1 (1 ) / ebW (t ) W (t ) W (1 et ) a 1r 1 1 bW (t ) eb (1r )W (t ) 1 r W W W (t ) t 1 A 1 Ae m(t ) p bW ( k l )(1r ) F n 0 n 0 a 1 2 m(t ) 1 e p (1 r ) Whre ( ) n1 (k ) n2 (n2 1) 1 e n1 ( n2 1) t F n1 (n2 1) B. Failure Data Set and the Selected Models for Comparison Hereon, in order to demonstrate the effectiveness and validity of proposed model, we designate a failure data set as an example which has been used and studied extensively to illustrate the performance of SRGM [25]. In the meanwhile, three pre-eminent models considering imperfect debugging and TE are also selected to be compared with TEID-SRGM. 314 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 C. Experimental Results and Comparative Studies First, in order to verify the effectiveness of improved Logistic TEF, we compared the proposed W(t) to that of models in Table 1, Generalized Logistic TEF [6], Rayleigh TEF [4], and Weibull TEF [4]. The goodness of TE has been drawn to illustrate the fitting of TEFs in Fig.1. From Fig.1, we can see that the models fit the real TE well except Generalized Logistic TEF and Yamada Rayleigh TEF. e. Generalized Exponential TEF a. Logistic TEF f. Improved Logistic TEF Figure 1. Observed/estimated cumulative testing-effort of failure data set vs time b. Generalized Logistic TEF Furthermore, here we give the criteria values of W(t) as shown in table 2. As indicated in Table 2, the values of MSE, Variance, RMS-PE and BMMRE for W(t) of SRGM-GTEFID are smallest, and the value of R-square is closest to 1. Obviously, the results provide better goodness of fit for failure data and proposed improved Logistic TEF is suitable for modeling the testing resources expenditure than the others. TABLE II. c. Yamada Rayleigh TEF COMPARISON RESULTS FOR DIFFERENT TEFS TEF Model MSE Logistic TEF Generalized Logistic TEF Yamada Rayleigh TEF Yamada Weibull TEF Generalized Exponential TEF Improved Logistic TEF 1.6271 9973 Rsquare 0.9680 3004 1.3221 8031 RMSPE 1.3103 6772 1.3361 2585 0.9784 7165 1.1915 0482 1.1875 1480 0.0857 7016 5.1476 9334 1.1757 0817 2.7599 0107 2.3227 9389 0.6374 1841 0.9022 4491 1.0126 3088 0.9845 0631 0.9757 4250 0.0842 3921 0.8502 8680 1.0071 5706 0.9512 0432 0.9473 1067 0.0720 7974 .805117071 0.9945 2474 0.9479 3279 0.9478 6913 0.0496 6124 Variance BMMRE 0.1066 9336 By calculating, n1=5 and n2=2 in (15) can satisfy the requirements. The parameters of the models are estimated based upon the failure data set and estimation results are shown in Table 3. d. Yamada Weibull TEF © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 TABLE III. M(T) PARAMETER ESTIMATION RESULTS OF THE MODELS Model SSRGMEWTEFI D DSSRGM -LTEFID SRGMGTEFID Estimation of model parameters aˆ 392.41819765 , bˆ 0.05845694 , ˆ 0.39793805 , Wˆ 67.3168 , ˆ 0.00000017 , ˆ 4.8380 , ˆ 0.231527 aˆ 181.415525 , bˆ 0.1393933 , rˆ 0.5076305 , Wˆ 120.4042 , Aˆ 3.1658 , ˆ 0.090 aˆ 265.81098261 , bˆ 0.00002672 , pˆ 0.8304480 , rˆ 0.03087796 , ˆ -0.00000895 , ˆ 0.5364128 , ˆ -0.57446398 , Wˆ 67.2513 , ˆ 0.1425 , uˆ 5.0814 , vˆ 0.8969 As can be seen from Table 3, the estimated value p and r of SRGM-GTEFID are not equal to zero (p=0.8304480, r=0.03087796, and r<<p). Therefore we can conclude that the fault removal process is imperfect. Next, the fitting curve of the estimated cumulative number m(t) of failures is graphically illustrated in Fig. 2. a. SSRGM-EWTEFID 315 As seen from Fig. 2, it can be found that the proposed model (SRGM-GTEFID) is very close to the real failure data and fits the data excellently well. Furthermore, we calculate comparison criteria results of all the models as presented in Table 4. It is clear from the Table 4 that the values of MSE, Variance, RMS-PE and BMMRE in SRGM-GTEFID are the lowest in comparison with models, and SRGM-GTEFID is followed by SSRGMEWTEFID, and DSSRGM-LTEFID. On the other hand, in the R-square comparison, SRGM-GTEFID and SSRGM-EWTEFID are the best, slightly differing in the fourth decimal points of R-square value and closely approximate to 1. Thus, R-square value of SRGMGTEFID is excellent. Moreover, the values of MSE, Variance and BMMRE for SSRGM-EWTEFID are not very close to the proposed model. Therefore, SRGMGTEFID provides better goodness of fit for failure data set than the other three models, and can almost be considered the best. The result can be explained in the following. DSSRGM-LTEFID not only ignores incomplete debugging but also sets FDR to b(t)=b2t/(1+bt) form which are hard to describe different situations. Likewise, SSRGM-EWTEFID also thinks debugging is complete and sets b(t ) b[ (1 ) m(t ) a] form which cannot show accurately the variation trend of FDR. In describing TE function W(t), SSRGM-EWTEFID employs complicated Exponentiated Weibull distribution TEF, but DSSRGM-LTEFID employ Logistic TEF. These TEFs diverge from the real testing resources expenditures. Due to all these insufficiencies, descriptive powers of these two models are inferior to that of the proposed one. TABLE IV. Model SSRGMEWTEFID DSSRGMLTEFID SRGMGTEFID b. DSSRGM-LTEFID c. SRGM-GTEFID Figure 2. Observed/estimated cumulative number of failures vs time © 2014 ACADEMY PUBLISHER COMPARISON CRITERIA RESULTS OF THE MODELS MSE 85.963 38226 477.39 889056 70.018 93565 R-square 1.0177 8405 1.2337 8026 1.0181 1856 Variance 9.6015 4938 25.569 67952 8.6727 8321 RMS-PE 9.5243 7274 26.479 07462 8.5956 92178 BMMRE 0.0642 1603 0.5938 2104 0.0640 4033 In predictive capability, the relative error (RE) in prediction is calculated and the results are shown graphically in Fig. 3 respectively. It is noted that the RE of the models approximate fast to zero. Furthermore, we can see that SRGM-GTEFID is not close to zero most quickly in all the models in the beginning. And for this, we compute the REs in prediction for the models in Table 1 at the end of testing and the results are shown in Table 5. As indicated in Table 5, the minimums of RE in the final four testing time (0.0625893076791, 0.02181151519274, 0.00866202271253 and 0.00502853969464, respectively) indicate the better prediction ability than the others. Thus, predictive capability of SRGM-GTEFID presents a gradually rising tendency. The reason for this is that, due to involving more parameters, predictive performance of SRGMGTEFID is modest when the failure data set is small, and predictive performance is increasing and superior to the other models when larger failure data set is employed. 316 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Figure 3. RE curve of the models TABLE V. COMPARISON OF PREDICTIVE POWER (RE) OF THE MODELS AT THE END OF TEST Model SSRGMEWTEFID DSSRGMLTEFID SRGMGTEFID th 17th week 0.071069 10995250 -0.061996 76798341 0.021811 51519274 16 week 0.108543 8509796 -0.08418 3653131 0.062589 3076791 18th week 0.044780 78433134 -0.049032 80103292 0.008662 02271253 19th week 0.027102 83884195 -0.036132 5818587 0.005028 53969464 Altogether, from Fig. 1-3 and Table 2, 4 and 5, we conclude that the proposed model (SRGM-GTEFID) fits well the observed failure data than the others and gives a reasonable prediction capability in estimating the number of software failures. Moreover, from Table 2, it can be concluded that incorporating improved Logistic TEF into SRGM-GTEFID yields a better fitting and can be used to describe the real testing-effort expenditure. V. CONCLUSIONS A relatively unified and flexible SRGM framework considering TE and ID is presented in this paper. By incorporating the improved Logistic TEF into software reliability models, the modified SRGM become more powerful and more informative in the software reliability engineering process. By the experimentation, we can conclude that the proposed model is more flexible and fits the observed failure data better and predicts the future behavior better. Obviously, developing SRGM tailored to diverse testing environment is main research direction in view of imperfections in the real testing. Thus, change point (CP) problem, the delay between fault detection process (FDP) and fault correction process (FCP), and dependence of faults should be incorporated to enlarge the researching range of imperfect debugging. Further research on these topics would be worthwhile. ACKNOWLEDGMENT This research was supported in part by the National Key R&D Program of China (No.2013BA17F02), the National Nature Science Foundation of China (No.60503015), and the Shandong province Science and Technology Program of China (No.2011GGX10108, 2010GGX10104). REFERENCES [1] E. A. Elsayed, “Ovewview of reliability testing,” IEEE Trans on Reliability, vol. 61(2), pp. 282-291, 2012. © 2014 ACADEMY PUBLISHER [2] Y. J. Long, J. Q, Ouyang, “Research on Multicast Reliability in Distributed Virtual Environment,” Journal of networks, vol. 8(5), 2013. [3] C. Y. Huang, S. Y. Kuo, & M. R. Lyu, “An assessment of testing-effort dependent software reliability growth models,” IEEE Trans on Reliability, vol. 56, pp. 198-211, 2007. [4] N. Ahmad, M. G. Khan, & L. S. Rafi, “A study of testingeffort dependent inflection S-shaped software reliability growth models with imperfect debugging,” International Journal of Quality & Reliability Management, vol. 27, pp. 89-110, 2010. [5] M. U. Bokhari, N. Ahmad, “Analysis of a software reliability growth models: the case of log-logistic testeffort function,” the 17th IASTED international conference on Modelling and simulation. Montreal, Canada, pp. 540545, 2006. [6] C. Y. Huang, & M. R. Lyu, “Optimal release time for software systems considering cost, testing-effort, and test efficiency,” IEEE Trans on Reliability, vol. 54, pp. 583591, 2005. [7] S. N. Umar, “Software testing effort estimation with CobbDouglas function: a practical application,” International Journal of Research Engineering and Technology (IJRET), vol. 2(5), pp. 750-754, 2013. [8] H. F. Li, S. Q. Wang, C. Liu, J. Zheng, Z. Li, “Software reliability model considering both testing effort and testing coverage,” Ruanjian Xuebao/Journal of Software, 2013,vol. 24(4), pp. 749-760, 2013. [9] Fiondella L, Gokhale SS. Optimal allocation of testing effort considering software architecture. IEEE Trans on Reliability, vol, 61(2), pp. 580-589, 2012. [10] P. K. Kapur, H. Pham, S. Anand, & K. Yadav, “A unified approach for developing software reliability growth models in the presence of imperfect debugging and error generation,” IEEE Trans on Reliability, vol. 60(1), pp. 331-340, 2011. [11] O. Singh, R. Kapur, & J. Singh, “Considering the effect of learning with two types of imperfect debugging in software reliability growth modeling,” Communications in Dependability and Quality Management., vol. 13, pp. 2939, 2010. [12] P. K. Kapur, O. Shatnawi, A. G. Aggarwal, & R. Kumar, “Unified framework for development testing effort dependent software reliability growth models,” WSEAS TRANSACTIONS on SYSTEMS, vol. 8, pp. 521-531, 2009. [13] N. Ahmad, M. G. Khan, & L. S. Rafi, “Analysis of an inflection S-shaped software reliability model considering log-logistic testing-effort and imperfect debugging,” International Journal of Computer Science and Network Security, vol. 11, pp. 161-171, 2011. [14] R. Peng, Q. P. Hu, S. H. Ng, & M. Xie, “Testing effort dependent software FDP and FCP models with consideration of imperfect debugging,” 4th International Conference on Secure Software Integration and Reliability Improvement, IEEE, pp. 141-146, 2010. [15] S. Y. Kuo, C. Y. Huang, & M. R. Lyu, “Framework for modeling software reliability, using various testing-efforts and fault-detection rates,” IEEE Trans on Reliability, vol. 50, pp. 310-320, 2001. [16] P. K. Kapur, O. Shatnawi, A. G. Aggarwal, & R. Kumar, “Unified framework for developing testing effort dependent software reliability growth models,” Wseas Transactions on Systems, vol. 4, pp. 521-531, 2009. [17] A. L. Goel, K. Okumoto, “Time-dependent error-detection rate model for software reliability and other performance JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 [18] [19] [20] [21] [22] [23] [24] [25] measures,” IEEE Trans on Reliability, vol. R-28, pp. 206211, 1979. M. Xie, B. Yang, “A study of the effect of imperfect debugging on software development cost,” IEEE Trans on Software Engineering, vol. 29, pp. 471-473, 2003. C. T. Lin, C. Y. Huang, “Enhancing and measuring the predictive capabilities of testing-effort dependent software reliability models,” The Journal of Systems and Software, vol. 81, pp. 1025-1038, 2008. P. K. Kapur, V. B. Singh, S. Anand, & V. S. S. Yadavalli, “Software reliability growth model with change-point and effort control using a power function of the testing time,” International Journal of Product Research, vol. 46, pp. 771-787, 2008. C. Y. Huang, “Performance analysis of software reliability growth models with testing-effort and change-point,” The Journal of Systems and Software, vol. 76, pp. 181-194, 2005. N. Ahmad, M. U. Bokhari, S. M. K. Quadri, & M. G. Khan, “The exponentiated Weibull software reliability growth model with various testing-efforts and optimal release policy,” International Journal of Quality & Reliability Management, vol. 25, pp. 211-235, 2008. H. F. Li, Q. Y. Li, M. Y. Lu, “A software reliability growth model considering an S-shaped testing effort function under imperfect debugging,” Journal of Harbin Engineering University, vol. 32, pp. 1460-1467, 2011. Q. Y. Li, H. F. Li, M. Y. Lu, X. C. Wang, “Software reliability growth model with S-shaped testing effort function,” Journal of Beijing University of Aeronautics and Astronautics, vol. 37(2), pp. 149-154, 2011. M. Ohbha, “Software reliability analysis models,” IBM Journal of Research and Development, vol, 28, pp. 428443, 1984. © 2014 ACADEMY PUBLISHER 317 Ce Zhang, born in 1978, received Bachelor and Master degrees of computer science and technology from Harbin Institute of Technology (HIT) and Northeast University (NEU), China in 2002 and 2005, respectively. He has been a Ph.D. candidate of HIT major in computer system structure since 2010. His research interests include software reliability modeling, FaultTolerant Computing (FTC) and Trusted Computing (TC). Gang Cui was born in 1949 in China. He earned his M.S. degree in 1989 and B.S. degree in 1976, both in ComputerbScience and Technology from Harbin Institute of Technology at Harbin. He is currently a professor and Ph.D. supervisor in School of Computer Science and Technology at Harbin Institute of Technology. He is a member of technical committee of fault tolerant computing of the computer society of China. His main research interests include fault tolerance computing, wearable computing, software testing, and software reliability evaluation. Prof. Cui has implemented several projects from the National 863 High-Tech Project and has won 1 First Prize, 2 Second Prizes and 3 Third Prizes of the Ministry Science and Technology Progress. He has published over 50 papers and one book. HongWei Liu was born in1971 in China, is doctor, processor and doctoral supervisor of HIT. His research interests include software reliability modeling, FTC and mobile computing. FanChao Meng was born in 1974 in China, is doctor and associate processor of HIT. His research interests include software architecture derived by model, software reliability modeling software reconstruction and reuse, and Enterprise Resource Planning (ERP). 318 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 A Web-based Virtual Reality Simulation of Mounting Machine Lan Li* School of Mathematics and Computer Science, ShanXi Normal University, Linfen, China *Corresponding author, Email: [email protected] Abstract—Mounting machine is the most critical equipment in the SMT (Surface Mounted Technology), the production efficiency of which affects the entire assembly line's productivity dramatically, and can be the bottleneck of the assembly line if poorly designed. In order to enhance the VM(Virtual Manufacturing) of mounting simulation for PCB(Printed Circuit Board) circuit modular, the virtual reality simulation of web-based mounting machine is written in Java Applet as controlling core and VRML(Virtual Reality Modeling Language) scenes as 3D displaying platform. This system is data driven and manufacturing oriented. System can dynamically generate 3D static mounting scene and interactively observe the dynamic process from all angles. Simulation results prove that the system has a high fidelity which brings good practical significance to manufacturing analysis and optimization for process design. It offers a new thought for establishing a practical PCB circuit modular VM system in unit production. Index Terms—Virtual Reality; Virtual Manufacturing; Vrml; Mounting; Simulation I. INTRODUCTION To accommodate the requirement for electronic product with more varieties, variable batch, short cycle and fast renewal, SMT assembly line has been widely used. VM is the application of virtual reality technology in the manufacturing field. The combination of SMT and VM is ideal for promoting the design level of PCB board circuit modular, guiding products to assemble correctly for rapid manufacturing; thereby it is a hot topic for research [1]. PCB virtual manufacturing technology has just begun in both China and abroad. At present, researches on it mainly focus on developing the VM system of the Electronic Design and Manufacturing Integrated (EDMI) established in Beijing by the Military Electronic Research Institute of the former Electronic Division, which has led to great research findings: it has established the architecture of EDMI’s VM system from a point on developing data-driven animation simulation technology; and it has developed a virtual manufacturing system orienting to the bottlenecks and efficiencies of the production line. Huazhong University of Science and Technology, and Wuhan Research Institute of Posts and Telecommunications mainly engage in the research and development of Computer Aided Process Planning © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.318-324 (CAPP) system for PCB Assembly in Computer Integrated Manufacturing System (CIMS) [2]. Researches on optimization problems were conducted in relevant literatures and various resolution algorithms were given. For instance, Guo et al proposed the optimization on component allocation between placement machines in surface mount technology assembly line [3]. Meanwhile, Peng et al proposed an optimization based on scatter search specific to the placement sequence for mounting machine [4]. However, the researches mentioned above mainly concentrate on the simulation in the production line and the optimum allocation of manufacturing technology. In addition, in spite of being quick and effective, it is difficult for the development environment to change the existing simulation software, which brings some disadvantages. The virtual manufacturing technology of PCB fails to reach some of its objectives due to the limitations of the existing simulation software. For instance, when taking SIMAN/CINEMA as the virtual manufacturing development environment of EDMI, it cannot simulate the manufacturing process of some specific manufacturing units, such as mounting machine and reflow machine. Mounting machine is the most essential equipment in the SMT, the production efficiency of which affects the entire assembly line's productivity dramatically, and could be the bottleneck of the entire assembly line if poorly designed. However, there is little simulation on the working process of the mounting machine. Hu et al proposed five categories of mounting machines based on their specification and operational methods [5]. By combining software with programming, 3D simulation system for SMT production line was designed and implemented in [6]. Ma et al [7-9] proposed the way for transferring model from the platform by OpenGL to create scenes, designs an interface program to directly transfer 3DS model documents, thus achieved the scene simulation of mounting. The simulation environment mentioned above uses high-level language (such as VC ++6.0) combined with 3D graphics library (such as OpenGL), but programming is complex and it is difficult for the system to satisfy the requirements of virtual reality simulation. Virtual reality simulation is the highest level of simulation, which has the "dynamic, interactive, immersive" characteristics. VRML is a standard modeling language, which is easier than other high-level JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 languages and modeling is more convenient [2]. Also, VRML is a web-based interactive 3D modeling language that renders graphical effect. It uses different nodes to construct the virtual reality world, and can simulate actions over 3D space from normal browser by simply installing proper plug-in unit. Java Applet is a Java operation mode, mainly used for the web page. Java program has the advantage of platform-independence and security in the network, and it appears to interact more freely with the more complex scenes [10-11]. Therefore, in this paper, combined VRML and Java Applet, we have established a data driven and manufacturing oriented virtual reality simulation system of web-based mounting machine. System can dynamically generate 3D static placement scene and interactively observe the dynamic process from all angles. The rest of this paper is organized as follows: section II illustrates the overall design of the simulation system and its structure chart; In section III, taking the first domestic visual mounting machine SMT2505 as an example, a detailed construction of the static mounting is carried out by utilizing a full range of modeling tools on the basis of VRML; The working process and motion forms of the mounting machine are analyzed in section IV, and combined with the animation mechanism of VRML, simulation of the mounting process is also studied by adopting keyframe animation and kinematics algorithm animation respectively; a concrete realization of the systemic interaction is also described in section V; finally, section VI concludes the paper by summarizing the key aspects of our scheme and point out shortcomings. II. SIMULATION SYSTEM STRUCTURE As shown in Fig. 1, the browser/server mode simulation system composed of three architectures, the Client, the Web Server, and the Database used to record all the technical mounting parameters. For the entire operation, the model is driven by the data with all the parameters come from the actual design phase, literature [12] gives the corresponding elaboration. The Database should have the ability to receive and update data quickly; The Client system runs on the Web browser, which needs to install the required VRML plug-in. Program and data used in the operation process are first downloaded from the Web server, then Java Applet acts as a simulation control engine, which establishes connection with the Database through JDBC(Java Database Connectivity) and transfers the Database data to the scene; also it uses EAI (External Authoring Interface) technology to recognize interfaces with the VRML scene, and drives the dynamic scene generation, placement process, and the user interaction, etc. III. VRML MODEL OF THE STATIC MOUNTING SCENE A. VRML Geometric Modeling VRML is a very powerful language to describe the 3D scenarios. The virtual scenarios are built from objects, the objects and their attributes can be abstracted to nodes, which will be used as the basic units for the VRML file. There are 54 nodes in VRML 2.0 [13], each node has © 2014 ACADEMY PUBLISHER 319 different fields and events. Field is used to describe different attributes of the node. The node can have different attribute with different value, so that certain functionality can be achieved. Event is the connection between different nodes, the nodes that communicate with each other constitute the event system. The dynamic interaction between user, the virtual world and virtual objects can be achieved through the event system [14-15]. A single geometric modeling that uses Shape node; The cuboid, cylinder, cone, sphere and other basic shape of the node is created by using corresponding node such as Box node, Cylinder node, Cone node, Sphere node directly; For some complex spatial modeling, we can use the point-line- plane modeling node i.e., PointSet node (point), IndexedLineSet node (line), IndexedFaceSet node (surface) as well as ElevationGrid node and Extrusion node to generate [2]. Figure 1. Simulation system structure Based on the hierarchical structure model theory, complex objects can be assembled from multiple simple geometries. Multiple objects can form the scenery by coordinate positioning. The scene graph can be built from coordinate transformation. With the nodes and their combination such as Group nodes and Transform nodes, etc., all kinds of complex virtual scene can be created. B. Cooperative Modeling and Data Optimization VRML is a descriptive text language based on the description of the node object. In theory, any 3D object can be constructed accurately or approximately. But since it is not a modeling language, it is very difficult to describe the complex model by using VRML model node alone. To improve the modeling efficiency and fidelity, for the complex part of the scene, first we consider the use of mature modeling software such as AUTOCAD, by means of VRML Export (ARX application) to be exported as *. wrl file. Normally the modeling derived from it are IndexedFaceSet nodes which are unfavorable for the file transfer, thus, we used VRML optimization tools such as Vizup to improve the conversion efficiency; Then with visual tools such as V-Realm Builder 2.0 [17], we can recognize relatively simple parts; Finally, we use VrmlPad as text editor to modify and improve the model. It has been shown in practice that it can improve the efficiency of modeling dramatically by using VRML modeling language as base and use multiple modeling tools collaboratively [2]. 320 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Figure 2. Internal hierarchical structure Figure 4. Gripper: (a) physical, (b) model C. Model Establishment Currently mounting machine can be divided into four types: boom, composite, turret and large parallel system. Boom machine works on medium speed with high precision that can support many different types of feeders and its price is cheap, so it is especially suitable for multivariety, small batch production, thus this paper uses boom for simulation research. Take the domestic first full visual mounting machine SMT2505 [16] as an example, it can identify different component by its visual system and place Chip, IC, SOIC rapidly and accurately. The placement accuracy, placement velocity and identify capability have reached the international level. 1) Internal Model Without loss of generality, we analyze and abstract the internal hierarchical structure of the machine by referencing the relevant documents for simplicity, as Fig. 2 shows. The basic elements of mounting machine can be divided into three parts: robot parts, the X/Z positioning system and other ancillary parts. Through gripper attached on the robot head, the boom mounting conducts a series of actions such as suction- shift-positioningplacing, mounting the components quickly and accurately to the PCB position. The modeling of all different parts is mainly based on the basic modeling nodes and stretch solid modeling under AUTOCAD. Next we are going to discuss the generation of main parts in detail. (1) Robot parts: As the key component of mounting machine, it includes base, robot head and other parts. Through gripper attached on the robot head, the boom mounting conducts a series of actions such as suctionshift-positioning-placing, mounting the components quickly and accurately to the PCB position. Base's modeling uses two cuboids doing minus operation to generate under AutoCAD [2], the rest parts use Box node, Cylinder node assembly, the resulting model is shown in Fig. 3 below. (3) Gripper location: It is used for storage of gripper and has no corresponding node and may be carried on with rectangular and circular to do the boolean calculation under AutoCAD to reform the stretch solid modeling as shown in Fig. 5. (4) Feeder: Components to be assembled are kept in various component feeders around the PCB. The shape is relatively complex and usually contains tape, waffle and bulk etc. Material modeling mainly uses the Box node, Cylinder node assembly completed is shown in Fig. 6. Figure 5. Gripper location model Figure 6. Feeder model X and Z positioning system models use the elliptic region stretching to convert under AutoCAD. PCB transmission mechanism can be modeling with Cylinder node, PCB board and the rack can be simplify modeling with the Box node [18]. The whole scene generation depends on each component space position relationship. Impose a Cartesian coordinate system on the work area, with the center of z positioning system z-track 1 representing the origin, and other parts by translation, rotation, scaling and other geometric transformations with Transform node assembly together, the resulting model is shown in Fig. 7 (b) below. Figure 3. Robot parts model (2) Gripper: It is used for grapping components and uses the Cylinder node assembly for regulation shape as shown in Fig. 4. © 2014 ACADEMY PUBLISHER Figure 7. SMT2505:(a) interior,(b) internal model JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 2) External Model As shown in Fig. 8(a), the case of the mounting machine can be grouped into three parts: operational control, the shell itself and the display monitor based on the structural features. Each of these three parts is made up by corresponding components based on the structure modeling principles. The geometric model of the operational control part can be implemented by shifting, rotating, shrinking and expanding the models of the keyboard and electrical switch. The same can be applied to the shell body and the display monitor geometric models. The whole model of the outside part of mounting machine is shown in Fig. 8(b) below. 321 its home location, and waits for the next raw PCB to arrive [20]. B. Key Frame Technology Assumption: we need to mount three components defined as 1, 2, 3 on PCB, which has been stored in feeders respectively i.e., each feeder can contain one type of components, two gripper defined as 1,2 placed in gripper location, so, gripper1 mounts components 1,3, gripper2 mounts components 2 in the order of 1-3-2, the robot head starts from the home location, the movement path is shown as in Fig. 9: Figure 9. Movement path Figure 8. SMT2505: (a) exterior,(b) external model IV. DYNAMIC SIMULATION OF MOUNTING PROCESS There are many methods for realization of the dynamic simulation, such as key frame technology and kinematics algorithm, etc. Key frame technology utilizes the continuous play of the object movement for some key frames specified path (a constant and orderly images) to achieve animation effects. In VRML, a time sensor node such as Time Sensor output clock drives various interpolation and route changing some domain of Transform node; Kinematics algorithm is determined by kinematic equation of the object trajectory and rate, without knowing its physical properties, that can be done efficiently, In VRML, by means of the Script node embedded JavaScript scripts, it can complete more complex animation. Thus, according to mounting machines work process definition, key frames or derivation of kinematics equations is the basis and the essential part of dynamic simulation [19]. The following example analyzes the key steps. A. Assembly Operation The sequence of operations performed by such a pickand-place robot can be described as follows: the robot head starts from its designated home location, moves to a pickup slot location, then grabs a component, and moves to the desired placement location, where the PCB is assembled, and places it there. After placement, the robot head moves to another pickup slot location to grab another component and repeats the prior sequence of operations. In case the components are not the same size, robot also changes its gripper by moving to the gripper location during the assembly operation. Also, finepitch components are tested for proper pin alignment against an upward-looking camera during their assembly. After completing the assembly operation, the robot returns to © 2014 ACADEMY PUBLISHER Path 1: robot head moves to the gripper location, grabs gripper 1; Path 2: robot head moves to the feeder, gets component 1; Path 3: robot head moves to the desired placement location where the PCB is assembled, and places it there; Path 4: robot head moves to the gripper location, gets component 3; Path 5: robot head moves to the desired placement location where the PCB is assembled, and places it there; Path 6: robot head moves to the gripper location, unloads gripper 1; Path 7: robot head moves to the gripper location, grabs gripper 2; Path 8: robot head moves to the feeder, gets component 2; Path 9: robot head moves to the desired placement location where the PCB is assembled, and places it there; Path 10: robot head moves to the gripper location, unloads gripper 2; Path 11: robot returns to its home location. Therefore, the action of robot head during the pickand-place operation can be described as follows: Grabing gripper: in Y-direction moves down close to the gripper then moves up to grab the gripper; Unloading gripper: in Y-direction moves down to put down the gripper then moves up; Getting component: in Y-direction moves down close to the component then moves up to get component; Placing component: rotation around Y-axis and in Ydirection moves down to place the component on the PCB then moves up. Based on the path described above and robot head movements, the key frames (for simplicity, take the simple model as the example) as shown in Fig. 10 can be set to determine the coordinates of the location throughout. 322 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Figure 10. Key frames of mounting process C. Kinematics Algorithm First of all, let us analyze movement forms of mounting machine. The robot starts from its home location, installs the gripper, gets the component, places the component, unloads the gripper, and in the end returns to the home location. It actually acts staticacceleration-constant-deceleration-static linear variable motion. For simplicity, without loss of generality, we assume that the robot head starts to move around at a constant speed (i.e. aside for its acceleration and deceleration phase), thus, the motion problem can be formulated as: © 2014 ACADEMY PUBLISHER t2 s vdt vt , t s / v, (1) vx sx / t , vz sz / t , (2) t1 t2 t2 t1 t1 sx vx dt , sz vz dt (3) Then, we establish the mathematical model. Suppose a component assembled on the PCB at the speed of 1 unit/ s, the robot head moves up and down respectively at 0.2 units at gripper location and at 0.05 units at the feeder and PCB, movement path and coordinates are shown as in Fig. 11: According to the known condition, translation and time of the straight line segments and the sub-speed of X- JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 323 direction, Z-direction is acquired. For example, the result of AB is as follows: and VRML scenes to communicate with the external operations directly, thereby objects controlled and modified and further connected to the database. Java Applet is mainly used for the Web page, Java program has the advantage of platform independence and security in the network, and it appears to interact more freely with the more complex scenes, we use it as the programming language for this paper [21]. A. Implementation of Main Interface VRML and Java Applet must be embedded in the same Web page, therefore, Java Applet acts as a simulation engine control, VRML provides 3D scene of virtual reality, ultimately implementation of the main interface as shown in Fig. 13. In the main interface of the system, first click on "connect database" button to initialize the database operation to complete the connection, then, the system returns an available results (including all of the static and dynamic movement parameters of the scene) for Java program, and dynamically generates a static scene. After reading the data in the system, by clicking on the "start"/"pause"/"stop" button, user can interactively observe the dynamic process from all angles, the dynamic coordinates of components also display in the text box concurrently [22]. Figure 11. Movement path and coordinate Figure 12. Key frames of mounting process AB 1 3 2 0 2 2 2.828 (4) t AB AB / v 2.828 (5) sx 1 3 vx t AB sz 2 0 v y t AB (6) vx 2 / 2 2 2 / 2 vz 2 / 2 2 2 / 2 At B point, translation of Y-direction is: s y 2 0.2 0.4, tB sy /1 0.4 (7) (8) Thus, t ABB 2.828 0.4 3.228 similar methods are used for the rest of segments. Using the continuous displacement produced by the programming control and change Translation node of the X carriage in VRML, Translation field of the base node as well as Translation and Rotation field of the gripper. Thus, the carriage moves horizontally on tracks in, say, zdirection, the base moves horizontally on carriage in, say, x-direction, and the gripper on the head can move in the vertical y-direction and rotate around the vertical axis for performing the proper alignment of components. As shown in Fig. 12, thereby the simulation results further validate the correctness of the above algorithm. V. THE REALIZATION OF SYSTEM INTERACTIONS In VRML all sorts of sensor nodes such as TouchSensor can be used with external program that allows users to interact directly for developing a strong sense of immersion 3D world. EAI allows Java Applet © 2014 ACADEMY PUBLISHER Figure 13. System main interface B. Dynamic Scene Generation EAI interface defines a set of VRML browsers for Java classes composed of three parts: vrml external*, vrml external field * and vrml external exception *. Therefore vrml external Browser is the basis of EAI access. For example, 3D scene called tiezhuangji defined previous is an empty node, by means of Browser class getBrowser () and getNode () method to obtain tiezhuangji case, we can access the nodes with getEventIn () and getEventOut () method, furthermore to achieve simulation design of scene interactively. Related codes are shown as follows: Browser browser=Browser.getBrowser(); Node tiezhuangji=browser.getNode ("tiezhuangji "); EventInSFVec3f translation= (EventInSFVec3f) tiezhuangji.getEventIn ("set_translation "); float position[ ]=new float[3]; position[0]=x; position[1]=y; position[2]=z; translation.setValue(position); float position[ ]=new float[3]; 324 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 position=((EventOutSFVec3f) (tiezhuanji.getEventOut("translation_changed"))).getValu e(); … VI. CONCLUSION In this paper after the research of structure characteristics and working principle of mounting machine, we establish the network application with data driven, manufacturing–oriented visualization simulation system that can interactively, represent the whole mount process. Detailed product design information added would achieve manufacturability analysis and process optimization to provide reference for the practical production in further research. ACKNOWLEDGMENT This paper is supported in part by the technical support of military electronic pre-research project, No. 415011005. REFERENCES [1] H. Koriyama and Y. Yazaki, "Virtual manufacturing system", International Symposium on Semiconductor Manufacturing, 2010, pp. 5-8. [2] Lan Li, "Modeling and simulation of mounting machine based on VRML ", 2012 Fourth International Conference on Computational and Information Sciences, Chongqing. [3] Shujuan Guo, "Optimization on component allocation between placement machines in surface mount technology assembly line", Computer Integrated Manufacturing Systems, 2009. 15(4), pp. 817-821. [4] Peng Yuan, "Scatter searching algorithm for multi- headed surface mounter", Electronics Process Technology, 2007. 28(6), pp. 316-320. [5] Yijing Hu, "Mounting optimization approaches of highspeed and high-precision surface mounting machines", Eiectronics Process Technoiogy, 2006. 27(4), pp. 191-194. [6] Nanni Zhang, "3D simulation system for key devices in surface mounting technology production line", Computer Applications and Software, 2009. 26(2), pp. 55-57. [7] Min Ma, "Visible simulation of the key equipment in PCB fabrication", Master's degree thesis, 2007. [8] Xiao Guo, "Visual modeling and simulation of electronic circuit manufacturing equipment of PCB board level", Master's degree thesis, 2007. [9] Bingheng Lai, "Study of paste to pack manchine simulation based on OpenGL", Master's degree thesis, 2007. © 2014 ACADEMY PUBLISHER [10] Kotak, D. B. Fleetwood, M. Tamoto, H. Gruver, W. A. "Operational scheduling for rough mills using a virtual manufacturing environment", Systems, Man, and Cybernetics, 2011 IEEE International Conference. [11] Zhe Xu, "VRML modeling and simulation of 6DOF AUV based on MATLAB", Journal of System Simulation, 2007. 19(10), pp. 2241-2243. [12] Hong Chang, Qusheng Li, Xinzhi Zhu; Liang Chen, "Study of PCB recovered for the SMT module of electronic product VM", computer simulation. 2009, 20(1), pp. 109111. [13] Haifan Zhang, "According to the Simulink imitate with realistic and dynamic system of VR Toolbox conjecture really", Control & Automation, 2007. 23(28), pp. 212-214. [14] Xiangping Liu, "Visual running simulation of railway vehicles based on Simulink and VRML", Railway Computer Application, 2009. 18(11), pp. 1-3. [15] Kurmicz, W. "Internet-based virtual manufacturing: a verification tool for IC designs, Quality Electronic Design, 2000. ISQED 2000". Proceedings. IEEE 2000 First International Symposium, March 2000. [16] Ames A L, Ndaeau D R, Moreland J L. VRML 2. 0 Sourcebook. [S.1.]: John Wiley & Sons, Inc., 1997. [17] M. sadiq, T. L. Landers, and G. D. Taylor, "A heuristic algorithm for minimizing total production time for a sequence of jobs on a suferace mount placement machine", Int. J. Productions Res, vol. 31, 1998. pp. 1327-1341 [18] Swee M. Mok, Chi-haur Wu, and D. T. Lee, "Modeling automatic assembly and disassembly operations for virtual manufacturing", IEEE Transactions on System, 2004. [19] Sihai Zheng, Layuan Li, Yong Li, "A qoS routing protocol for mobile Ad Hoc networks based on multipath", Journal of Networks, 2012. 7(4), pp. 691-698. [20] Ratnesh Kumar and Haomin Li, "Assembly Time Optimization for PCB Assembly", Proceedings of the American Control Conference altmore, 1994, pp. 306-310 [21] Xiaobo Wang, Xianwei Zhou, Junde Song, "Hypergraph based model and architecture for planet surface Networks and Orbit Access", Journal of Networks, 2012. 7(4), pp. 723-729. [22] F. Larue, M. D. Benedetto, M. Dellepiane, and R. Scopigno, "From the digitization of cultural artifacts to the Web publishing of digital 3D collections: an Automatic Pipeline for Knowledge Sharing", Journal of multimedia, 2012. 7(2), pp. 132-144. Lan Li, is a lecturer of ShanXi Normal University, China. She received her B.S. degree in Computer Science from Southwest Jiaotong University and her M.S. degree in Computer Science from Xidian University in 2003. Her current research interests include virtual reality and multimedia technology. JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 325 Improved Extraction Algorithm of Outside Dividing Lines in Watershed Segmentation Based on PSO Algorithm for Froth Image of Coal Flotation Mu-ling TIAN Institute of Mechatronics Engineering, College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan, China Email: [email protected] Jie-ming Yang Institute of Mechatronics Engineering Taiyuan University of Technology, Taiyuan, China Email: [email protected] Abstract—It is difficult to exact accurate bubble size and to make image recognition more reliable for forth image of coal floatation because of low contrast and blurry edges in froth image. An improved method of getting outside dividing lines in watershed segmentation was proposed. In binarization image processing, threshold was optimized applying particle swarm optimization algorithm(PSO)combining with 2-D maximum entropy based gray level co-occurrence matrix. After distance transform, outside dividing lines have been exacted by watershed segmentation. By comparison with Otsu method, the segmentation results have shown that the gotten external watershed markers are relatively accurate, reasonable. More importantly, under segmentation and over segmentation were avoided using the improved method. So it can be proved that extraction algorithm of outside dividing lines based on PSO is effective in image segmentation. Index Terms—Froth Image in Coal Floatation; Threshold Optimization; Particle Swarm Optimization Algorithm; OnClass Variance Maximum Otsu Method; Distance Transform; Image Segmentation I. INTRODUCTION Because flotation indexes have the extremely strong correlation, accurate extraction of bubble in flotation image becomes the key. In general, size characteristic of the bubble will be obtained by watershed segmentation for froth image. Watershed segmentation is a method of image segmentation based on region .With fast, efficient, accurate segmentation results, watershed segmentation way has be pay attention to by more and more people. Because the traditional watershed segmentation is easily affected by the noise and the image texture details, small basin formed by noise and small details are segmented by error [1], This can lead to over-segmentation. On the contrary, for the low contrast images, under-segmentation © 2014 ACADEMY PUBLISHER doi:10.4304/jmm.9.2.325-332 can form because image edge is not clear [2]. In order to solve this problem, people use the two methods mainly. The first kind is to preprocess the image by filter; the second kind is to use watershed segmentation algorithm based on marker extraction. In addition, the fuzzy Cmeans clustering algorithm is applied to solve oversegmentation by merging segmentation results [3] [4]. Considering that froth images of coal flotation are collected in floatation factory, gray distribution is concentrated, there is low contrast in background and foreground and bubble edges in images are blurry [5]. As a consequence, it is difficult to segment bubbles. To solve this problem, the marked watershed is often adopted. In addition to the internal identifier, outside segmentation lines should be extracted. There are several kinds of extraction algorithm of outside segmentation lines. The extraction way of outside dividing lines is based on binary image processing in this paper. When gray level image was converted into binary image, traditional threshold selection method by one-dimensional histogram is often used in binary image segmentation. This kind of methods is simple and effective to implement. Its concrete step is to make a one-dimensional histogram for a gray image, namely, gray statistic information of the image, and then to find the lowest valley in double peak, which is often considered the image segmentation threshold. The principle of this method is based on the two gray mountains are composed from foreground and background gray values of a image and target and background could be separated if separating the image at the low point of the two peaks. However, due to the impact of lighting and other reasons, sometimes obvious wave crests and troughs don’t appear in the onedimensional histogram. So it is unable to realize only to use the gray distribution to obtain the threshold. In addition, there are many methods for threshold selection in binary image segmentation, such as on-class variance 326 maximum Otsu, the method of minimum error, the method of maximum entropy and so on. The method of Otsu threshold segmentation is a kind of segmentation methods based on maximum on-class variance of histogram, in which a threshold will be gotten to make the on-class variance maximum. In the method of maximum entropy, a threshold will be gotten to make the information entropy of the two kinds of distribution of target and background maximum. The method of maximum entropy includes one-dimensional and twodimensional maximum entropy method. One-dimensional maximum entropy method depends only on the gray level histogram of image; only considers the statistical information of the gray of image itself, ignoring the other pertinent information. So it is not accurate to segment image with much noise by the gotten threshold. Twodimensional maximum entropy method, which not only uses the gray information of image pixels, and fully considers the pixel and its spatial correlation information within the neighborhood, is suitable for all of SNR image, and can produce better image segmentation effect. So it is a kind of threshold selection methods with very high practical value. It will produce good effect to segment an image by the gotten threshold using the maximum entropy method of 2-D gray histogram as the objective function [6] [7] [8] [9]. Different from the commonly used method of extracting outside dividing lines in watershed segmentation, an improved algorithm has been proposed in this paper. 1) Using optimized threshold by particle swarm optimization algorithm, transform a gray image into a binary image. Multi-threshold segmentation can make froth image undistorted and satisfied segmentation effect. Considering that particle swarm optimization algorithm not only has strong searching ability and ideal convergence but also code by real, it is more efficient to find threshold ( s, t ) which make the two-dimensional entropy maximum. The linearly decreasing weight (LDW) strategy more is adopted in PSO algorithm. 2) Use two dimensional-maximum entropy based on the gray level co-occurrence matrix as the fitness function of optimize algorithm. In binary image process, threshold selection is one of the most a key problem. In fact, it plays a decisive role in image segmentation in keeping quality of image segmentation and integrity. Because there is low contrast in froth image and bubble edges are blur, no matter threshold method based on the histogram or OTSU method are not suitable for threshold selection. The double-threshold way based on two dimensionalmaximum entropy can keep bubble original appearance and make character extraction more accurate. Gray level co-occurrence matrix contains obvious physical meaning and is simpler and timesaving compared with twodimensional histogram matrix based on gray mean. So two dimensional-maximum entropy based on the gray level co-occurrence matrix has been proposed as the fitness function of optimize algorithm in this paper. 3) Segment distance image transformed by binary image using watershed segmentation algorithm and © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 obtain outside segment lines. In common, there are several kinds of extraction algorithm for outside segmentation line. But in view of froth image particularity that under segmentation is caused easily because of bubble adhesion, the method to segment distance image formed by binary image was applied in extracting outside segment lines. Combining internal marker, the gradient image can be segmented by watershed accurately. II. PROPOSED ALGORITHM OF EXTRACTING OUTSIDE SEGMENTATION LINES A. The Segmentation Method of Image Based on 2-D Maximum Entropy 1. Two-dimensional histogram The definition of two-dimensional histogram: Twodimensional histogram Num(G1 , G2 ) refers a frequency that level of a pixel is G1 , and mean value of gray of its neighborhood or the gray of its adjacent points in different direction is G2 . Suppose that f ( x, y) is an image with the 256 gray levels, and g ( x, y) is the average image of adjacent gray of ( x, y) or the neighborhood image of left (right) of ( x, y) . So two-dimensional histogram Num(G1 , G2 ) can be expressed as Num(G1 , G2 G1 , G2 Num{{ f ( x, y) G1} {g ( x, y) G2 } (1) 2. Two-dimensional histogram based on gray level cooccurrence matrix Two-dimensional histogram usually means as the following two kinds, the gray of current point of image as the abscissa, gray mean value of the neighborhood or the gray of adjacent point in different directions as ordinate, such as the adjacent points on left or right or up or down. In general, the included information of adjacent point on the left and up direction is less clear and important than one of adjacent point on the right and down direction [10] [11]. The combined frequency of gray of image and gray of the corresponding right neighborhood point was selected as the two-dimensional histogram. Suppose that F [ f ( x, y)]M N is original image matrix, f ( x, y) is gray value of coordinate ( x, y) , M N refers the size of the image. Define that transfer matrix W [nij ]M N of L L dimension is used to represent two-dimensional histogram. Among these, nij means the number that the gray of current pixel is i and the gray of its right neighborhood point is j . It can be expressed as the following representation [12]. nij M N (l , k ) l 1 k 1 1, f (l , k ) i and f (l , k 1) j 0, otherwise (l , k ) (2) JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 327 The combined frequency pij is expressed as the following representation. pij nij / M N (3) From the above significance, two-dimensional histogram based on the pixel in the right neighborhood means the same as gray level co-occurrence matrix. So it can be directly calculated using the gray level cooccurrence matrix. In contrast, it is more troublesome, time-consuming to compute two-dimensional histogram based on gray mean value of the neighborhood in the right neighborhood. For example, for a froth image with size of 512x512, time to calculate two-dimensional histogram matrix based on gray mean is 201s; but time to calculate two-dimensional histogram matrix based on gray level co-occurrence matrix is 0.172s. Besides these, because of the application in adjacent point, two-dimensional histogram based on the pixel in the right neighborhood contains obvious physical meaning and represents the gray transfer of image and change. It is obvious that two-dimensional histogram matrix based on gray level co-occurrence matrix is simpler and timesaving. gray value in original image is different from gray value of its right neighborhood. The characteristic is close to formation characteristics of the edge and noise, and so C and D can be taken as the edge and noise area. Figure 2. The sketch map of two-dimensional histogram 3. 2-D entropy function In the image histogram matrix, supposing that the threshold vector is ( s, t ) , region entropies of A, B, C, D were obtained by the definition of two-dimensional entropy as the followings. s t H ( A) ( pij / PA ) log( pij / PA ) log PA H A / PA (4) i 0 j 0 L 1 t H ( B) ( pij / PB ) log( pij / PB ) log PB H B / PB (5) i s 1 j 0 L 1 L 1 H (C ) ordinate g ( x, y) gray value at the right neighborhood of ( x, y) , vector ( s, t ) is segmentation threshold of image, by which the graph is divided into four districts, namely A, B, C and D respectively, shown below. From the shown components of histogram matrix, compared with elements in quadrant B and D, elements in quadrant A and C mean that the pixel gray value in the original image is less different than and gray value of its right neighborhood. The characteristic is close to the properties of internal elements in target or background. If the object is dark, A is the object area and C is the background. The bright object, and vice versa. For general images, most pixels fall within the object region and background region and are concentrated on diagonal in two areas because gray level changes are relatively flat [13]. It is to say that the numbers of two-dimensional histogram matrix of image along the diagonal are large obviously, result as figure 2.Compared with elements in quadrant A and C, the elements in quadrant B and D mean that the pixel © 2014 ACADEMY PUBLISHER s H ( D) L 1 (p / PD ) log( pij / PD ) log PD H D / PD (7) ij i 0 j t 1 / PC ) log( pij / PC ) log PC H C / PC (6) ij i s 1 j t 1 Figure 1. The two-dimensional histogram matrix of image 3. The physical significance of two-dimensional histogram The graph of definition and constraint domain of twodimensional histogram is shown as the followings. The abscissa f ( x, y) is gray value at ( x, y) scale, and the (p Among them: s t PA pij PB i 0 j 0 PC L 1 L 1 i s 1 j t 1 s L 1 t p i s 1 j 0 s pij PD ij L 1 i 0 j t 1 pij L 1 t t H A pij log pij H B pij log pij i 0 j 0 L 1 HC i s 1 j 0 L 1 p i s 1 j t 1 ij s log pij H D L 1 p i 0 j t 1 ij log pij Determination functions of entropy include local entropy, joint entropy and the global entropy, and then determinations are shown respectively. Local entropy: H LE H ( A) H (C) (8) H JE H ( B) H ( D) (9) Joint entropy: Global entropy: 328 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 HGE H LE H JE (10) B. Particle Swarm Algorithm Particle swarm optimization (Particle Swarm Optimization, PSO) is an evolutionary computation technique [14] [15], which was put forward by Dr. Eberhart and Dr. Kennedy in 1995. PSO algorithm came from the study of prey behavior of birds and is an optimization tool based on iteration. The basic purpose of PSO algorithm is to find the optimal solution through group collaboration between individual and social information sharing. In PSO algorithm, a bird is abstracted as particle without mass and volume (point), whose position vector is X i ( x1 , x2 , , xn ) , whose flight velocity vector I Vi (v1 , v2 , vn ) . Expressed as xi ( xi1 , xi 2 , , xiD ) , each particle's search space is D dimension. Correspondingly, the particle velocity vector is vi (vi1 , vi 2 , , viD ) , which is used to determine the direction and distance of particles flying. Each particle has a fitness decided by objective function. Fitness value is standard which is used to measure pros and cons of each particle in the whole group of. In addition, each particle don’t know itself best position (pbest) so far and all the particles group experienced the best location (gbest). gbest is the optimal value in pbest. Whether it is pbest or gbest, evaluation is based on the fitness value. PSO algorithm is a process in which the particles follow the two extreme value pbest and gbest to update itself constantly in order to find the optimal solution of the problem. This algorithm has been widely used in the domain of function optimization, image processing, mechanical design, communication, robot path planning and has achieved good results. vik 1 vik c1 rand () ( pbest xik ) c2 rand () ( gbest xik ) xiK 1 xik vik 1 (11) (12) Among them, i 1, 2, , M , M is the total number of particles in the group; vi is the speed of the particle; pbest and gbest are as defined earlier; is called inertia factor; xik is the current position of the particle; c1 and c2 is the learning factor; rand() is a random number between 0 and1. From a sociological perspective, in (11), the first part is called inertia, which refers the ability to maintain their original velocity and direction; second part is called cognitive particle, which is about his "learning" for own experiences and means that the particle movement is originated from their own experience component; the third part is called social cognition, which is the vector from current a pointer to the best point of population and reflects the collaboration and knowledge sharing between particles. The particle is to decide the next step of © 2014 ACADEMY PUBLISHER movement just through the own experience and best experience of peers. 1. The advantages of particle swarm algorithm in threshold optimization of image segmentation From the original segmentation process of image by two dimensional threshold, its essence is to search a set of optimal solutions ( s, t ) to make two dimensional entropy maximum in two dimensional reference space of attribute formed by the gray values of pixels and the gray values in their neighborhood. As the dimensions increase, the amount of calculation of this threshold algorithm will become larger, time-consuming. It not only reduces the complexity of the algorithm, not only meets the real-time requirements applying particle swarm algorithm in searching the optimal threshold by two dimensional threshold algorithm [16]. In image processing, including image segmentation, many people use the genetic, immune and other stochastic optimization algorithm to find the target value, and get good effect in [17] [18]. Considering that there are many parameters to be set in these algorithms and set is quite different for different image parameters, this will lead to considerable differences in treatment effect. Compared with the genetic, immune optimization algorithm, particle swarm algorithm can code applying real directly with less parameters and fast convergence speed. Accordingly, this algorithm is not only simple and easy but also can reduce the dimension of population. With respect to genetic, immune algorithm, there are no crossover and mutation operations in PSO algorithm and the particle is updated through the internal velocity. Especially in threshold selection of image segmentation, PSO algorithm can realize threshold optimization effectively while combining with 2-D maximum entropy algorithm because is real not binary code. With fast convergence speed, PSO algorithm has the incomparable advantages with other algorithm and makes threshold selection simpler, more efficient. 2. Threshold optimization process of image segmentation based particle swarm algorithm 1) Initializing population. Set the population size N, the dimension of each particle D . Form randomly N particles, the position (0, L 1) , velocity vi (i 1, 2, N ) , and interval [vmax , vmax ] . Among them, L is the gray level of image. 2) According to 2D entropy formula (8), calculate the fitness value of each particle. 3) For each particle, determine its best location the pbest and the current global best position gbest; the initial pbest of each particle is initial value of each particle, the initial gbest value is maximum value of pbest for all particles. 4) According to (11) and (12), adjust particle velocity and position. 5) Calculate new fitness for each particle and update fitness. 6) For each particle, compare current fitness with that of the best position pbest, if better, the best pbest's is current position; find the fitness maximum of all pbest, and update gbest; JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 329 7) Whether the end condition (enough good fitness values or reaching itermax ) has been realized. If the end condition hasn’t been reached, turn 4). If reached, gbest is the optimal solution. 3. Fitness function selection in particle swarm algorithm Because PSO algorithm is a process to obtain the optimal solutions to the problems by searching the best fitness value using continuous iteration, the selection of the fitness function is the soul of PSO algorithm. According to the maximum entropy principle, the threshold ( s, t ) is the value which will make twodimensional entropy maximum. Here, the local entropy H LE H ( A) H (C) will be adopted as criterion of threshold selection of image segmentation, namely, (s, t ) Arg max( H LE ) . The binary image which was segmented using two-dimensional vector ( s, t ) f s ,t ( x, y) , and it was expressed as the formula (13). 0, when f ( x, y) s and g ( x, y ) t f s ,t ( x, y) 1, when f ( x, y) s or g ( x, y) t is (13) 4. The parameter selection of particle swarm algorithm 1) Population size M: The larger population scale is, the higher searching capability of the algorithm is. But this is at a cost of large amount of calculation. For the specific issues, a suitable size should be found, generally from 20 to 40. If problem is more complex and specific, population size may be increased appropriately, here, M 20 . 2) Particle dimension D : In binary image process, threshold optimization means to find threshold ( s, t ) which make the two-dimensional entropy maximum threshold, so, D 2 . 3) Maximum speed: Maximum limit speed in each particle reflects the particle search accuracy, namely resolution between the current position and the best position. If too fast, particle may be over the extreme point; if too slow, particle can’t search outside the local extreme point so as to fall into the local extreme area. If velocity in some dimension exceeds the set value, it is defined as vmax ( vmax 0 ). Here the maximum speed vmax is 4, namely, speed range [4, 4] . 4) Inertia factor : can keep the particle motion inertia, it has the trend of searching space, and has the ability to explore new areas. If is larger, it will has strong global searching, but its local search ability is weak; If is smaller, its local search ability is strong. At present, the linearly decreasing weight (LDW) strategy more is adopted mostly. That is max [(max min ) / itermax ] iter (14) Among them, max and min is the maximum and minimum value of ; iter and itermax are the current iteration number and the maximum iteration number. © 2014 ACADEMY PUBLISHER Typical values: max 0.9 , min 0.4 . Here, max 0.95 , min 0.4 . 5) Acceleration coefficient c1 and c2 : c1 and c2 are weights which adjust each particle moving to the Pbest and gbest [19]. Lower values allow particles wandering outside the target region before they are drawn back. Higher values will make particle suddenly rush or cross the target area. Learning factors is to adjust role and weight of own experience and social (Group) experience of particle in its movement. If c1 0 , it means that particle had it experience, but had only the social experience (social-only). As a consequence, its speed of convergence may be faster and but it can fall local optima in dealing with complex problems. If c2 0 , it means that particle had itself group information, but had own experience because there is no interaction between individuals. In general, set c1 2, c2 2 . When c1 and c2 are constants, a better solution can be gotten, but not necessarily equal to 2. Here choose c1 2, c2 2 . 6) The iteration termination condition: According to the specific problems, set the conditions for the termination that the maximum number of iterations itermax is has reached or particle swarm optimal position has meet predetermined expectations. Here choose itermax 50 . The iteration termination condition: the maximum number of iterations has reached itermax or average fitness groups of two successive generations is less than or equal to 0.0001. C. Distance Transformation of Binary Image Distance transformation is a kind of operations to transform a binary image into a grayscale image. Distance transformation refers to each pixel into distance between it and the nearest nonzero pixel in the image. Finally transformed target image is grayscale using distance representation. For a binary image with a size of M N , a set of the target pixels in image A [ fij ] is expressed as M {( x, y) | f xy 1} , and a set of background pixels regions is expressed as B {( x, y) | f xy 0} . Distance transform is to get the shortest distance between pixels in background region pixel and target points. The gotten image is D[aij ] after distance transformation, its expression is dij min D[(i, j ),( x, y)] ( x , yM ) (15) 1. Euclidean Distance Distance transformation includes many kinds of transformation methods, such as Euclidean distance, a chessboard distance transform, and block distance transform and so on. Euclidean distance transform is one of the used most commonly. Euclidean distance transform is a kind of accurate two norm nonlinear distance transform, which has been applied different 330 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 OSTU二 值 图 像 fields in image processing. It can be expressed as D[(i, j ),( x, y)] . 2 D[(i, j ),( x, y)] (i-x) ( j y) 2 III. (16) EXPERIMENTAL RESULTS AND ANALYSIS A. The Simulation Results of External Segmentation Lines Based on Watershed Wegmentation In this paper, the experimental platform was Microsoft Windows XP Professional system, CPU of Intel Core, main frequency of 1.86GHz, RAM of 1GB. Matlab R2007 was used as processing software. For the image acquired by CCD industrial camera in coal flotation factory whose size is 512×512. In order to make the image more clear and real it was processed by morphological de-noising and enhancement processing, and processed image was shown as figure 3.The image was segmented by two kinds of threshold selection, namely Otsu and PSO respectively. 1. Otsu method After gray image was segmented by automatic single threshold segmentation of on-class variance maximum method (Otsu), the gotten threshold is 119. Firstly, binaryzation image was obtained through threshold segmentation and the binary image was shown as figure 4. Secondly, the binary image was transformed by Euclidean distance transform. Finally, external tag can be gotten using watershed segmentation, as shown in figure 5, the image superimposed by outside dividing lines as shown in figure 6. 2. PSO method The gray image was segmented by double threshold particle using swarm optimization algorithm which took 2-D maximum local entropy segmentation method of local entropy H LE H ( A) H (C) as the fitness function. After 20 times running, the algorithm converged at 35th generation averagely, functional relation between average fitness value and the iteration as shown in figure 7. The obtained average optimal threshold of 20 times is (113,112). Firstly, binaryzation image was obtained through threshold segmentation and the binary image was shown as figure 8. Secondly, the binary image was transformed by Euclidean distance transform. Finally, external tag can be gotten using watershed segmentation, as shown in figure 9, the image superimposed by outside dividing lines as shown in figure 10. Figure 4. The binary image by Otsu OSTU外 部 分 割 线 Figure 5. External tag gotten by Otsu using watershed segmentation OSTU分 割 边 界 与 原 图 叠 加 图 Figure 6. The image superimposed by outside dividing lines of Otsu PSO二 值 图 像 原图像 Figure 7. The binary image by PSO Figure 3. Original froth image of coal flotation © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 331 PSO外 部 分 割 线 reasonable, and can accurately distinguish each bubble in the forth image. As a result this avoids under segmentation and over segmentation; in the meanwhile, this created favorable conditions for the feature extraction in bubble size of flotation image. Especially, the feature extraction based on binary image by PSO can greatly improve accuracy in image recognition. So it can be proved that PSO algorithm for threshold selection of image binaryzation is a kind of effective method. ACKNOWLEDGEMENTS Figure 8. External tag gotten by PSO using watershed segmentation PSO分 割 边 界 与 原 图 叠 加 图 Thanks for the support of Special Research Fund of Doctoral Tutor Categories of Doctoral Program in higher education (20111402110010) and Shanxi Science and Technology Program (20120321004-03) Shanxi Science and Technology Program (20110321005-07). REFERENCES Figure 9. The image superimposed by outside dividing lines of PSO 16 15.8 15.6 15.4 15.2 15 14.8 14.6 14.4 14.2 14 0 5 10 15 20 25 30 35 Figure 10. Functional relation between average fitness value and the iteration B. Analysis and Conclusion The experiment indicated that watershed ridge lines seriously deviated from the bubble edge lines when the binary image which was obtained using automatic single threshold segmentation based on on-class variance maximum Otsu was segmented by watershed after distance transform. By comparison, when the binary image which was obtained by optimal double thresholds using the particle swarm algorithm was segmented by watershed after distance transform, image segmentation is not only has the most ideal effect, but also greatly reduce the computational time. More important, the gotten external watershed ridge markers are relatively accurate, © 2014 ACADEMY PUBLISHER [1] Zhang Guoying, Zhu Hong, XuNing, “Flotation bubble image segmentation based on seed region boundary growing”, Mining Science and Technology, 21(12) pp. 239–242, 2011. [2] Shao Jianbin, Chen Gang, “bubble segmentation of image based on watershed algorithm”, Journal of Xi'an University of technology, 27 (2) pp. 185 – 189, 2011. [3] Gong may, YaoYumin, “improved fuzzy clustering image segmentation based on watershed”, Application Research of Computers, Vol. 28, No. 12, pp. 4773–4775, Dec. 2011. [4] Gao Jinyong, Tang Hongmei, “an image segmentation algorithm based on improved PSO and FCM”, Journal Of Hebei University Of Technology, Vol. 40, No. 6, pp. 6– 10, December 2011. [5] Yang Jieming, Yang Dandan, “a segmentation method of flotation froth image based improved watershed algorithm”, Coal Preparation Technology, No. 5, pp. 82–85, Oct. 2012. [6] Chen Guo, Zuo Hongfu, “Genetic algorithm image segmentation of the two-dimensional maximum entropy”, Journal of computer aided design and graphics, 16(4), pp. 530–534, 2002. [7] Pun T. A, “new method for gray level picture threshold using the entropy of histogram”, Signal Processing, 2(3), pp. 223–237. 1980. [8] Kapur J N, Sahoo P K, Wong A K C, “A new method for gray level picture threshold using the entropy of the histogram”, ComputerVision, Graphics and Image Processing, 29(3), pp. 273–285. 1985. [9] Yang Haifeng, Hou Zhaozhen, Image, “segmentation using ant colony based on 2D gray histogram” Laser and Infra., 35(8), pp. 614-617, 2005. [10] Nikhil R. Pal, Sankar K. Pal, “Entropic thresholding”, Signal Processing, 16, pp. 97–108, 1989. [11] Li Na, Li Yuanxiang, “image segmentation by twodimensional threshold based on adaptive particle swarm algorithm and data field”, Journal of Computer Aided Design Computer Graphics, Vol. 24, No. 5, pp. 628–635, May 2012. [12] Gu Peng, Zhang Yu, “improved segmentation algorithm for infrared image by two dimensional Otsu”, Journal of Image and Graphics, Vol. 16, No. 8, pp. 1425–1428, Aug. 2011. [13] Wang Dong, Zhu Ming, “The improved threshold segmentation method based on 2D entropy in low contrast image”, Chinese Journal of scientific instrument, vol. 25, No. 4 Suppl., pp. 356–357, 2004. 332 [14] Kennedy J, Eberhant R, “Particle Swarm Optimization”, Proceedings of the IEEE International Conference on Neural Networks, 1942–1948, 1995. [15] Eberhant R, Kennedy J, “A New Optimizer Using Particle Swarm Theory”, Proceedings of the 6th International Symposium on MicroMachine and Human Science, pp. 39– 43. 1995. [16] Huang Hong, Li Jun, Pan Jingui, “The two dimensional Otsu fast image segmentation algorithm based on particle swarm optimization method”, Journal of Image and Graphics, Vo. l 16, No. 3, pp. 377–381, 2011. [17] Yue Zhenjun, Qiu Wangcheng, Liu Chunlin, “An adaptive image segmentation method for targets”, Chinese Journal of image and graphics, 9 (6), pp. 674–678, 2004. [18] Yin Chunfang, Li Zhengming, “Application of a hybrid genetic algorithm in image segmentation”, Computer Simulation, 21 (8), pp. 158–160, 2004. [19] Peiyi Zhu, Weili Xiong, et al., “D-S Theory Based on an Improved PSO for Data Fusion”, Journal Of Networks, VOL. 7, NO. 2, pp. 370–376, February 2012. © 2014 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 9, NO. 2, FEBRUARY 2014 Muling Tian, female, born in 1969, in Taiyuan, Shanxi Province, China. A Ph.D candidate in Institute of Mechatronics Engineering, Taiyuan University of Technology. Received the bachelor's degree in Electronic and master's degree in mechatronic engineering from Taiyuan University of Technology. She is a teacher in College of Electrical and Power Engineering, Taiyuan University of Technology. Her main interest is focus on the image processing and automatic control. Jieming Yang, female, born in 1956 in Taiyuan, Shanxi Province, China. Received the PhD degree in Mechatronics Engineering from Taiyuan University of Technology. She is a Professor in Taiyuan University of Technology .Her research interest covers image processing, automatic monitoring and control and fault diagnosis. She has hosted several provincial projects. More than 30 academic articles are published and there are more than ten papers cited by EI. Instructions for Authors Manuscript Submission All paper submissions will be handled electronically in EDAS via the JMM Submission Page (URL: http://edas.info/newPaper.php?c=7325). After login EDAS, you will first register the paper. Afterwards, you will be able to add authors and submit the manuscript (file). If you do not have an EDAS account, you can obtain one. Along with the submission, Authors should select up to 3 topics from the EDICS (URL: http://www.academypublisher.com/jmm/jmmedics.html), and clearly state them during the registration of the submission. JMM invites original, previously unpublished, research papers, review, survey and tutorial papers, application papers, plus case studies, short research notes and letters, on both applied and theoretical aspects. Submission implies that the manuscript has not been published previously, and is not currently submitted for publication elsewhere. Submission also implies that the corresponding author has consent of all authors. Upon acceptance for publication transfer of copyright will be made to Academy Publisher, article submission implies author agreement with this policy. Manuscripts should be written in English. Paper submissions are accepted only in PDF. Other formats will not be accepted. Papers should be formatted into A4size (8.27" x 11.69") pages, with main text of 10-point Times New Roman, in single-spaced two-column format. Authors are advised to follow the format of the final version at this stage. All the papers, except survey, should ideally not exceed 12,000 words (14 pages) in length. Whenever applicable, submissions must include the following elements: title, authors, affiliations, contacts, abstract, index terms, introduction, main text, conclusions, appendixes, acknowledgement, references, and biographies. Conference Version Submissions previously published in conference proceedings are eligible for consideration provided that the author informs the Editors at the time of submission and that the submission has undergone substantial revision. In the new submission, authors are required to cite the previous publication and very clearly indicate how the new submission offers substantively novel or different contributions beyond those of the previously published work. The appropriate way to indicate that your paper has been revised substantially is for the new paper to have a new title. Author should supply a copy of the previous version to the Editor, and provide a brief description of the differences between the submitted manuscript and the previous version. If the authors provide a previously published conference submission, Editors will check the submission to determine whether there has been sufficient new material added to warrant publication in the Journal. The Academy Publisher’s guidelines are that the submission should contain a significant amount of new material, that is, material that has not been published elsewhere. New results are not required; however, the submission should contain expansions of key ideas, examples, elaborations, and so on, of the conference submission. The paper submitting to the journal should differ from the previously published material by at least 30 percent. Review Process Submissions are accepted for review with the understanding that the same work has been neither submitted to, nor published in, another publication. Concurrent submission to other publications will result in immediate rejection of the submission. All manuscripts will be subject to a well established, fair, unbiased peer review and refereeing procedure, and are considered on the basis of their significance, novelty and usefulness to the Journals readership. The reviewing structure will always ensure the anonymity of the referees. The review output will be one of the following decisions: Accept, Accept with minor changes, Accept with major changes, or Reject. The review process may take approximately three months to be completed. Should authors be requested by the editor to revise the text, the revised version should be submitted within three months for a major revision or one month for a minor revision. Authors who need more time are kindly requested to contact the Editor. The Editor reserves the right to reject a paper if it does not meet the aims and scope of the journal, it is not technically sound, it is not revised satisfactorily, or if it is inadequate in presentation. Revised and Final Version Submission Revised version should follow the same requirements as for the final version to format the paper, plus a short summary about the modifications authors have made and author's response to reviewer's comments. Authors are requested to use the Academy Publisher Journal Style for preparing the final camera-ready version. A template in PDF and an MS word template can be downloaded from the web site. Authors are requested to strictly follow the guidelines specified in the templates. Only PDF format is acceptable. The PDF document should be sent as an open file, i.e. without any data protection. Authors should submit their paper electronically through email to the Journal's submission address. Please always refer to the paper ID in the submissions and any further enquiries. Please do not use the Adobe Acrobat PDFWriter to generate the PDF file. Use the Adobe Acrobat Distiller instead, which is contained in the same package as the Acrobat PDFWriter. Make sure that you have used Type 1 or True Type Fonts (check with the Acrobat Reader or Acrobat Writer by clicking on File>Document Properties>Fonts to see the list of fonts and their type used in the PDF document). Copyright Submission of your paper to this journal implies that the paper is not under submission for publication elsewhere. Material which has been previously copyrighted, published, or accepted for publication will not be considered for publication in this journal. Submission of a manuscript is interpreted as a statement of certification that no part of the manuscript is copyrighted by any other publisher nor is under review by any other formal publication. Submitted papers are assumed to contain no proprietary material unprotected by patent or patent application; responsibility for technical content and for protection of proprietary material rests solely with the author(s) and their organizations and is not the responsibility of the Academy Publisher or its editorial staff. The main author is responsible for ensuring that the article has been seen and approved by all the other authors. It is the responsibility of the author to obtain all necessary copyright release permissions for the use of any copyrighted materials in the manuscript prior to the submission. More information about permission request can be found at the web site. Authors are asked to sign a warranty and copyright agreement upon acceptance of their manuscript, before the manuscript can be published. The Copyright Transfer Agreement can be downloaded from the web site. Publication Charges and Re-print The author's company or institution will be requested to pay a flat publication fee of EUR 360 for an accepted manuscript regardless of the length of the paper. The page charges are mandatory. Authors are entitled to a 30% discount on the journal, which is EUR 100 per copy. Reprints of the paper can be ordered with a price of EUR 100 per 20 copies. An allowance of 50% discount may be granted for individuals without a host institution and from less developed countries, upon application. Such application however will be handled case by case. More information is available on the web site at http://www.academypublisher.com/jmm/authorguide.html. (Contents Continued from Back Cover) Method of Batik Simulation Based on Interpolation Subdivisions Jian Lv, Weijie Pan, and Zhenghong Liu 286 Research on Saliency Prior Based Image Processing Algorithm Yin Zhouping and Zhang Hongmei 294 A Novel Target-Objected Visual Saliency Detection Model in Optical Satellite Images Xiaoguang Cui, Yanqing Wang, and Yuan Tian 302 A Unified and Flexible Framework of Imperfect Debugging Dependent SRGMs with Testing-Effort Ce Zhang, Gang Cui, Hongwei Liu, Fanchao Meng, and Shixiong Wu 310 A Web-based Virtual Reality Simulation of Mounting Machine Lan Li 318 Improved Extraction Algorithm of Outside Dividing Lines in Watershed Segmentation Based on PSO Algorithm for Froth Image of Coal Flotation Mu-ling TIAN and Jie-ming Yang 325