Full Issue in PDF
Transcription
Full Issue in PDF
Journal of Multimedia ISSN 1796-2048 Volume 7, Number 4, August 2012 Contents Special Issue: Multimedia Contents Security in Social Networks Applications Guest Editors: Zhiyong Zhang and Muthucumaru Maheswaran Guest Editorial Zhiyong Zhang and Muthucumaru Maheswaran 277 SPECIAL ISSUE PAPERS DRTEMBB: Dynamic Recommendation Trust Evaluation Model Based on Bidding Gang Wang and Xiao-lin Gui 279 Block-Based Parallel Intra Prediction Scheme for HEVC Jie Jiang, Baolong, Wei Mo, and Kefeng Fan 289 Optimized LSB Matching Steganography Based on Fisher Information Yi-feng Sun, Dan-mei Niu, Guang-ming Tang, and Zhan-zhan Gao 295 A Novel Robust Zero-Watermarking Scheme Based on Discrete Wavelet Transform Yu Yang, Min Lei, Huaqun Liu, Yajian Zhou, and Qun Luo 303 Stego Key Estimation in LSB Steganography Jing Liu and Guangming Tang 309 REGULAR PAPERS Facial Expression Spacial Charts for Describing Dynamic Diversity of Facial Expressions H. Madokoro and K. Sato 314 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 277 Special Issue on Multimedia Contents Security in Social Networks Applications Guest Editorial Recent years have witnessed rapid developments of the information and communication technologies, and the Next-Generation Internet, 3G and 4G wireless mobile networks have been striding to a large-scale deployment and application. The emerging versatile network admissions, especially for social network services, tools and applications, enable a much more convenient access to multimedia contents in anytime, at anywhere, for anyone. Unfortunately, in such environment, copyright infringement behaviors, such as illicit copying, malicious distribution, unauthorized usage, free sharing of copyright-protected digital contents, will also become a much more common phenomenon. Some research frontiers on multimedia contents security in social networks applications have been in progress, including enhanced security mechanisms, methods and algorithms, trust assessment and risk management in social network applications, as well as social factors and soft computing in social media distributions. The special issue attempts to bring together researchers, contents industry engineers and administrators resorting to the state-of-the-art technologies and ideas to protect valuable multimedia contents and services against attacks and IP piracy in the emerging social networks. It includes 5 selected papers as follows: The first paper focuses on an interesting recommendation trust issue in the large-scale distributed computing, including social networks. Dr. Gang Wang proposed a dynamic recommendation trust evaluated model based on bidding in E-Commerce environment. By greatly increasing the “criminal cost” of malicious recommendation nodes, with recommendation optimization algorithm based on Markov chain, the model ensures that the recommendation service is true and more objective, which will stimulate the enthusiasm of nodes objective recommendation and reach the goal that nodes give up malicious recommendation and cooperative cheating on its own initiative by objective method so as to restrain malicious recommendation and cooperative cheating effectively. The model also shows good restraint on malicious recommendation by a serious of simulation experiments. Then, the continuous emergence of multimedia video coding standards is cared in the second paper. Especially, the existing challenges are two folds: one fold is to find efficient coding algorithms which require high performance, and the other is to speed up the coding process. With recent advancement of VLSI (the Very Large Scale Integration) semiconductor technology contributing to the emerging digital multimedia word, this paper intends to investigate efficient parallel architecture for the emerging high efficiency video coding (HEVC) standard to speed up the intra coding process, without any prediction modes ignored. Dr. Jie Jiang’s experimental implementations of the proposed algorithm are demonstrated by using a set of video test sequences that are widely used and freely available. The results show that the proposed algorithm can achieve a satisfying intra parallelism without any significant performance loss. In the third paper, LSB matching steganography technologies were explored and discussed in detail. The authors proposed a novel optimized LSB matching steganography scheme based on Fisher Information. The embedding algorithm is designed to solve the optimization problem, in which Fisher information is the objective function and embedding transferring probabilities are variables to be optimized. By modeling the groups of elements in a cover image as Gaussian mixture model, the joint probability distribution of cover elements for each cover image is obtained by estimating the parameters of Gaussian mixture distribution. Finally, in order to embed message bits, pixels chose to add or subtract one according to the optimized transferring probabilities of the category. The experiments show that the security performance of this new algorithm is better than the existing LSB matching. As one of key technologies for digital rights management, the research on watermarking algorithms is followed. In this paper, authors denoted the unsolved issues on traditional watermarking algorithms. For instance, the insertion of watermark into the original signal inevitably introduces some perceptible quality degradation. Another problem is the inherent conflict between imperceptibility and robustness. Some existing zero-watermarking algorithm available for audio and image cannot resist against some signal processing manipulations or malicious attacks. In the paper, a novel audio zero-watermarking scheme based on discrete wavelet transform (DWT) is proposed, which is more efficient and robust. The experiments show that the algorithm is robust against the common audio signal processing operations such as MP3 compression, re-sampling, low-pass filtering, cutting-replacement, additive white Gaussian noise and so on. These results demonstrate that the proposed watermarking method can be a suitable candidate for audio copyright protection. In the last paper of the special issue, a burning issue on the hiding harmful information in multimedia conveniently in the emerging multimedia social networks. Dr. Jing Liu considered the problem of estimating the stego key used for hiding using least significant bit (LSB) paradigm, which has been proved much difficult than detecting the hidden message. Previous framework for stego key search was provided by the theory of hypothesis testing. However, the test threshold is hard to determine. In this paper, we propose a new method, in which the correct key can be identified by inspecting the difference between stego sequence and its shifted sequence on the embedding path. It’s shown that the new technique is much simpler and quicker than that previously known. The above five paper were selected by strict two rounds of peer reviewing based on their originality, relevance, technical clarity and presentation, by at least two anonymous reviewers. Here, I show gratitude to Prof. Maheswaran for © 2012 ACADEMY PUBLISHER doi:10.4304/jmm.7.4.277-278 278 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 his collaboration on the successful special issue on the very interesting and challenging topic of multimedia contents security in social networks. Besides, all invited reviewers are appreciated for their reviewing, comments and suggestions for authors. Finally, I give special thanks to Prof. Jiebo Luo and Dr. George Sun for their great helps and efforts so as to the special issue publication on time and successfully. Guest Editors Zhiyong Zhang (Email: [email protected]) Department of Computer Science, Henan University of Science & Technology, Luoyang, P. R. of China Muthucumaru Maheswaran (Email: [email protected]) School of Computer Science, McGill University, Montreal, Canada Zhiyong Zhang, received his Master, Ph.D. degrees in Computer Science from Dalian University of Technology and Xidian University, China, respectively, and post-doctoral fellowship at Xi'an Jiaotong University, China. He is currently associate professor with College of Electronics Information Engineering, Henan University of Science & Technology, and research interests include digital rights management and soft computing, trusted computing and access control, as well as security risk management. Recent years, he has published over 50 scientific papers and 2 monographs on the above research fields, and held 3 authorized patents. Dr. Zhang is IEEE Senior Member, IEEE Systems, Man, Cybermetics Society Technical Committee on Soft Computing, World Federation on Soft Computing Young Researchers Committee, Membership for Digital Rights Management Technical Specialist Workgroup Attached to China National Audio, Video, Multimedia System and Device Standardization Technologies Committee, Topic Editor-in-Chief of International Journal of Digital Content Technology and Its Applications, and Guest Editor of Journal of Multimedia. Besides, he is Chair/Co-Chair and TPC Member for numerous international workshops/sessions on Digital Rights Management and contents security. Muthucumaru Maheswaran, received BSc degree in Electrical and Electronic Engineering from University of Peradeniya, Sri Lanka in 1990. In 1994, he received a MSEE degree in Electrical Engineering from School of Electrical and Computer Engineering at Purdue University, West Lafayette, Indiana, USA. From the same school he also received a PhD in Computer Engineering in December 1998. Prof. Maheswaran is nowadays an associate professor at the School of Computer Science and Department of Electrical and Computer Engineering of McGill University, Canada. He is directing the research activities at the Advanced Networking Research Lab, and His research interests are in the general areas of Computer Networking, Information Security, and Distributed Systems. He has published hundreds of scientific on the related research realms in numerous international journals and conferences. © 2012 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 279 DRTEMBB: Dynamic Recommendation Trust Evaluation Model Based on Bidding Gang Wang1,2,3 1 Department of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China 2 Shanxi Province Key Laboratory of Computer Network, Xi’an Jiaotong University, Xi’an, China 3 School of Information , Xi’an University of Finance and Economics, Xi’an, China [email protected] Xiao-lin Gui 1,2 1 Department of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China 2 Shanxi Province Key Laboratory of Computer Network, Xi’an Jiaotong University, Xi’an, China [email protected] Abstract—Fraud or cheating of malicious nodes were restrained by reducing the trust value of malicious nodes in the past trust model, but few recommendation nodes for cooperation cheating are punished, so it has had only limited malicious recommendation or fraudulent action again and again. Due to lack of punishment for malicious recommendation of recommendation nodes, enormous malicious recommendation nodes remain and search for next criminal opportunity. By borrowing idea of bidding and the introduction of competition and the mechanism of rewards and penalty, this paper proposes a dynamic recommendation trust evaluated model based on bidding in E-Commerce environment. By greatly increasing the “criminal cost” of malicious recommendation nodes, with recommendation optimization algorithm based on Markov chain, the model ensures that the recommendation service is true and more objective, which will stimulate the enthusiasm of nodes objective recommendation and reach the goal that nodes give up malicious recommendation and cooperative cheating on its own initiative by objective method so as to restrain malicious recommendation and cooperative cheating effectively. In simulation experiment, the model also shows good restraint on malicious recommendation. Index Terms—Bidding, Recommendation Trust, Evaluation Model, Recommendation Ability, Recommendation Oligarch I. INTRODUCTION With the rapid growth in computing and communications technology, the past decade has witnessed a proliferation of powerful e-commerce. However, the problems also follow it, in which, trust is a very important problem in mobile e-commerce. Thus, Manuscript received April 17, 2012; revised May 8, 2011; accepted August 7, 2012. This work was supported by a grant from National Science and Technology Major Project of the Ministry of Science and Technology of China (No.2012ZX03002001 -004) and the National Natural Science Foundation of China (No. 60873071, No.61172090). Email: [email protected] © 2012 ACADEMY PUBLISHER doi:10.4304/jmm.7.4.279-288 how to build a secure, reliable and trustworthy trust evaluation system for mobile e-commerce is a research hot spot nowadays. In the large-scale distributed computing, because the both of services don’t know each other and lack of the necessary trust basis, there may be malicious node fraud, false trading and other problems. A better way is to interact with the high trust nodes by using good trust security model in a number of interactive objects. M. Blaze et al. proposed trust management concept in Literature [1] to solve the issue of trust in distributed environment at the first time, and depended on this to develop the corresponding trust management system, PolicyMaker and KeyNote in Literature [2]. Literature [3] proposes the idea of using bid to improve the efficiency of trust query of recommended nodes and the confidence updating mechanism, and uses mainly the subjective logic trust model that proposed by A. JøSang in Literature [4], which decides trust degree of objectevaluated through the introduction of uncertain factor for evaluation of experiment and by using three factors of credible, incredible and uncertainty. However, for the lack of effective incentive, the model can not promote effectively positive tender of the node. Literature [5] proposes an electronic community PeerTrust Model based on local reputation under P2P environment. This PeerTrust Model gives a comprehensive trust assessment based on the feedbacks acquired from nodes, the magnitude of the feedback, trust degree of the feedback, the critical and non-critical factors that differentiate the interactive context and the environmental factors related to the electronic community. Kamvar et al. in Stanford University in Literature [6] proposes the EigenRep model, which is a typical global reputation model, specifying the trust calculation under the P2P environment. Literature [7] proposes a global trust model that is intended to overcome the lack of security consideration of EigenRep Model, for instance, feigning, slandering and lack of punitive measures and solves the problem of trust 280 recommendation effectively. Tang Wen and others proposed a subjective trust management model that is based on fuzzy set theory in Literature [8], and the model builds a trust model by using fuzzy mathematics method for the ambiguity of subjective trust. Literature [9] classifies the trust values of nodes into resource trust value, node contribution value and node assessment value, but it ignores the fact that the trust value of a node is related to its performance with the influence of various factors. Literature [10] proposed a new trust model under the P2P E-Commerce environment, and the model accelerates direct trust by vote in order to stimulate nodes vote positively to increase the reliability of trust. Tian Chunqi and others proposed a novel Super-Peer based trust model for Peer-to-Peer Networks in Literature [11]. Though it solves some trust problems, the quality of recommendation service(abbreviated as QoRS) could not be judged effectively, because they didn’t consider the uncertainty of recommendation information. Literature [12,13] summarized comprehensively the P2P trust mechanisms under the distributed computing environment, but because of the lack of research and analysis for collaborative cheating problems, it can not effectively reduce the emerging malicious recommendations issues in the network. Literature [14] proposed a trust model based on transaction content similarity, and the model represents the trust degree of recommendation by taking advantage of service contents similarity, and improves the reliability of trust recommendation. Due to the absence of appropriate incentive and punishment mechanism, there are still some limitations in reducing the malicious recommendation. Literature[15] proposed a new trust model based on advanced D-S evidence theory for P2P networks. Though the model improves the creditability of evaluation to node, the model could not effectively mobilize the node positivity and thus is difficult to ensure healthy development of network because of lack of incentive and punishment mechanism for node. Literature [16] proposed dynamic trust evaluation model under distributed computing environment, which is based on the Dempster-Shafer Theory and Shapley entropy, and refers to the people’s trust relationship model of Sociology. The trust evaluation model uses historical interactive information to compute the direct trust. Shapley entropy evaluates the quantity of information of each node’s direct trust function, and then the direct trust is revised in consideration of each node’s reliability and its quantity of information. Afterwards, they compute the integrative trust derived from the revised direct trust of all nodes according to the Dempster rules. In addition to above, some researchers analyze the trust security model in terms of social network, such as Sabater and others, who proposed a reputation-based trust system REGRET [17], which uses social network analysis and a hierarchical ontology structure to integrate reputation of different types so as to calculate the final node trust value. Through above analysis, although the past history trust models have reduced a few malicious services and fraud actions, these models are not good to restrain malicious © 2012 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 recommendation actions of network nodes, we find the current trust models have the several questions: (1) Evaluation of the current trust models is to aim at both parties in transaction, but evaluation for recommendation nodes is little. So it is the difficult that trust models find malicious recommendation nodes. (2) Because QoRS is not only positive correlation with recommendation success ratio and reliability of recommendation nodes, but also positive correlation with improvement degree of recommendation ability. But the current evaluation method for QoRS is only to depend on transaction success or not, so evaluation method shows monotonous and inaccuracy, and evaluation effect is bad. (3) Because the current models’ incentive and penalty mechanism depends only on improvement or reduction of trust value for transaction nodes, these models lack to encourage the power for active recommendation of recommendation nodes. To solve these issues, we proposes a dynamic recommendation trust evaluation model based on Bidding for E-Commerce environment in paper, and its novel contributions are as follows. (1) We propose a dynamic recommendation trust evaluation model based on bidding mechanism of ecommerce. Through bidding of recommendation nodes, many recommendation nodes can gain corresponding rewards from transactions, so our model can encourage a lot of recommendation nodes to recommend actively. (2) We propose a novel evaluation method for QoRS based on Markov. The evaluation method for QoRS integrates recommendation success ratio and the improvement degree of recommendation service ability. The method can effectively overcome monotonous and inaccuracy of evaluation, and solves effectively a problem that recommendation of node of the higher trust is trustworthy than nodes’ recommendation of the lower trust, and the method effectively avoids to arise “recommendation oligarch” in e-commerce network. (3) We distinguish recommendation nodes between acquaintance recommendation nodes and strange recommendation nodes, and distinguish acquaintance recommendation nodes between direct and indirect acquaintance recommendation nodes. So we can effectively ensure the trustworthy of recommendation according to acquaintance degree of recommendation nodes with evaluator. This paper aims to build a dynamic trust evaluation model based on bidding idea for E-Commerce of under the social network environment. We summarize the related research work of the current trust model in Section I. Section II introduces related key concepts and characteristics. Section III puts forward a novel dynamic recommendation trust evaluation model based on bidding. Section IV introduces evaluation method of recommendation service quality. Section V simulates the model by experiment and verified effective of proposing model. Section VI summarizes the paper and next research effort. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 II. RELATED CONCEPT AND ATTRIBUTION Definition2.1. Trust is a type of faith and reliance, which reflects the subject’s degree of knowledge for the identity and behavioural intention of the object in a specific time and context. Definition2.2. Reputation, also known as prestige, is the collection of all recommendation value for an entity and a reflection of the trust that a subject has for the object. Definition 2.3. Direct Trust, also known as local trust, refers to the judgment that the subject has for the object’s ability and trustworthiness and reliability, based on his/her experience that the subject acquires directly from the contact and transactions with the object. Definition2.4.Indirect Trust, also called recommendation trust, refers to the judgment of the subject on the object’s ability, trustworthiness and reliability through the recommendation of a third party. The relationship of direct trust and recommendation trust is shown in Figure1. A Direct Trust Direct Trust B C that B will surely return the bicycle. On the other hand, A does not lend B ten thousand dollars because A has doubt whether B will be able to return ten thousand dollars. Definition 2.5. Service Provider, also known target node, refers to the node which provides source service, and its trust will be evaluated by service requestor in ecommerce trust network, denoted by SP. Definition 2.6. Service Recommendation node refers to the node which provides recommendation of service source for the service requestor in e-commerce trust network, in order that gains related economic interest and trust degree, denoted by SR. Definition 2.7. Service requestor, also known Evaluator, refers to the node which evaluates trust of Service Provider in e-commerce trust network, denoted by E. Definition 2.8. Recommendation Oligarch refers to a member of recommendation nodes whose recommendation will play the key role for recommendation of network, and will monopolize individual of right to recommend of the whole network. D Recommendation Trust Figure 1. Local Trust and Recommendation Trust Trust is a dualistic relationship, which can be the oneto-one, one-to-many (or individual to group), many-toone (group to individual) and many-to-many (or group to group) relationships. Trust is a complicated concept that bears both subjectivity and objectivity. This paper summarizes the following key attributes of trust: (1) Subjectivity Subjective trust is cognitive phenomenon of human beings about the real world and the subjective judgment of the subject on the special traits or behaviors of the object at a specific level. Different subjects have different subjective judgments at a specific object. (2) Conditional Transitivity Transitivity only takes place under specific conditions. For instance, entity A trusts entity B and entity B trusts entity C, but it cannot be inferred that entity A trusts entity C. (3) Asymmetry Entity A trusts Entity B, but entity B does not necessarily trusts entity A. (4) Context Trust is closely related to the context and the environment the subject is in. Under different contexts or backgrounds, the trust can be different. (5) Domain Correlation Trust can be represented differently pending changes of the domain and scope. For example, a medical doctor enjoys higher degree of trust in the medical field, but the degree of trust for him may be very low in the domain of music. (6) Content Correlation Entity A trusts entity B does not mean A trusts all behaviors of B. As a matter of fact, entity A trusts entity B only under a specific context for a particular type of behavior. For instance, A trusts B and is willing to lend the latter his/her bicycle, this is because that A believes © 2012 ACADEMY PUBLISHER 281 III. DYNAMIC RECOMMENDATION TRUST EVALUATION MODEL. To solve effectively the problem of collaborative cheating of malicious nodes, we need the effective assessment and management of the network nodes. From the point of view as social psychology and Organizational Behavioral Science, whether network nodes of participating service are good or malicious nodes, there are always with a certain purpose, which obtains corresponding benefits. Based on above analysis, by borrowing idea of bidding, the gains of normal service node is the higher than the Non-normal service one from long-term benefits. (This service includes two cases, one is to provide resource service, and the other is to provide a recommendation service). The benefits include two aspects: one is good nodes make a profit from their excellent recommendation and the other is the improvement of the good recommendation nodes credibility. Definition3.1. Trust evaluation management model, which contains three entities of service requestor (also known as evaluator),service provider(also knows as resource entities) and service recommendation nodes, is new network security model based on social psychology, which is built according to history interaction relation of three entities. Definition 3.2. Direct acquaintance node is that those nodes that have already had direct interaction with the resource providers. Definition 3.3. Indirect acquaintance node is that those nodes that have already had direct interaction with direct acquaintance node, and haven’t had direct interaction with evaluator. Another recommendation node is stranger nodes except direct acquaintance nodes and indirect acquaintance nodes. From social psychology, recommendation reliability and trustworthiness have relation to familiarity of between recommendation nodes and evaluator to a great extent. If there is experience of direct interaction between 282 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 recommendation nodes and evaluator, their recommendation reliability is the higher than recommendation of recommendation nodes of experience of non-direct interaction, and reliability of acquaintance recommendation is the higher than reliability of stranger recommendation, reliability of direct acquaintance recommendation is the higher than indirect acquaintance recommendation. In the e-commerce network environment, recommender is re- commendation node, and evaluator is evaluation node, and objector evaluated is service providing node, also known target node. In which, recommendation node are classified into direct acquaintance node, indirect acquaintance node and strange recommendation node. Their relation is as following Figure2: Service requestor ( or Trust evaluator) node1 Direct Trust node5 node4 Indirect Trust node3 Direct Trust Indirect acquaintanc e node Stranger node Stranger node node6 Direct acquaintance node Stranger node node2 Service provider (or Target node) Trust? Figure 2. Description for Relationship of Between Network Nodes In which, Service requestor is requestor of service, and service provider is the provider of service, and service recommendation nodes is a entity that recommends service for service requestor. While there are direct transaction experiences between service requestor and service provider, we call two parts have direct trust relation, on the contrary, there is indirect trust relation. A. The Basic Process for selecting Bidding Nodes In dynamic recommendation trust model, the basic process of selecting bidding nodes is the first is evaluator to send bidding information; the second, service providing nodes select to bid according to bidding information, and bidding nodes are classified two categories: one is resource service providing nodes; the other is recommendation bidding nodes. After evaluator receives bidding application from service providing nodes , and evaluator queries trust value of bidding service nodes and evaluation bidding service nodes; the third is to select trust bidding nodes for contract and service. Definition 3.4. Service bidding messages is defined as following tetrad: SBM ( Evaluator _ ID,ServiceContents,TimeStrap,Reward ) (1) In which, Evaluator_ID is evaluator identity, that is service requestor identity, and ServiceContents is service contents of request for service requestor; TimeStrap is window of bidding time; Reward is that service provider and service recommendation nodes obtain the corresponding reward from evaluator, and Reward includes two sides: one is to obtain corresponding funds from evaluator; the other is to obtain trust evaluation from evaluator. Correspondingly, Reward represents to get corresponding punishment to false recommendation nodes and malicious service nodes. Definition 3.5. Resource service application is defined as following triple: © 2012 ACADEMY PUBLISHER RSA (Service_Provider_ID,ServiceContents,Response_Time) (2) In which, Resource_Provider_ID refers to identity of resource service bidding nodes; ServiceContents refers to service contents provided by Service_Provider_ID; Response_Time refers to feedback time of bidding nodes . Definition 3.6. Recommendation service application is defined as following tetrad: RSA' (Recommendator_ID,Service_Provider_ID, Recommendation_Trust,Recommedation_Respones_Time) (3) In which, Recommendator_ID refers to identity of recommendation service node; Service_Provider_ID refers to identity of resource provider; Recommendation _ Trust refers to recommendation trust, Recommendation_ Response_Time refers to feedback time of recommendation nodes. B. Trust Computing Trust computing consists of direct trust computing and recommendation trust computing. (Also known as indirect trust computing). Through integration of direct trust and recommendation trust, service requestor (or evaluator) can gain the global trust value of service provider. From research of literature[14], we find that there is more closely relationship in two factors to the global trust: one is the objectivity and accuracy of evaluation of coming from recommendation service nodes and service requestor to service provider, which depends on similarity of service contents between each evaluator and service provider and each recommendation service node and service provider in a period of time; the other is the global trust which depend on familiarity between service requestor and recommendation nodes. This relationship conform also social psychology, that is to say, the trustworthiness of acquaintance node evaluation is generally higher than that of stranger nodes, whereas the evaluation of direct acquaintance nodes is JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 283 comparatively more reliable than indirect acquaintance nodes. (1) Direct trust computing The so-called direct Trust DTSPE refers direct trust value determined by the history of interactions between evaluator E and service providing nodes. DTSPE E SP G E S SP 1 (4) S SP FSPE 2 E S F E SP E SP (5) In which, DT represent direct trust value , S represents the number of times that E and SP have been E happy with the interaction, i.e., the success rate; GSP specifies the overall times of the transactions between E E E S SP FSPE , FSPE refer the number of times and SP, GSP that E and SP have been unhappy with the interaction, or E rate of failures. when GSP =0, DTSPE =0.5。 DTSPE is not E SP E SP equal with DTESP . Since trust is a dynamic Variable and has a characteristic of time decay, this paper introduces an influence function of time: 1 f (k ) k n S S k 1 E , SP Finally, S * f (k ) S E SP E SP and F k E , SP , F k 1 E , SP F * f (k ) F k E , SP is to compute direct trust DTSPE of node E to node SP in kth time (8) DTEk, SP ( sEk , SP 1) ( sEk , SP f Ek, SP 2) (2)general recommend trust computing From Literature [14] obtains general recommend trust computing RT SP Simi, j ( i , j 1 i j i SP DT DT ) / n i SP j SP j SP (9) RTSP is the total recommendation Trust Evaluation for node SP , Simi , j refers to the similarity of service i content CSP and CSPj between node i to SP and node j to SP, i SP and SPj respectively refer to acquaintance recommendation weight and stranger recommendation oi DToi i weight; in which SP and SPj : j o 0.5 refers to the recommendation weight of direct acquaintance node or indirect acquaintance, and this represents the trust degree of the recommending node for the recommended node; is set at: © 2012 ACADEMY PUBLISHER 1 n DTSPi n i 1 (10) if SRi =IASRi ,while DTSPi 0 , then the node is a newly n i 1 joining node or a dormant node. In the process of trust recommendation, because there are two phenomena that recommendation path has the cross and independent, in which select a recommendation path of having highest trust value. In recommendation path shown by Figure3, ACE1B will be selected. 0 . 88 CC 9 0. 0 . 66 0.9 E1 0.6 D 0.75 A E2 .... .. B 54 0. .... .. Em Figure 3. Recommendation Trust Path (3)service content similarity computing This paper computes the similarity of service contents by vector included angle cosine method, and improves recommendation credibility; its computing formula is following: cos (7) are brought into formula (1), that n F k E,SP if SRi =DASRi Because there is a node cross situation in recommendation path of acquaintance nodes, let’s set α threshold ε, When α<ε, the recommendation path shall be abandoned. refers to stranger recommendation weight. If ,0 1 (6) There is the different influence in interaction information of the different time to trust computing, so there is a corresponding time impact factor in every time in order that node trust computing is the more accurate than history trust computing. In a given period of time tk, E success number S SP and failure number FSPE with influence function of time f (k ) represent formula (6): k E , SP DTSPi 1 n i j i j SP n PTSP * PTSP PT j PTSP PTl N 1, i j k a b (11) a b In which, a denotes service vector that service provider has provided for recommendation nodes in the period of time, and b denotes service vector that service provider will provide for evaluator this time, and their similarity is creditability of evaluator to recommendation node. In which, ai xi1 , xi 2 , xi 3 , xi 4 , bi x j1, x j 2 , x j 3 , x j 4 0 i, j n ,and they respectively refer to Service Contents, Response Time, Service Completion, Service Cost. Index set weight of Service vector has different situation at different distributed computing environment. After the above analysis, service similarity coefficient of this paper denotes as following: 4 Simi , j k xik k x jk k 1 2 4 k xik k x jk k 1 k 1 4 2 (12) In which, k denote the weight of the k-th index. (4)Global Trust Global trust GTOE is to get trust value of service provider through fusion of recommendation trust and direct trust. Its computation equation is as follows: 284 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 DTOE GTOE RT o (13) In which, 1 , , 0,1 , is a weight factor of direct trust, and is a weight factor of recommendation trust. because trust is a multi-attribute object, and is gradually attenuation with time, 、 is going to dynamic change with the numbers of interaction and others. While value of is larger and value of is smaller, proportion of direct trust will become larger and larger, and proportion of recommendation will become smaller and smaller as the numbers of interaction increase step by step between evaluator and service provider. k 1 nk 1 ,n k 0 2 (14) C. Trust Update The update of Trust value includes two aspects that one is the update of trust value of service provider, and the other is the update of trust value of the recommendation nodes. According to the final interaction situation and the global trust value, trust value of the service provider is updated positively (or negatively); and recommendation trust of each recommendation node is updated according to of equation(14). E RTSP RSP (15) In which, RT SP represents general recommendation E trust value of all recommendation nodes, and RSP represents recommendation trust value of each recommendation node E to service provider SP. is the E difference between RT SP and RSP . We set update threshold , while is bigger than , it is the bigger that represents distance of general recommendation trust with recommendation trust of recommendation node i, which recommendation node i is untrustworthy node; while is smaller than or equivalent to , it is closer that represents distance of general recommendation trust with recommendation trust of recommendation node i, and recommendation trust of recommendation node is the more accurate, and recommendation node i is a trust node. takes the average of all recommendation trust value. 1 k ri n i 1 (16) D. Model Algorithm Description Input: Initialize network nodes, generate network nodes Output: a list of trusted nodes, a list of excellent recommendation nodes Step(1): service requestor(that is evaluator) publishes service bidding messages to network, then resource service providers receive messages to feed back its resource service bidding application; Step(2): Resource service bidding nodes are fed back network and publicity, and recommendation service node will evaluate resource service bidding nodes and send to recommendation service bidding messages to network; © 2012 ACADEMY PUBLISHER Step(3): According to demand conditions of recommendation service bidding, a few recommendation service bidding applications are cleaned by filter because they fall short of bidding request. Step(4): Recommendation trust of each resource service bidding node is to be fused, in order to compute the general recommendation trust value; Step(5): According to direct history interaction between resource service bidding node and evaluator, to compute the global trust value through fusion of direct trust and recommendation trust; Step(6): While interactions are successful, there are a comparison between and threshold value by using(4), while , trust values is positive updated; else trust values is negative updated; and according to quality of recommendation service, a list of excellent recommendation nodes are produced automatically by this system, at the same time, and system notices evaluator pays fee to them. IV. QORS EVALUATION METHOD It is difficult to measure QoRS in the past historical trust model because the recommendation service ability of nodes is always developed and changed. Based on the analysis results, we find that the QoRS not only depends on the accuracy of the recommended trust service, but also depends on the degree of improvement of recommended service ability of the recommended node. The past history recommendation trust models often ignore improvement degree of recommendation service ability. We propose a dynamic recommendation evaluation method based on Markov chain, Abbreviated as DREM, the method not only considers accuracy of recommendation service, but considers improvement degree of recommendation service ability. The basic process for evaluation of QoRS is as follows: Step1: To compute the recommendation accuracy of each trust recommendation node according to formula(15); Step2: To compute improvement degree of recommendation ability of each recommendation node by DRETMBB; Step3: To modify recommendation service quality value according to Step2, and get the list of excellent recommendation service nodes and the list of bad recommendation service nodes. Markov chain is under the conditions that the state of process (or system) has known at time t0, and conditional distribution of the state of the process at time t>t0 is unrelated with the state of the process before at time t0, that is known to process under the conditions "now", its "future" does not depend on "past", this process is called Markov process [19]. The mathematical concept is as follows: Definition 4.1: It is supposed that the state space of random process X (t ), t T is I, if time t has any value of n, that is t1 t2 tn , n 3, ti T , under the conditions X (ti ) xi , xi I , i 1, 2, , n 1 , of the condition JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 285 P11 P Pw 21 Pn1 distribution function of X (tn ) equal exactly the condition distribution function of X (tn ) under the condition of X (tn 1 ) xn 1 , that is, P X (tn ) xn X (t1 ) x1 , X (t 2 ) x2 , , X (t n 1 ) xn 1 (17) P X (tn ) xn X (tn 1 ) xn 1 That we call the process X (t ), t T have properties of Markov, or the process is called the Markov process, which both time and state are discrete is called Markov chain. The step of DREM is as follows: Step 1: Evaluation rating To evaluate recommendation ability of nodes, at first, this article builds evaluation rating, and represents the level with evaluation indexing set V{v1,v2,v3,…,vn} of recommendation ability. Step 2: Rating and the proportion of the number of rating To count the number of different levels of evaluating recommendation nodes, Mi refers to the number of rate in leveli. Si' shows the proportion of the number of rate in present leveli , formula is as following: n M i N ,Si' i 1 Mi N the evaluation status of recommended ability of node w, referred to as the state. Parameter t (t is a discrete magnitude) represents time, so that the k-th evaluation status of node w can be shown as S wt . When t changes, the evaluation status of w will change with its behavior change, and the state transition will occurred. Step 4: Transition probability matrix If S wk is evaluation status of w in the k-th time, the previous evaluation state vector is S wk 1 , as all known, Pij Pn 2 Because Markov process has the ergodicity [21], when t tends to ∞, S wt 1 S wt S wk . As long as steadystate S w ( S1 , S2 , , Sn ) is calculated, we can calculate the degree of improvement of node w by using S w , S w obtained from equation (18): Sw Pw S w n Si 1 i 1 n Fw Si xi © 2012 ACADEMY PUBLISHER (21) i 1 Though we obtain the value of Fw , Fw does not represent the actual recommendation ability of the node, and just shows that the degree of improvement of node during this time. If the result of Fw is big, only indicates the degree of improvement of node is the bigger to compare with previous one. Example 4.1. If there are recommended nodes w1,w2, and the evaluation status is divided into 5 levels (high degree of confidence, trust, basic trust, distrust, a high degree of distrust),shown as(HT,T, BT,DT,HDT), the number and transition of state of nodes w1,w2 are shown as Table I and Table II。 Table I. Evaluation of Node w1 total (20) If x1 , x2 , , xn are given, the progress results can be quantified, and we can obtain scores of recommended ability, and it shows as Fw , the Kolmogorov equation[20], in which, Pw is one step transition probability matrix. Previous evaluation (19) the level “i” to the level “j”, and 0 Pij 1(i, j 1, 2, , n), Pij 1,(i 1, 2, , n) , S wk 1 Pw S wk by Number of people P1n P2 n Pmn In which, Pij expresses the transition probability from (18) Step 3: State with the state transition This article uses the vector Swt (S1t , S2t , , S tj ) to express P12 P22 The evaluation Total Grade HT T BT DT HDT HT T BT DT HDT 4 2 0 0 0 1 0 0 18 1 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 5* 21 4 0 0 6 21 3 0 0 30 286 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Table II. Evaluation of Node w2 The evaluation Number of people Total Previous evaluation Grade HT T BT DT HDT HT T BT DT HDT 1 2 1 0 0 Total 1 24 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 2* 26 2 0 0 4 25 1 0 0 30 (In table I and table II, the meaning of ”*” figure represents that there are five evaluators in previous evaluation ,and their evaluation results are “HT”, but in the current evaluation, evaluation results of the five peoples are respectively four peoples give “HT”, one people gives “T” . The others are same argument.) The actual status of the last evaluation: After recommendation service accuracy of each node is i calculated from formula(15) RTSP RSP , and to select current evaluation: recommendation node that recommendation accuracy is S w1 (5 30, 21 30, 4 30,0,0) S w2 (2 30, 26 30, 2 30,0,0) The actual state of the S w' (6 30, 21 30,3 30,0,0) higher than others, then modifies the list of excellent S w' (4 30, 25 30,1 30,0,0) recommendation nodes selected according to getting 1 2 nodes whose the progress speed of recommendation A transition probability Pw1 and Pw2 are as follows: service is faster than others, so it improved a 1 4 0 0 0 5 5 2 18 1 0 0 21 21 21 Pw1 1 1 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 1 1 2 2 1 12 13 13 Pw2 1 0 2 0 0 0 0 comprehensive quality of V. SIMULATION EXPERIMENT AND RESULTS ANALYSIS from ( I Pw )T S wT 0 we can get steady state of w1 and w2, .same argument, Sw2 (2 15,13 15,0,0,0) . If the specific score of the evaluation rating is FT=90, T=80 , BT=60 , NT=50 , NFT=30 , Scores of recommendation ability of the node w1 and w2 are respectively Fw1 =81.8, Fw2 =81.3,improvement of the © 2012 ACADEMY PUBLISHER the of excellent recommendation nodes. 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 node w1 is the faster than the node w2. (5)comprehensive evaluation of QoRS to recommendation service, and the final we obtain the list 0 1 1 0 0 0 5 5 S1 2 1 1 0 0 S 2 7 21 21 0 S3 0 1 1 0 0 2 2 S4 0 0 0 1 0 S5 0 0 0 0 1 So we can get Sw1 (10 33,7 11,2 33,0,0) evaluation Our simulation experiment shows that our proposed trust evaluation model can restrain effectively strategic deception of nodes and malicious recommendation of nodes and dynamically adapt to this model. We simulate the operation of DRTEMBB model through experiment and verifies the results of the experiment by operating the model in a small-scale network. The experiment involves 100 nodes and 100 services. These services are randomly assigned to each node, marking sure that each node is assigned with at least one service, and each simulation is supposed to comprise several cycles, and there must be one interaction at each node within each cycle. The trust value of the initial node is set at 0.5, indicating it is both trustworthy and untrustworthy. The simulation experiment is conducted under Java environment. And Operating Environment is CPU 3.OG, and Memory is 2G. Experiment 5.1: Analysis aiming at strategic deception of malicious nodes. This experiment aim at the issue of strategic deception of malicious nodes with time change, and inspects perception of DRTEMBB model to strategic deception of nodes, and analyses the difference of DRTEMBB model with EigenRep model, the normal model to strategic deception. The experiment assumes malicious nodes adopt suddenly cheating service action after malicious nodes have accumulated higher trust through successful interaction in a period of time. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 287 We find when the rate of malicious nodes is set at 40% and cycles is over 50, EigenRep shows transaction success ratio begins decrease, but DRTEMBB is stable. As Cycle is increasing, the effect of DRTEMBB is always better than EigenRep model. DRTEMBB Model demonstrates a very good success ratio with the increase of transaction cycles. 1 0.9 0.8 0.7 DRETMBB 0.6 0.5 EigenRep 0.4 General 0.3 0.2 0.1 0 5 10 15 20 25 30 35 40 45 50 Time cycle Figure 4. Trust Curve of Malicious Peers From Figure 4, we find that DRTEMBB model is more sensitive to sudden change of malicious nodes service action as transaction time cycle is creasing. While malicious nodes adopt intermittence cheating service action, trust decline rapidly, and as time goes by, and node’s trust decline is the faster than its trust restoration, from experiment results we find DRTEMBB model can identify effectively strategic of cheating nodes, and its effect is the better than the EigenRep model and the general model. Experiment 5.2: Analysis aiming at the scale of malicious recommendation nodes. DRETMB B EigenR ep 0.9 Transaction success ratio 1 0.9 0.8 0.7 0.6 0.5 0.4 DRETMBB Hassan 0.3 0.2 0.1 0 10 20 30 40 50 Impact of malicious percentage on success ratio Figure 7. Impact of Malicious Recommendation Nodes Percentage to Transaction Success Rate 1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Percent rate of malicious node Figure 5. Success Ratio of when Malicious Nodes Are Changed From Figure 5, we find that the success ratio of transaction decreases rapidly when malicious nodes continues to rise. When the number of malicious node gets above 60%, DRTEMBB Model maintains a quite high success ratio of transactions. Whether malicious node is individual or collaborative, DRTEMBB model demonstrates good inhibitory effect from the experiment results. Experiment 5.3: Analysis aiming at fixed rate of malicious nodes 1 0.9 Transaction success ratio Experiment 5.4: Analysis directing to malicious recommendation attack 0.8 Figure7.shows, with more and more increasing of dishonest node ratio, that trust computing accurateness declines gradually, which leads to the increasing of transaction failure numbers and the decreasing of the success ratio. Since each dishonest recommendation will influence trust value of itself, and will lead to self trust value to be decreased along with dishonest node ratio increasing, the decline magnitude of the success ratio of transaction is the larger. Hassan model has a kind of resistant ability, but because Hassan assumes that recommendation nodes have higher trust value, meanwhile, Hassan model lacks punishment mechanism, and it cannot effectively weaken the influence of dishonest recommendation to trust, and the success ratio of transaction decreases rapidly. DRTEMBB model shows that the dishonest node can be shielded through the other honest nodes of recommendation path, so decline of transaction success ratio is slower than Hassan model. In contrast to Hassan model, DRTEMBB model can filter malicious recommendation nodes more effectively and make recommendation information to be more accurate through leading into trade goods concept similarity and dynamic adjustment trust value. 0.7 VI. CONCLUSION AND FURTHER WORKS 0.6 DRETMBB EigenRep 0.5 0.4 0.3 0.2 0.1 0 10 20 30 Cycles 40 50 Figure 6. Transactions Success Ratio of The Cycle In a Fixed Malicious Nodes Rate We set the rate of malicious nodes in this experiment in order to observe the effect of DRTEMBB. From Figure6, © 2012 ACADEMY PUBLISHER Trust security is an important research domain in ecommerce environment. Though many researchers have proposed a few trust security models, there are also many questions in these trust models. So we propose a dynamic recommendation trust evaluation model, and we propose a novel evaluation method of QoRS in the model. By experiment results we could know our model is effective and good for restraining malicious recommendation and fraud service actions. At the same time, the novel evaluation method of QoRS can be effectively used to 288 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 ascertain the improvement degree of recommendation service ability, and evaluation for QoRS is the accuracy and objective through integrating recommendation success ratio and improvement degree of recommendation service ability, and the method solves effectively a problem that recommendation of the higher trust node is the better than the others, at the same time, and avoids to arise recommendation oligarch. In addition, we propose recommendation nodes bidding method, bidding method can encourage effectively recommendation nodes to the correct recommendation actively. Our future work will analyze the other methods for recommendation evaluation model such as multiattributes analysis evaluation method, and fuzzy mathematic analysis for weight of multi-attributes. [10] [11] [12] [13] [14] ACKNOWLEDGMENT This work was supported by a grant from National Science and Technology Major Project of the Ministry of Science and Technology of China (No.2012ZX03002001 -004) and the National Natural Science Foundation of China (No. 60873071, No.61172090). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation. REFERENCES [1] M Blaze, J Feigenbaum, J Lacy. Decentralized trust management[A].In Proc.17th IEEE Symposium on Security and Privacy[C].Los Alamitos: IEEE Computer Society Press,1996.164-173. [2] M.Blaze,Feigenbaum J,Keromytis AD. Keynote: Trust Management for public-key infrastructures. In:Christianson B,Crispo B,William S,et al., oeds. Cambridge 1998 Seeurity Protocols International Workshop. Berlin:Springer-Verglag,1999.59~63. [3] CHEN Chao,WANG Ru-Chuan, ZHANG Lin, DENG Yong. A Kind of Trust Appraisal Model Based on Bids in Grid[J]. Journal of NanJing University of Posts and Telecommunications(Natural Science), 2009, 29(5) :5964. [4] A.JøSang. A Logic For Uncertain Probabilities[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 9, No. 3 (June 2001), 279-311. [5] Xiong L, Liu L. PeerTrust: Supporting reputation-based trust for peer-to-peer electronic communities. IEEE Trans. on Knowledge And Data Engineering, 2004,16(7):843-857. [6] Kamvar S D , Schlosser M T. EigenRep: Reputation management in P2P networks[C]//Lawrence S.Proc of the 12th Int’l World Wide Web Conf.Budapest : ACM Press,2003:123-134. [7] Dou Wen, Wang Huai-Min, Jia Yan, Zhou Peng. A Recommendation-Based Peer-to-Peer Trust Model[J]. Journal of Software, 2004,15(4):571-583. [8] Tang Wen, Chen Zhong. Research of subjective trust management model based on the fuzzy set theory. Journal of Software, 2003, 14(8): 1401-1408(in Chinese) [9] Iguchi M,Terada M,Fujimura K. Managing resource and service reputation in P2P networks[C]//Proceedings of © 2012 ACADEMY PUBLISHER [15] [16] [17] [18] [19] [20] [21] the 37th Hawaii International Conference on System Sciences,2004. WANG Yu, ZHAO Yue-long, HOU Fang. A New Security Trust Model For Peer-to-Peer E-commerc. //Management of e-Commerce and e-Government, 2008. ICMECG '08. International Conference on Tian ChunQi, Jiang JianHui, Hu ZhiGuo, etc,.A Novel Super-Peer Based Trust Model for Peer-to-Peer Networks[J]. Journal of computer, 2010,33(2):345~355. (In Chinese). Li Xiao-Yong, Gui XiaoLin. Research on Dynamic Trust Model For Large Scale Distributed Environment[J]. Journal of Software, 2007, 18(6):1510~1521(In Chinese) LI YongJun,DAI YaFei.Research on Trust Mechanism for Peer-to-Peer Network[J]. Chinese Journal of computer,2010,33(3):390~405 Wang Gang, Gui XiaoLin, Wei GuangFu. A Recommendation Trust Model Based on E-Commerce Transactions Content-Similarity[C]. 2010 International Conference on Machine Vision and Human-machine Interface, 2010 Tian ChunQi, Zhou ShiHong, Wang WenDong, Cheng ShiRui.A New Trust Model Based on Advanced D-S Evidence Theory for P2P Networks[J]. Journal of Electronics&Information echnology,2008,30(6) : 14801484 ZHU YouWen,HUANG LiuSheng,CHEN GuoLiang, YANG Wei.Dynamic Trust Evaluation Model Under Distributed Computing Environment[J]. Chinese Journal of computer,2011,34(1):55-64 J Sabater,C Sierra.REGRET:A reputation model for gregarious societies[A].Fourth Workshop on Deception Fraud and Trust in Agent Societies[C].2001.61-70 (America) A • Paul Samuelson, Nordhaus, William • D •,Translated by Hongye Gao. 《 economics 》 ( 12th Edition). China Development Press,1992. Zhang Heng. A Application of Markov Chain[J]. Changchun Institute of Optics and Fine Mechanics, 1994,17(3):44-49. Zhang XiaoCheng, Li Jing, Gen Wei. Economic Applied Mathematics [M]. Nankai University Press,2008 Wang RongXin. Random process[M].Xi’an JiaoTong University Press,1987. Gang Wang, born in 1974, Ph.D.candidate, His current research interests include mainly Trust security, Service computing, Internet of things and cloud computing. XiaoLin Gui, born in 1966, Ph.D., Professor, Ph.D. supervisor. The chair professor of the Department of Computer Science, Xi’an Jiaotong University, His research interests include Service computing, Internet of things, dynamic trust management and cloud computing. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 289 Block-Based Parallel Intra Prediction Scheme for HEVC Jie Jiang Institute of Intelligence Control and Image Engineering, Xidian University, Xi’an, China Email: [email protected] Baolong Guo and Wei Mo Institute of Intelligence Control and Image Engineering, Xidian University, Xi’an, China Email: {blguo, wmo}@xidian.edu.cn Kefeng Fan Guilin University of Electronic Technology, Guilin 541004, China Email:[email protected] Abstract — Advanced video coding standards have become widely deployed in numerous products, such as multimedia service, broadcasting, mobile television, video conferences, surveillance systems and so on. New compression techniques are gradually included in video coding standards so that a 50% compression rate reduction is achievable every ten years. However, dramatically increased computational complexity is one of the many problems brought by the trend. With recent advancement of VLSI (the Very Large Scale Integration) semiconductor technology contributing to the emerging digital multimedia word, this paper intends to investigate efficient parallel architecture for the emerging high efficiency video coding (HEVC) standard to speed up the intra coding process, without any prediction modes ignored. Parallelism is achieved by limiting the reference pixels of the 4 × 4 subblocks, allowing the subblocks to use different direction modes to predict the residuals. Experimental implementations of the proposed algorithm are demonstrated by using a set of video test sequences that are widely used and freely available. The results show that the proposed algorithm can achieve a satisfying intra parallelism without any significant performance lose. Index Terms — HEVC, intra coding, parallel architecture, multiple directions. The purpose of these works is to design and evaluate the performance of new methods to reduce encoder complexity, while keeping the quality of reconstructed video sequences for intra coding. The works generally fall into two categories. 1. Fast mode decision approaches with early termination using adaptive thresholds or optimized Lagrangian rate distortion optimization (RDO) function [2-4]. 2. Parallel architectures to speed up the intra prediction process [5-14]. With recent advancement of VLSI (the Very Large Scale Integration) semiconductor technology contributing to the emerging digital multimedia word, research on parallel architectures gets more attention. In this paper, we focus on the second case, and present a block based parallel architecture to speed up the intra prediction for HEVC. The remaining parts of this paper are organized as follows: Section II reviews the state of art within the field of parallel architectures. Section III introduces the spatial prediction in HEVC. Section IV presents the proposed scheme, including 2X parallel intra prediction and its expansion to 4X parallelism. Experimental results are presented within Section V. Finally, we conclude this paper in section VI. II. RELATED WORK I. INTRODUCTION Continuous emergence of video coding standards and the growth in development and implementation technology for them have undoubtedly created a completely new world of multimedia. So far, contributions to video coding technology have mainly focused on improving coding efficiency. The challenges remain: not only to find efficient coding algorithms which require high performance but also to speed up the coding process. The ongoing video coding standard, High Efficiency Video Coding (HEVC) [1], is getting more attention due to its high compression efficiency. However, the computational complexity of HEVC would be 2-10 times higher than its counterpart, which is considered an obstacle to implement it in real-time. Therefore, many research works focus on how to reduce the computational complexity. © 2012 ACADEMY PUBLISHER doi:10.4304/jmm.7.4.289-294 The main image and intra frame of video compression extensively adopts the block-based structure from prediction and transform to entropy coding, where the coding of one block is dependent on the availability of its left, upper-left, and upper-right blocks. Such a highly dependent structure is not quite suitable for parallelization, especially for ASIC (Application Specific Integrated Circuit) solutions. Even so, when dual-core and quad-core computers are available, there are still many efforts on parallelizing the encoding and decoding from different aspects, as described below. 1. GOP (Group of Pictures) approaches: Barbosa [5] and Vander [6] propose to partition a sequence into some GOPs. The correlation between GOPs is low, and it can not only limit error propagation, but also support parallel coding processing. However, it needs to get the data of all the pictures in a GOP before parallelism. When the GOP has too 290 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 many pictures, it will lead to serious delay, which is not convenient for the real-time video coding applications. 2. Frame approaches: Chen [7] proposes to realize the parallelism coding at the frame level. This approaches is limited to the correlation between frames, therefore, the speedup cannot be linearly increased corresponding to the number of process cores. 3. Pipeline approaches: Gulati [8] and Klaus [9] propose to organize the prediction, transform, and entropy coding of macroblocks (MBs) as a pipeline and assign them to multiple cores for parallel computing. This class of approaches can achieve limited parallelisms if workloads are unbalanced at different cores. Two times speedup is reported for high definition sequences on general-purpose quad-core computers in [9]. 4. Slice partitioning (SP) approaches: Rodriguez [10] proposes to partition an image into some regions that are referred to as slices. The coding of slices can be carried out independently by different cores. They provide good parallelism but would result in a significant loss on coding performance if there are too many slices. 5. MB-reordering (MR) approaches: Despite these efforts on parallelizing existing coding algorithms, due to strong dependency among blocks in intra prediction and the filtering process of top/left reconstructed samples, the intra luma prediction process of small blocks is a challenge for parallelization. The original process handles blocks in serial, which is not efficient. Efficient architectures have been reported in [11-14], which propose to process MBs in the wave front order so that MBs in each diagonal line can be coded concurrently when neighboring MBs are available. Owing to the fine-granularity parallelism at the MB level, these approaches can achieve good parallelism and are more widely used at present. Huang's work [11] has bubbles between Intra 4 × 4 predictions because of the low throughput of reconstruction process so that the prediction has to wait for the completion of reconstruction. Lee's work [12] perfectly pipelines the intra prediction and reconstruction process; however, it requires that both intra prediction and reconstruction have exact equal processing cycles. It also reduces some prediction modes in some blocks in order to enforce pipelining; hence, the video quality is degraded. Jin's work [13] proposes both partially and fully pipelined architectures for intra 4 × 4 prediction and has the same drawback as the approach in [12]. Moreover, the architectures add dependency graph process in order to improve gains; however, this increases hardware overhead. It takes 25 cycles to process each block, which is too long for high throughput reconstruction. Similar to Huang's work, Suh's work [14] proposes an efficient parallel architecture followed by a redundancy reduction algorithm to speed up the intra 4 × 4 prediction. However, the approaches above all have drawbacks either with the pipelining architecture or in compression gains. In this paper, we propose a block-based intra prediction parallel algorithm to solve the problem above. joint proposal to the high efficiency video coding standardization effort have been partially adopted into the HEVC Test Model (HM). The major improvements for intra coding come from two tools described below. A. Intra Prediction The nine intra prediction modes supported in H.264/AVC with different directionalities is not flexible enough to represent complex structures or image segments with different directionalities. HEVC extends the set of directional prediction modes of H.264/AVC, providing increased flexibility and more accurate predictions for the sample values. The increased prediction accuracy provides significant reductions in residual energy of the intra coded blocks and improvements in coding efficiency [15]. It can support from 16 to 34 prediction directions for different prediction size, and the prediction can achieve 1/8 pixel accuracy. Table I shows a 64 × 64 prediction unit contains different subblock size and its corresponding prediction modes. The 34 directions of the intra prediction modes are illustrated in Fig. 1. TABLE I A 64 ×64 PREDICTION UNIT CONTAINS DIFFERENT SUB-BLOCK SIZE AND ITS CORRESPONDING PREDICTION MODES Amount Size Modes 256 4×4 17 64 8×8 34 16 16×16 34 4 32×32 34 1 64×64 5 Figure 1. 34 Prediction modes for HEVC III. SPATIAL PREDICTION IN HEVC ISO-IEC/MPEG and ITU-T/VCEG recently formed the joint collaborative team on video coding (JCT-VC). The JCT-VC aims to develop the next-generation video coding standard, called high efficiency video coding (HEVC). A © 2012 ACADEMY PUBLISHER It improves the coding efficiency significantly through permitting block prediction in an arbitrary direction by indicating the prediction angle. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 B. Quadtree-based coding structure Coding efficiency can be significantly improved by utilizing macroblock structures with sizes larger than 16 × 16 pixels, especially at sequences with high resolutions [16]. To achieve a more flexible coding scheme, HEVC utilizes a quadtree-based coding structure [17] with support for macroblocks of size 64 × 64, 32 × 32, 16 × 16, 8 × 8, and 4 × 4 pixels. HEVC separately defines three block concepts: coding unit (CU), prediction unit (PU) and transform unit (TU). After the size of largest coding unit (LCU) and the hierarchical depth of CU have been defined, the overall structure of codec is characterized by the various sizes of CU, PU and TU in a recursive manner. This allows the codec to be readily adapted for various kinds of content, applications, or devices that have different capabilities. The CU splitting process is described in Fig. 2, using a Split flag to ensure whether the current CU needs to split into a smaller size or not. 2N0 2N1 2N2 2N3 CU0 CU1 CU2 CU3 Split flag=0 Split flag=0 Last depth: No Split flag Split flag=0 0 1 0 1 0 1 2 3 2 3 2 3 Split flag=1 Split flag=1 Split flag=1 CU Size=64 CU Depth=0 N0=32 CU Size=32 CU Depth=1 N1=16 CU Size=16 CU Depth=2 N2=8 CU Size=8 CU Depth=3 N3=4 The coding efficiency improvements are more visible at higher resolutions, where bit rate reductions reach around 50% and 35% for the low delay and random access experiments, respectively. C. Observation and Motivation Intra prediction is currently achieved by partitioning a largest coding unit (LCU) into one or more blocks through a recursive splitting process. Each block is predicted spatially and subsequently refined, and the prediction is performed sequentially using neighboring reconstructed blocks. The prediction process requires that the causal neighbors of the current block must be completely reconstructed before processing the current block. It results in a set of serial dependencies, and these serial dependencies result in significant complexity for both the encoder and decoder processes. 1 2 3 Figure 3. An 8 ×8 block with four 4 ×4 blocks © 2012 ACADEMY PUBLISHER Take an 8 × 8 block with four 4 × 4 blocks as an example in Figure 3, where block0 is the first 4 × 4 block in decoding order, Block1 is the second one and so forth. Intra predictors for block0 are computed from the top and left neighboring MBs. As one can see, there is a strong dependency (serialization) in computation of intra predictors for general 4 × 4 intra mode. Indeed, the block1 depends on the reconstruction samples of block0. The block2 in turn depends on the reconstructed samples block0 and block1 and its left neighbor block. The block3 depends on the reconstructed samples of block0, block1 and block2. This dependency makes a parallelization of intra prediction to be challenging especially for ASIC implementations. Therefore, it is necessary to minimize dependencies and improve performance respectively. IV. BLOCK BASED PARALLEL INTRA PREDICTION In HEVC, for the inter prediction has greater complexity, however, it can realize motion estimation without referring to the neighbor blocks, the parallelism can be easily realized, While the intra prediction process requires that the causal neighbors of the current block must be completely reconstructed before processing the current block. It results in a set of serial dependencies, which makes a parallelization of intra prediction to be a challenge, especially for the 4 × 4 intra prediction. A. 2X parallel intra prediction Figure 2. The CU splitting process 0 291 The parallel intra-prediction approach divides a coding unit into two partitions: all blocks in the first partition are predicted without reference to each other; similarly, all blocks in the second partition are predicted with reference to the first partition but also without reference to each other. It consists of three steps: 1. Partitioning the coding units into two sets 2. Predicting the first set blocks (shaded blocks in Fig. 3) in parallel using the already reconstructed block neighbors 3. Predicting the second set blocks (white blocks in Fig. 3) in parallel using already reconstructed block neighbors and also the neighbors from the first set blocks Take an 8 × 8 block with four 4 × 4 blocks in Fig. 3 as an example, block0 and block1 are in the first set, while block2 and block3 are in the second set. With the partition above, then we define that all blocks in a single partition be processed in parallel. It means that the blocks in the first partition are predicted without reference to each other (or blocks in the second partition). Blocks in the second partition are also predicted without reference to other blocks in the second partition. In all cases, a block could use the pixel values from neighboring coding unit for intra prediction. Fig. 4 shows the difference between HEVC and the proposed scheme. To predict the first set of blocks, we use pixel values from the upper and left coding unit’s boundaries. For example, to predict block1 in Fig. 4 B, we used reconstructed pixel values from the dark gray pixels. Its left reference pixels are different from that in HEVC showed in Fig. 4 A. It utilizes the already reconstructed pixel values of the left coding unit’s boundaries. Therefore, block1 and block0 can be predicted in parallel. Following the prediction of first set of blocks, we then add any transmitted residual to the predicted value to generate a reconstructed first set of blocks. After 292 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 reconstructing the first set of blocks, we proceed to predict the second set of blocks. As mentioned before, these blocks are predicted in parallel and use the reconstructed pixel values from blocks in the first partition and the left coding unit’s boundaries, as showed in Fig. 4 D. In the proposed scheme, none of the prediction modes are ignored. That means it support all the prediction directions mentioned in Part A, Section III. Therefore, it can realize the parallism without significant performance loss. 0 1 0 1 2 3 2 3 and are simply predicted using their 8 × 8 block neighbors, as shown in Fig. 6. However, each 4 × 4 block can have its own predict mode, residual and transform. In this case, the distances between the referenced pixels and predicting pixels increase, therefore it will lead to a bit rate increase, as will be seen in the results later. Figure 6. A. Block 1's reference pixelsin HEVC B. Block 0 and Block 1's reference pixels in 2X parallelism 0 1 0 1 2 3 2 3 C. Block 3's reference pixelsin HEVC Figure 4. D. Block 2 and Block 3's reference pixelsin 2X parallelism The blocks’ reference pixels By employing the partitioning strategy, we can achieve a direct increase in parallelism. This parallelism is shown graphically in Fig. 5. The 4 × 4 intra prediction within an 8 × 8 coding unit requires 4 sequential steps while our partitioning approach results in only two sequential steps. This is an increase in parallelism by a factor of 2X. In general, the increase in parallelism is N/2, where N is the number of blocks within the macro-block. 0 1 2 3 The blocks’ reference pixels for 4X parallelism V. EXPERIMENTAL RESULTS The parallel intra prediction technology has been integrated into the HEVC reference software HM3.0 [18] and tested the recommended configuration on 18 sequences listed in Table II. The test sequences include two 4K sequences, five 1080p sequences, four WVGA sequences, four WQVGA sequences, and three 720P sequences, which are widely used in various applications. We use two common test conditions described in [19], Intra (high efficiency condition) and Intra LoCo (low complexity condition). The common test conditions are set as follows. 1) 10 seconds length are encoded as intra frames for each sequence. 2) The QP is set at 22, 27, 32, and 37. 3) In-loop deblocking filter and RDO are enabled. TABLE II COMPARISON OF THE PERFORMANCE FOR 1DDCT AND 2DDCT Predicted block Reconstructed Block Class A 4K Traffic PeopleOn Street (a). Traditional prediction structure Class B 1080p Class C WVGA Class D WQVGA Class E 720P Kimono BasketballDrill BasketballPass Vidyo1 ParkScene BQMall BQSquare Vidyo3 Cactus PartyScene BlowingBubbles Vidyo4 RaceHorses RaceHorses BasketballDrive BQTerrace (b). 2X parallelism prediction structure Figure 5. Processing order for the 4×4 blocks within an 8×8 intra predicted unit B. 4X parallel intra prediction The method can also be extended the parallelism to be 4X. In this extension, all of four blocks are treated as the first set blocks. They are restricted to have the same prediction mode, © 2012 ACADEMY PUBLISHER Table III below shows the summary results of 2X parallel intra prediction vs. HM3.0. Table IV shows summary results of 4X parallel intra prediction vs. HM3.0. Y BD-rate means the BD-rate increase for lumu component, while U BD-rate and V BD-rate for the chroma component. Compare Table III with Table IV, we can find that the coding performance of 4X parallism is not as good as 2X parallism. For the 4X parallism case, the distances between the referenced pixels and predicting pixels increase, and that reduces the prediction accuracy, which results in the performance lose. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 293 TABLE III RD RESULTS OF 2X PARALLEL INTRA PREDICTION WITH VS. HM3.0 Intra Intra LoCo Y BDrate % U BDrate % V BDrate % Y BDrate % U BDrate % V BDrate % Class A 0.6 0.1 0 0.5 0.4 0.4 Class B 0.7 0.6 -0.3 0.7 1 0.5 Class C 1.5 1.1 0.8 1.5 1.5 1.3 Class D 1.8 1.4 0.6 1.9 0.6 2.3 Class E 0.7 1.2 -0.5 1 0.8 0.9 All 1.1 0.8 0.2 1.1 0.9 1.1 Figure 8. The comparison of RD performance (RaceHorses, HE) TABLE IV VI. CONCLUSIONS RD RESULTS OF 4X PARALLEL INTRA PREDICTION WITH VS. HM3.0 Intra Intra LoCo Y BDrate % U BDrate % V BDrate% Y BDrate% U BDrate % V BDrate % Class A 1.1 -0.6 -0.7 1.1 0.3 0.9 Class B 1.2 0.6 0 1.3 0.8 1.1 Class C 2.7 1.9 1 3.3 2.5 2.3 Class D 3.5 1.4 1.3 3.5 2.5 2.1 Class E 1.6 1 0.7 2.1 1.8 0.5 All 2 0.9 0.4 2.2 1.5 1.4 Two example of RD curves are given in Fig. 7 (BasketballPass, LoCo) and Fig. 8 (RaceHorses, HE), showing the proposed performance at four different bit rates. It is observed that the proposed method can achieve the parallism without significant RD performance lose in both low bit rate and high bit rate. Due to the various special applications of intra coding frame such as random access point and refresh synchronous frame, intra coding is an important part of video coding. For the intra coding blocks are predicated dependent on the availability of the adjacent blocks in the former rows and columns, so the parallel coding or decoding process cannot be realized among the intra coding blocks, which increases the difficulties for the implementation of real-time coding in high definition and ultra high definition applications. Therefore, how to increase the capacity of parallel computing effectively has considerable value. In this paper, a parallel prediction scheme for HEVC is proposed. The parallel prediction unit supports the parallelization of intra-coded blocks within the current coding unit in a two-step parallel process. Parallelism is achieved by limiting the reference pixels of the 4 × 4 subblocks, without any prediction modes ignored. Experimental results show that the increased parallelization comes with small losses in coding efficiency. For example, about 1% for HD sequences of 2X parallelism. We assert that this loss is negligible and well justified by the parallelization capability. We described 2X and 4X parallelism in detail, and in fact, the parallelism can also be expanded to higher parallelism. It can flexibly fit for the various applications. ACKNOWLEDGMENT This work is supported by the National Natural Science Foundation of China (No. 60802077; No. 61003196; No. 61172053) REFERENCES Figure 7. The comparison of RD performance (BasketballPass, LoCo) © 2012 ACADEMY PUBLISHER [1] K. Ugur, K. Andersson and A. Fuldseth et al. ―High Performance, Low Complexity Video Coding and the Emerging HEVC Standard‖, IEEE Transactions on Circuits and Systems for Video Technology, 2010, 20(12), pp: 16881697. [2] G. Sullivan, P. Topiwala, and A. Luthra, "Fast 4x4 Intra prediction Based on The Most Probable Mode in H.264/AVC," IEICE Electronics Express, 2008, 5(19), pp. 782-788. 294 [3] Yinji Piao, Junghye Min, and Jianle Chen et al, ―Encoder improvement of unified intra prediction,‖ JCTVC-C207, Guangzhou, Oct. 2010. [4] Liang Zhao, Li Zhang, and Xin Zhao, ―Further Encoder Improvement of intra mode decision,‖ JCTVC-D283, Daegu, Jan. 2011. [5] Denilson M.Barbosa, Joao Paulo Kitajama and Wagner Meira JR, ―Parallelizing MPEG Video Encoding using Multiprocessors‖, Proceedings of the XII Brazilian Symposium on Computer Graphics and Image Processing, 1999, pp: 215-222. [6] E. B. vander, E. G. T. Jaspers, R. H. Gelderblom, ―Mapping of H.264 Decoding on a Multiprocessor Architecture‖, SPIE Conf. on Image and Video Communications and Processing, 2003. 5(7), pp: 707-718. [7] Y. Chen, E. Li and X. Zhou, ―Implementation of H.264 encoder and decoder on personal computers‖, J. Vis. Commun. Image Representation, 2006, 17(2), pp: 509–532. [8] A. Gulati and G. Campbell, ―Efficient mapping of the H.264 encoding algorithm onto multi-processor DSPs‖, in Proc. SPIE - IS&T Electronic Imaging, 2005, pp: 94–103. [9] O. L. Klaus Schoffman, M. Fauster, and L. Boszormeny, ―An evaluation of parallelization concepts for baseline profile compliant H.264/AVC decoders‖, in Proc. Euro-Par—Parallel Processing, 2007, Lecture Notes in Computer Science, pp: 782–791. [10] A. Rodriguez, A. Gonzalez and M. P. Malumbres, ―Hierarchical parallelization of an H.264/AVC video encoder‖, in Proc. Int. Symp. Parallel Comput. Elect. Eng., 2006, pp: 363–368. [11] Y. W. Huang, B. Y. Hsieh, and T. C. Chen, "Analysis Fast Algorithm and VLSI Architecture Design for H.264/AVC Intra Frame Coder," IEEE Trans. Circuit and Systems for Video Technology, 2005, 15(3), pp. 378-401, Mar. 2005. [12] W. Lee, S. Lee, and J. Kim, "Pipelined Intra Prediction Using Shuffled Encoding Order for H.264/AVC," TENCON 2006, pp. 14-17, Nov. 2006. [13] G. Jin and H. J. Lee, "A Parallel and Pipelined Execution of H.264/AVC Intra Prediction," IEEE International Conference on Computer and Information Technology, CIT'06, pp. 246250, Sep. 2006. [14] K. Suh, S. Park and H. Cho, "An Efficient Hardware Architecture of Intra Prediction and TQ/IQIT Module for H.264 Encoder," ETRI Journal, vol. 27, no. 5, pp. 511-524, Oct. 2005. [15] K. McCann, W.-J. Han, and I.-K. Kim, et al, "Video Coding Technology Proposal by Samsung (and BBC)". JCTVC-A124, Dresden, Germany, Apr. 2010. [16] S. Ma and C.-C. J. Kuo, ―High-definition video coding with super-macroblocks,‖ Proc. SPIE, vol. 6508, part 1, 650816, Jan.2007. [17] F. Bossen, V. Drugeon and E. Francois, ―Video Coding Using a Simplified Block Structure and Advanced Coding Techniques‖, IEEE Transactions on Circuits and Systems for Video Technology, 2010, 20(12), pp: 1667-1675. [18] HEVC Reference Software HM-3.0[CP]. https://hevc.hhi.fraunhofer.de/svn/svn_TMuCSoftware/tags/H M-3.0. [19] F. Bossen, ―Common test conditions and software reference configurations,‖ JCTVC-E700, Geneva, Switzerland, Mar. 2011. Jie Jiang received the B.S. and M.S. degree both in signal and information processing from Xidian University, Xi’an, China, in 2005 and 2008, respectively. She is currently working toward the Ph.D. degree with the Institute of Intelligent Control and Image Engineering (ICIE), Xidian University. Her research interests include video coding and communication. © 2012 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Baolong Guo received his B.S., M. S. and Ph.D. degrees from Xidian University in 1984, 1988 and 1995, respectively, all in communication and electronic system. From 1998 to 1999, he was a visiting scientist at Doshisha University, Japan. He is currently a full professor with the Institute of Intelligent Control and Image Engineering (ICIE) at Xidian University. His research interests include intelligent information processing, image processing and communication. Wei Mo was born in Guangxi Province, China, in 1956. He is a professor, Ph.D. supervisor at Xidian University. His research interests include multimedia coding and intelligent information processing. Fan Kefeng completed his Postdoctoral Research in the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China. His research interests include 3D TV, Smart TV, DRM, digital interface, et al. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 295 Optimized LSB Matching Steganography Based on Fisher Information Yi-feng Sun Zhengzhou Information Science and Technology Institute, Zhengzhou , China Email: [email protected] Dan-mei Niu Electronic Information Engineering College, Henan University of Science and Technology, Luoyang, China Email: [email protected] Guang-ming Tang and Zhan-zhan Gao Zhengzhou Information Science and Technology Institute, Zhengzhou , China Email: [email protected] [email protected] Abstract—This paper proposes an optimized LSB matching steganography based on Fisher Information. The embedding algorithm is designed to solve the optimization problem, in which Fisher information is the objective function and embedding transferring probabilities are variables to be optimized. Fisher information is the quadratic function of the embedding transferring probabilities, and the coefficients of quadratic term are determined by the joint probability distribution of cover elements. By modeling the groups of elements in a cover image as Gaussian mixture model, the joint probability distribution of cover elements for each cover image is obtained by estimating the parameters of Gaussian mixture distribution. For each sub-Gaussian distribution in Gaussian mixture distribution, the quadratic term coefficients of Fisher information are calculated, and the optimized embedding transferring probabilities are solved by quadratic programming. By maximum posteriori probability principle, cover pixels are classified as the categories corresponding to sub-Gaussian distributions. At last, in order to embed message bits, pixels chose to add or subtract one according to the optimized transferring probabilities of the category. The experiments show that the security performance of this new algorithm is better than the existing LSB matching. Index Terms—steganography, security, information hiding, quadratic programming, Fisher information I. INTRODUCTION The aim of steganography is to hide secret message imperceptibly into a cover, so that the presence of hidden data cannot be diagnosed. But steganogrpahy faces the threaten of steganalysis. Steganalysis aims to expose the presence of hidden data. How to improve the security of steganography is an important problem. In Least Significant Bit (LSB) replacement algorithm, Manuscript received January 1, 2012; revised June 1, 2012; accepted July 1, 2012. National Nature Science Foundation of China (Grant No.60970141, 60903220) corresponding author: Yi-feng Sun, [email protected]. © 2012 ACADEMY PUBLISHER doi:10.4304/jmm.7.4.295-302 the LSBs of cover elements are replaced with message bits. Some structural asymmetry (never decreasing even pixels and increasing odd pixels when hiding message bit) is introduced. It is easy to detect the existence of hidden message. A trivial modification of LSB replacement is LSB matching, which randomly increases or decreases pixel values by one to match the LSBs with the message bits. LSB matching is much harder to detect than LSB replacement algorithm. The embedding in LSB matching is similar to adding an independent and identically distributed (IID) noise sequence independent of cover. It is well known that values of neighboring pixels in natural images are not independent. Further more, there are complex dependences in the noise component of neighboring pixels [1]. These dependences are violated by LSB matching. Many steganalysis methods utilized this fact [1] [2]. The LSB matching also needs to be improved. There are two kinds of approaches to improve steganography. The first approach is to preserve a chosen cover model in steganographic methods, such as modelbased (MB) steganography[3][4], OutGuess[5], the statistical restoration based steganographic algorithms [6][7], which preserve the first and second order statistics. Another strategy is to minimize a heuristically-defined embedding distortion. The steganography using tri-way pixel-value differencing [8] reduces the quality distortion of stego-image. Matrix embedding methods [9][10][11] minimize the change number of cover elements by linear codes, and the change number is predefined as embedding distortion. Many other syndrome codes steganographic methods, such as steganography using wet paper codes[12], steganographic algorithms using syndrome trellis codes [13][14][15][16], minimize the more complex embedding distortion. According to experiments, the above steganographic methods have the better performance. In this paper, we propose an optimized LSB matching steganography based on Fisher information. It can 296 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 demonstrate the security improvement in the sense of Kullback Leibler (KL) divergence. Firstly, we introduce the relation between KL divergence and Fisher information, and explain why Fisher information can be used to improve steganographic security. Then we present an embedding optimization framework based on Fisher information. In the framework, an embedding algorithm is designed to solve the optimization problem whose objective is minimizing the Fisher Information. The optimized LSB matching algorithm is an instance of the framework. We assume that the groups of pixels in the cover image submit Gaussian mixture distributions. After obtaining the Gaussian mixture distribution of pixel group for each cover image, the optimized embedding transferring probabilities are solved by quadratic programming for each sub-Gaussian distribution. Cover pixels are classified as the categories corresponding to sub-Gaussian distributions. The embedding is operated by the optimized embedding transferring probabilities. The experimental results show that the optimized LSB matching steganography is better than the existing LSB matching in the aspect of security performance. KL divergence between cover and stego distributions can be used to measure the stganographic security [17]. Let the distribution of cover objects be P0 , and the distribution of stego objects be Pλ , where the parameter λ denotes the relative payload, and 0 ≤ λ ≤ 1 . The KL divergence between P0 and Pλ is denoted as D( P0 || Pλ ) .The smaller the D( P0 || Pλ ) is, the more secure the steganography will be. If D( P0 || Pλ ) is equal to 0, stego-system is perfectly secure. The physical interpretation of cover objects is not discussed in [17]. We think that each statistic from a cover can represent the cover object. Here the elements sequence of cover, such as the sequence of the pixel value, or the sequence of Discrete Cosine Transform (DCT) coefficients, is selected as the presentation of the cover object. The elements sequence is denote by C = (C1 , C2 , ⋅⋅⋅, CN ) , where N is the number of the elements that can be used to embed secret information in a multimedia cover. In the following, we use P0( N ) and Pλ( N ) instead of P0 and Pλ . It emphasizes that P0( N ) and Pλ( N ) is N-dimensional joint probability distribution. Assuming that P0( N ) is known and embedding function is fixed, KL divergence D( P0( N ) || Pλ( N ) ) is the function of λ . KL divergence can also be denoted as d N (λ ) . The second derivative of d N (λ ) to λ (when λ = 0 ) is known as steganographic Fisher Information (FI) [18]. It is denoted by d N′′ (0) . When λ → 0 , © 2012 ACADEMY PUBLISHER corresponds to an embedding method. The probability bxy is also called as the embedding transferring probability. For embedding matrix B , Fisher information is calculated [20]: d N′′ (0) = ∑ y∈χ N A(y ) 2 − N2 , P (C = y ) (N ) 0 (2) n A(y ) = ∑∑ [ P0( N ) (C = y[u / yi ]) ⋅ buyi ] . (3) Here y = ( y1 , y2 ,L , yN ) ∈ χ N denotes the value of INFORMATION 1 d N′′ (0) ⋅ λ 2 . 2 is the set of cover and stego elements. The matrix B = (bxy ) x∈χ , y∈χ , also named as embedding matrix, i =1 u∈χ II. STEGANOGRAPHIC SECURITY AND FISHER D( P0( N ) || Pλ( N ) ) ≈ When relative payload λ is fixed, Fisher Information is smaller, and KL divergence will be smaller. Thus Fisher Information can be used for evaluating steganographic security. Assuming that embedding operations are mutually independent (MI) [19], Fisher Information can be calculated. The probability of cover element being x and stego element being y is denoted by the conditional probability P ( S k = y | Ck = x) = bxy , x ∈ χ , y ∈ χ , where χ (1) C = (C1 , C2 , ⋅⋅⋅, CN ) , where χ N = χ × χ × L × χ . Here y[u yi ] denotes the sequence ( y1 ,L , yi −1 , u, yi ,L , y N ) . It means that u replaces the item yi in y . The computational complexity of Fisher information is decided by N. We call d N′′ (0) N-dimensional Fisher Information. We often simplify the model of the covers in order to calculate KL divergence. For example, the elements of cover objects are IID. But it ignores the correlation among the elements of cover objects. Hence, we suppose that the cover is composed of IID element groups. The elements in a group are correlated, and the number of elements in a group is n. If a group is composed of two elements, we can take the impact of second-order correlation into account. If it includes more elements, we can analyze the influence of "higher order" correlation. For cover C = (C1 , C2 , ⋅⋅⋅, CN ) , starting from the first element C1 , every n adjacent elements are divided into a group, whose n-dimensional joint distribution is denoted by P0( n ) . If P0( n ) replaces P0( N ) , and n replaces N in (2) and (3), we can obtain the calculation formula about n-dimensional Fisher information corresponding to the element groups. The n-dimensional Fisher information is denoted by d n′′(0) . In the following, we focus on decreasing the value of d n′′(0) to improve the security of embedding method. III. THE EMBEDDING OPTIMIZATION FRAMEWORK BASED ON FISHER INFORMATION In order to improve the steganographic security, we need to find the embedding algorithm whose B matrix corresponds to the small Fisher information. In general, JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 297 the embedding matrix B = (bxy ) x∈χ , y∈χ is usually constant for each cover. For example, in LSB matching algorithm, if the message bit does not match the LSB of the cover pixel, one is randomly either added or subtracted from the value of the cover pixel. That is to say bxx +1 = bxx −1 = 0.25 and bxx = 0.5 for each cover. Here we think the probability bxy is variable, and we adjust bxy to minimize d n′′(0) for each cover, assuming that P0( n ) can be obtained according to one cover. B = (bxy ) x∈χ , y∈χ So can be adapted according to the cover statistical properties, and we can get optimized embedding algorithm. Here d n′′(0) is determined by the n-dimensional joint probability distribution P0( n ) and embedding matrix B = (bxy ) x∈χ , y∈χ . If bxy , x ∈ χ , y ∈ χ is viewed as variables, Fisher information is the function of bxy . To improve the steganographic security, we need solve the optimization problem in which the Fisher information d n′′(0) is the objective function. Here n is constant, and the optimization problem is B optimized = arg min B = ( bxy ) A(y ) 2 , (n) 0 (C = y ) ∑P y∈χ n (4) with the constraints ⎧∑ bxy = 1, x ∈ χ ⎪ y∈χ . ⎨ ⎪⎩bxy ≥ 0 We expand the formula as follows: ∑P y∈χ n (5) IV. OPTIMIZED LSB MATCHING EMBEDDING EALGORITHM IN SPATIAL DOMAIN In this paper, we optimize the LSB matching algorithm in image spatial domain. To get the optimized bxy , we must know the n-dimensional joint distribution P0( n ) of the pixel group made of adjacent pixels. The distribution of pixel group in spatial domain is complex. But Gaussian mixture model (GMM) can describe many complex distributions approximately [21]. We take GMM as the distribution model of the n-dimensional pixel group in spatial domain. GMM is the sum of weighted sub-Gaussian distributions. It can fit very complex distribution by adjusting the parameters of GMM. The small local areas of an image often correspond to a particular object or belong to the same background. Therefore, we can think that the pixel groups in a local area have the same distribution characteristics. Sub-Gaussian distributions in GMM can reflect these local characteristics. The pixel groups in an image are instances of these sub-Gaussian distributions in GMM. For GMM composed of K sub-Gaussian distributions, pixel groups can be divided into K categories, and each sub-Gaussian distribution associates to a separate category. Different pixel groups belonging to different categories have different distribution characteristics. The embedding matrix B should be optimized according to the local distribution properties of the cover image. Denote the optimized embedding matrix corresponding to k k sub-Gaussian distribution as B optimized , where k = 1, 2,L , K . For a pixel which is selected to embedding message bit, if it belongs to the category of k sub- A(y ) 2 (C = y ) ( n) 0 n n = ∑∑∑∑ ∑ P0( n ) (C = y[u / yi ]) P0( n ) (C = y[v / y j ]) u∈χ v∈χ i =1 j =1 y∈χ n P0n (C = y ) buyi bvy j ⎛ ⎞ (6) ⎜ n n P0( n ) (C = y[u / yi , b / y j ]) P0( n ) (C = y[a / yi , v / y j ]) ⎟ ⎟ bua bvb = ∑∑∑∑ ⎜ ∑∑ ∑ P0( n ) (C = y[a / yi , b / y j ]) ⎟ u∈χ v∈χ a∈χ b∈χ ⎜ i =1 j =1, yh ∈χ b≠a ⎜ j ≠ i (h =1,⋅⋅⋅, n ) ⎟ ( h≠i, j ) ⎝ ⎠ ⎫ ⎧ P0( n ) (C = y[u / yi , a / y j ]) P0( n ) (C = y[ a / yi , v / y j ]) n P0( n ) (C = y[u / yi ]) P0( n ) (C = y[v / yi ]) ⎪⎪ ⎪⎪ n n + ∑∑∑ ⎨∑∑ ∑ +∑ ∑ ⎬ bua bva P0( n ) (C = y[ a / yi , a / y j ]) P0( n ) (C = y[a / yi ]) u∈χ v∈χ a∈χ ⎪ i =1 j =1 yh ∈χ i =1 yh ∈χ ⎪ ≠ j i (h =1,⋅⋅⋅, n ) (h =1,⋅⋅⋅, n ) ⎪⎭ ⎪⎩ ( h≠i ) (h ≠i, j ) Here y[a / yi , b / y j ] denotes the sequence ( y1 ,L , yi −1 , a, yi +1 ,L y j −1 , b, y j +1 ,L , yn ) . From (6), we can find that Fisher information d n′′(0) is a quadratic function about the probability bxy . Searching bxy which minimizes Fisher information is a quadratic programming problem. © 2012 ACADEMY PUBLISHER Gaussian distribution, we should modify it according k to B optimized . The whole framework of the optimized embedding algorithm is shown in Figure 1. Firstly, we estimate the GMM parameters for the pixel groups in a cover image. Secondly, we obtain the optimized embedding matrix for each sub-Gaussian distribution in GMM. Then we should judge the sub-Gaussian distribution category of the pixels which is selected to embedding message bit. At last, if the LSBs of those pixels don’t equal to message bits, the 298 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 pixels should add or subtract 1 according to the transferring probabilities indicated by the optimized embedding matrix corresponding to each category. i = 1, ⋅⋅⋅, N1 − 1 and j = 1, ⋅⋅⋅, N 2 − 1 , there are ( N1 − 1)( N 2 − 1) samples in total; at last, every two adjacent pixels in the vice diagonal direction Estimate the GMM of pixel group According to the 1th subGaussian distribution, obtain the optimized embedding matrix of the first category Cover image Stego-key K1 Select pixels to embed message bits According to the 2th subGaussian distribution, obtain the optimized embedding matrix of the second category Judge the category of the pixel … According to the kth subGaussian distribution, obtain the optimized embedding matrix of the kth category Stego image Add or subtract 1 according to transition probabilities indicated by optimized embedding matrix corresponding to the category 10100100110100 Secret message Figure 1. The framework of the optimized LSB matching algorithm (Ci +1, j , Ci , j +1 ) also constitute ( N1 − 1)( N 2 − 1) samples. A. Estimating GMM for each cover image For pixel groups y = ( y1 , y2 ,L , yn ) in a cover image, we assume that they submit to the following Gaussian mixture distribution K P0 (y ) = ∑ π k P0, k (y ) , (7) k =1 P0, k (y ) = 1 Rk (2π ) n 2 −1 2 ⎧ 1 ⎫ exp ⎨− (y − mk ) Rk−1 (y − mk )T ⎬ (8) ⎩ 2 ⎭ where K is the amount of sub-Gaussian distributions in GMM. Here mk and Rk represent the mean vector and covariance matrix of sub-Gaussian distribution respectively, and π k denote the weights of the subGaussian distributions, where k = 1, 2, ⋅⋅⋅, K . In the following, we discuss how to estimate the parameters {mk , Rk , π k }k =1,2,L, K for each cover image. The pixel group is made up of two adjacent pixels in order to reduce computational complexity in our scheme. But in a cover image, the horizontal adjacent pixels, the vertically adjacent pixels, and the adjacent pixels in main diagonal direction or in vice diagonal direction reflect different second-order statistical properties. In order to embody these second-order properties in GMM, we get pixel group samples as follows: Assume that an image C has N1 × N 2 pixels. Firstly, from the pixel in the first line and the first column of the image, every two horizontal adjacent pixels (Ci , j , Ci , j +1 ) constitute a pixel group sample, where i = 1, ⋅⋅⋅, N1 and j = 1, ⋅⋅⋅, N 2 − 1 , and there are a total of N1 × ( N 2 − 1) samples; secondly, every two vertical adjacent pixels (Ci , j , Ci +1, j ) constitute a sample, and the number of these samples is ( N1 − 1) × N 2 ; Then, every two adjacent pixels in the main diagonal direction (Ci , j , Ci +1, j +1 ) constitute a sample, where © 2012 ACADEMY PUBLISHER With the above 4( N1 − 1)( N 2 − 1) + ( N1 − 1) + ( N 2 − 1) samples, we use the MDL algorithm of Purdue University [21] to estimate the parameters of GMM. The MDL algorithm is an extension of Expectation Maximum (EM) algorithm. Please refer to the literature [22]. Note that GMM is the continuous distribution. We need the discrete distribution in optimization problem (4). In order to reduce computation complexity, the values of the sub-Gaussian probability density p0, k (y ) replace the discrete values directly. B. Obtaining the optimized LSB matching embedding matrix for each sub-Gaussian distribution Here we embed information into spatial pixel groups in grayscale image, then χ = {0,1,L , 255} . The embedding matrix of LSB matching is adjusted as follows: Use x , x = 1, 2,L , 254 , to represent the value of the selected cover pixel. If the secret message bit is different with the LSB of x , and x is not equal to 0 or 255, we add 1 with the probability bx , x +1 , or subtract 1 with the probability bx , x −1 . The probabilities bx , x +1 and bx , x −1 are the variables in optimization problem. There are 254 × 2 unknown variables in all. We suppose that the secret message bits have been encrypted. They are pseudorandom bit streams. We think that half of pixels’ LSBs are equal to the message bits. So bx , x = 0.5 , and the new LSB matching embedding matrix becomes ⎛ 0.5 0.5 0 ⎜ ⎜ b1,0 0.5 b1,2 ⎜ 0 b2,1 0.5 ⎜ B=⎜L L L ⎜ 0 0 0 ⎜ 0 0 0 ⎜ ⎜ 0 0 0 ⎝ L L L 0 0 0 L L L 0.5 L b254,253 0 L 0 0 0 L b253,254 0.5 0.5 ⎞ ⎟ ⎟ ⎟ ⎟ L ⎟. 0 ⎟ ⎟ b254,255 ⎟ 0.5 ⎟⎠ 0 0 0 (9) JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 299 For each sub-Gaussian distribution of GMM, we need to obtain the corresponding optimized LSB matching k embedding matrix, denoted as B optimized , k = 1,L , K . pixel, we need judging the category to which the pixel belongs at first. The embedding matrix of the kth k category is B optimized . We add or subtract 1 by the Here the pixel group is made up of two adjacent pixels, and the optimization problem is simplified as k . probability bx , x −1 or bx , x +1 in B optimized k B optimized = arg min f k (B) , B = ( bxy ) (10) f k (B ) ⎛ 2 2 ⎞ P0, k ( yi = u , y j = b]) P0, k ( yi = a, y j = v]) ⎟ = ∑∑∑∑ ⎜ ∑∑ b b ⎜ ⎟⎟ ua vb P0, k ( yi = a, y j = b) u∈χ v∈χ a∈χ b∈χ ⎜ i =1 j =1, b≠ a ⎝ j ≠i ⎠ ⎧2 2 P0,k ( yi = u, y j = a]) P0,k ( yi = a, y j = v]) ⎪ +∑∑∑ ⎨∑∑ P0,k ( yi = a, y j = a) u∈χ v∈χ a∈χ ⎪ i =1 j =1 ⎩ j ≠i (11) 2 P0,k ( yi = u, y j ])P0,k ( yi = v, y j ]) ⎫⎪ +∑ ∑ ⎬ bua bva . P0,k ( yi = a, y j ) i =1 y j ∈χ ⎪⎭ The constraints are ⎧bx , x +1 + bx , x −1 = 0.5, bx , x +1 ≥ 0, bx , x −1 ≥ 0, x ∈ χ , x ≠ 0, x ≠ 255 ⎪ . ⎪bx , x = 0.5, x ∈ χ ⎨ b 0.5, b 0.5 = = 255,254 ⎪ 0,1 ⎪others b = 0 x, y ⎩ column induce of the pixel. Firstly, we constitute pixel group by Ci , j and its adjacent pixels. For example, constructing the pixel group (Ci , j , Ci , j +1 ) with the right horizontal adjacent pixel Ci , j +1 . Note that if Ci , j is at the right edge of the image, then we can use the left horizontal adjacent pixel Ci , j −1 to constitute the pixel group. Then based on the value of the pixel group y = (Ci , j , Ci , j +1 ) or y = (Ci , j −1 , Ci , j ) , the posterior probabilities of each sub-Gaussian distribution is calculated as follows: P(k | y ) = P0, k (C = y )π k K ∑P k =1 0, k (C = y )π k , k = 1, 2,L , K . (13) (12) This optimization problem is a quadratic programming. The standard form of quadratic programming 1 is min xHx T , where x = {x1 , x2 ,L , xL } is a row vector, x 2 and H represents the quadratic coefficient matrix. By arranging matrix B into a row vector, we can get the standard form. The elements of H are the corresponding coefficients in (6). The complexity degree of quadratic programming depends on the element number of x = {x1 , x2 ,L , xL } . The element number of H is the square of the element number of x. If computer memory can not store matrix H , we need to iteratively calculate. It is slow. Here, though the amount of elements in matrix B is 256 × 256 , many matrix elements are 0 in our algorithm because bxy = 0 when | x − y |≥ 2 . If bxy is 0, its corresponding quadratic coefficient has no effect on the quadratic function. There are 766 × 766 nonzero elements in H , which need to be stored in the computer memory. They are bx , x −1 , bx , x +1 , where x = 1, 2,L , 254 , bx , x = 0.5 , where x = 0,1,L 255 , b0,1 = 0.5 and b255,254 = 0.5 . PC can store them. So we use quadratic programming function in Matlab software to solve the optimized transferring probabilities bx , x −1 and bx , x +1 . C. Judging the category of Pixels In the existing LSB matching algorithm, if the message bit does not match the LSB of the cover pixel, one is randomly either added or subtracted from the value of the cover pixel. In our optimized LSB matching algorithm, if the message bit does not match the LSB of the cover © 2012 ACADEMY PUBLISHER Now we explain the method of judging the category the pixels Ci , j belonging to, where i, j are the row and The pixel Ci , j is categorized according to maximum posteriori probability principle koptimal = arg max P0, k (C = y )π k , k =1,2,L, K (14) where koptimal denotes the category of pixels Ci , j to be determined. D. Embedding and Extraction Algorithm The embedding process of the optimized LSB matching steganography is summarized as follows: Input: cover image {Ci , j } which has N = N1 × N 2 pixels, M bits encrypted secret message, stego-key k1 . Output: stego image {Si , j } . Step(i): Model the two-dimensional probability distribution of two spatial adjacent pixels as GMM, and estimate the parameters of GMM according to the pixel group samples from cover image {Ci , j } . Step(ii): For each sub-Gaussian distribution P0, k (y ) of GMM, work out the matrix H by calculating the coefficients of nonzero bx , x −1 and bx , x +1 in objective function f k (B) according to (11), then solve the quadratic programming problem (10), and get the k optimized embedding matrix B optimized corresponding to each sub-Gaussian distribution. Step(iii): Calculate the rate of pixels used for embedding, which is denoted as λ = M / N . The rate is equal to the relative payload in general sense. According to λ and the stego-key k1 , select the set of image pixels {Ci , j } to embed message bits; 300 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Step(iv): For each selected pixel Ci , j , if its LSB is equal to the message bit, do not change it, otherwise, turn to the next step; Step(v): Judge whether the value of Ci , j is 0 or 255, if so, replace the LSB of Ci , j with the message bit, otherwise, turn to the next step; Step(vi): Constitute the pixel group y with the right horizontal adjacent Ci , j +1 or the left horizontal adjacent pixel Ci , j −1 , and then determine the category koptimal of the pixel group according to (14); k , then produce a pseudoStep(vii): Obtain bCoptimal i , j , Ci , j +1 random number ni , j ni , j ∈ (0, 2bC koptimal i , j ,Ci , j +1 ) , then Si , j = Ci , j + 1 ; otherwise , Si , j = Ci , j − 1 . koptimal threshold 2bC within the range (0,1) , if i , j ,Ci , j +1 The reason koptimal is that bC i , j ,Ci , j +1 why + bC koptimal i , j ,Ci , j −1 selecting the = 0.5 . The extraction embedding process of the optimized LSB matching steganography is summarized as follows: Input: stego-image {Si , j } which has N = N1 × N 2 pixels, the rate λ of pixels used for embedding, and stego-key k1 . Output: secret message. Step(i): According to λ and the stego-key k1 , select the set of image pixels {Ci , j } to embed message; Step(ii): Select the LSB in each selected pixel Ci , j , and combine them into the secret message. In summary, for a cover image, we should calculate K optimized embedding matrices corresponding to each sub-Gaussian distribution. Then the message bits are embedded by adding or subtracting 1 according to the probability indicated by the embedding matrix corresponding to the category the pixel belonging to. V. EXPERIMENT AND ANALYSIS In this section, we present some experimental results to demonstrate the effectiveness of our proposed optimized LSB matching steganography compared with the existing LSB matching methods. A common way of testing steganographic schemes is to report the detection metric of some steganalysis methods empirically estimated from a database of cover and stego images where each stego image carriers a fixed relative payload[16]. Here the cover image database consists of 1000 images which were downloaded from USDA NRCS Photo Gallery [23]. The images are of very high resolution TIF files (mostly 2100×1500) and appear to be scanned from a variety of film sources. For testing, the images were resampled to 614×418 and converted to grayscale (The tool used is Advanced Batch Converter 3.8.20, and the selected interpolation filter is bilinear). Firstly, the cover images were used to generate 3 groups of 1000 stego images of the existing LSB matching with the relative payload λ = 1 , λ = 0.75 and © 2012 ACADEMY PUBLISHER λ = 1 , respectively. Then, another 3 groups of 1000 stego images of the optimized LSB matching were generated with the relative payload λ = 1 , λ = 0.75 and λ = 1 , respectively. The 2000 stego images of two kind of steganography with the same relative payload and the corresponding cover images were used to build a training set and a test set. The training and the test sets were built randomly, both containing 50% cover and 50% stego images. The training sets are used to train some steganalysis detector. The security of two kind steganography with the same relative payload is compared by the detection performance on test sets. The steganalysis detector makes two types of errors either detect the cover image as stego (false alarm, or false positive) or recognize the stego image as cover (missed detection, or false negative). The corresponding probabilities are denoted PFA and PMD. The receiver operating characteristic (ROC) curve is obtained by plotting 1-PMD(PFA) as a function of PFA. The ROC curve can be reduced into a scalar detection measure called the mimimum error probability: 1 PE = min ( PFA + PMD ( PFA )) (15) PFA 2 Both the ROC curve and the mimimum error probability PE can be used as the detection performance metrics of the steganalysis method for some steganography with the fixed relative payload. A. ROC curves metric In the experiments, the first steganalytic detector [2] adopts the Difference Characteristic Function Moments as the features, and Fisher linear discriminator (FLD) as the classifier. The detection performance for the existing LSB matching is excellent [2]. We adopt ROC curves to evaluate the security performances of two steganography methods. The ROC curves of detector show how the detection probability (true positive rate, the fraction of the stego images that are correctly classified) and the false alarm probability (false positive rate, the fraction of the cover images that are misclassified as stego images) vary as detection threshold is varied. The lower the curve is, the more difficult the detector is. It means that the corresponding steganography is more secure. Figure 2 shows the ROC curves of the existing LSB matching and the optimized LSB matching with the relative payload λ = 1 . Figure 3 shows the ROC curves with λ = 0.75 . Figure 4 shows the ROC curves with λ = 0.5 . From the figures, the ROC curves of the optimized LSB matching are higher than those of the existing LSB matching. Thus the optimized LSB matching is more secure than the existing LSB matching. Further more, the distinctions between the ROC curves of two steganography become small with the decrease of the relative payload. It means that the improvement of security performance is more obvious when λ is larger. B. Minimum error probability metric We also use the steganalysis detector in the literature[1] to compare the security of two steganography methods. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 The steganalysis detector is famous in the realm of steganalysis and steganography. We use the second-order SPAM features with T=3 and first-order SPAM features with T=4 respectively. There are 686 dimensions in the second-order features, and 162 dimensions in the firstorder features. With regards to machine learning, we use soft-margin SVMs with a Gaussian kernel of width γ . Soft-margin SVMs penalize the error on the training set through a hyper-parameter C . In this section, the minimum error probability in (15) is used to evaluate the security. The steganography, with larger PE , is more secure than that with smaller PE . 1 0.9 0.8 True Positive Rate 0.7 0.6 0.5 0.4 0.3 0.2 existing LSB matching optimized LSB matching 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 301 In the experiment, the parameters C = 300 and γ = 5 . Note that we didn’t using a grid-search with five-fold cross-validation on the training set to obtain the best C and γ as in [1]. The reason is that we only need to find out which steganography has larger PE . We needn’t the best detector with the best C and γ . The grid-search is time consuming. As a result, the values of PE in our experiments is larger than that in the literature [1]. But it is trivial to compare the security of two steganographic methods. Table I demonstrates the security performances of the existing LSB matching steganography and the optimized LSB matching steganography. From table I, with each relative payload, the minimum error probability PE of the optimized LSB matching is larger than that of the existing LSB matching. So the optimized LSB matching is more secure than the existing one. On the other hand, the distinction of the minimum errors of two steganography decreases when the relative payload become small. Thus the larger the λ is, the more the security improvement of the optimized LSB matching is. The results are consistent with the results in V-A section. 1 False Positive Rate Figure 2. The ROC curves for two kind LSB matching steganography when the relative payload is 1 TABLE I. MINIMUM ERRORS OF SPAM STEAGANALYZERS Relative payload 1 λ 0.9 0.8 existing LSB matching Our optimized LSB matching Second-order with T=3 0.3955 0.4098 First-order with T=4 0.2962 0.3105 Second-order with T=3 0.3636 0.3972 First-order with T=4 0.2556 0.2893 Second-order with T=3 0.3359 0.3808 First-order with T=4 0.2263 0.2643 0.5 0.7 True Positive Rate SPAM Features 0.6 0.5 0.75 0.4 0.3 1 0.2 existing LSB matching optimized LSB matching 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Rate VI. SUMMARY Figure 3. The ROC curves for two kind LSB matching steganography when the relative payload is 0.75 1 0.9 0.8 True Positive Rate 0.7 0.6 0.5 0.4 0.3 0.2 existing LSB matching optimized LSB matching 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Rate Figure 4. The ROC curves for two kind LSB matching steganography when the relative payload is 0.5 © 2012 ACADEMY PUBLISHER We present an optimized LSB matching algorithm. The probabilities of adding or subtracting one for embedding, which is also named as the embedding transferring probabilities, are determined by solving an optimization problem. We demonstrate that Fisher information is the quadratic function of the embedding transferring probabilities. Assuming that the groups of pixels in the cover image can be modeled as GMM, we obtain Gaussian mixture distribution of pixel group for each cover image. Based on it, the optimized embedding transferring probabilities are solved by quadratic programming for each sub-Gaussian distribution. The experiments show that the security performance of this new algorithm is better than the existing LSB matching. Furthermore, the principle of improving the security in our optimized LSB matching algorithm is different with that of stegnography using syndrome coding. The 302 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 optimized embedding method here can be combined with them. The reason is that many steganographic algorithms based on syndrome coding [12][13] don’t define the specific modification operation (adding or subtracting 1) on cover elements, and our optimized LSB matching algorithm can be utilized to decide which operation to be selected. The combination of two kinds of algorithms may further improve the security. [17] [18] [19] REFERENCES [1] T. Pevny, P. Bas, and J. Fridrich, “Steganalysis by subtractive pixel adjacency matrix,” IEEE Transactions on Information Forensics and Security, vol. 5, no. 2, 2010, pp. 215–224. [2] Y. Sun, F. Liu, and B. Liu, “Steganalysis based on difference image,” in IWDW 08, LNCS 5450. Heidelberg: Springer-Verlag, 2009, pp. 184–198. [3] P. Sallee, “Model-based steganography,” in Proceedings of IWDW 2003, LNCS 2939, 2004, pp.154–167. [4] P. Sallee, “Model-based methods for steganography and steganalysis,” International Journal of Image and Graphics, vol. 5, no. 1, 2005, pp. 167–189. [5] N. Provos, “Defending against statistical steganalysis,” in Proceedings of 10th USENIX Security Symposium. Washington. DC, 2001, pp. 323–335. [6] K. Solanki, K. Sullivan, and U. Madhow, etc, “Provably secure steganography: Achieving zero K-L divergence using statistical restoration,” in Proceedings of ICIP’06. Piscataway: IEEE Signal Processing Society, 2007, pp. 125–128. [7] A. Sarkar, K. Solanki, and U. Madhow, etc, “Secure steganography: statistical restoration of the second order dependencies for improved security,” in Proceedings of ICASSP’07. Piscataway: IEEE Signal Processing Society, 2007, pp. II277–II280. [8] K. Changa, C. Changa, and P. S. Huangb, etc, “A novel image steganographic method using tri-way pixel-value differencing,” Journal of Multimedia, vol.3, no.2, 2008, pp.37-44. [9] A. Westfeld, “F5--A steganographic algorithm high capacity despite better steganalysis,” in Proceedings of IH 2001, LNCS 2137, 2001, pp. 289–302. [10] Y. Kim, Z. Duric, and D. Richards, “Modified matrix encoding technique for minimal distortion steganography,” in Proceedings of IH 06, LNCS 4437, 2006, pp. 314–327. [11] J. Fridrich, and D. Soukal, “Matrix embedding for large payloads],” IEEE Transactions on Information Forensics and Security, vol. 1, no. 3, 2006, pp. 390–394. [12] J. Fridrich, M. Golijan, and D. Soukal, etc, “Wet paper codes with improved imbedding efficiency,” IEEE Transactions on Information Forensics and Security, vol. 1, no. 1, 2006, pp. 102–110. [13] T. Filler, J. Judas, and J. Fridrich, “Minimizing embedding impact in steganography using Trellis-Coded quantization,” Proceedings of Media Forensics and Security III, SPIE 7451, 2010. pp. 715405-1~715405-14. [14] T. Filler, J. Judas, and J. Fridrich, “Gibbs construction in steganography,” IEEE Transactions on Information Forensics and Security, vol. 5, no.4, 2010, pp.705-720. [15] T. Pevny, T. Filler, and B. Patrick, “Using highdimensional image models to perform highly undetectable steganography,” in Proceedings of IH’10, LNCS 6387, 2010, pp.161-177. [16] T. Filler, and J. Fridrich, “Design of adaptive steganographic schemes for digital images,” in © 2012 ACADEMY PUBLISHER [20] [21] [22] [23] Proceedings of Media Watermarking, Security and Forensics III, SPIE 7880, 2011, pp. 78800F-1~78800F-13. C. Cachin, “An information-theoretic model for steganography,” in proceedings of IH’98, LNCS 1525. Heidlberg: Springer-Verlag, 1998, pp. 306–318. T. Filler, and J. Fridrich, “Fisher Information determines capacity of secure steganography,” in proceedings of IH’09, LNCS 5806. Heidlberg: Springer-Verlag, 2009, pp. 31–47. T. Filler, A. D. Ker, and J. Fridrich, “The square root law of steganographic capacity for Markov covers,” in Proceedings of Media Forensics and Security 09, SPIE 7254, Bllingham: SPIE press, 2009, pp. 08-1–08-11. A. D. Ker, “Estimating the information theoretic optimal stego noise,” in Proceedings of IWDW’09, LNCS 5703. Heidlberg: Springer-Verlag, 2009, pp. 184–198. G. McLachlan, and P. David, “Finite mixture models,” New York: Wiley, 2001. C. A. Bouman, “CLUSTER: An unsupervised algorithm for modeling Gaussian mixtures,” http://www.ece.purdue.edu/~bouman. http://photogallery.nrcs.usda.gov. Yi-feng Sun was born in Xinxiang, China in 1976. He received the B.S., M.S. and PH.D. degrees in computer science from Zhengzhou information science and technology institute, Zhengzhou, China, in 1999, 2002, 2010. He is presently an instructor with the department of Information Security, Zhengzhou information science and technology institute, Zhengzhou, China. His research interests are in image steganography, image steganalysis, computer vision. Dan-mei Niu was born in Luoyang, China in 1979. She received the B.S. degree in 2001, and received the M.S. degree from Henan university of science and technology, Luoyang, China in 2006. She is presently an instructor in electronic information engineering college, Henan university of science and technology, Luoyang, China. Her research interests are in information security, digital rights management. Guang-ming Tang was born in Wuhan, China in 1963. She received the B.S., M.S. and PH.D. degrees in information security from Zhengzhou information science and technology institute, Zhengzhou, China, in 1983, 1990, and 2008, respectively. She is presently a professor with the department of Information Security, Zhengzhou information science and technology institute, Zhengzhou, China. Her fields of professional interest are information hiding, watermarking and software reliability. She has published 51 research articles and 3 books in these areas. Ms. Tang is a recipient of Provincial Prize for Progress in Science and Technology. Zhan-zhan Gao was born in Shijiazhuang, China in 1988. He received the B.S. degree in electronic science and technology from Zhengzhou information science and technology institute, Zhengzhou, China, in 2011. He is pursuing the M.S degree in information security at Zhengzhou information science and technology institute. His research interests include information hiding and information security. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 303 A Novel Robust Zero-Watermarking Scheme Based on Discrete Wavelet Transform Yu Yang1, 2 Min Lei1, 2 1. Information Security Center, Beijing University of Posts and Telecommunications, Beijing, China 2. National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and Telecommunications, Beijing, China Email: [email protected], [email protected] Huaqun Liu School of Information and Mechanical Engineering, Beijing Institute of Graphic Communication, Beijing, China Yajian Zhou1, 2 Qun Luo1 1. Information Security Center, Beijing University of Posts and Telecommunications, Beijing, China 2. National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and Telecommunications, Beijing, China Abstract—In traditional watermarking algorithms, the insertion of watermark into the original signal inevitably introduces some perceptible quality degradation. Another problem is the inherent conflict between imperceptibility and robustness. Zero-watermarking technique can solve these problems successfully. But most existing zerowatermarking algorithm for audio and image cannot resist against some signal processing manipulations or malicious attacks. In the paper, a novel audio zero-watermarking scheme based on discrete wavelet transform (DWT) is proposed, which is more efficient and robust. The experiments show that the algorithm is robust against the common audio signal processing operations such as MP3 compression, requantization, re-sampling, low-pass filtering, cutting-replacement, additive white Gaussian noise and so on. These results demonstrate that the proposed watermarking method can be a suitable candidate for audio copyright protection. Index Terms—zero-watermarking, transform, copyright protection discrete wavelets I. INTRODUCTION Recently, the rapid development of the Internet has increased multimedia services, such as electronic commerce, pay-per-view, video-on-demand, electronic newspapers, and peer-to-peer media sharing. As a result, multimedia data can be obtained quickly over high speed network connections. However, the authors, publishers, owners and providers of multimedia data are reluctant to grant the distribution of their documents in a networked environment because the ease of intercepting, copying © 2012 ACADEMY PUBLISHER doi:10.4304/jmm.7.4.303-308 and redistributing electrical data in their exact original form encourages copyright violation. Therefore, it is crucial for the future development of networked multimedia systems that robust methods are developed to protect the intellectual property right of data owners against unauthorized copying and redistribution of the material made available on the network. Classical encryption systems do not completely solve the problem of unauthorized copying because once encryption is removed from a document, there is no more control of its dissemination. Digital watermarking is a good approach for providing copyright protection of digital contents (for example text, word, image, audio, video) [1-14]. This technique is based on direct embedding the watermarking information into the digital contents. The embedding operations should not be introduced any perceptible distortions ideally, there must be no perceptible difference between the watermarked and the original version. That is to say the watermark data should be embedded imperceptibly into the audio media. Apart from imperceptibility, capacity and robustness are two fundamental properties of audio watermarking schemes. To make tradeoff between imperceptible and robust, it’s a common method to incorporate the human auditory system (HAS) into a watermarking system. The watermark should be extractable after various intentional and unintentional attacks. These attacks may include additive noise, resampling, MP3 compression, low-pass filtering, requantization, and any other attack which removes the watermark or confuse the watermark extraction system. Considering a trade-off between capacity, transparency and robustness is the main challenge for audio watermarking applications. We focus on the audio watermarking scheme in this paper. In traditional audio watermarking techniques, either in spatial domain, transform domain, or dual 304 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 domain [15-16], the embedding of watermark into the original audio inevitably introduces some audible quality degradation. Another problem is the inherent conflict between the imperceptibility and robustness. Then, zerowatermarking technique was proposed by some researchers to solve these problems [17–25]. Instead of embedding watermark into the original signal, the zerowatermarking approach just constructs a binary pattern based on the essential characteristics of the original signal and uses them for watermark recovery. In [21], the property of the natural images that the vector quantization indices among neighboring blocks tend to be very similar was utilized to generate the binary pattern. A scheme that combined the zero-watermarking with the spatialdomain-based neural networks was proposed in [22], in which the differences between the intensity values of the selected pixels and the corresponding output values of the neural network model were calculated to generate the binary pattern. Some low-frequency wavelet coefficients were randomly selected from the original image by chaotic modulation and used for character extraction in [23]. And in [24], two zero-watermarks were constructed from the original image. One was robust to signal process and central cropping, which was constructed from lowfrequency coefficients in discrete wavelet transform domain and the other was robust to general geometric distortions as well as signal process, which was constructed from DWT coefficients of log-polar mapping of the host image. There are novel zero-watermarking algorithms for audio now. An efficient and robust zerowatermarking technique for audio signal is presented in [25]. The multi-resolution characteristic of discrete wavelet transform (DWT), the energy compression characteristic of discrete cosine transform (DCT), and the Gaussian noise suppression property of higher-order cumulant are combined to extract essential features from the original audio signal and they are then used for watermark recovery. Simulation results demonstrate the effectiveness of the scheme in terms of inaudibility, detection reliability, and robustness. However, all these zero-watermarking techniques are designed for still image and their robustness against some signal processing manipulations or malicious attacks is not satisfying. In this paper, a novel robust zerowatermarking technique for audio signal is proposed. The paper is organized as follows. Section II introduces basic concepts for discrete wavelet transform. Section III describes a novel robust zero-watermarking scheme. Section IV exhibits the experimental results illustrating that the proposed method can get good robustness. A brief conclusion can be available in Section V. For audio, we need convert the 1-dimensional audio into 2-dimensional. Wavelet transform decomposes an audio into a set of band limited components which can be reassembled to reconstruct the original audio without error. Since the bandwidth of the resulting coefficient sets is smaller than that of the original audio, the coefficient sets can be down sampled without loss of information. Reconstruction of the original signal is accomplished by up sampling, filtering and summing the individual sub bands. For 1-D audio, we convert the 1-D audio into 2-D signal, and then apply DWT corresponds to processing the audio by 2-D filters in each dimension. The filters divide the input audio into four non-overlapping multiresolution coefficient sets, a lower resolution approximation audio (LL1) as well as horizontal (HL1), vertical (LH1) and diagonal (HH1) detail components. The sub-band LL1 represents the coarse-scale DWT coefficients while the coefficient sets LH1, HL1 and HH1 represent the fine-scale of DWT coefficients. To obtain the next coarser scale of wavelet coefficients, the subband LL1 is further processed until some final scale N is reached. When N is reached we will have 3N+1 coefficient sets consisting of the multi-resolution coefficient sets LLN and LHx, HLx and HHx where x ranges from 1 until N. Figure 1 show the representation of 2-levels of wavelet transform. II. DISCRETE WAVELET TRANSFORM A. Embeddbing Process Let A be the original audio signal and let Wavelet transform is a time domain localized analysis method with the window’s size fixed and convertible. It supplies high temporal resolution in high frequency part, and high frequency resolution in low frequency part of signals. It can distill the information from signal effectively. © 2012 ACADEMY PUBLISHER LL2 HL2 LH2 HH2 HL1 2-D Original Audio LH1 HH1 Figure 1: Representation of 2-Levels of Wavelet Transform Due to its excellent spatio-frequency localization properties, the DWT is very suitable to identify the areas in the original audio where a watermark can be embedded effectively. In particular, this property allows the exploitation of the masking effect of the human visual and auditory system such that if a DWT coefficient is modified, only the region corresponding to that coefficient will be modified. In general most of the audio energy is concentrated at the lower frequency coefficient sets LLx and therefore embedding watermarks in these coefficient sets may degrade the audio significantly. Embedding in the low frequency coefficient sets, however, could increase robustness significantly. III. AUDIO WATERMARKING ALGORITHM W = {w(i, j ) i = 1, 2,..., M1 , j = 1, 2,..., M 2 } be the binary-valued image watermark to be embedded. Now we describe the watermark embedding procedure. (Figure 2) JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Step 1: Dimension reduction of watermark image. Because the audio signal is one- dimensional and watermarking W is two-dimensional. Therefore the watermarking should be reduced into one-dimensional signal: w = {w(i) = w(m1 , m2 ) 0 ≤ m1 ≤ M 1 , 0 ≤ m2 ≤ M 2 , i = M1 × M 2 } 305 Step 4: We can get the watermarking extraction secret key k (i ) = w(i ) ⊕ s (i ) through computing the exclusive or operation between s (i ) and watermarking information w(i ) . B. Extraction Process The watermark recovery procedure can be carried out as follows (Figure 3). Yes 0<i<1025 Yes No 0<i<1025 Figure 2: Embedding Process Step 2: Let the original audio A = ( a1 , a2 ,..., aN ) No with N PCM (pulse-code modulation) samples be M = ⎢⎣ N / 512⎥⎦ blocks Bi , where M = M 1 × M 2 , i = 1, 2,..., M . Each block includes segmented into 512 samples. Step 3: for i=1: M 3.1). The 3-level discrete wavelet packet decomposition is applied to the audio block Bi , and can get the approximate sub-band coefficients Ci = {ci (1), ci (2),..., ci (64)} , which has 64 samples. 3.2). Computes t (i ) = ∑ j =1 ci ( j ) × ci ( j ) × j , 64 t '(i ) = ∑ j =1 ci ( j ) × ci ( j ) 64 According to the approximate sub-band coefficients Ci = {ci (1), ci (2),..., ci (64)} . And gets: s '(i ) = ⎣⎢t (i) / t '(i) ⎦⎥ . 3.3). Computes s (i ) = s '(i )(mod 2) . 3.4). End © 2012 ACADEMY PUBLISHER Figure 3: Extraction Process Step 1: Let the watermarked audio A ' = (a '1 , a '2 , ..., a 'N ) be segmented into M = ⎣⎢ N / 512⎦⎥ blocks B 'i , where M = M 1 × M 2 , i = 1, 2,..., M . Each block includes 512 samples. Step 2: for i=1: M 2.1). The 3-level discrete wavelet packet decomposition is applied to the audio block B 'i , and can get the approximate sub-band coefficients C 'i = {c 'i (1), c 'i (2),..., c 'i (64)} which has 64 samples. 2.2). Computes 306 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 T (i ) = ∑ j =1 c 'i ( j ) × c 'i ( j ) × j 64 , T '(i ) = ∑ j =1 c 'i ( j ) × c 'i ( j ) 64 according to the approximate sub-band coefficients C 'i = {c 'i (1), c 'i (2),..., c 'i (64)} . And gets S '(i) = ⎣⎢T (i) / T '(i ) ⎦⎥ . 2.3). Computes S (i ) = S '(i )(mod 2) . 2.4). End Step 3: We can get the watermarking information w '(i ) = k (i ) ⊕ S (i ) through computing the exclusive or operation between S (i ) and he watermarking extraction secret key k (i ) . Step 4: We can get the binary-valued image watermark according to the extracted watermarking information. IV. EXPERIMENTAL RESULTS AND ANALYSIS In this experiment, two binary stamp image with size 32 × 32 (i.e., M 1 = M 2 = 32 ), displayed in Fig. 5, are taken as the original watermark And we evaluate the performance of our proposed watermarking method for different types of 16 bit mono audio signals sampled at 44.1 KHz as shown in Fig. 4. The sound files are: (a) Pop, (b) Speech, (c) Classic. Each audio file contains 524,288 samples with duration of 11 seconds. Figure 4: The signal of original audio Figure 5: Watermarks with size 32 × 32 In order to evaluate the quality of watermarked audio, the following signal-to-noise radio (SNR) equation is used: SNR = 10 log10 ∑ © 2012 ACADEMY PUBLISHER ∑ N n =1 N n =1 Y 2 ( n) [Y (n) − Y '(n)]2 where Y ( n ) and Y '( n ) are original audio signal and watermarked audio signal respectively. In order to evaluate the robustness of the watermarked algorithm, the following normalized coefficient (NC) and bit error rate (BER) are employed respectively: M1 M 2 NC(W ,W ) = ∑∑W (i, j)W *(i, j) i =1 j =1 * M1 M 2 ∑∑W (i, j)2 × i =1 j =1 M1 M 2 ∑∑W *(i, j) 2 i =1 j =1 BER = (∑ i =1 ∑ j =1 W (i, j ) − W *(i, j ) ) / ( M 1 × M 2 ) M1 M2 where W (i, j ) and W * (i , j ) are the original watermark and extract watermark respectively, and M 1×M 2 is the size of watermark. And, the bit error rate (BER) was employed to measure the robustness of our algorithm, BER = B × 100% M1 × M 2 where B is the number of erroneously extracted bits and M 1×M 2 is the size of watermark. A. Imperceptibility Anallysis One of the main requirements of audio watermarking techniques is inaudibility of the embedded watermark. For the proposed scheme, this requirement is naturally achieved because the watermark is embedded into the secret key but not the original audio signal itself. Actually, the watermarked audio is the identical to the original one. B. Robusness Test and Analysis In order to test the robustness of our proposed method, eight different types of attacks is as following: (1) Noise addition: 20 dB additive white Gaussian noise (AWGN) is added to the watermarked audio signal (2) Re-sampling: The watermarked signal originally sampled at 44.1kHz is re-sampled at 22.050kHz, and then restored by sampling again at 44.1 kHz. (3) Low-pass filtering: cut-off frequency is 11.025kKz. (4) Re-quantization: the 16 bit watermarked audio signal is quantized down to 8 bits/sample and again requantized back to 16 bits/sample. (5) MP3 compression: MPEG-1 layer 3 compression with 64 kbps is applied to the watermarked signal. (6) MP3 compression: MPEG-1 layer 3 compression with 32 kbps is applied to the watermarked signal. (7) MP3 compression: MPEG-1 layer 3 compression with 128 kbps is applied to the watermarked signal (8) Noise reduction: Preset shape is Hiss removal. (9) Equalization. Table 1 show the NC and BER of the proposed watermarking method in terms of robustness against several kinds of attacks applied to three different types of JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 307 watermarked audio signal “Speech”, “Classic” and “Pop” respectively. Speech BER Extracted image Table 1 The result of attacks Classic NC BER Extracted image Pop BER Audio NC Attack 1 1 0 1 0 1 0 2 1 0 1 0 1 0 3 1 0 1 0 1 0 4 1 0 1 0 1 0 5 1 0 1 0 1 0 6 1 0 1 0 1 0 7 1 0 1 0 1 0 8 0.9991 0.001 1 0 1 0 9 1 0 1 0 1 0 NC Extracted image Table 2 Comparison between our algorithm and other algorithm Attack NC(our) NC([25]) NC([26]) NC([27]) Equalization 1 0.97 - 0.998 MP3 compression(32kbps) 1 0.99 - 0.999 Low-pass filtering 1 1 0.9971 0.824 Noise addition 1 1 0.9668 0.999 Re-sampling 1 1 - 0.981 The Table 2 compares our algorithm and other algorithms, we can easily get the robustness of our algorithm is more stronger than other algorithms. sampling, low-pass filter, cutting-replacement and additive white Gaussian noise. These results demonstrate that our algorithm can be a suitable candidate for audio copyright protection. V CONCLUSION A novel audio zero-watermarking scheme based on discrete wavelet transform (DWT) is proposed in this paper, which is more efficient and robust. The experiments show that the proposed algorithm has good robustness against the common audio signal processing operations such as MP3 compression, re-quantization, re- © 2012 ACADEMY PUBLISHER ACKNOWLEDGMENT This work is supported by the National Natural Science Foundation of China under Grant NO. 60973146, NO. 61170272, NO. 61170269, NO. 61003285, and Institute Level Project Funded by Beijing Institute of Graphic Communication under Grant NO. E-6-2012-23. 308 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 REFERENCES [1] Min Lei, Research on the digital watermarking and steganalysis algorithms of audio information hiding[D].A dissertation for the degree of doctor of philosophy, Beijing University of Posts and Telecommunications. 2011:28-44. [2] Yu Yang, Min Lei, Xiayun Jia etc.A New Digital Audio Watermarking Based on DCT-DWT-SVD. Proceedings 2011 IEEE International Conference on Information Theory and Information Security[C] August 7-9, 2011, Hangzhou China, IEEE PRESS, BEIJING,2011,906-909. [3] Lei Min, Yang Yu Recent Advances in Audio-Based Steganalysis Research. Proceedings of the 2nd Asia-Pacific Conference on Information Network and Digital Content Security[C], August 7-9,2011, Hangzhou China, ATLANTIS PRESS, FRANCE, 2011, 208-213. [4] LEI Min, YANG Yu, LUO Shou-shan et al. Semi-fragile Audio Watermarking Algorithm in DWT Domain [J]. China Communications, 2010, 7(4): 71-75 [5] S. C. Liu and S. D. Lin, “BCH code based robust audio watermarking in the cepstrum domain,” Journal of Information Science and Engineering, vol. 22, pp. 535-543, 2006. [6] M. Pooyan and A. Delforouzi, “Adaptive and Robust Audio watermarking in Wavelet Domain,” in Proceedings of International Conference on International Information Hiding and Multimedia Signal Processing (IIH-MSP 2007), vol. 2, pp. 287-290, 2007 [7] G. Zeng and Z. Qiu, “Audio Watermarking in DCT: Embedding Strategy and Algorithm,” in Proceedings of 9th International Conference on Signal Processing (ICSP’09), pp. 2193-2196, 2008. [8] Samira Lagzian, Mohsen Soryani and Mahmood Fathy, “A new robust watermarking scheme based on RDWT-SVD,” International Journal of Intelligent Information Processing, Vol. 2, No 1, 2011. [9] Pranab Kumar Dhar, Mohammad Ibrahim Khan and Saif Ahmad, “A new DCT-based watermarking method for copyright protection of digital audio,” International Journal of Computer Science & Information Technology. Vol. 2, No. 5, 2010, pp. 91-101. [10] Chen Yongqiang, Zhang yanqing, Hu Hanping and Ling Hefei, “A novel gray image watermarking scheme,” Journal of Software, Vol. 6, No, 5, 2011, pp. 849-856. [11] Kilari Veera Swamy, B. Chandra Mohan, Y. V. Bhaskar Reddy and S. Srinivas Kumar, “Image compression and watermarking scheme using scalar quantization,” The International Journal of Next Generation Network, Vol. 2, No. 1, 2010, pp. 37-47. [12] Hasan M. Meral, Emre Sevinc, Ersin Unkar, Bulent Sankur, A. Sumru Ozsoy and Tunga Gungor, “Syntactic tool for text watermarking,” in Proceedings of the in International society for Optical Engineering, 2007. [13] Pranab Kumar Dhar, Mohammad Ibrahim Khan and JongMyon, Kim, “A new audio watermarking system using discrete fourier transform for copyright protection,” International journal of computer science and network security. 6(10), 2010: 35-40 © 2012 ACADEMY PUBLISHER [14] Vivekananda B K, Indranil S and Abhijit D, “An Audio Watermarking Scheme Using Singular Value Decomposition and Dither-Modulation Quantization,” Multimedia Tools and Applications, 52(2-3), 2010: 369 383. [15] X.-Y. Wang and H. Zhao, “A novel synchronization invariant audio watermarking scheme based on DWT and DCT,” IEEE Transactions on Signal Processing, vol. 54, no. 12, pp. 4835–4840, 2006. [16] X.-Y. Wang, W. Qi, and P. Niu, “A new adaptive digital audio watermarking based on support vector regression,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2270–2277, 2007. [17] T. Sun, W. Quan, and S.-X. Wang, “Zero-watermark watermarking for image authentication,” in Proceedings of the Signal and Image Processing, pp. 503–508, Kauai, Hawaii, USA, August 2002. [18] G. Horng, C. Chen, B. Ceng, and T. Chen, “Neural network based robust lossless copyright protection technique,”http://www.csie.cyut.edu.tw/TAAI2002/TAAI2 002PDF/Parallel. [19] Q. Wen, T.-F. Sun, and S.-X. Wang, “Concept and application of zero-watermark,” Tien Tzu Hsueh Pao/Acta Electronica Sinica, vol. 31, no. 2, pp. 214–216, 2003. [20] S. Yang, C. Li, F. Sun, and Y. Sun, “Study on the method of image non-watermark in DWT domain,” Chinese Journal Image Graphics, vol. 8A, no. 6, pp. 664–669, 2003. [21] D. Charalampidis, “Improved robust VQ-based watermarking,” Electronics Letters, vol. 41, no. 23, pp. 21–22, 2005. [22] J. Sang, X. Liao, andM. S. Alam, “Neural network based zero-watermark scheme for digital images,” Optical Engineering, vol. 45, no. 9, 2006. [23] C. Hanqiang, X. Hua, L. Xutao, L. Miao, Y. Sheng, and W. Fang, “A zero-watermarking algorithm based on DWT and chaotic modulation,” in Independent Component Analyses, vol. 6247 of Proceedings of SPIE, pp. 1–9, Orlando, Fla, USA, 2006. [24] L. Jing and F. Liu, “Double zero-watermarks scheme utilizing scale invariant feature transform and log-polar mapping,” in Proceedings of the IEEE International Conference onMultimedia and Expo, pp. 2118–2121, Las Vegas,Nev, USA, February 2007. [25] Ning Chen, Jie Zhu. “A robust zero-watermarking algorithm for audio,” EURASIP Journal on Advances in Signal Processing, 2008. [26] W Song, J. J. Hou, Z. H. Li and L. Huang, “A novel zerobit watermarking algorithm based on logistic chaotic system and singular value decomposition,” Acta physica sinica, Vol. 58, No. 7, 2009, pp. 4449-4456. [27] L. Tan, B. Wu, Z. Liu and M. T. Zhou, ‘An audio information on hiding algorithm with high capacity which based on Chaotic and wavelet transform,” Acta electronic sinica, Vol. 38, No. 8, 2010, pp. 1812-1818. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 309 Stego Key Estimation in LSB Steganography Jing LIU Zhengzhou information science and technology institute, ZhengZhou, China Email: [email protected] Guangming TANG Zhengzhou information science and technology institute, ZhengZhou, China Email:[email protected] Abstract—There are kinds of multimedia can be accessed conveniently in social networks. Some of multimedia may be used to hiding harmful information. We consider the problem of estimating the stego key used for hiding using least significant bit (LSB) paradigm, which has been proved much difficult than detecting the hidden message. Previous framework for stego key search was provided by the theory of hypothesis testing. However, the test threshold is hard to determine. In this paper, we propose a new method, in which the correct key can be identified by inspecting the difference between stego sequence and its shifted sequence on the embedding path. It’s shown that the new technique is much simpler and quicker than that previously known. Index Terms—social networks, multimedia contents security, steganography, stego key estimation I. INTRODUCTION The aim of steganography is to hide information imperceptibly into cover objects, such as digital images. To hide a message, some features of the original image, also called the cover image, which are chosen by a stego key are slightly modified by the embedding technique to obtain the stego image. Before embedding, the message is usually encrypted[1]. Steganography is significant to information security. However, steganographic technique can be used unlawfully by criminals and terrorists, especially in social networks. Steganalysis aims to break steganography. steganalysis can be classified into two categories: active and passive[2]. The goal of the active steganalysis is to estimate some parameters (stego key, message length etc.) of the embedding algorithm or the hidden message[3][4], while passive steganalysis deals with identifying the presence/absence of a hidden message or the embedding algorithm used [5][6]. In this paper, we investigate active steganalysis since the aim is to estimate the stego key under the assumptions that we already know the Manuscript received January 1, 2012; revised June 1, 2011; accepted July 1, 2011. This work was supported in part by the National Natural Science Foundation of China(No.61101112), and Postdoctoral Science Foundation of China(No.2011M500775). Corresponding author: Liu Jing, email:[email protected] © 2012 ACADEMY PUBLISHER doi:10.4304/jmm.7.4.309-313 steganographic algorithm. Trivedi et al.[7][8] presented a method for secret key estimation in sequential steganography. The authors used a sequential probability ratio test to determine the embedding key, which was, in their interpretation, the beginning and the ending of the subsequence modulated during embedding. In fact, sequential embedding is typically used for watermarking. Fridrich et al[9][10] considered a more typical situation for a steganographic application, in which the key determined a pseudorandomly ordered subset of all indices in the cover image to be used for embedding. They performed chi-square test to estimate the correct key. Unfortunately, chi-square test suffered from the drawback of no simple way to choose the test threshold to attain a desired target performance. In this paper, we propose a novel approach to search the stego key used in LSB steganography. The key can be determined by simple count. In Section II, we include a description of the LSB embedding algorithm. In Section III, we give the definitions used in our method, and then describe the method of identifying the correct key. In Section IV, we give two kinds of experiments. we verify our method in the first experiment and show the effectiveness of our new method by attacking steganographic software in the second experiment. Finally, we outline some further directions for research. II. LSB STEGANOGRAPHY For an image I sized M N , let the pixel value at i, j be I i, j , i 1, 2, , M , j 1, 2, , N , I i, j 0,1, , 255 . Let K be the space of all possible stego keys. For each key K K , let Path K denote the ordered set of element indices visited along the path generated from the key K . When embedding message bits, the elements in the sequence I i , j , i, j Path K 0 , K 0 K are chosen orderly to be modified by the embedding operation, which is shown in Table I. After embedding, the stego image I with MN is obtained. embedding ratio q Our task is to identify the stego key K0 under the condition that we have a complete knowledge of the embedding algorithm and one stego image. 310 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 notation, we use simple symbols to denote the corresponding pixel set, shown in TableII. TABLE I. LSB EMBEDDING OPERATION 2x Cover pixel Message bit 0 1 2x Stego pixel TABLE 0 2x 1 Symbols 1 2x 2x 1 III. PROPOSED METHOD A. Definitions To explain the detail of our new technique, we’ll first briefly explore the definitions used in our method. A non-boundary pixel I i, j has eight adjacent pixels in image: I (i 1, j 1), I (i 1, j), , I i 1, j 1 . The differences between center pixel and its adjacent pixels are denoted by D1 i, j I (i 1, j 1) I (i, j), D2 i, j HM LM LE min Dk MH M E max Dk 0 ML M O min Dk 0 I (i 1, j 1) I (i, j) L min Dk Flipping Operation: 0 1, 2 3, , 254 255 ˆ Then a shifted sequence I i, j , i, j Path K B. Method for Identifying Correct Key ME FM K Fˆ K FH K Fˆ K FL K Fˆ K H (1) L The frequency change of H is independent of that of L . Therefore, we only consider the frequency of M pixels. It’s apparent that: FˆM K FM K FMH K FHM K FML K Figure 1. Movements of each kind of pixels after LSB embedding. But only four movements, which span different pixel sets, will change the pixel frequencies of set H , L , and M .The four very movements are: H O max Dk 1 M E max Dk 0 , L min Dk 1 M O min Dk 0 where X y denotes pixels belonging to set X (called X pixels) that meets requirement y . To simplify the © 2012 ACADEMY PUBLISHER (2) FLM K denotes number of pixels in path K . Let S K denote the frequency difference of M pixel between ˆI i, j , i, j Path K and I i, j , i, j Path K , MO LE LO Path K respectively. Obviously: message bits along path K , and path K L E and Iˆ i, j , i, j Path K and be the frequency of “1” and “0” in Let message bits and be the embedding ratio on the path generated from K , which can be defined as n path K , where n denotes number of embedded M HO is obtained. M Middle pixel set: I i, j M min Dk 0&max Dk 0 According to the parity of center pixel value, three sets can be divided into six pixel subsets further: H E , H O , LE , LO , M E and M O . When the center pixel is modified by LSB embedding, there are ten movements among the six subsets, which are shown in Fig.1. HE 1 Finally, we define a flipping operation on sequence I i, j , i , j Path K , I i, j I : 0 H 1 Let FX K and F̂X K be the frequency of X pixels I (i 1, j) I (i, j), According to Dk , k 1, 2, ,8 , we define three kinds of pixel sets— H , L , and M : High pixel set: I i, j H max Dk 0 Low pixel set I i, j Pixel subset H O max Dk in I i, j , i, j D8 i, j . SYMBOLS FOR PIXEL SETS 2x 1 namely: S K S K FˆM K FM K . So: FˆM K FHM K FM K FLM K FMH K 1 FHM K FMH K FML K FML K 1 FLM K FHM K 1 FMH K FLM K 1 FML K (3) JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 311 Usually, the message is encrypted, so the message bits are i.i.d. realizations of a binary random variable uniformly distributed on 0,1 , therefore 1 2 [11], hence: S K FMH K FHM K FML K FLM K 1 (4) We use 0 and j denote the embedding ratio along the correct path and incorrect path respectively. Because the number of embedded message bits along the correct path is , and the number of pixels along correct path is too, the embedding ratio along the path generated from correct key K 0 K is 0 1 . We have S K0 FMH K FHM K FML K FLM K 0 1 true in practice since for any payloads, the pixels are chosen with the same probability to the embedding path. The other is that the frequency changes of pixel HM , MH , LM , ML in practical embedding processing aren’t exactly same as that in theory. Despite of the two factors, we find S K 0 S K j holds in all testing images. We apply the method in [10] to the stego images above, the searching ratio of which is 37 keys per second on a Pentium IV machine running at 3.0GHZ, 512MB, while the searching rate of our method is 115 keys per second. 0 (5) ZHAI et al. have proved that Dk follows Generalized Gaussian distribution with mean “zero” in [11], therefore: FHM K , FML K FMH K FLM K If the stego image is not fully embedded, the number of embedded message bits along the incorrect path is smaller than , so the embedding ratio along the path generated from incorrect key K j K is smaller than 1, namely S Kj j 1 . Thus: FMH K FHM K FML K FLM K j 1 0 (7) From (5) and (7), we get: S K0 S Kj (8) Therefore, the difference between correct key and incorrect key is obtained. By calculating S K , K K, the key K with maximal value of S K determined as the correct stego key. can be IV. EXPERIMENTS AND ANALYSIS A. Experiments on Typical Images We perform some experiments to verify (8). We test 100 typical images sized in 512×512, downloaded from http://sipi.usc.edu/services/databased-/Database.html. We simulate the course of LSB embedding using a Matlab routine, in which the key space is 220 . For embedding ratio q 0.60 , we embed encrypted message into each S Kj 220 . B. Experiment on Typical LSB Steganographic Software We conduct this experiment by using “hide and seek 4.1”[12], which is a popular LSB steganographic software. It uses GIF image containing 320×480 pixels as the cover image and adopts random (num) of Borland C++ 3.1 as the generator. The maximal embedding message length is limited to 19000 byte The initialized state of random (num) is a seed with 16 bits, and “num” is used to control maximal migration step, which is related to message and pixel number that haven’t been embedded. So the stego key of Hide and Seek 4.1 consists of the seed and message length. Therefore, we should recover the seed and the exact length of message. We convert 1000 images, downloaded from http://www.cs.washington.edu/research/magedatabse/gro undtruth/, into GIF format with 320×480 pixels and insert different length of message using Hide and Seek 4.1. We perform the experiment with our stego key recovery algorithm described as follows. 0. Estimate the embedding ratio using the method presented in [13] and calculate the possible length of secret message min , max . 1. For each 16 0,2 Ki k min , max , test each seed of the seed generator used in Hide and 1 Seek 4.1 with K ik Ki , pixel set I i, j , i , j image, and calculate the value of S K .We find that S K0 for q 0.60 , K Figure 2. S K (6) k , generate the embedding Path K ik . 2. Calculate S K ik on sequence I i, j , i, j holds in all images. Fig .2. shows the value of S K in “lena.bmp”. We can see that (8) is verified. However, the value of S K 0 is not equal to zero, which is inconsistent with (5). There are two factors resulting to this. One is that we get (5) and (7) under important assumption that all neighboring pixels are unmodified. However, this is not © 2012 ACADEMY PUBLISHER 3. Let S max & k min , max S ik max & S K ik and T Ki , k Ki Path Kik . 0,216 1 S max . 4. If T 1 , then the seed and the message length in T are declared as the correct seed and length. Otherwise, the algorithm couldn’t find any correct key. Because our algorithm uses the method of [13] to 312 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 estimate the message length, where the possible deviation for the estimation of embedding ratio is 0.02, 0.02 , the effective length of stego key we should test actually is 16 log 2 0.04 . Table III shows the results of our method, where Pr(succ) means the probability of success, defined as: Pr(succ)=number of images whose stego key is recovered/ total number of images. TABLE III. EXPERIMENT RESULTS ON HIDE AND SEEK 4.1 Embedding ratio q [2] [3] [4] Pr(succ) 0.001 0 0.005 12.4% 0.06 89.3% 0.08 100% 0.50 100% 0.91 100% 0.95 84.4% 0.97 44.2% 0.99 0 [5] [6] [7] [8] From Table III, we can see the performance of our method in recovering the stego key of LSB steganographic software. When 0.08 q 0.91 , our method can recovery the stego key of all the images. Namely when 0.08 q 0.91 , the probability of S K0 S Kj [9] [10] is 1. But when the embedding ratio approaches to 0 or 1, our method is disabled. Because when q 0 , the number of data is not enough to recovery the stego key, while q 1 , the embedding ratio difference between correct path and incorrect path is so small that the correct key can’t be detected. In addition, the embedding message length influences our stego key search ratio most, because all the elements along the path should be calculated. The larger the message length, the slower the search rate. The testing speed is about 4314500 keys per second. [11] [12] [13] message,” International Journal of Advanced Computer Science and Applications, vol.2, no.3, 2011, pp.19-24. R. Chandramouli, “A mathematical framework for active steganalysis,” Special issue on multimedia watermarking Springer/ACM Multimedia Systems, vol.9, no.3, 2003, pp. 303-311. T. Pevny, J. Fridrich, and A. Ker, “From blind to quantitative steganalysis,” SPIE Electronic Imageing, San Jose, CA, vol.7254, 2009, pp. 0C 1-0C 14. Kang Leng Chiew and Josef Pieprzyk, “Estimating Hidden Message Length in Binary Image Embedded by Using Boundary Pixels Steganography,” 2010 International Conference on Availability, Reliability and Security, Krakow, Poland, 2010, pp.683-688. J.Fridrich, J. Kodovsky, V. Holub and M. Goljan, “Steganalysis of Content-Adaptive Steganography in Spatial Domain,” 13th Information Hiding Conference, Prague, Czech Republic, May 2011. J. Kodovsky, T. Pevny and J. Fridrich, “Modern Steganalysis Can Detect YASS,”SPIE, Electronic Imaging, Media Forensics and Security XII, San Jose, CA, Jan 2010, pp. 02-01 - 02-11. S. Trivedi and R. Chandramouli, “Locally Most Powerful Detector for Secret Key Estimation in Spred Spectrum Data Hiding,” in E. Delp(ed): Proc. SPIE, Security, Steganography, and Watermarking of Multi-media Contents VI, San Jose, vol.5306, 2004, pp. 1-12. S. Trivedi and R. Chandramouli,“Secret Key Estimation in Sequential Steganography,” IEEE Trans. on Siganl Processing, Supplement on Secure Media, 2005. J. Fridrich, M. Goljan, D. Soukal and T. Holotyak, “Searching for the stego key,” Steganography and Watermarking of Multimedia Contents of EI SPIE, San Jose, vol.5306, 2004, pp.70-82. J. Fridrich, M. Goljan, and D. Soukal, “Forensic Steganalysis: Determining the Stego Key in Spatial Domain Steganography,” in Proc. EI SPIE, San Jose, CA, 2005, pp.631-642. ZHAI Wei-dong, LV Shu-wang, and LIU Zhen-hua, “Spatial stego-detecting algorithm in color images based on GGD,” Journal of China Institute of Communications, vol.25, no.2, 2004, pp.33-42. C.Moroney. Hide and Seek 4.1. Available at: http://www.netlink.co.uk/users/hassop/pgp/hdsk41b.zip , 2010. J. Fridrich and M. Goljan, “On Estimation of Secret Message Length in LSB Steganography in Spatial domain,” in Proc. EI SPIE, Jose San, CA, vol.5306, Jan 2004, pp.23-34. V. CONCLUSIONS AND FUTURE WORK We have proposed a novel and simple way in the stego key recovery of LSB steganography which can produce satisfying performance in images. In the future work, we should consider how to reduce computational time and improve the success ratio in searching stego key of images with larger or smaller embedding rate. Finally, we should transfer the method into the spatial sequential LSB steganography, whose stego key can be considered as the beginning and the ending of the message during embedding. REFERENCES [1] Joyshree Nath, and Asoke Nath, “Advanced Steganography Algorithm using Encrypted secret © 2012 ACADEMY PUBLISHER Jing LIU was born in Xuancheng, China in 1985. She received the B.S., M.S. degrees in information security from Zhengzhou information science and technology institute, Henan, China, in 2007 and 2010 respectively. Currently, she is pursuing the PH.D. degree in information security at Zhengzhou information science and technology institute. Her research interest includes information hiding and information security. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Guangming Tang was born in Wuhan, China in 1963. She received the B.S., M.S. and PH.D. degrees in information security from Zhengzhou information science and technology institute, Henan, China, in 1983, 1990, and 2008, respectively. Her fields of professional interest are information hiding, watermarking and software reliability. She is presently a professor with the department of Information Security, Zhengzhou information science and technology institute, Henan, China. She has published 51 research articles and 3 books in these areas. Ms. Tang is a recipient of Provincial Prize for Progress in Science and Technology. © 2012 ACADEMY PUBLISHER 313 314 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Facial Expression Spacial Charts for Describing Dynamic Diversity of Facial Expressions H. Madokoro and K. Sato Department of Machine Intelligence and Systems Engineering, Faculty of Systems Science and Technology, Akita Prefectural University, Yurihonjo, Japan Email: {madokoro, ksato}@akita-pu.ac.jp Abstract— This paper presents a new framework to describe individual facial expression spaces, particularly addressing the dynamic diversity of facial expressions that appear as an exclamation or emotion, to create a unique space for each person. We name this framework Facial Expression Spatial Charts (FESCs). The FESCs are created using Self– Organizing Maps (SOMs) and Fuzzy Adaptive Resonance Theory (ART) of unsupervised neural networks. For facial images with emphasized sparse representations using Gabor wavelet filters, SOMs extract topological information in facial expression images and classify them as categories in the fixed space that are decided by the number of units on the mapping layer. Subsequently, Fuzzy ART integrates categories classified by SOMs using adaptive learning functions under fixed granularity that is controlled by the vigilance parameter. The categories integrated by Fuzzy ART are matched to Expression Levels (ELs) for quantifying facial expression intensity based on the arrangement of facial expressions on Russell’s circumplex model. We designate the category that contains neutral facial expression as the basis category. Actually, FESCs can visualize and represent dynamic diversity of facial expressions consisting of ELs extracted from facial expressions. In the experiment, we created an original facial expression dataset consisting of three facial expressions—happiness, anger, and sadness— obtained from 10 subjects during 7–20 weeks at one-week intervals. Results show that the method can adequately display the dynamic diversity of facial expressions between subjects, in addition to temporal changes in each subject. Moreover, we used stress measurement sheets to obtain temporal changes of stress for analyzing psychological effects of the stress that subjects feel. We estimated stress levels of four grades using Support Vector Machines (SVMs). The mean estimation rates for all 10 subjects and for 5 subjects over more than 10 weeks were, respectively, 68.6% and 77.4%. Index Terms— Facial Expression Spatial Charts, SOMs, Fuzzy ART, SVMs, SRS-18. I. I NTRODUCTION A face sends information of various types. Humans can recognize intentions and emotions from diverse information that is exhibited through facial expressions. Especially for people with whom we share a close relation, This paper is based on “Estimation of Psychological Stress Levels Using Facial Expression Spatial Charts,” by H. Madokoro and K. Sato, which appeared in the Proceedings of the 2010 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2010), Istanbul, c 2010 IEEE. Turkey, October 2010. ° This work was supported in part by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Grant-in-Aid for Young Scientists (B), No.21700257 © 2012 ACADEMY PUBLISHER doi:10.4304/jmm.7.4.314-324 we can feel and understand health conditions or moods directly from facial expressions. For the role of facial expressions in human communication, it is desirable to develop advanced interfaces between humans and machines in the future [1]. In the 1970s, from a study to determine how to express emotions related to facial expressions, Ekman and Friesen defined six facial expressions shown by people feeling six basic emotions (happiness, disgust, surprise, sadness, anger, and fear) that are apparently universal among cultures. They described that these are basic facial expressions because their associated emotions are distinguishable with high accuracy. However, real expressions are blended intermediate facial expressions that often show mixtures of two or three emotions. Human beings often express various facial expressions simultaneously. For example, eyes can express crying but the mouth can express a smile when someone is moved by an extremely kind deed. Moreover, the processes of expressive facial expressions contain individual differences such as differences of face shapes among people. Regarding this difference, Akamatsu described that human faces present diversity of two types: static diversity and dynamic diversity [2]. Static diversity is individual diversity that is configured by facial componential position, size, location, etc., consisting of the eyes, nose, mouth, and ears. We can identify a person and determine their gender and impressions using static diversity. We are able to move facial muscles to express internal emotions unconsciously and sequentially or express emotions as a message. This is called dynamic diversity. Facial expressions are expressed as a shift from a neutral facial expression to one of changed shapes of parts and overall configurations constructed with the face. For studying facial expression analysis, we must consider and understand not only static diversity but also dynamic diversity. When dealing with dynamic diversity of facial expressions, it is necessary to describe relations between physical parameters of facial pattern changes with expressions and psychological parameters of recognized emotion. Physical parameters of facial expressions are the described facial deformations created by expression under consistent measurements of facial patterns that differ depending according to the size and shape of each person. Ekman and Friesen proposed a Facial Action Coding System (FACS) as a method to describe facial JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 expression changes from movements on a facial surface. Viewed comprehensively, the FACS is the most popular and standard method for objective description of facial expressions. It is often used in behavioral sciences and psychology. The FACS was developed originally as a tool to measure facial expressions consisting of anatomical stand-alone Action Units (AUs). The FACS is useful to realize natural and flexible man-machine interfaces in the fields of human cognition and behavior science studies. However, special training is necessary to describe AUs as an observer. Moreover, movements of facial expressions that can not be described as AUs exist in practice because AUs are classified subjectively by an observer to examine numerous quantities of facial expressions. Parameter description methods aside from those of AUs of FACS are examined because FACS systematizes appearances of images as facial expressions, not always as suitable parameters to describe shape changes of the facial surface shown by expressions. In contrast, psychological parameters can be acquired from tasks of cognitive judgments related to emotions indicating facial expression images to a subject as visual stimulation. Although physical parameters of facial expression patterns are unique in each person, psychological parameters of emotion are universal among humans. The emotion to be recognized shows various attributes according to the degree of physical changes of facial expression patterns such as open or closed eyes and mouth. For estimating the degree of emotion, it is necessary to relate physical variations of individual patterns of facial expressions and psychological variations according to their levels. Moreover, the feature space to describe this amount of changes is necessary to describe spaces based on the common scale in each person because emotion is universal. The fact that expressive facial expressions differ among people is associated with the fact that shapes of faces differ among people. For example, the range within which expressions on the facial surface change according to an emotion differs among people. Akamatsu described that adaptive learning mechanisms are necessary to modify models according to characteristics of expression in each person as a platform of a classification mechanism of emotion [2]. For organizing and visualizing facial expression spaces, this paper presents a novel framework to describe the dynamic diversity of facial expressions. The framework accommodates dynamic changes of facial expressions as topological changes of facial patterns driven by facial muscles of expression. For that reason, the framework is suitable to describe the richness of facial expressions using Expression Levels (ELs). The target facial expressions are happiness, anger, and sadness from the basic six facial expressions to represent expression levels as a chart with axes of each expression quantitatively and visually. From temporal facial expression images, we use Self-Organizing Maps (SOMs) [3] that contain self-mapping characteristics to extract facial expression categories according to expressions. We also use Adap- © 2012 ACADEMY PUBLISHER 315 tive Resonance Theory (ART) that contains stability and plasticity that enable classification to integrate categories adaptively under constant granularity. We infer relations between categories created by Fuzzy ART and ELs based on Russell’s circumplex model. This paper presents Facial Expression Spatial Charts (FESCs) to represent dynamic diversity of facial expressions as a dynamic and spatial chart. For the experiment, we created original facial expression datasets including images obtained during 7– 20 weeks at one-week intervals from 10 subjects with three facial expressions: happiness, anger, and sadness. Experimental results show that our method can visualize and quantify facial expressions between subjects and temporal changes for creating FESCs in each subject. We use stress measurement sheets to assess temporal changes of stress in each subject for analyzing psychological stress, which includes the subjects that affect facial expressions. We analyze relations between FESCs and psychological stress values. Moreover, we estimate stress levels from FESCs. This paper consists of the following. We review related work to clarify the position of this study in Section II. In Section III, we define two terms: ELs that quantitatively represent individual facial expression spaces and FESCs that we propose in this paper. In Section IV, we explain our proposed method to capture facial expression images, preprocessing, classification of facial expression patterns with SOMs, integration of facial expression categories with Fuzzy ART, and creation of FESCs based on ELs. We explain our original developed facial expression datasets in Section V. We show results of FESCs for 10 subjects in Section VI. In Section VII, we estimate stress levels using FESCs as an application of our method. Finally, we present conclusions and feature work in Section VIII. II. R ELATED W ORK Approaches dealing with facial expressions have changed from emphasis of spatial factors as an extension of pattern classification for recognition of the six basic facial expressions. Increasingly, temporal changes are analyzed to solve the problems posed by dynamic diversity of facial expressions [4]. In this section, we review related work, particularly addressing the latter approaches. As a study to address facial expression changes and its timing factors, Bassili [5] classified facial expressions using motion feature points captured by markers applied on a human face. However, the influence of motion components is not clear because their method can not control stimulations for dynamic changes of facial expressions in their visual psychological experiment. In recent studies specifically examining dynamic diversity of facial expressions, Ohta et al. [6] proposed a method based on a model of facial structure elements. They described facial expression patterns using parameter values of construction of facial muscles matching with facial video images. They created a variable model of facial structural elements of the eyebrows, eyes, and 316 mouth. Through comparison with a target facial pattern and pre-defined standard patterns, this model can extract expression levels to facilitate recognition of facial expressions. Using expression levels, the intervals and temporal changes can be extracted to reveal patterns in the motion of the face that are related to facial expressions. Moreover, temporal changes of expressions, duration, and termination processes can be detected from increasing and decreasing expression levels. However, setting of standard facial expression patterns is necessary for calculating expression levels. Therefore, this model can only describe the maximum level of the standard facial expression patterns. Moreover, this model includes the drawback that setting the feature points manually on the first frame of facial images is necessary. Nishiyama et al. [7] proposed facial scores as a framework to interpret dynamic aspects of detailed facial expressions based on timing structures of movements of facial parts. They used modes that are segmented facial expressions according to expression levels as an index to describe fine differences in temporal factors of facial expression changes. After dividing a static or dynamic status of movements of facial parts traced using Active Appearance Models (AAMs), facial expression images are segmented against indicators of movements from temporal subtraction norms of feature vectors. The modes are extracted as four patterns to repeat integration of intervals based on hierarchical clustering defined by segmented sectional distances. The capability of describing facial expression degrees is low, although the temporal resolution is high for representation of timing structures from modes. Moreover, various methods have been proposed to extract facial expression intensity [8]–[16]. Similarly, the spatial resolution is as high as four levels, although the temporal resolution is also high. The level of facial expressions differs according to mental state, context, situation, etc. We actively use the difference to take facial expression images for a long term in each subject. We are aiming at describing and quantifying individual facial expression spaces from temporal changes of long-term facial expression datasets. The degrees of expression levels differ among people and their feelings at any particular time. We consider creation of facial expression spaces in constant granularity and adaptively, thereby avoiding quantification of the maximum expression level. Moreover, we seek to obtain facial expression images under consideration of the effects of mental status. III. A ROUSAL L EVELS AND FACIAL E XPRESSION S PATIAL C HARTS In this chapter, we explain ELs that use quantification of individual facial expression spaces and FESCs as a framework to integrate ELs. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Fear Surprise Happiness 15 Happiness 10 Anger Disgust Sadness Neutral 5 0 Sad. (a) Russell's circumplex model Anger (b) FESC Figure 1. Correspondence relations between Russell’s circumplex model and FESC. sity based on the arrangement of facial expressions on Russell’s circumplex model [17], as portrayed in Fig. 1(a). In this model, all emotions are constellated on a two-dimensional space: the pleasure dimension of pleasure–displeasure and the arousal dimension of arousal–sleepiness. The ELs include both features of the pleasure and arousal dimensions. We extract dynamics of facial parts such as eyes, eyebrows, and the mouth as topological changes of facial expressions. Input images are categorized using extracted features of topological changes. The ELs are obtained as sorted categories according to their differences in intensity from expressions that are regarded as neutral facial expressions. B. Facial Expression Spatial Charts Facial Expression Spatial Charts (FESCs) are a new framework to describe facial expression spaces and patterns of ELs constituting each facial expression. Facial expression spaces are spatial configurations of each facial expression that are used to analyze semantic and polar characteristics of various emotions portrayed by facial expressions [2]. They represent a correspondence relation between the physical parameters that present facial changes expressed by facial expressions and the psychological parameters that are recognized as emotions. Psychological parameters can be extracted from psychological experiments to take cognitive decisions related to emotions. Physical parameters must be described based on a certain standard of types and based on the facial deformity that invariably arises from expressions on different facial patterns that differ in each person, as represented by FACS. Our target facial expressions are happiness in the first quadrant, anger in the second quadrant, and sadness in the third quadrant of Russell’s circumplex model. Fig. 1 shows the correspondence relation between Russell’s circumplex model and an FESC. The value of each axis on the FESC shows the maximum values of ELs. The FESC is created by the connection among maximum values of ELs. IV. P ROPOSED METHOD A. Expression Levels A. Whole architecture As described in this paper, we introduce Expression Levels (ELs) for quantifying facial expression inten- Akamatsu described the adaptive learning mechanisms necessary for modification according to individual char- © 2012 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 317 Preprocessing of facial images Input images Acquisition of facial images Smoothing of histogram Coarse graining Generation of FESCs F0 F1 F2 Original image Smoothing of histograms Gabor wavelet filters Mapping Layer Gabor wavelet filters SOMs 200 frames 8 x 9 pixels Input Layer weights (a) SOM Vigilance Parameter (b) Fuzzy ART Figure 3. SOMs and Fuzzy ART. Coarse graining Hapiness 15 Weights B. Preprocessing local feature changes related to expressions. In contrast, feature-based representation demands high calculation costs for the process of extracting and tracing feature points. Moreover, feature-based representation contains problems of precision and stability in cases of numerous samples being processed automatically. For our method, we use view-based representation after converting images with filters of Gabor wavelets showing similar characteristics to those of a human primary visual cortex. Our processing target is to extract ELs from pattern changes of one facial expression from neutral facial expression. Therefore, we consider that the changed parts are apparent on the feature space after converting Gabor wavelets, without tracking of feature points based on AUs The period during which images were obtained was expanded from several weeks to several months. We were unable to constrain external factors completely, e.g. through lighting variations, although we took facial expression images in constant conditions. Therefore, in the first step, brightness values are preprocessed with normalization of the histogram to the target images. In the next step, features are extracted using Gabor wavelet filtering. In the field of computer vision and image processing, information representation of Gabor wavelets is a popular method for an information-processing model based on human visual characteristics. The information representation of Gabor wavelets that can emphasize an arbitrary feature with inner parameters shows the same characteristic of response selectivity in a receptive field. At the final step, we applied downsampling for noise reduction and compression of the data size. In this method, we set the initial position of the template to contain facial parts for capturing facial images. We use templatematching methods to trace the region of interest of a face in real time. However, the trace results of the region of interest yield errors caused by body motion. These errors can be removed through the procedure of downsampling. The downsampling window that we set is 10 × 10 pixels. The dimension of the target images is compressed from 80 × 90 pixels to 8 × 9 pixels. For this study, we use view-based feature representation of holistic images, not feature-based representation such as AUs. Actually, feature-based representation is superior to view-based representation for detailed description of C. Classification of facial patterns with SOMs For classification according to ELs, 200-frame images are normalized in a constant range. In this method, we 10 Fuzzy ART networks FESCs SVMs 5 0 Sad. FESC Anger Estimation of stress grades Figure 2. Procedure of the proposed method from acquisition of facial images to generate FESCs. acteristic features of facial expressions because the processes of expression differ among individuals. For example, a subject expresses facial surface changes of a certain size; the expressions and their sizes differ among individuals because the shapes of faces differ among people. Therefore, in this study, our target is intentional facial expressions of a person. We use SOMs for extracting topological changes of expressions and for normalization with compression in the direction of the temporal axis. In fact, SOMs perform unsupervised classification input data into a mapping space that is defined preliminarily. After classification by SOMs, facial images are integrated using Fuzzy ART, which is an adaptive learning algorithm with stability and plasticity. Fuzzy ART performs unsupervised classification at a constant granularity that is controlled by the vigilance parameter. Therefore, using SOMs and Fuzzy ART, time-series datasets showing changes over a long term are classified with a certain standard. Moreover, as an application of FESCs, we estimate psychological stress levels using Support Vector Machines (SVMs) [18]. Fig. 2 depicts an overview of the procedures used for our proposed method. Detailed procedures of preprocessing, category classification with SOMs, category integration with Fuzzy ART, and stress estimation with SVMs are explained below. © 2012 ACADEMY PUBLISHER 318 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 used SOMs, which are unsupervised neural networks with competitive learning in neighborhood regions. Fig. 3(a) depicts a network architecture of a SOM. The network architecture of SOMs typically includes two layers: the input layer and the mapping layer. All units on the mapping layer are connected to all units of the input layer while maintaining weights between both layers. When a set of input data is propagated, a unit whose weights are the most similar to the input data is burst. Weights on the burst unit and its neighbor units are updated to be close to the input data, which is the learning of SOMs. Similarity among input data limits the features of topological saving that are reflected in the distance of the burst unit on the one-dimensional or two-dimensional units. According to the progress of learning, similar feature weights are mapped to neighbor units; other units are mapped to separate units. The SOM training algorithm is the following. 1) Let wi,j (t) be the weight from the input unit i to the Kohonen unit (n, m) at time t. The weights are initialized with random numbers. 2) Let xi (t) be the input data to the input unit i at time t. The Euclidean distance dj between xi (t) and wi,j (t) is calculated as v u I uX (1) dj = t (xi (t) − wi,j (t))2 . i=1 3) The win unit c is defined, for which dj becomes a minimum as c = argmin(dj ). (2) 4) Let Nc (t) be the units of the neighborhood of the unit c. The weight wi,j (t) inside Nc (t) is updated using the Kohonen training algorithm as (α(t) is the training coefficient, which decreases with time.) wi,j (t + 1) = wi,j (t) + α(t)(xi (t) − wi,j (t)). (3) 5) Training is finished when the iterations reach the maximum number. D. Integration of facial patterns with Fuzzy ART The input data are classified in the fixed number of units of the mapping layer. Therefore, classification results are relative. In contrast, classification under the fixed granularity is required for long-term datasets in each subject. In our method, facial expression categories are integrated with Fuzzy ART to learn weights of SOMs. The use of ART, which was proposed by Grossberg et al., is a theoretical model of unsupervised and selforganizing neural networks forming a category adaptively in real time while maintaining stability and plasticity. Actually, ART has many variations [19]. We use Fuzzy ART [20], into which analog values can be input. Fig. 3(b) depicts a network architecture of SOMs. The network architecture of Fuzzy ART consists of three fields: Field 0 (F0) of receiving input data, Field 1 (F1) for feature representation, and Field 2 (F2) for category representation. The Fuzzy ART algorithm is the following. © 2012 ACADEMY PUBLISHER 0 1 2 0 1 2 0 1 2 3 4 5 (a) Happiness 3 3 4 (c) Anger 4 5 6 7 6 7 8 5 (c) Sadness Figure 4. Mean images in each EL of three facial expression. The quantities of neurons of F1 and F2 are, respectively, m and n. Actually, wi represent the weights between respective F2 neurons i and corresponding F1 neurons. All wi are initialized as one. Fuzzy ART dynamics are determined using a choice parameter a(a > 0), a learning rate parameter r(0 ≤ r ≤ 1), and a vigilance parameter p(0 ≤ p ≤ 1). For each input xi to each unit i on the F2, the choice function Ti is defined as Ti = |xi ∧ wi | . a + |wi | (4) In addition, ic , which is the maximum value c of Ti , is selected for a category as a winner. The category with the smallest index is chosen if more than one unit is maximal. When Tc is selected for a category, the c-th neuron on the F2 is set to 1; other neurons are set to zero. Resonance or resetting is judged as the following equation if the selected category matches xi . For the activation value xi ∧ wc propagated from the signal of the c-th unit on F2 to F1, if the match function is |xi ∧ wc | ≥ p, (5) |i| then resonance occurs at xi and c. The chosen category is decided and the weights are updated as wi0 = r(xi ∧ wi0 ) + (1 − r)wc . (6) Therefore, c is reset if resonance does not occur. The unit to contain the next maximum value Ti is chosen again. Resonance or reset is judged again. Fuzzy ART creates a new unit on F2 if all units are reset. E. Allocation of ELs to FESCs The facial expression categories classified by SOMs and integrated by Fuzzy ART are sorted in the order of ELs from the neutral facial expression category. For this dataset, the number of images of neutral facial expression is the maximum. The neutral facial expression category is selected to the maximum number of images. The ELs are sorted by correlation values in each category. Fig. 4 presents an example of categories corresponding to ELs. The number of ELs of happiness, anger, and sadness are, respectively, eight steps, nine steps, and six steps. This JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 number is the same as the number of categories integrated using Fuzzy ART. The maximum values of ELs are dipicted to an FESC. The center of an FESC is EL=0, which represents a neutral facial expression. With increasing ELs, facial expression categories are assigned to the outside of the triangle. F. Stress Estimation with SVMs In this study, we specifically examine the effects of psychological stress on facial expressions. We measured psychological stress values using a stress checking sheet while taking facial expression images together. As an application of our study, we estimate the stress levels using FESCs. We used SVMs [18], which have high recognition capability, for mapping input data to a high dimensional space using kernel tricks. Kernel function K uses the polynomial kernel, the Radial Basis Function (RBF), and the Sigmoid kernel, etc. For this study, we used RBF defined as ||x − xi ||2 ), (7) λ where λ is the variance of RBF. The property of the Kernel differs in the setting of λ. Therefore, we evaluate our method using results to change in a certain range. K(x, xi ) = exp(− V. DATASETS For this study, we created an original dataset dealing with long-term facial expression changes. Additionally, for analyzing psychological effects of temporal changes of ELs, we measured psychological stress levels using a stress sheet after taking facial expression images each time. A. Facial expression dataset Human show facial expressions of two types: spontaneous facial expressions and intentional facial expressions. Taking a steady and long-term dataset without regard to a camera and a situation is a challenging task, although spontaneous facial expressions present the advantage of corresponding directly to affection or emotion. Moreover, the cause-and-effect relation of facial expressions from emotions is uncertain. In contrast, intentional facial expressions are used as a communication method to communicate something positively to other person, especially in social communication. We set a target to create an original international facial expression datasets to obtain a long-term facial expression dataset for selected subjects. Moreover, intentional facial expression datasets are suitable to keep the number of subjects as a horizontal dataset. Open datasets of facial expression images are released from some universities and research institutes to be used generally in many conventional studies for performance comparisons of facial expression recognition or automatic analysis of facial expressions [1]. However, the specifications vary in each dataset, and among datasets. As static © 2012 ACADEMY PUBLISHER 319 facial images, the dataset presented by Ekman and Friesen is a popular dataset comprising collected various facial expressions used for visual stimulation in psychological examinations of facial expression cognition. As dynamic facial images, the Cohn–Kanade dataset and the Ekman– Hager dataset are widely used, especially in experimental applications [21]. In recent years, the MMI Facial Expression Database presented by Pantic et al. [22] has become a widely used open dataset containing both static and dynamic images. These dynamic datasets contain a sufficient number of subjects as a horizontal dataset. However, images are taken only once for each person. No dataset exists in which the same subject has been traced over a long term. According to the mental status, circumstances, and context of a subject, facial expression patterns differ each week, even for the same subject. We set a long term to obtain facial expression images and thereby create individual models of FESCs and for use in estimating stress levels in each subject. Existing methods created an integrated model for classification and recognition of facial expressions, although expression patterns were affected by individual differences. We create individual models in each subject to extract ELs for creating FESCs and to estimate stress levels using each FESC. For development of individual models, we create long-term facial expression datasets compiled with information gathered during periods as long as 20 weeks. B. Target facial expressions In our daily life, we frequently observe mixed facial expressions rather than a single facial expression. We believe that the estimation accuracy can be improved if mixed facial expression datasets are used. Considering the load for subjects, it is a challenging task to collect numerous samples of mixed facial expressions. Moreover, the mixture ratio of facial expressions remains unclear. Therefore, the target of this study is particular facial expressions of three types. Reducing the load to subjects for taking long-term facial expression images, we selected target facial expressions from the six basic facial expressions described by Ekman. As a cultural factor of expression, Ekman reported that Japanese people express a smile even at times when they feel disgust [23]. We considered that this is a difficult facial expression used by Japanese people. Regarding fear, we received opinions that it is extremely difficult to express poses of fear because it arises only from rare situations that are not usually encountered in daily life. For surprise, many subjects, especially men, feel embarrassed, although they do not feel difficulty in expressing surprise. Moreover, mixed facial expressions with happiness are apparent. Considering these restrictions, opinions from subjects, and the assignment of facial expressions on Russell’s circumplex model, we selected happiness, anger, and sadness as target facial expressions. 320 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 • relax. Subjects were not trained using FACS. We agreed with participants that facial expression images shall not be used except for this research. We did not explain details of this study or datasets used for stress level estimation. 80 pixels D. Stress measurements sl ex ip 09 (a) Acquired images (200 frames) (b) ROI Figure 5. Set of target images and regions of interest. C. Acquisition of facial expression images We took images of three facial expressions with 10 subjects over a long term. The terms of taking images differed among subjects, but images were taken during 7–20 weeks at one-week intervals. Details of subjects are five females (Subjects A, B, C, and D were 19; Subject E was 21) and five males (Subjects F and J were 19; Subjects G, H, and I were 22) university students. We began to take facial expression images when a subject became accustomed to the experimental environment after some trials. Considering generality and usability, we used a USB camera (Qcam, Logicool; Logitec Corp.). We set the environment to simulate a normal indoor condition (lighting by fluorescent lamps). We took frontal facial images to include the region containing facial components such as the eyebrows, eyes, nose, and mouth. We previously indicated to subjects to restrain the head position as much as possible. The images were fit to the constant range including the facial region. We used a method using Haar-like features and Boosting for tracking a face region to adjust the centers of images [24]. Fig. 5 depicts captured images. We set the region of interest to 80 × 90 pixels including the eyebrows, eyes, nose, mouth, cheeks, and jaw, which all contribute to the impression of a whole face as facial feature components. One set of datasets consisted of facial expression image sequences with neutral facial expression and each facial expression to be indicated. As an assumption, we instructed subjects to express an emotion 3–4 times during the image-taking time of 20 s. One set of data consisted of 200 frames with the sampling rate of 10 frames per second. As conditions of this experiment, we previously explained to all subjects the following preferred procedures: • • • • • show posed facial expressions to the camera; avoid movement of the head as much as possible; repeat expressions three times during the acquisition time of 20 s; make a maximum facial expression; return to the neutral facial expression in each interval; and © 2012 ACADEMY PUBLISHER For this study, we used stress measurement sheets known as the Stress Response Scale 18 (SRS-18) by Suzuki et al. [25]. The SRS-18 comprises question sheets that can measure responses related to psychological stress easily in a short time and record many that we meet in our daily life. Specific psychological stress responses are gloom, anxiety, and anger (emotional responses), lethargy and difficulty concentrating (cognitive responses), decreased efficiency of work (behavioral responses), etc. caused by stressors. This sheet can measure stress responses according to three factors: dysphoria or anxiety, displeasure or anger, and lassitude. The SRS-18 has 18 questions that can elicit answers of four types: strongly no, no, yes, and strongly yes. The scores for answers correspond respectively to zero to three points. The range of total points is 0–54 points. A high total score indicates a high level of psychological stress. Moreover, four grades of Level 1 (weak), Level 2 (normal), Level 3 (slightly high), and Level 4 (high) are classified from the points. Subjects complete this sheet before taking facial expression images to reduce the effect from stress checking results. Depending on the subject, we consider that subjective factors affect estimation accuracy because SRS-18 is based on subjective assessments. In this study, we used a salivary amylase monitor produced by Nipro Corp. to measure stress values as inferred from activated salivary amylase. Kashiwano et al. [26] portrays the usability of stress measurements and evaluation using a salivary amylase monitor. For this measurement, subjects are prevented from eating food for two hours beforehand. During the measurement phase, subjects hold a chip in the mouth to take salivary measurements for about 30 s. The load for subjects in this study is very high. Furthermore, the salivary amylase monitor has sensitive response to temporal stress and presents a wide variation of measurement results. In contrast, SRS-18, which consists of only 18 terms as questions, can be completed in a short time. Moreover, SRS-18 can be used to assess various body responses. We consider that SRS-18 is a useful check sheet used in our experiment. VI. E XPERIMENTAL RESULTS In this section, we present results of each step to create an FESC to apply to one subject. Subsequently, we present results of FESCs for application to 10 subjects. A. Results of an FESC (Subject A) For corresponding to ELs, we extracted categories classified with SOMs and integrated with Fuzzy ART. The JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 #2 #1 321 #3 10 lse 9 evL 87 oni 65 ss 4 er 3 px 2 E1 0 10 20 30 40 50 60 70 80 90 100 110 120 130 (a) Happiness 170 180 190 200 10 10 5 5 0 0 Anger #4 10 20 30 40 50 60 70 80 90 Hapiness 15 10 10 5 5 0 0 (b) Angry #2 #1 10 lse 9 evL 8 7 no 6 sis 5 er 4 px 3 E21 Anger 10 20 30 40 50 60 70 80 90 #4 100 110 120 130 140 150 160 170 180 190 200 (c) Sadness Anger (d) 4th week [frames] #3 Hapiness 15 Hapiness 15 10 10 5 5 0 0 Sad. 0 Sad. (c) 3rd week 100 110 120 130 140 150 160 170 180 190 200 Anger (b) 2nd week Hapiness 15 Sad. 0 Sad. (a) 1st week [frames] #3 #2 140 150 160 10 lse 9 evL 87 oni 65 ss er 4 px 3 E21 0 Hapiness 15 Sad. 0 #1 0 Hapiness 15 [frames] Anger (e) 9th week Sad. Anger (f) 20th week Figure 7. FESC (Subject A). Figure 6. Transition of ELs in each facial expression (Subject A at ninth week). category that contains the maximum number of images is selected to the neutral facial expression category. For example, in Fig. ??, the sixth category that corresponds to 60 images is the base category. Each category is put in correspondence with ELs to calculate correlation among categories and to sort to small values of correlation. We evaluated temporal changes of ELs to verify the correspondence of ELs and facial expressions. Fig. 6 presents results of temporal changes of ELs of happiness, anger, and sadness. The horizontal axis depicts the frames that consist of 200 frames in each dataset. The vertical axis depicts ELs. We marked the dashed vertical lines to the start and terminal positions of expression. The subject showed expressions of three or four times during one dataset. In this dataset of Subject A at the ninth week, happiness is expressed three times; anger and sadness are expressed four times. Start and terminal timings of expression are represented as changes of ELs. The ELs are changed according to the expressions, although the result contains slight variation. Fig. 7 depicts some examples of FESCs. The FESCs show temporal changes of facial expression patterns that changed in each week in the same subject. We consider that the changes are attributable to psychological effects. In the next section, we will analyze these results with stress that is assessed as measured using the SRS-18. B. Results of FESCs (10 subjects) We created FESCs for 10 subjects. The terms of taking images differ among subjects for the 7–20 weeks. Fig. 8 portrays average FESCs of 10 subjects. As the facial expression of happiness in all average ELs of all subjects, © 2012 ACADEMY PUBLISHER the maximum value is 9.1 in Subject J. The minimum value is 5.6 in Subject G. As facial expressions of anger, Subjects E and F respectively show the maximum and minimum ELs. As facial expressions of sadness, Subjects C and J respectively show the maximum and the minimum ELs. The triangle of the FESC of Subject J is smaller than those of other subjects. As sexual differences, Kring et al. [27] portray that emotional expressions with facial expressions of women are richer than those of men. In our experience, most female subjects were not averse to showing intense facial expressions in front of a camera. We inferred that male subjects were shy or not good at making facial expressions. This trend appears on FESC as the size of triangles. VII. E STIMATION OF STRESS LEVELS The degree of actual facial expressions is modified by various types of psychological effects, a situation, atmosphere, etc., although spontaneous and intentional facial expressions are triggered by emotional changes and intentional social restrictions, such as when one makes a fake smile. In this study, we have acquired facial expression images continually during a long period in an identical situation. The FESCs show various distributions in each week. Therefore, the ELs that show the degrees of expressions in our method differ each week. In this experiment, we specifically examine the effect between expressions and stress from psychology for estimating stress levels from FESCs. We used SVMs [18], which have high recognition capability, for mapping input data to a high dimensional space using kernel tricks. We evaluated estimation rates using Leave-One-Out Cross Validation (LOOCV). The estimation targets are stress evaluation values of four 322 JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Hapiness 15 Hapiness 15 10 10 5 5 0 0 Sad. Anger (a) Subject A (20 weeks) Anger (b) Subject B (12 weeks) Hapiness 15 Hapiness 15 20 10 10 10 5 5 0 0 0 Sad. Anger (c) Subject C (8 weeks) Sad. Anger (d) Subject D (7 weeks) Hapiness 15 Hapiness 15 10 10 5 5 0 0 Sad. Anger (e) Subject E (7 weeks) Sad. Anger (f) Subject F (11 weeks) Hapiness 15 Hapiness 15 10 10 5 5 0 0 Sad. Anger (g) Subject G (13 weeks) Sad. Sad. ] 100 [% 90 yc rau 80 cc 70 a 60 onit a 50 m tis 40 E 30 Sad. Anger (h) Subject H (13 weeks) Hapiness 15 Hapiness 15 10 10 5 5 0 0 Anger (i) Subject I (10 weeks) Sad. Anger (j) Subject J (7 weeks) Figure 8. FESC of 10 subjects (Subjects A–E, male; Subjects F–J, female). steps: Level 1 (weak), Level 2 (normal), Level 3 (slightly high), and Level 4 (high). Herein, the distribution of target datasets is 48.6% of Level 2, 33.6% of Level 1, 13.1% of Level 3, and 4.7% of Level 1. We used Radial Basis Functions (RBFs) as a kernel function for SVMs. We set the γ parameter, which controls the distribution of Gauss functions, to the inverse number of input vectors. Fig. 9 portrays stress estimation results of 10 subjects. The mean estimation rate is 68.6%. The highest estimation rate is 90.9% of Subject B. Subsequently, the estimation rates of Subjects G and F are, respectively, 84.6% and 81.8%. The estimation rate of Subject A, who had facial expression images taken the most times, is 80.0%. In contrast, the estimation rates of Subjects C and I are the same: 50.0%. The lowest estimation rate is 42.9% of Subject J. The data lengths of Subjects J and C are, respectively, only seven and eight weeks. We consider that this is a difficult problem for SVMs to create a classifier of © 2012 ACADEMY PUBLISHER 90.9 81.8 84.6 80.0 71.4 57.1 50.0 A B C 76.9 D E 50.0 F G H I 42.9 J Subject Figure 9. Stress estimation results with SVM four categories from such few data. Therefore, we selected the datasets of these subjects for more than 10 weeks. The mean estimation rate of Subjects A, B, F, G, H, and I is 77.4%. We consider that the estimation performance will be improved if long-term datasets of more than 10 weeks were obtained to continue to obtain vertical datasets. Using our method, we achieved efficient estimation of stress levels, although we used SVMs under the condition of disproportionate training data distribution. We evaluated all datasets using LOOCV. The number for datasets for each stress level is various. The number for datasets of Level 2 is the largest: about 50%. This rate reaches 80% when including the number of datasets of Level 1. Moreover, five patterns of datasets were produced, which correspond to four subjects; one set of data consisted of stress levels. Six patterns of five subjects produced only two samples. We used all datasets of these few samples without exception, although it is difficult to learn and to estimate these samples using conventional generalization capabilities. To collect these data evenly is a challenging task because stress distributions vary among individuals. We consider that estimation performance will be improved to increase the terms of image acquisition, enabling us to address seasonal transformations. VIII. C ONCLUSION This paper presents FESCs as a framework to describe individual facial expression spaces based on the consideration of facial expressions created by emotion as an individual space in each person. The ELs are created by categories that are classified by SOMs and integrated with Fuzzy ART. The FESCs are created with the axes of ELs of three facial expressions (happiness, anger, and sadness) based on Russell’s circumplex model. We created an original facial expression dataset of 10 subjects (five male subjects and 5 female subjects) for seven weeks. Using this dataset, our method can express individual facial expression spaces using FESCs. Moreover, we used SRS18 for measuring the stress levels of each subject before taking images. We analyzed the effects of psychological stress using FESCs. The results show that happiness and sadness are affected by stress in most subjects. JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Future studies must evaluate intentional and spontaneous facial expressions for discrimination using symmetry properties of the horizontal direction to represent facial expression rhythms created by individual patterns of time changes of ELs. Moreover, we will seek to increase the number of subjects for horizontal studies between subjects and to capture long-term datasets for vertical studies in each subject to analyze and to elucidate relations between facial expressions and stress. ACKNOWLEDGMENT The authors would like to thank fellow engineers at SmartDesign Co. Ltd. for providing much appreciated advice and for developing a system for taking facial images for these experiments. We also thank the 10 students at our university who participated as subjects by letting us take facial images over such a long period. R EFERENCES [1] M. Pantic and L.J.M. Rothkrantz, “Automatic Analysis of Facial Expressions: The State of the Art,” IEEE Trans. on Pattern Anal. and Mach. Intell., vol. 22, no. 12, pp.14241445, Dec. 2000. [2] S. Akamatsu, “Recognition of Facial Expressions by Human and Computer [I]: Facial Expressions in Communications and Their Automatic Analysis by Computer,” The Journal of the Institute of Electronics, Information, and Communication Engineers, vol. 85, no. 9, Sep. 2002, pp. 680-685. [3] T. Kohonen, “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, vol. 43, no. 1, pp. 59–69, 1982. [4] M. Pantic and I. Patras, “Dynamics of Facial Expression: Recognition of Facial Actions and Their Temporal Segments From Face Profile Image Sequences,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 36, no. 22, pp.433449, April 2006. [5] J.N. Bassili, “Facial motion in the perception of faces and of emotional expression,” Journal of Experimental Psychology: Human Perception and Performance, vol. 4, no. 3, pp. 373-379, 1978. [6] H. Ohta, H. Saji, and H. Nakatani, “Recognition of Facial Expressions Using Muscle-Based Feature Models,” Trans. of the Institute of Electronics, Information and Communication Engineers, D-II, vol. J82-DII, no. 7, July 1999, pp. 1129-1139. [7] M. Nishiyama, H. Kawashima, T. Hirayama, and T. Matsuyama, “Facial Expression Representation based on Timing Structures in Faces”, IEEE International Workshop on Analysis and Modeling of Faces and Gestures, pp. 140154, 2005. [8] M. Oda and K. Isono, “Effects of time function and expression speed on the intensity and realism of facial expressions,” Proc. IEEE International Conference on Systems, Man and Cybernetics, pp. 1103–1109, 2008. [9] P. Yang, Q. Liu, and D. N. Metaxas, “RankBoost with l1 regularization for facial expression recognition and intensity estimation,” Proc. IEEE International Conference on Computer Vision, pp. 1018–1025, 2009. [10] K.K. Lee and Y. Xu, “Real-time estimation of facial expression intensity,” Proc. IEEE International Conference on Robotics and Automation, vol. 2, pp. 2567–2572, 2003. [11] M.A. Amin and H. Yan, “Expression intensity measurement from facial images by self organizing maps,” Proc. International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3490–3496, 2008. © 2012 ACADEMY PUBLISHER 323 [12] S. Chin and K.Y. Kim, “Emotional Intensity-based Facial Expression Cloning for Low Polygonal Applications,” IEEE Transactions on Systems, Man, and Cybernetics (Part C), vol. 39, no. 3, pp. 315–330, 2009. [13] J.J. Lien, T. Kanade, J.F. Cohn, and C.C. Li, “Subtly different facial expression recognition and expression intensity estimation,” Proc. IEEE Computer Vision and Pattern Recognition, pp. 853–859, 1998. [14] J.R. Delannoy and J. McDonald, “Automatic estimation of the dynamics of facial expression using a three–level model of intensity,” Proc. IEEE International Conference on Automatic Face & Gesture Recognition, pp. 1-6, 2008. [15] L. Yin, X. Wei, P. Longo, and A. Bhuvanesh, “Analyzing Facial Expressions Using Intensity-Variant 3D Data For Human Computer Interaction,” Proc. International Conference on Pattern Recognition, vol. 1, pp. 1248–1251, 2006. [16] J. Reilly, J. Ghent, and J. McDonald, “Non-Linear Approaches for the Classification of Facial Expressions at Varying Degrees of Intensity,” Proc. Machine Vision and Image Processing Conference, pp. 125–132, 2007. [17] J.A. Russell and M. Bullock, “Multidimensional Scaling of Emotional Facial Expressions: Similarity From Preschoolers to Adults,” Journal of Personality and Social Psychology, vol. 48, 1985, pp. 1290-1298. [18] V. Vapnik, Statistical Learning Theory, Wiley, 1998. [19] G.A Carpenter and S. Grossberg, Pattern Recognition by Self-Organizing Neural Networks, The MIT Press, 1991. [20] G.A Carpenter, S. Grossberg, and D.B. Rosen, “Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system,” Neural Networks, no. 4, pp. 759-771, 1991. [21] T. Kanade, J.F. Cohn, and Y. Tian,“Comprehensive database for facial expression analysis,” Proc. Face and Gesture, pp. 46-53, 2000. [22] M. Pantic, M.F. Valstar, R. Rademaker and L. Maat, “Webbased Database for Facial Expression Analysis”, Proc. IEEE Int’l. Conf. Multimedia and Expo, July 2005. [23] P. Ekman, “Emotions Revealed: Understanding Faces and Feelings,” Orion Publishing, 2004. [24] P. Viola and M.J. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features,” Proc. Computer Vision and Pattern Recognition, pp. 511-518, 2001. [25] S. Suzuki, Stress Response Scale-18, Kokoronet, Jul. 2007. [26] A. Kashino, H. Kashiwa, and K, Suemori, “Stress evaluation of clinical laboratory technologists by the salivary amylase monitor on working time,” The medical journal of Kawasaki Hospital, vol. 5, pp. 25-27, 2010. [27] A.M. Kring and A.H. Gordon, “Sex Differences in Emotion: Expression, Experience, and Physiology,” Journal of Personality and Social Psychology, vol. 74, pp. 686-703, 1998. Hirokazu Madokoro received the ME degree in information engineering from Akita University in 2000 and joined Matsushita Systems Engineering Corporation. He moved to Akita Prefectural Industrial Technology Center and Akita Research Institute of Advanced Technology in 2002 and 2006, respectively. He received the PhD degree from Nara Institute of Science and Technology in 2010. He is currently an assistant professor at the Department of Machine Intelligence and Systems Engineering, Akita Prefectural University. His research interests include in machine learning and robot vision. He is a member of the Robotics Society of Japan, the Japan Society for Welfare Engineering, the Institute of Electronics, Information and Communication Engineers, and the IEEE. 324 Kazuhito Sato received the ME degree in electrical engineering from Akita University in 1975 and joined Hitachi Engineering Corporation. He moved to Akita Prefectural Industrial Technology Center and Akita Research Institute of Advanced Technology in 1979 and 2005, respectively. He received the PhD degree from Akita University in 1997. He is currently an associate professor at the Department of Machine Intelligence and Systems Engineering, Akita Prefectural University. He is engaged in the development of equipment for noninvasive inspection of electronic pats, various kinds of expert systems, and MRI brain image diagnostic algorithms. His current research interests include in biometrics, medical image processing, facial expression analysis, computer vision. He is a member of the Medical Information Society, the Medical Imaging Technology Society, the Japan Society for Welfare Engineering, the Institute of Electronics, Information and Communication Engineers, and the IEEE. © 2012 ACADEMY PUBLISHER JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012 Call for Papers and Special Issue Proposals Aims and Scope. Journal of Multimedia (JMM, ISSN 1796-2048) is a scholarly peer-reviewed international scientific journal published bimonthly, focusing on theories, methods, algorithms, and applications in multimedia. It provides a high profile, leading edge forum for academic researchers, industrial professionals, engineers, consultants, managers, educators and policy makers working in the field to contribute and disseminate innovative new work on multimedia. The Journal of Multimedia covers the breadth of research in multimedia technology and applications. JMM invites original, previously unpublished, research, survey and tutorial papers, plus case studies and short research notes, on both applied and theoretical aspects of multimedia. These areas include, but are not limited to, the following topics: • • • • • • • Multimedia Signal Processing Multimedia Content Understanding Multimedia Interface and Interaction Multimedia Databases and File Systems Multimedia Communication and Networking Multimedia Systems and Devices Multimedia Applications JMM EDICS (Editors Information Classification Scheme) can be found at http://www.academypublisher.com/jmm/jmmedics.html. Special Issue Guidelines Special issues feature specifically aimed and targeted topics of interest contributed by authors responding to a particular Call for Papers or by invitation, edited by guest editor(s). We encourage you to submit proposals for creating special issues in areas that are of interest to the Journal. Preference will be given to proposals that cover some unique aspect of the technology and ones that include subjects that are timely and useful to the readers of the Journal. A Special Issue is typically made of 10 to 15 papers, with each paper 8 to 12 pages of length. The following information should be included as part of the proposal: • • • • • • • Proposed title for the Special Issue Description of the topic area to be focused upon and justification Review process for the selection and rejection of papers. Name, contact, position, affiliation, and biography of the Guest Editor(s) List of potential reviewers Potential authors to the issue Tentative time-table for the call for papers and reviews If a proposal is accepted, the guest editor will be responsible for: • • • • • Preparing the “Call for Papers” to be included on the Journal’s Web site. Distribution of the Call for Papers broadly to various mailing lists and sites. Getting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors. Authors should be informed the Instructions for Authors. Providing us the completed and approved final versions of the papers formatted in the Journal’s style, together with all authors’ contact information. Writing a one- or two-page introductory editorial to be published in the Special Issue. Special Issue for a Conference/Workshop A special issue for a Conference/Workshop is usually released in association with the committee members of the Conference/Workshop like general chairs and/or program chairs who are appointed as the Guest Editors of the Special Issue. Special Issue for a Conference/Workshop is typically made of 10 to 15 papers, with each paper 8 to 12 pages of length. Guest Editors are involved in the following steps in guest-editing a Special Issue based on a Conference/Workshop: • • • • • • • Selecting a Title for the Special Issue, e.g. “Special Issue: Selected Best Papers of XYZ Conference”. Sending us a formal “Letter of Intent” for the Special Issue. Creating a “Call for Papers” for the Special Issue, posting it on the conference web site, and publicizing it to the conference attendees. Information about the Journal and Academy Publisher can be included in the Call for Papers. Establishing criteria for paper selection/rejections. The papers can be nominated based on multiple criteria, e.g. rank in review process plus the evaluation from the Session Chairs and the feedback from the Conference attendees. Selecting and inviting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors. Authors should be informed the Author Instructions. Usually, the Proceedings manuscripts should be expanded and enhanced. Providing us the completed and approved final versions of the papers formatted in the Journal’s style, together with all authors’ contact information. Writing a one- or two-page introductory editorial to be published in the Special Issue. More information is available on the web site at http://www.academypublisher.com/jmm/.