Full Issue in PDF

Transcription

Full Issue in PDF
Journal of Multimedia
ISSN 1796-2048
Volume 7, Number 4, August 2012
Contents
Special Issue: Multimedia Contents Security in Social Networks Applications
Guest Editors: Zhiyong Zhang and Muthucumaru Maheswaran
Guest Editorial
Zhiyong Zhang and Muthucumaru Maheswaran
277
SPECIAL ISSUE PAPERS
DRTEMBB: Dynamic Recommendation Trust Evaluation Model Based on Bidding
Gang Wang and Xiao-lin Gui
279
Block-Based Parallel Intra Prediction Scheme for HEVC
Jie Jiang, Baolong, Wei Mo, and Kefeng Fan
289
Optimized LSB Matching Steganography Based on Fisher Information
Yi-feng Sun, Dan-mei Niu, Guang-ming Tang, and Zhan-zhan Gao
295
A Novel Robust Zero-Watermarking Scheme Based on Discrete Wavelet Transform
Yu Yang, Min Lei, Huaqun Liu, Yajian Zhou, and Qun Luo
303
Stego Key Estimation in LSB Steganography
Jing Liu and Guangming Tang
309
REGULAR PAPERS
Facial Expression Spacial Charts for Describing Dynamic Diversity of Facial Expressions
H. Madokoro and K. Sato
314
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
277
Special Issue on Multimedia Contents Security in Social Networks Applications
Guest Editorial
Recent years have witnessed rapid developments of the information and communication technologies, and the
Next-Generation Internet, 3G and 4G wireless mobile networks have been striding to a large-scale deployment and
application. The emerging versatile network admissions, especially for social network services, tools and applications,
enable a much more convenient access to multimedia contents in anytime, at anywhere, for anyone. Unfortunately, in
such environment, copyright infringement behaviors, such as illicit copying, malicious distribution, unauthorized usage,
free sharing of copyright-protected digital contents, will also become a much more common phenomenon. Some
research frontiers on multimedia contents security in social networks applications have been in progress, including
enhanced security mechanisms, methods and algorithms, trust assessment and risk management in social network
applications, as well as social factors and soft computing in social media distributions.
The special issue attempts to bring together researchers, contents industry engineers and administrators resorting to
the state-of-the-art technologies and ideas to protect valuable multimedia contents and services against attacks and IP
piracy in the emerging social networks. It includes 5 selected papers as follows:
The first paper focuses on an interesting recommendation trust issue in the large-scale distributed computing,
including social networks. Dr. Gang Wang proposed a dynamic recommendation trust evaluated model based on
bidding in E-Commerce environment. By greatly increasing the “criminal cost” of malicious recommendation nodes,
with recommendation optimization algorithm based on Markov chain, the model ensures that the recommendation
service is true and more objective, which will stimulate the enthusiasm of nodes objective recommendation and reach
the goal that nodes give up malicious recommendation and cooperative cheating on its own initiative by objective
method so as to restrain malicious recommendation and cooperative cheating effectively. The model also shows good
restraint on malicious recommendation by a serious of simulation experiments.
Then, the continuous emergence of multimedia video coding standards is cared in the second paper. Especially, the
existing challenges are two folds: one fold is to find efficient coding algorithms which require high performance, and
the other is to speed up the coding process. With recent advancement of VLSI (the Very Large Scale Integration)
semiconductor technology contributing to the emerging digital multimedia word, this paper intends to investigate
efficient parallel architecture for the emerging high efficiency video coding (HEVC) standard to speed up the intra
coding process, without any prediction modes ignored. Dr. Jie Jiang’s experimental implementations of the proposed
algorithm are demonstrated by using a set of video test sequences that are widely used and freely available. The results
show that the proposed algorithm can achieve a satisfying intra parallelism without any significant performance loss.
In the third paper, LSB matching steganography technologies were explored and discussed in detail. The authors
proposed a novel optimized LSB matching steganography scheme based on Fisher Information. The embedding
algorithm is designed to solve the optimization problem, in which Fisher information is the objective function and
embedding transferring probabilities are variables to be optimized. By modeling the groups of elements in a cover
image as Gaussian mixture model, the joint probability distribution of cover elements for each cover image is obtained
by estimating the parameters of Gaussian mixture distribution. Finally, in order to embed message bits, pixels chose to
add or subtract one according to the optimized transferring probabilities of the category. The experiments show that the
security performance of this new algorithm is better than the existing LSB matching.
As one of key technologies for digital rights management, the research on watermarking algorithms is followed. In
this paper, authors denoted the unsolved issues on traditional watermarking algorithms. For instance, the insertion of
watermark into the original signal inevitably introduces some perceptible quality degradation. Another problem is the
inherent conflict between imperceptibility and robustness. Some existing zero-watermarking algorithm available for
audio and image cannot resist against some signal processing manipulations or malicious attacks. In the paper, a novel
audio zero-watermarking scheme based on discrete wavelet transform (DWT) is proposed, which is more efficient and
robust. The experiments show that the algorithm is robust against the common audio signal processing operations such
as MP3 compression, re-sampling, low-pass filtering, cutting-replacement, additive white Gaussian noise and so on.
These results demonstrate that the proposed watermarking method can be a suitable candidate for audio copyright
protection.
In the last paper of the special issue, a burning issue on the hiding harmful information in multimedia conveniently in
the emerging multimedia social networks. Dr. Jing Liu considered the problem of estimating the stego key used for
hiding using least significant bit (LSB) paradigm, which has been proved much difficult than detecting the hidden
message. Previous framework for stego key search was provided by the theory of hypothesis testing. However, the test
threshold is hard to determine. In this paper, we propose a new method, in which the correct key can be identified by
inspecting the difference between stego sequence and its shifted sequence on the embedding path. It’s shown that the
new technique is much simpler and quicker than that previously known.
The above five paper were selected by strict two rounds of peer reviewing based on their originality, relevance,
technical clarity and presentation, by at least two anonymous reviewers. Here, I show gratitude to Prof. Maheswaran for
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.4.277-278
278
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
his collaboration on the successful special issue on the very interesting and challenging topic of multimedia contents
security in social networks. Besides, all invited reviewers are appreciated for their reviewing, comments and
suggestions for authors. Finally, I give special thanks to Prof. Jiebo Luo and Dr. George Sun for their great helps and
efforts so as to the special issue publication on time and successfully.
Guest Editors
Zhiyong Zhang (Email: [email protected])
Department of Computer Science, Henan University of Science & Technology, Luoyang, P. R. of China
Muthucumaru Maheswaran (Email: [email protected])
School of Computer Science, McGill University, Montreal, Canada
Zhiyong Zhang, received his Master, Ph.D. degrees in Computer Science from Dalian University of
Technology and Xidian University, China, respectively, and post-doctoral fellowship at Xi'an Jiaotong
University, China.
He is currently associate professor with College of Electronics Information Engineering, Henan University
of Science & Technology, and research interests include digital rights management and soft computing,
trusted computing and access control, as well as security risk management. Recent years, he has published
over 50 scientific papers and 2 monographs on the above research fields, and held 3 authorized patents.
Dr. Zhang is IEEE Senior Member, IEEE Systems, Man, Cybermetics Society Technical Committee on
Soft Computing, World Federation on Soft Computing Young Researchers Committee, Membership for
Digital Rights Management Technical Specialist Workgroup Attached to China National Audio, Video, Multimedia System and
Device Standardization Technologies Committee, Topic Editor-in-Chief of International Journal of Digital Content Technology and
Its Applications, and Guest Editor of Journal of Multimedia. Besides, he is Chair/Co-Chair and TPC Member for numerous
international workshops/sessions on Digital Rights Management and contents security.
Muthucumaru Maheswaran, received BSc degree in Electrical and Electronic Engineering from
University of Peradeniya, Sri Lanka in 1990. In 1994, he received a MSEE degree in Electrical Engineering
from School of Electrical and Computer Engineering at Purdue University, West Lafayette, Indiana, USA.
From the same school he also received a PhD in Computer Engineering in December 1998.
Prof. Maheswaran is nowadays an associate professor at the School of Computer Science and Department
of Electrical and Computer Engineering of McGill University, Canada. He is directing the research activities
at the Advanced Networking Research Lab, and His research interests are in the general areas of Computer
Networking, Information Security, and Distributed Systems. He has published hundreds of scientific on the
related research realms in numerous international journals and conferences.
© 2012 ACADEMY PUBLISHER
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
279
DRTEMBB: Dynamic Recommendation Trust
Evaluation Model Based on Bidding
Gang Wang1,2,3
1
Department of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China
2
Shanxi Province Key Laboratory of Computer Network, Xi’an Jiaotong University, Xi’an, China
3
School of Information , Xi’an University of Finance and Economics, Xi’an, China
[email protected]
Xiao-lin Gui 1,2
1
Department of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China
2
Shanxi Province Key Laboratory of Computer Network, Xi’an Jiaotong University, Xi’an, China
[email protected]
Abstract—Fraud or cheating of malicious nodes were
restrained by reducing the trust value of malicious nodes in
the past trust model, but few recommendation nodes for
cooperation cheating are punished, so it has had only
limited malicious recommendation or fraudulent action
again and again. Due to lack of punishment for malicious
recommendation of recommendation nodes, enormous
malicious recommendation nodes remain and search for
next criminal opportunity. By borrowing idea of bidding
and the introduction of competition and the mechanism of
rewards and penalty, this paper proposes a dynamic
recommendation trust evaluated model based on bidding in
E-Commerce environment. By greatly increasing the
“criminal cost” of malicious recommendation nodes, with
recommendation optimization algorithm based on Markov
chain, the model ensures that the recommendation service is
true and more objective, which will stimulate the
enthusiasm of nodes objective recommendation and reach
the goal that nodes give up malicious recommendation and
cooperative cheating on its own initiative by objective
method so as to restrain malicious recommendation and
cooperative cheating effectively. In simulation experiment,
the model also shows good restraint on malicious
recommendation.
Index Terms—Bidding, Recommendation Trust, Evaluation
Model,
Recommendation
Ability,
Recommendation
Oligarch
I. INTRODUCTION
With the rapid growth in computing and
communications technology, the past decade has
witnessed a proliferation of powerful e-commerce.
However, the problems also follow it, in which, trust is a
very important problem in mobile e-commerce. Thus,
Manuscript received April 17, 2012; revised May 8, 2011; accepted
August 7, 2012. This work was supported by a grant from National
Science and Technology Major Project of the Ministry of Science and
Technology of China (No.2012ZX03002001 -004) and the National
Natural Science Foundation of China (No. 60873071, No.61172090). Email: [email protected]
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.4.279-288
how to build a secure, reliable and trustworthy trust
evaluation system for mobile e-commerce is a research
hot spot nowadays.
In the large-scale distributed computing, because the
both of services don’t know each other and lack of the
necessary trust basis, there may be malicious node fraud,
false trading and other problems. A better way is to
interact with the high trust nodes by using good trust
security model in a number of interactive objects. M.
Blaze et al. proposed trust management concept in
Literature [1] to solve the issue of trust in distributed
environment at the first time, and depended on this to
develop the corresponding trust management system,
PolicyMaker and KeyNote in Literature [2]. Literature [3]
proposes the idea of using bid to improve the efficiency
of trust query of recommended nodes and the confidence
updating mechanism, and uses mainly the subjective
logic trust model that proposed by A. JøSang in
Literature [4], which decides trust degree of objectevaluated through the introduction of uncertain factor for
evaluation of experiment and by using three factors of
credible, incredible and uncertainty. However, for the
lack of effective incentive, the model can not promote
effectively positive tender of the node. Literature [5]
proposes an electronic community PeerTrust Model
based on local reputation under P2P environment. This
PeerTrust Model gives a comprehensive trust assessment
based on the feedbacks acquired from nodes, the
magnitude of the feedback, trust degree of the feedback,
the critical and non-critical factors that differentiate the
interactive context and the environmental factors related
to the electronic community. Kamvar et al. in Stanford
University in Literature [6] proposes the EigenRep model,
which is a typical global reputation model, specifying the
trust calculation under the P2P environment. Literature [7]
proposes a global trust model that is intended to
overcome the lack of security consideration of EigenRep
Model, for instance, feigning, slandering and lack of
punitive measures and solves the problem of trust
280
recommendation effectively. Tang Wen and others
proposed a subjective trust management model that is
based on fuzzy set theory in Literature [8], and the model
builds a trust model by using fuzzy mathematics method
for the ambiguity of subjective trust. Literature [9]
classifies the trust values of nodes into resource trust
value, node contribution value and node assessment value,
but it ignores the fact that the trust value of a node is
related to its performance with the influence of various
factors. Literature [10] proposed a new trust model under
the P2P E-Commerce environment, and the model
accelerates direct trust by vote in order to stimulate nodes
vote positively to increase the reliability of trust. Tian
Chunqi and others proposed a novel Super-Peer based
trust model for Peer-to-Peer Networks in Literature [11].
Though it solves some trust problems, the quality of
recommendation service(abbreviated as QoRS) could not
be judged effectively, because they didn’t consider the
uncertainty of recommendation information. Literature
[12,13] summarized comprehensively the P2P trust
mechanisms
under
the
distributed
computing
environment, but because of the lack of research and
analysis for collaborative cheating problems, it can not
effectively
reduce
the
emerging
malicious
recommendations issues in the network. Literature [14]
proposed a trust model based on transaction content
similarity, and the model represents the trust degree of
recommendation by taking advantage of service contents
similarity, and improves the reliability of trust
recommendation. Due to the absence of appropriate
incentive and punishment mechanism, there are still some
limitations in reducing the malicious recommendation.
Literature[15] proposed a new trust model based on
advanced D-S evidence theory for P2P networks. Though
the model improves the creditability of evaluation to node,
the model could not effectively mobilize the node
positivity and thus is difficult to ensure healthy
development of network because of lack of incentive and
punishment mechanism for node. Literature [16]
proposed dynamic trust evaluation model under
distributed computing environment, which is based on the
Dempster-Shafer Theory and Shapley entropy, and refers
to the people’s trust relationship model of Sociology. The
trust evaluation model uses historical interactive
information to compute the direct trust. Shapley entropy
evaluates the quantity of information of each node’s
direct trust function, and then the direct trust is revised in
consideration of each node’s reliability and its quantity of
information. Afterwards, they compute the integrative
trust derived from the revised direct trust of all nodes
according to the Dempster rules. In addition to above,
some researchers analyze the trust security model in
terms of social network, such as Sabater and others, who
proposed a reputation-based trust system REGRET [17],
which uses social network analysis and a hierarchical
ontology structure to integrate reputation of different
types so as to calculate the final node trust value.
Through above analysis, although the past history trust
models have reduced a few malicious services and fraud
actions, these models are not good to restrain malicious
© 2012 ACADEMY PUBLISHER
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
recommendation actions of network nodes, we find the
current trust models have the several questions:
(1) Evaluation of the current trust models is to aim at
both parties in transaction, but evaluation for
recommendation nodes is little. So it is the difficult that
trust models find malicious recommendation nodes.
(2) Because QoRS is not only positive correlation with
recommendation success ratio and reliability of
recommendation nodes, but also positive correlation with
improvement degree of recommendation ability. But the
current evaluation method for QoRS is only to depend on
transaction success or not, so evaluation method shows
monotonous and inaccuracy, and evaluation effect is bad.
(3) Because the current models’ incentive and penalty
mechanism depends only on improvement or reduction of
trust value for transaction nodes, these models lack to
encourage the power for active recommendation of
recommendation nodes.
To solve these issues, we proposes a dynamic
recommendation trust evaluation model based on Bidding
for E-Commerce environment in paper, and its novel
contributions are as follows.
(1) We propose a dynamic recommendation trust
evaluation model based on bidding mechanism of ecommerce. Through bidding of recommendation nodes,
many recommendation nodes can gain corresponding
rewards from transactions, so our model can encourage a
lot of recommendation nodes to recommend actively.
(2) We propose a novel evaluation method for QoRS
based on Markov. The evaluation method for QoRS
integrates recommendation success ratio and the
improvement degree of recommendation service ability.
The method can effectively overcome monotonous and
inaccuracy of evaluation, and solves effectively a
problem that recommendation of node of the higher trust
is trustworthy than nodes’ recommendation of the lower
trust, and the method effectively avoids to arise
“recommendation oligarch” in e-commerce network.
(3) We distinguish recommendation nodes between
acquaintance recommendation nodes and strange
recommendation nodes, and distinguish acquaintance
recommendation nodes between direct and indirect
acquaintance recommendation nodes. So we can
effectively ensure the trustworthy of recommendation
according to acquaintance degree of recommendation
nodes with evaluator.
This paper aims to build a dynamic trust evaluation
model based on bidding idea for E-Commerce of under
the social network environment. We summarize the
related research work of the current trust model in
Section I. Section II introduces related key concepts and
characteristics. Section III puts forward a novel dynamic
recommendation trust evaluation model based on bidding.
Section IV introduces evaluation method of
recommendation service quality. Section V simulates the
model by experiment and verified effective of proposing
model. Section VI summarizes the paper and next
research effort.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
II. RELATED CONCEPT AND ATTRIBUTION
Definition2.1. Trust is a type of faith and reliance, which
reflects the subject’s degree of knowledge for the identity
and behavioural intention of the object in a specific time
and context.
Definition2.2. Reputation, also known as prestige, is the
collection of all recommendation value for an entity and
a reflection of the trust that a subject has for the object.
Definition 2.3. Direct Trust, also known as local trust,
refers to the judgment that the subject has for the object’s
ability and trustworthiness and reliability, based on
his/her experience that the subject acquires directly from
the contact and transactions with the object.
Definition2.4.Indirect Trust, also called recommendation
trust, refers to the judgment of the subject on the object’s
ability, trustworthiness and reliability through the
recommendation of a third party.
The relationship of direct trust and recommendation
trust is shown in Figure1.
A
Direct Trust
Direct Trust
B
C
that B will surely return the bicycle. On the other hand, A
does not lend B ten thousand dollars because A has doubt
whether B will be able to return ten thousand dollars.
Definition 2.5. Service Provider, also known target node,
refers to the node which provides source service, and its
trust will be evaluated by service requestor in ecommerce trust network, denoted by SP.
Definition 2.6. Service Recommendation node refers to
the node which provides recommendation of service
source for the service requestor in e-commerce trust
network, in order that gains related economic interest
and trust degree, denoted by SR.
Definition 2.7. Service requestor, also known Evaluator,
refers to the node which evaluates trust of Service
Provider in e-commerce trust network, denoted by E.
Definition 2.8. Recommendation Oligarch refers to a
member
of
recommendation
nodes
whose
recommendation will play the key role for
recommendation of network, and will monopolize
individual of right to recommend of the whole network.
D
Recommendation Trust
Figure 1. Local Trust and Recommendation Trust
Trust is a dualistic relationship, which can be the oneto-one, one-to-many (or individual to group), many-toone (group to individual) and many-to-many (or group to
group) relationships. Trust is a complicated concept that
bears both subjectivity and objectivity. This paper
summarizes the following key attributes of trust:
(1) Subjectivity
Subjective trust is cognitive phenomenon of human
beings about the real world and the subjective judgment
of the subject on the special traits or behaviors of the
object at a specific level. Different subjects have different
subjective judgments at a specific object.
(2) Conditional Transitivity
Transitivity only takes place under specific conditions.
For instance, entity A trusts entity B and entity B trusts
entity C, but it cannot be inferred that entity A trusts
entity C.
(3) Asymmetry
Entity A trusts Entity B, but entity B does not
necessarily trusts entity A.
(4) Context
Trust is closely related to the context and the
environment the subject is in. Under different contexts or
backgrounds, the trust can be different.
(5) Domain Correlation
Trust can be represented differently pending changes of
the domain and scope. For example, a medical doctor
enjoys higher degree of trust in the medical field, but the
degree of trust for him may be very low in the domain of
music.
(6) Content Correlation
Entity A trusts entity B does not mean A trusts all
behaviors of B. As a matter of fact, entity A trusts entity
B only under a specific context for a particular type of
behavior. For instance, A trusts B and is willing to lend
the latter his/her bicycle, this is because that A believes
© 2012 ACADEMY PUBLISHER
281
III. DYNAMIC RECOMMENDATION TRUST EVALUATION
MODEL.
To solve effectively the problem of collaborative
cheating of malicious nodes, we need the effective
assessment and management of the network nodes. From
the point of view as social psychology and Organizational
Behavioral Science, whether network nodes of
participating service are good or malicious nodes, there
are always with a certain purpose, which obtains
corresponding benefits. Based on above analysis, by
borrowing idea of bidding, the gains of normal service
node is the higher than the Non-normal service one from
long-term benefits. (This service includes two cases, one
is to provide resource service, and the other is to provide
a recommendation service). The benefits include two
aspects: one is good nodes make a profit from their
excellent recommendation and the other is the
improvement of the good recommendation nodes
credibility.
Definition3.1. Trust evaluation management model,
which contains three entities of service requestor (also
known as evaluator),service provider(also knows as
resource entities) and service recommendation nodes, is
new network security model based on social psychology,
which is built according to history interaction relation of
three entities.
Definition 3.2. Direct acquaintance node is that those
nodes that have already had direct interaction with the
resource providers.
Definition 3.3. Indirect acquaintance node is that those
nodes that have already had direct interaction with direct
acquaintance node, and haven’t had direct interaction
with evaluator.
Another recommendation node is stranger nodes except
direct acquaintance nodes and indirect acquaintance nodes.
From social psychology, recommendation reliability
and trustworthiness have relation to familiarity of between recommendation nodes and evaluator to a great extent. If there is experience of direct interaction between
282
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
recommendation nodes and evaluator, their recommendation reliability is the higher than recommendation of recommendation nodes of experience of non-direct interaction, and reliability of acquaintance recommendation is
the higher than reliability of stranger recommendation,
reliability of direct acquaintance recommendation is the
higher than indirect acquaintance recommendation. In the
e-commerce network environment, recommender is re-
commendation node, and evaluator is evaluation node,
and objector evaluated is service providing node, also
known target node. In which, recommendation node are
classified into direct acquaintance node, indirect
acquaintance node and strange recommendation node.
Their relation is as following Figure2:
Service requestor
( or Trust evaluator)
node1
Direct Trust
node5
node4
Indirect Trust
node3
Direct Trust
Indirect
acquaintanc
e node
Stranger
node
Stranger
node
node6
Direct acquaintance node
Stranger
node
node2
Service provider
(or Target node)
Trust?
Figure 2. Description for Relationship of Between Network Nodes
In which, Service requestor is requestor of service, and
service provider is the provider of service, and service
recommendation nodes is a entity that recommends
service for service requestor. While there are direct
transaction experiences between service requestor and
service provider, we call two parts have direct trust
relation, on the contrary, there is indirect trust relation.
A. The Basic Process for selecting Bidding Nodes
In dynamic recommendation trust model, the basic
process of selecting bidding nodes is the first is evaluator
to send bidding information; the second, service
providing nodes select to bid according to bidding
information, and bidding nodes are classified two
categories: one is resource service providing nodes; the
other is recommendation bidding nodes. After evaluator
receives bidding application from service providing
nodes , and evaluator queries trust value of bidding
service nodes and evaluation bidding service nodes; the
third is to select trust bidding nodes for contract and
service.
Definition 3.4. Service bidding messages is defined as
following tetrad:
SBM  ( Evaluator _ ID,ServiceContents,TimeStrap,Reward ) (1)
In which, Evaluator_ID is evaluator identity, that is
service requestor identity, and ServiceContents is service
contents of request for service requestor; TimeStrap is
window of bidding time; Reward is that service provider
and service recommendation nodes obtain the
corresponding reward from evaluator, and Reward
includes two sides: one is to obtain corresponding funds
from evaluator; the other is to obtain trust evaluation
from evaluator. Correspondingly, Reward represents to
get corresponding punishment to false recommendation
nodes and malicious service nodes.
Definition 3.5. Resource service application is defined as
following triple:
© 2012 ACADEMY PUBLISHER
RSA  (Service_Provider_ID,ServiceContents,Response_Time)
(2)
In which, Resource_Provider_ID refers to identity of
resource service bidding nodes; ServiceContents refers to
service contents provided by Service_Provider_ID;
Response_Time refers to feedback time of bidding nodes .
Definition 3.6. Recommendation service application is
defined as following tetrad:
RSA'  (Recommendator_ID,Service_Provider_ID,
Recommendation_Trust,Recommedation_Respones_Time)
(3)
In which, Recommendator_ID refers to identity of
recommendation service node; Service_Provider_ID
refers to identity of resource provider; Recommendation _
Trust refers to recommendation trust, Recommendation_
Response_Time
refers
to
feedback
time
of
recommendation nodes.
B. Trust Computing
Trust computing consists of direct trust computing and
recommendation trust computing. (Also known as
indirect trust computing). Through integration of direct
trust and recommendation trust, service requestor (or
evaluator) can gain the global trust value of service
provider. From research of literature[14], we find that
there is more closely relationship in two factors to the
global trust: one is the objectivity and accuracy of
evaluation of coming from recommendation service
nodes and service requestor to service provider, which
depends on similarity of service contents between each
evaluator and service provider and each recommendation
service node and service provider in a period of time; the
other is the global trust which depend on familiarity
between service requestor and recommendation nodes.
This relationship conform also social psychology, that is
to say, the trustworthiness of acquaintance node
evaluation is generally higher than that of stranger nodes,
whereas the evaluation of direct acquaintance nodes is
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
283
comparatively more reliable than indirect acquaintance
nodes.
(1) Direct trust computing
The so-called direct Trust DTSPE refers direct trust value
determined by the history of interactions between
evaluator E and service providing nodes.
DTSPE 
E
SP
G
E
S SP
1
(4)
S SP  FSPE  2
E
S
F
E
SP
E
SP
(5)
In which, DT represent direct trust value , S
represents the number of times that E and SP have been
E
happy with the interaction, i.e., the success rate; GSP
specifies the overall times of the transactions between E
E
E
 S SP
 FSPE , FSPE refer the number of times
and SP, GSP
that E and SP have been unhappy with the interaction, or
E
rate of failures. when GSP
=0, DTSPE =0.5。 DTSPE is not
E
SP
E
SP
equal with DTESP .
Since trust is a dynamic Variable and has a
characteristic of time decay, this paper introduces an
influence function of time:
1
f (k )  
k
n
S
S
k 1
E , SP
Finally, S
* f (k )  S
E
SP
E
SP
and F
k
E , SP
,
F
k 1
E , SP
F
* f (k )  F
k
E , SP
is to compute direct trust DTSPE of node E to node SP in kth time
(8)
DTEk, SP  ( sEk , SP  1) ( sEk , SP  f Ek, SP  2)
(2)general recommend trust computing
From Literature [14] obtains general recommend trust
computing
RT SP 
 Simi, j  ( 
i , j 1



 i j
i
SP
 DT      DT ) / n
i
SP
j
SP
j
SP
(9)
RTSP is the total recommendation Trust Evaluation for
node SP , Simi , j refers to the similarity of service
i
content CSP
and CSPj between node i to SP and node j to SP,
i
SP
and SPj respectively refer to acquaintance
recommendation weight and stranger recommendation
oi  DToi
i
weight; in which SP
and SPj : 

j

o  0.5
 refers to the recommendation weight of direct
acquaintance node or indirect acquaintance, and this
represents the trust degree of the recommending node for
the recommended node;  is set at:
© 2012 ACADEMY PUBLISHER
1 n
 DTSPi
n i 1
(10)
if SRi =IASRi
,while  DTSPi  0 , then the node is a newly
n
i 1
joining node or a dormant node.
In the process of trust recommendation, because there
are two phenomena that recommendation path has the
cross and independent, in which select a recommendation
path of having highest trust value. In recommendation
path shown by Figure3, ACE1B will be selected.
0 . 88
CC
9
0.
0 . 66
0.9
E1
0.6
D
0.75
A
E2
....
..
B
54
0.
....
..
Em
Figure 3. Recommendation Trust Path
(3)service content similarity computing
This paper computes the similarity of service contents
by vector included angle cosine method, and improves
recommendation credibility; its computing formula is
following:
cos 
(7)
are brought into formula (1), that
n

F
k
E,SP
if SRi =DASRi
Because there is a node cross situation in
recommendation path of acquaintance nodes, let’s set α
threshold ε, When α<ε, the recommendation path shall be
abandoned.
 refers to stranger recommendation weight. If
,0    1
(6)
There is the different influence in interaction
information of the different time to trust computing, so
there is a corresponding time impact factor in every time
in order that node trust computing is the more accurate
than history trust computing. In a given period of time tk,
E
success number S SP
and failure number FSPE with
influence function of time f (k ) represent formula (6):
k
E , SP

  DTSPi

1 n

i
j
i
j
SP
  n   PTSP * PTSP  PT j  PTSP  PTl 
N 1, i  j  k

a b
(11)
a  b
In which, a denotes service vector that service
provider has provided for recommendation nodes in the
period of time, and b denotes service vector that service
provider will provide for evaluator this time, and their
similarity is creditability of evaluator to recommendation
node.
In which, ai   xi1 , xi 2 , xi 3 , xi 4  , bi   x j1, x j 2 , x j 3 , x j 4 
0  i, j  n ,and they respectively refer to Service
Contents, Response Time, Service Completion, Service
Cost. Index set weight of Service vector has different
situation at different distributed computing environment.
After the above analysis, service similarity coefficient
of this paper denotes as following:
4
Simi , j 
 k xik   k x jk
k 1
2

  4

  k xik    k x jk 
 k 1
  k 1

4
2
(12)
In which,  k denote the weight of the k-th index.
(4)Global Trust
Global trust GTOE is to get trust value of service provider
through fusion of recommendation trust and direct trust.
Its computation equation is as follows:
284
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
 DTOE 
GTOE       

 RT 
 o 
(13)
In which,     1 ,  ,    0,1 ,  is a weight factor of
direct trust, and  is a weight factor of recommendation
trust. because trust is a multi-attribute object, and is
gradually attenuation with time,  、  is going to
dynamic change with the numbers of interaction and
others. While value of  is larger and value of  is
smaller, proportion of direct trust will become larger and
larger, and proportion of recommendation will become
smaller and smaller as the numbers of interaction increase
step by step between evaluator and service provider.
k
 1  nk
  1    ,n  k  0
2
(14)
C. Trust Update
The update of Trust value includes two aspects that
one is the update of trust value of service provider, and
the other is the update of trust value of the
recommendation nodes. According to the final interaction
situation and the global trust value, trust value of the
service provider is updated positively (or negatively); and
recommendation trust of each recommendation node is
updated according to  of equation(14).
E
  RTSP  RSP
(15)
In which, RT SP represents general recommendation
E
trust value of all recommendation nodes, and RSP
represents recommendation trust value of each
recommendation node E to service provider SP.  is the
E
difference between RT SP and RSP
. We set update
threshold  , while  is bigger than  , it is the bigger
that represents distance of general recommendation trust
with recommendation trust of recommendation node i,
which recommendation node i is untrustworthy node;
while  is smaller than  or equivalent to  , it is closer
that represents distance of general recommendation trust
with recommendation trust of recommendation node i,
and recommendation trust of recommendation node is the
more accurate, and recommendation node i is a trust node.
 takes the average of all recommendation trust value.
 
1 k
 ri
n i 1
(16)
D. Model Algorithm Description
Input: Initialize network nodes, generate network
nodes
Output: a list of trusted nodes, a list of excellent
recommendation nodes
Step(1): service requestor(that is evaluator) publishes
service bidding messages to network, then resource
service providers receive messages to feed back its
resource service bidding application;
Step(2): Resource service bidding nodes are fed back
network and publicity, and recommendation service node
will evaluate resource service bidding nodes and send to
recommendation service bidding messages to network;
© 2012 ACADEMY PUBLISHER
Step(3): According to demand conditions of
recommendation service bidding, a few recommendation
service bidding applications are cleaned by filter because
they fall short of bidding request.
Step(4): Recommendation trust of each resource
service bidding node is to be fused, in order to compute
the general recommendation trust value;
Step(5): According to direct history interaction
between resource service bidding node and evaluator, to
compute the global trust value through fusion of direct
trust and recommendation trust;
Step(6): While interactions are successful, there are a
comparison between  and threshold value  by
using(4), while    , trust values is positive updated;
else trust values is negative updated; and according to
quality of recommendation service, a list of excellent
recommendation nodes are produced automatically by
this system, at the same time, and system notices
evaluator pays fee to them.
IV. QORS EVALUATION METHOD
It is difficult to measure QoRS in the past historical
trust model because the recommendation service ability
of nodes is always developed and changed. Based on the
analysis results, we find that the QoRS not only depends
on the accuracy of the recommended trust service, but
also depends on the degree of improvement of
recommended service ability of the recommended node.
The past history recommendation trust models often
ignore improvement degree of recommendation service
ability. We propose a dynamic recommendation
evaluation method based on Markov chain, Abbreviated
as DREM, the method not only considers accuracy of
recommendation service, but considers improvement
degree of recommendation service ability. The basic
process for evaluation of QoRS is as follows:
Step1: To compute the recommendation accuracy of
each trust recommendation node according to
formula(15);
Step2: To compute improvement degree of
recommendation ability of each recommendation node by
DRETMBB;
Step3: To modify recommendation service quality
value according to Step2, and get the list of excellent
recommendation service nodes and the list of bad
recommendation service nodes.
Markov chain is under the conditions that the state of
process (or system) has known at time t0, and conditional
distribution of the state of the process at time t>t0 is
unrelated with the state of the process before at time t0,
that is known to process under the conditions "now", its
"future" does not depend on "past", this process is called
Markov process [19]. The mathematical concept is as
follows:
Definition 4.1: It is supposed that the state space of
random process  X (t ), t  T  is I, if time t has any value of
n, that is t1  t2   tn , n  3, ti  T , under the conditions
X (ti )  xi , xi  I , i  1, 2, , n  1 ,
of
the
condition
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
285
 P11

P
Pw   21


 Pn1
distribution function of X (tn ) equal exactly the condition
distribution function of X (tn ) under the condition of
X (tn 1 )  xn 1 , that is,
P  X (tn )  xn X (t1 )  x1 , X (t 2 )  x2 ,
, X (t n 1 )  xn 1 
(17)
P  X (tn )  xn X (tn 1 )  xn 1
That we call the process  X (t ), t  T  have properties of
Markov, or the process is called the Markov process,
which both time and state are discrete is called Markov
chain. The step of DREM is as follows:
Step 1: Evaluation rating
To evaluate recommendation ability of nodes, at first,
this article builds evaluation rating, and represents the
level with evaluation indexing set V{v1,v2,v3,…,vn} of
recommendation ability.
Step 2: Rating and the proportion of the number of
rating
To count the number of different levels of evaluating
recommendation nodes, Mi refers to the number of rate in
leveli. Si' shows the proportion of the number of rate in
present leveli , formula is as following:
n
 M i  N ,Si' 
i 1
Mi
N
the evaluation status of recommended ability of node w,
referred to as the state. Parameter t (t is a discrete
magnitude) represents time, so that the k-th evaluation
status of node w can be shown as S wt . When t changes, the
evaluation status of w will change with its behavior
change, and the state transition will occurred.
Step 4: Transition probability matrix
If S wk is evaluation status of w in the k-th time, the
previous evaluation state vector is S wk 1 , as all known,
Pij
Pn 2
Because Markov process has the ergodicity [21], when t
tends to ∞, S wt 1  S wt  S wk . As long as steadystate S w  ( S1 , S2 , , Sn ) is calculated, we can calculate the
degree of improvement of node w by using S w ,
S w obtained from equation (18):
 Sw  Pw  S w
 n

  Si  1
 i 1
n
Fw   Si xi
© 2012 ACADEMY PUBLISHER
(21)
i 1
Though we obtain the value of Fw , Fw does not
represent the actual recommendation ability of the node,
and just shows that the degree of improvement of node
during this time. If the result of Fw is big, only indicates
the degree of improvement of node is the bigger to
compare with previous one.
Example 4.1. If there are recommended nodes w1,w2,
and the evaluation status is divided into 5 levels (high
degree of confidence, trust, basic trust, distrust, a high
degree of distrust),shown as(HT,T, BT,DT,HDT),
the number and transition of state of nodes w1,w2 are
shown as Table I and Table II。
Table I.
Evaluation of Node w1
total
(20)
If x1 , x2 , , xn are given, the progress results can be
quantified, and we can obtain scores of recommended
ability, and it shows as Fw ,
the Kolmogorov equation[20], in
which, Pw is one step transition probability matrix.
Previous
evaluation
(19)
the
level
“i”
to
the
level
“j”,
and 0  Pij  1(i, j  1, 2, , n), Pij  1,(i  1, 2, , n) ,
S wk 1  Pw  S wk by
Number
of people
P1n 

P2 n 


Pmn 
In which, Pij expresses the transition probability from
(18)
Step 3: State with the state transition
This article uses the vector Swt  (S1t , S2t , , S tj ) to express
P12
P22
The evaluation
Total
Grade
HT T BT DT HDT
HT
T
BT
DT
HDT
4
2
0
0
0
1 0 0
18 1 0
2 2 0
0 0 0
0 0 0
0
0
0
0
0
5*
21
4
0
0
6 21 3 0
0
30
286
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Table II.
Evaluation of Node w2
The evaluation
Number
of people
Total
Previous
evaluation
Grade
HT T BT DT HDT
HT
T
BT
DT
HDT
1
2
1
0
0
Total
1
24
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
2*
26
2
0
0
4 25 1 0
0
30
(In table I and table II, the meaning of ”*” figure represents that there are five evaluators in previous evaluation ,and their evaluation results are
“HT”, but in the current evaluation, evaluation results of the five peoples are respectively four peoples give “HT”, one people gives “T” . The others
are same argument.)
The
actual
status
of
the
last
evaluation:
After recommendation service accuracy of each node is
i
calculated from formula(15)   RTSP  RSP
, and to select
current
evaluation:
recommendation node that recommendation accuracy is
S w1  (5 30, 21 30, 4 30,0,0)
S w2  (2 30, 26 30, 2 30,0,0)
The
actual
state
of
the
S w'  (6 30, 21 30,3 30,0,0)
higher than others, then modifies the list of excellent
S w'  (4 30, 25 30,1 30,0,0)
recommendation nodes selected according to getting
1
2
nodes whose the progress speed of recommendation
A transition probability Pw1 and Pw2 are as follows:
service is faster than others, so it improved a
1
4
0 0 0
5
 5

2
18
1
0 0
21
21
 21

Pw1  

1
1
0
0
0


2
2
 0
0
0 0 0


 0
0
0 0 0 
1
1
2
 2
1
12
13
 13
Pw2   1
0
 2
 0
0

0
 0
comprehensive
quality
of
V. SIMULATION EXPERIMENT AND RESULTS ANALYSIS
from ( I  Pw )T S wT  0 we can get steady state of w1 and w2,
.same
argument, Sw2  (2 15,13 15,0,0,0) .
If the specific score of the evaluation rating is FT=90,
T=80 , BT=60 , NT=50 , NFT=30 , Scores of
recommendation ability of the node w1 and w2 are
respectively Fw1 =81.8, Fw2 =81.3,improvement of the
© 2012 ACADEMY PUBLISHER
the
of excellent recommendation nodes.
0 0

0 0 0

1
0 0 
2
0 0 0

0 0 0 
node w1 is the faster than the node w2.
(5)comprehensive evaluation of QoRS
to
recommendation service, and the final we obtain the list
0
 1
1
0
0 0
5
 5
  S1 
 2
1
1
0 0   S 2 
7
21
 21

 0
  S3   0
1
1

0
0

 
2
2
 S4 
 0
0
0
1 0  

  S5 
 0
0
0
0 1 
So we can get Sw1  (10 33,7 11,2 33,0,0)
evaluation
Our simulation experiment shows that our proposed
trust evaluation model can restrain effectively strategic
deception of nodes and malicious recommendation of
nodes and dynamically adapt to this model.
We simulate the operation of DRTEMBB model
through experiment and verifies the results of the
experiment by operating the model in a small-scale
network. The experiment involves 100 nodes and 100
services. These services are randomly assigned to each
node, marking sure that each node is assigned with at
least one service, and each simulation is supposed to
comprise several cycles, and there must be one
interaction at each node within each cycle. The trust
value of the initial node is set at 0.5, indicating it is both
trustworthy and untrustworthy. The simulation
experiment is conducted under Java environment. And
Operating Environment is CPU 3.OG, and Memory is 2G.
Experiment 5.1: Analysis aiming at strategic deception
of malicious nodes.
This experiment aim at the issue of strategic deception
of malicious nodes with time change, and inspects
perception of DRTEMBB model to strategic deception of
nodes, and analyses the difference of DRTEMBB model
with EigenRep model, the normal model to strategic
deception. The experiment assumes malicious nodes
adopt suddenly cheating service action after malicious
nodes have accumulated higher trust through successful
interaction in a period of time.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
287
We find when the rate of malicious nodes is set at 40%
and cycles is over 50, EigenRep shows transaction
success ratio begins decrease, but DRTEMBB is stable.
As Cycle is increasing, the effect of DRTEMBB is
always better than EigenRep model. DRTEMBB Model
demonstrates a very good success ratio with the increase
of transaction cycles.
1
0.9
0.8
0.7
DRETMBB
0.6
0.5
EigenRep
0.4
General
0.3
0.2
0.1
0
5
10
15
20
25
30
35
40
45
50
Time cycle
Figure 4. Trust Curve of Malicious Peers
From Figure 4, we find that DRTEMBB model is more
sensitive to sudden change of malicious nodes service
action as transaction time cycle is creasing. While
malicious nodes adopt intermittence cheating service
action, trust decline rapidly, and as time goes by, and
node’s trust decline is the faster than its trust restoration,
from experiment results we find DRTEMBB model can
identify effectively strategic of cheating nodes, and its
effect is the better than the EigenRep model and the
general model.
Experiment 5.2: Analysis aiming at the scale of
malicious recommendation nodes.
DRETMB
B
EigenR
ep
0.9
Transaction success ratio
1
0.9
0.8
0.7
0.6
0.5
0.4
DRETMBB
Hassan
0.3
0.2
0.1
0
10
20
30
40
50
Impact of malicious percentage on success ratio
Figure 7. Impact of Malicious Recommendation Nodes Percentage to
Transaction Success Rate
1
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Percent rate of malicious node
Figure 5. Success Ratio of when Malicious Nodes Are Changed
From Figure 5, we find that the success ratio of
transaction decreases rapidly when malicious nodes
continues to rise. When the number of malicious node
gets above 60%, DRTEMBB Model maintains a quite
high success ratio of transactions. Whether malicious
node is individual or collaborative, DRTEMBB model
demonstrates good inhibitory effect from the experiment
results.
Experiment 5.3: Analysis aiming at fixed rate of
malicious nodes
1
0.9
Transaction success ratio
Experiment 5.4: Analysis directing to malicious
recommendation attack
0.8
Figure7.shows, with more and more increasing of
dishonest node ratio, that trust computing accurateness
declines gradually, which leads to the increasing of
transaction failure numbers and the decreasing of the
success ratio. Since each dishonest recommendation will
influence trust value of itself, and will lead to self trust
value to be decreased along with dishonest node ratio
increasing, the decline magnitude of the success ratio of
transaction is the larger. Hassan model has a kind of
resistant ability, but because Hassan assumes that
recommendation nodes have higher trust value,
meanwhile, Hassan model lacks punishment mechanism,
and it cannot effectively weaken the influence of
dishonest recommendation to trust, and the success ratio
of transaction decreases rapidly.
DRTEMBB model shows that the dishonest node can
be shielded through the other honest nodes of
recommendation path, so decline of transaction success
ratio is slower than Hassan model. In contrast to Hassan
model, DRTEMBB model can filter malicious
recommendation nodes more effectively and make
recommendation information to be more accurate through
leading into trade goods concept similarity and dynamic
adjustment trust value.
0.7
VI. CONCLUSION AND FURTHER WORKS
0.6
DRETMBB
EigenRep
0.5
0.4
0.3
0.2
0.1
0
10
20
30
Cycles
40
50
Figure 6. Transactions Success Ratio of The Cycle In a Fixed Malicious
Nodes Rate
We set the rate of malicious nodes in this experiment in
order to observe the effect of DRTEMBB. From Figure6,
© 2012 ACADEMY PUBLISHER
Trust security is an important research domain in ecommerce environment. Though many researchers have
proposed a few trust security models, there are also many
questions in these trust models. So we propose a dynamic
recommendation trust evaluation model, and we propose
a novel evaluation method of QoRS in the model. By
experiment results we could know our model is effective
and good for restraining malicious recommendation and
fraud service actions. At the same time, the novel
evaluation method of QoRS can be effectively used to
288
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
ascertain the improvement degree of recommendation
service ability, and evaluation for QoRS is the accuracy
and objective through integrating recommendation
success
ratio
and
improvement
degree
of
recommendation service ability, and the method solves
effectively a problem that recommendation of the higher
trust node is the better than the others, at the same time,
and avoids to arise recommendation oligarch. In addition,
we propose recommendation nodes bidding method,
bidding
method
can
encourage
effectively
recommendation nodes to the correct recommendation
actively.
Our future work will analyze the other methods for
recommendation evaluation model
such as multiattributes analysis evaluation method, and fuzzy
mathematic analysis for weight of multi-attributes.
[10]
[11]
[12]
[13]
[14]
ACKNOWLEDGMENT
This work was supported by a grant from National
Science and Technology Major Project of the Ministry of
Science and Technology of China (No.2012ZX03002001
-004) and the National Natural Science Foundation of
China (No. 60873071, No.61172090). The authors also
gratefully acknowledge the helpful comments and
suggestions of the reviewers, which have improved the
presentation.
REFERENCES
[1] M Blaze, J Feigenbaum, J Lacy. Decentralized trust
management[A].In Proc.17th IEEE Symposium on
Security and Privacy[C].Los Alamitos: IEEE Computer
Society Press,1996.164-173.
[2] M.Blaze,Feigenbaum J,Keromytis AD. Keynote: Trust
Management
for
public-key
infrastructures.
In:Christianson B,Crispo B,William S,et al., oeds.
Cambridge 1998 Seeurity Protocols International
Workshop. Berlin:Springer-Verglag,1999.59~63.
[3] CHEN Chao,WANG Ru-Chuan, ZHANG Lin, DENG
Yong. A Kind of Trust Appraisal Model Based on Bids in
Grid[J]. Journal of NanJing University of Posts and
Telecommunications(Natural Science), 2009, 29(5) :5964.
[4] A.JøSang. A Logic For Uncertain Probabilities[J].
International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, Vol. 9, No. 3 (June 2001),
279-311.
[5] Xiong L, Liu L. PeerTrust: Supporting reputation-based
trust for peer-to-peer electronic communities. IEEE Trans.
on Knowledge And Data Engineering, 2004,16(7):843-857.
[6] Kamvar S D , Schlosser M T. EigenRep: Reputation
management in P2P networks[C]//Lawrence S.Proc of the
12th Int’l World Wide Web Conf.Budapest : ACM
Press,2003:123-134.
[7] Dou Wen, Wang Huai-Min, Jia Yan, Zhou Peng. A
Recommendation-Based Peer-to-Peer Trust Model[J].
Journal of Software, 2004,15(4):571-583.
[8] Tang Wen, Chen Zhong. Research of subjective trust
management model based on the fuzzy set theory. Journal
of Software, 2003, 14(8): 1401-1408(in Chinese)
[9] Iguchi M,Terada M,Fujimura K. Managing resource
and service reputation in P2P networks[C]//Proceedings of
© 2012 ACADEMY PUBLISHER
[15]
[16]
[17]
[18]
[19]
[20]
[21]
the 37th Hawaii International Conference on System
Sciences,2004.
WANG Yu, ZHAO Yue-long, HOU Fang. A New Security
Trust Model For Peer-to-Peer E-commerc. //Management
of e-Commerce and e-Government, 2008. ICMECG '08.
International Conference on
Tian ChunQi, Jiang JianHui, Hu ZhiGuo, etc,.A Novel
Super-Peer Based Trust Model for Peer-to-Peer
Networks[J]. Journal of computer, 2010,33(2):345~355.
(In Chinese).
Li Xiao-Yong, Gui XiaoLin. Research on Dynamic Trust
Model For Large Scale Distributed Environment[J].
Journal of Software, 2007, 18(6):1510~1521(In Chinese)
LI YongJun,DAI YaFei.Research on Trust Mechanism for
Peer-to-Peer
Network[J].
Chinese
Journal
of
computer,2010,33(3):390~405
Wang Gang, Gui XiaoLin, Wei GuangFu. A
Recommendation Trust Model Based on E-Commerce
Transactions Content-Similarity[C]. 2010 International
Conference on Machine Vision and Human-machine
Interface, 2010
Tian ChunQi, Zhou ShiHong, Wang WenDong, Cheng
ShiRui.A New Trust Model Based on Advanced D-S
Evidence Theory for P2P Networks[J]. Journal of
Electronics&Information echnology,2008,30(6) : 14801484
ZHU YouWen,HUANG LiuSheng,CHEN GuoLiang,
YANG Wei.Dynamic Trust Evaluation Model Under
Distributed Computing Environment[J]. Chinese Journal of
computer,2011,34(1):55-64
J Sabater,C Sierra.REGRET:A reputation model for
gregarious societies[A].Fourth Workshop on Deception
Fraud and Trust in Agent Societies[C].2001.61-70
(America) A • Paul Samuelson, Nordhaus, William • D
•,Translated by Hongye Gao. 《 economics 》 ( 12th
Edition). China Development Press,1992.
Zhang Heng. A Application of Markov Chain[J].
Changchun Institute of Optics and Fine Mechanics,
1994,17(3):44-49.
Zhang XiaoCheng, Li Jing, Gen Wei. Economic Applied
Mathematics [M]. Nankai University Press,2008
Wang RongXin. Random process[M].Xi’an JiaoTong
University Press,1987.
Gang Wang, born in 1974, Ph.D.candidate,
His current research interests include
mainly Trust security, Service computing,
Internet of things and cloud computing.
XiaoLin Gui, born in 1966, Ph.D.,
Professor, Ph.D. supervisor. The chair
professor of the Department of Computer
Science, Xi’an Jiaotong University, His
research
interests
include
Service
computing, Internet of things, dynamic
trust management and cloud computing.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
289
Block-Based Parallel Intra Prediction Scheme for
HEVC
Jie Jiang
Institute of Intelligence Control and Image Engineering, Xidian University, Xi’an, China
Email: [email protected]
Baolong Guo and Wei Mo
Institute of Intelligence Control and Image Engineering, Xidian University, Xi’an, China
Email: {blguo, wmo}@xidian.edu.cn
Kefeng Fan
Guilin University of Electronic Technology, Guilin 541004, China
Email:[email protected]
Abstract — Advanced video coding standards have become
widely deployed in numerous products, such as multimedia
service, broadcasting, mobile television, video conferences,
surveillance systems and so on. New compression techniques
are gradually included in video coding standards so that a 50%
compression rate reduction is achievable every ten years.
However, dramatically increased computational complexity is
one of the many problems brought by the trend. With recent
advancement of VLSI (the Very Large Scale Integration)
semiconductor technology contributing to the emerging digital
multimedia word, this paper intends to investigate efficient
parallel architecture for the emerging high efficiency video
coding (HEVC) standard to speed up the intra coding process,
without any prediction modes ignored. Parallelism is achieved
by limiting the reference pixels of the 4 × 4 subblocks, allowing
the subblocks to use different direction modes to predict the
residuals. Experimental implementations of the proposed
algorithm are demonstrated by using a set of video test
sequences that are widely used and freely available. The results
show that the proposed algorithm can achieve a satisfying intra
parallelism without any significant performance lose.
Index Terms — HEVC, intra coding, parallel architecture,
multiple directions.
The purpose of these works is to design and evaluate the
performance of new methods to reduce encoder complexity,
while keeping the quality of reconstructed video sequences
for intra coding. The works generally fall into two
categories.
1. Fast mode decision approaches with early termination
using adaptive thresholds or optimized Lagrangian rate
distortion optimization (RDO) function [2-4].
2. Parallel architectures to speed up the intra prediction
process [5-14].
With recent advancement of VLSI (the Very Large Scale
Integration) semiconductor technology contributing to the
emerging digital multimedia word, research on parallel
architectures gets more attention. In this paper, we focus on
the second case, and present a block based parallel
architecture to speed up the intra prediction for HEVC.
The remaining parts of this paper are organized as follows:
Section II reviews the state of art within the field of parallel
architectures. Section III introduces the spatial prediction in
HEVC. Section IV presents the proposed scheme, including
2X parallel intra prediction and its expansion to 4X
parallelism. Experimental results are presented within
Section V. Finally, we conclude this paper in section VI.
II. RELATED WORK
I. INTRODUCTION
Continuous emergence of video coding standards and the
growth in development and implementation technology for
them have undoubtedly created a completely new world of
multimedia. So far, contributions to video coding
technology have mainly focused on improving coding
efficiency. The challenges remain: not only to find efficient
coding algorithms which require high performance but also
to speed up the coding process.
The ongoing video coding standard, High Efficiency
Video Coding (HEVC) [1], is getting more attention due to
its high compression efficiency. However, the
computational complexity of HEVC would be 2-10 times
higher than its counterpart, which is considered an obstacle
to implement it in real-time. Therefore, many research
works focus on how to reduce the computational complexity.
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.4.289-294
The main image and intra frame of video compression
extensively adopts the block-based structure from prediction
and transform to entropy coding, where the coding of one
block is dependent on the availability of its left, upper-left,
and upper-right blocks. Such a highly dependent structure is
not quite suitable for parallelization, especially for ASIC
(Application Specific Integrated Circuit) solutions. Even so,
when dual-core and quad-core computers are available,
there are still many efforts on parallelizing the encoding and
decoding from different aspects, as described below.
1. GOP (Group of Pictures) approaches: Barbosa [5] and
Vander [6] propose to partition a sequence into some GOPs.
The correlation between GOPs is low, and it can not only
limit error propagation, but also support parallel coding
processing. However, it needs to get the data of all the
pictures in a GOP before parallelism. When the GOP has too
290
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
many pictures, it will lead to serious delay, which is not
convenient for the real-time video coding applications.
2. Frame approaches: Chen [7] proposes to realize the
parallelism coding at the frame level. This approaches is
limited to the correlation between frames, therefore, the
speedup cannot be linearly increased corresponding to the
number of process cores.
3. Pipeline approaches: Gulati [8] and Klaus [9] propose
to organize the prediction, transform, and entropy coding of
macroblocks (MBs) as a pipeline and assign them to
multiple cores for parallel computing. This class of
approaches can achieve limited parallelisms if workloads are
unbalanced at different cores. Two times speedup is reported
for high definition sequences on general-purpose quad-core
computers in [9].
4. Slice partitioning (SP) approaches: Rodriguez [10]
proposes to partition an image into some regions that are
referred to as slices. The coding of slices can be carried out
independently by different cores. They provide good
parallelism but would result in a significant loss on coding
performance if there are too many slices.
5. MB-reordering (MR) approaches: Despite these efforts
on parallelizing existing coding algorithms, due to strong
dependency among blocks in intra prediction and the
filtering process of top/left reconstructed samples, the intra
luma prediction process of small blocks is a challenge for
parallelization. The original process handles blocks in serial,
which is not efficient. Efficient architectures have been
reported in [11-14], which propose to process MBs in the
wave front order so that MBs in each diagonal line can be
coded concurrently when neighboring MBs are available.
Owing to the fine-granularity parallelism at the MB level,
these approaches can achieve good parallelism and are more
widely used at present.
Huang's work [11] has bubbles between Intra 4 × 4
predictions because of the low throughput of reconstruction
process so that the prediction has to wait for the completion
of reconstruction. Lee's work [12] perfectly pipelines the
intra prediction and reconstruction process; however, it
requires that both intra prediction and reconstruction have
exact equal processing cycles. It also reduces some
prediction modes in some blocks in order to enforce
pipelining; hence, the video quality is degraded. Jin's work
[13] proposes both partially and fully pipelined architectures
for intra 4 × 4 prediction and has the same drawback as the
approach in [12]. Moreover, the architectures add
dependency graph process in order to improve gains;
however, this increases hardware overhead. It takes 25
cycles to process each block, which is too long for high
throughput reconstruction. Similar to Huang's work, Suh's
work [14] proposes an efficient parallel architecture
followed by a redundancy reduction algorithm to speed up
the intra 4 × 4 prediction. However, the approaches above
all have drawbacks either with the pipelining architecture or
in compression gains.
In this paper, we propose a block-based intra prediction
parallel algorithm to solve the problem above.
joint proposal to the high efficiency video coding
standardization effort have been partially adopted into the
HEVC Test Model (HM). The major improvements for intra
coding come from two tools described below.
A. Intra Prediction
The nine intra prediction modes supported in H.264/AVC
with different directionalities is not flexible enough to
represent complex structures or image segments with
different directionalities. HEVC extends the set of
directional prediction modes of H.264/AVC, providing
increased flexibility and more accurate predictions for the
sample values. The increased prediction accuracy provides
significant reductions in residual energy of the intra coded
blocks and improvements in coding efficiency [15].
It can support from 16 to 34 prediction directions for
different prediction size, and the prediction can achieve 1/8
pixel accuracy. Table I shows a 64 × 64 prediction unit
contains different subblock size and its corresponding
prediction modes. The 34 directions of the intra prediction
modes are illustrated in Fig. 1.
TABLE I
A 64 ×64 PREDICTION UNIT CONTAINS DIFFERENT SUB-BLOCK
SIZE AND ITS CORRESPONDING PREDICTION MODES
Amount
Size
Modes
256
4×4
17
64
8×8
34
16
16×16
34
4
32×32
34
1
64×64
5
Figure 1.
34 Prediction modes for HEVC
III. SPATIAL PREDICTION IN HEVC
ISO-IEC/MPEG and ITU-T/VCEG recently formed the
joint collaborative team on video coding (JCT-VC). The
JCT-VC aims to develop the next-generation video coding
standard, called high efficiency video coding (HEVC). A
© 2012 ACADEMY PUBLISHER
It improves the coding efficiency significantly through
permitting block prediction in an arbitrary direction by
indicating the prediction angle.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
B. Quadtree-based coding structure
Coding efficiency can be significantly improved by
utilizing macroblock structures with sizes larger than 16 ×
16 pixels, especially at sequences with high resolutions [16].
To achieve a more flexible coding scheme, HEVC utilizes a
quadtree-based coding structure [17] with support for
macroblocks of size 64 × 64, 32 × 32, 16 × 16, 8 × 8, and 4
× 4 pixels. HEVC separately defines three block concepts:
coding unit (CU), prediction unit (PU) and transform unit
(TU). After the size of largest coding unit (LCU) and the
hierarchical depth of CU have been defined, the overall
structure of codec is characterized by the various sizes of
CU, PU and TU in a recursive manner. This allows the
codec to be readily adapted for various kinds of content,
applications, or devices that have different capabilities.
The CU splitting process is described in Fig. 2, using a
Split flag to ensure whether the current CU needs to split
into a smaller size or not.
2N0
2N1
2N2
2N3
CU0
CU1
CU2
CU3
Split flag=0
Split flag=0
Last depth:
No Split flag
Split flag=0
0
1
0
1
0
1
2
3
2
3
2
3
Split flag=1
Split flag=1
Split flag=1
CU Size=64
CU Depth=0
N0=32
CU Size=32
CU Depth=1
N1=16
CU Size=16
CU Depth=2
N2=8
CU Size=8
CU Depth=3
N3=4
The coding efficiency improvements are more visible at
higher resolutions, where bit rate reductions reach around
50% and 35% for the low delay and random access
experiments, respectively.
C. Observation and Motivation
Intra prediction is currently achieved by partitioning a
largest coding unit (LCU) into one or more blocks through a
recursive splitting process. Each block is predicted spatially
and subsequently refined, and the prediction is performed
sequentially using neighboring reconstructed blocks. The
prediction process requires that the causal neighbors of the
current block must be completely reconstructed before
processing the current block. It results in a set of serial
dependencies, and these serial dependencies result in
significant complexity for both the encoder and decoder
processes.
1
2
3
Figure 3. An 8 ×8 block with four 4 ×4 blocks
© 2012 ACADEMY PUBLISHER
Take an 8 × 8 block with four 4 × 4 blocks as an example
in Figure 3, where block0 is the first 4 × 4 block in decoding
order, Block1 is the second one and so forth. Intra predictors
for block0 are computed from the top and left neighboring
MBs. As one can see, there is a strong dependency
(serialization) in computation of intra predictors for general
4 × 4 intra mode. Indeed, the block1 depends on the
reconstruction samples of block0. The block2 in turn
depends on the reconstructed samples block0 and block1
and its left neighbor block. The block3 depends on the
reconstructed samples of block0, block1 and block2. This
dependency makes a parallelization of intra prediction to be
challenging especially for ASIC implementations. Therefore,
it is necessary to minimize dependencies and improve
performance respectively.
IV. BLOCK BASED PARALLEL INTRA PREDICTION
In HEVC, for the inter prediction has greater complexity,
however, it can realize motion estimation without referring
to the neighbor blocks, the parallelism can be easily realized,
While the intra prediction process requires that the causal
neighbors of the current block must be completely
reconstructed before processing the current block. It results
in a set of serial dependencies, which makes a
parallelization of intra prediction to be a challenge,
especially for the 4 × 4 intra prediction.
A. 2X parallel intra prediction
Figure 2. The CU splitting process
0
291
The parallel intra-prediction approach divides a coding
unit into two partitions: all blocks in the first partition are
predicted without reference to each other; similarly, all
blocks in the second partition are predicted with reference to
the first partition but also without reference to each other. It
consists of three steps:
1. Partitioning the coding units into two sets
2. Predicting the first set blocks (shaded blocks in Fig. 3)
in parallel using the already reconstructed block neighbors
3. Predicting the second set blocks (white blocks in Fig. 3)
in parallel using already reconstructed block neighbors and
also the neighbors from the first set blocks
Take an 8 × 8 block with four 4 × 4 blocks in Fig. 3 as an
example, block0 and block1 are in the first set, while block2
and block3 are in the second set. With the partition above,
then we define that all blocks in a single partition be
processed in parallel. It means that the blocks in the first
partition are predicted without reference to each other (or
blocks in the second partition). Blocks in the second
partition are also predicted without reference to other blocks
in the second partition. In all cases, a block could use the
pixel values from neighboring coding unit for intra
prediction.
Fig. 4 shows the difference between HEVC and the
proposed scheme. To predict the first set of blocks, we use
pixel values from the upper and left coding unit’s
boundaries. For example, to predict block1 in Fig. 4 B, we
used reconstructed pixel values from the dark gray pixels.
Its left reference pixels are different from that in HEVC
showed in Fig. 4 A. It utilizes the already reconstructed
pixel values of the left coding unit’s boundaries. Therefore,
block1 and block0 can be predicted in parallel.
Following the prediction of first set of blocks, we then
add any transmitted residual to the predicted value to
generate a reconstructed first set of blocks. After
292
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
reconstructing the first set of blocks, we proceed to predict
the second set of blocks. As mentioned before, these blocks
are predicted in parallel and use the reconstructed pixel
values from blocks in the first partition and the left coding
unit’s boundaries, as showed in Fig. 4 D.
In the proposed scheme, none of the prediction modes are
ignored. That means it support all the prediction directions
mentioned in Part A, Section III. Therefore, it can realize the
parallism without significant performance loss.
0
1
0
1
2
3
2
3
and are simply predicted using their 8 × 8 block neighbors,
as shown in Fig. 6. However, each 4 × 4 block can have its
own predict mode, residual and transform. In this case, the
distances between the referenced pixels and predicting
pixels increase, therefore it will lead to a bit rate increase, as
will be seen in the results later.
Figure 6.
A. Block 1's reference pixelsin
HEVC
B. Block 0 and Block 1's reference
pixels in 2X parallelism
0
1
0
1
2
3
2
3
C. Block 3's reference pixelsin
HEVC
Figure 4.
D. Block 2 and Block 3's
reference pixelsin 2X parallelism
The blocks’ reference pixels
By employing the partitioning strategy, we can achieve a
direct increase in parallelism. This parallelism is shown
graphically in Fig. 5. The 4 × 4 intra prediction within an 8
× 8 coding unit requires 4 sequential steps while our
partitioning approach results in only two sequential steps.
This is an increase in parallelism by a factor of 2X. In
general, the increase in parallelism is N/2, where N is the
number of blocks within the macro-block.
0
1
2
3
The blocks’ reference pixels for 4X parallelism
V. EXPERIMENTAL RESULTS
The parallel intra prediction technology has been
integrated into the HEVC reference software HM3.0 [18]
and tested the recommended configuration on 18 sequences
listed in Table II.
The test sequences include two 4K sequences, five 1080p
sequences, four WVGA sequences, four WQVGA
sequences, and three 720P sequences, which are widely used
in various applications.
We use two common test conditions described in [19],
Intra (high efficiency condition) and Intra LoCo (low
complexity condition).
The common test conditions are set as follows.
1) 10 seconds length are encoded as intra frames for each
sequence.
2) The QP is set at 22, 27, 32, and 37.
3) In-loop deblocking filter and RDO are enabled.
TABLE II
COMPARISON OF THE PERFORMANCE FOR 1DDCT AND 2DDCT
Predicted block
Reconstructed Block
Class A
4K
Traffic
PeopleOn
Street
(a). Traditional prediction structure
Class B
1080p
Class C
WVGA
Class D
WQVGA
Class E
720P
Kimono
BasketballDrill
BasketballPass
Vidyo1
ParkScene
BQMall
BQSquare
Vidyo3
Cactus
PartyScene
BlowingBubbles
Vidyo4
RaceHorses
RaceHorses
BasketballDrive
BQTerrace
(b). 2X parallelism prediction structure
Figure 5.
Processing order for the 4×4 blocks within an 8×8 intra
predicted unit
B. 4X parallel intra prediction
The method can also be extended the parallelism to be 4X.
In this extension, all of four blocks are treated as the first set
blocks. They are restricted to have the same prediction mode,
© 2012 ACADEMY PUBLISHER
Table III below shows the summary results of 2X parallel
intra prediction vs. HM3.0. Table IV shows summary results
of 4X parallel intra prediction vs. HM3.0. Y BD-rate means
the BD-rate increase for lumu component, while U BD-rate
and V BD-rate for the chroma component. Compare Table
III with Table IV, we can find that the coding performance
of 4X parallism is not as good as 2X parallism. For the 4X
parallism case, the distances between the referenced pixels
and predicting pixels increase, and that reduces the
prediction accuracy, which results in the performance lose.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
293
TABLE III
RD RESULTS OF 2X PARALLEL INTRA PREDICTION WITH VS. HM3.0
Intra
Intra LoCo
Y BDrate %
U BDrate %
V BDrate %
Y BDrate %
U BDrate %
V BDrate %
Class A
0.6
0.1
0
0.5
0.4
0.4
Class B
0.7
0.6
-0.3
0.7
1
0.5
Class C
1.5
1.1
0.8
1.5
1.5
1.3
Class D
1.8
1.4
0.6
1.9
0.6
2.3
Class E
0.7
1.2
-0.5
1
0.8
0.9
All
1.1
0.8
0.2
1.1
0.9
1.1
Figure 8.
The comparison of RD performance
(RaceHorses, HE)
TABLE IV
VI. CONCLUSIONS
RD RESULTS OF 4X PARALLEL INTRA PREDICTION WITH VS. HM3.0
Intra
Intra LoCo
Y BDrate %
U BDrate %
V BDrate%
Y BDrate%
U BDrate %
V BDrate %
Class A
1.1
-0.6
-0.7
1.1
0.3
0.9
Class B
1.2
0.6
0
1.3
0.8
1.1
Class C
2.7
1.9
1
3.3
2.5
2.3
Class D
3.5
1.4
1.3
3.5
2.5
2.1
Class E
1.6
1
0.7
2.1
1.8
0.5
All
2
0.9
0.4
2.2
1.5
1.4
Two example of RD curves are given in Fig. 7
(BasketballPass, LoCo) and Fig. 8 (RaceHorses, HE),
showing the proposed performance at four different bit rates.
It is observed that the proposed method can achieve the
parallism without significant RD performance lose in both
low bit rate and high bit rate.
Due to the various special applications of intra coding
frame such as random access point and refresh synchronous
frame, intra coding is an important part of video coding. For
the intra coding blocks are predicated dependent on the
availability of the adjacent blocks in the former rows and
columns, so the parallel coding or decoding process cannot
be realized among the intra coding blocks, which increases
the difficulties for the implementation of real-time coding in
high definition and ultra high definition applications.
Therefore, how to increase the capacity of parallel
computing effectively has considerable value.
In this paper, a parallel prediction scheme for HEVC is
proposed. The parallel prediction unit supports the
parallelization of intra-coded blocks within the current
coding unit in a two-step parallel process. Parallelism is
achieved by limiting the reference pixels of the 4 × 4
subblocks, without any prediction modes ignored.
Experimental results show that the increased parallelization
comes with small losses in coding efficiency. For example,
about 1% for HD sequences of 2X parallelism. We assert
that this loss is negligible and well justified by the
parallelization capability.
We described 2X and 4X parallelism in detail, and in fact,
the parallelism can also be expanded to higher parallelism. It
can flexibly fit for the various applications.
ACKNOWLEDGMENT
This work is supported by the National Natural Science
Foundation of China (No. 60802077; No. 61003196; No.
61172053)
REFERENCES
Figure 7.
The comparison of RD performance
(BasketballPass, LoCo)
© 2012 ACADEMY PUBLISHER
[1] K. Ugur, K. Andersson and A. Fuldseth et al. ―High
Performance, Low Complexity Video Coding and the
Emerging HEVC Standard‖, IEEE Transactions on Circuits
and Systems for Video Technology, 2010, 20(12), pp: 16881697.
[2] G. Sullivan, P. Topiwala, and A. Luthra, "Fast 4x4 Intra
prediction Based on The Most Probable Mode in
H.264/AVC," IEICE Electronics Express, 2008, 5(19), pp.
782-788.
294
[3] Yinji Piao, Junghye Min, and Jianle Chen et al, ―Encoder
improvement of unified intra prediction,‖ JCTVC-C207,
Guangzhou, Oct. 2010.
[4] Liang Zhao, Li Zhang, and Xin Zhao, ―Further Encoder
Improvement of intra mode decision,‖ JCTVC-D283, Daegu,
Jan. 2011.
[5] Denilson M.Barbosa, Joao Paulo Kitajama and Wagner Meira
JR, ―Parallelizing MPEG Video Encoding using
Multiprocessors‖, Proceedings of the XII Brazilian
Symposium on Computer Graphics and Image Processing,
1999, pp: 215-222.
[6] E. B. vander, E. G. T. Jaspers, R. H. Gelderblom, ―Mapping
of H.264 Decoding on a Multiprocessor Architecture‖, SPIE
Conf. on Image and Video Communications and Processing,
2003. 5(7), pp: 707-718.
[7] Y. Chen, E. Li and X. Zhou, ―Implementation of H.264
encoder and decoder on personal computers‖, J. Vis. Commun.
Image Representation, 2006, 17(2), pp: 509–532.
[8] A. Gulati and G. Campbell, ―Efficient mapping of the H.264
encoding algorithm onto multi-processor DSPs‖, in Proc.
SPIE - IS&T Electronic Imaging, 2005, pp: 94–103.
[9] O. L. Klaus Schoffman, M. Fauster, and L. Boszormeny, ―An
evaluation of parallelization concepts for baseline profile
compliant H.264/AVC decoders‖, in Proc. Euro-Par—Parallel
Processing, 2007, Lecture Notes in Computer Science, pp:
782–791.
[10] A. Rodriguez, A. Gonzalez and M. P. Malumbres,
―Hierarchical parallelization of an H.264/AVC video
encoder‖, in Proc. Int. Symp. Parallel Comput. Elect. Eng.,
2006, pp: 363–368.
[11] Y. W. Huang, B. Y. Hsieh, and T. C. Chen, "Analysis Fast
Algorithm and VLSI Architecture Design for H.264/AVC
Intra Frame Coder," IEEE Trans. Circuit and Systems for
Video Technology, 2005, 15(3), pp. 378-401, Mar. 2005.
[12] W. Lee, S. Lee, and J. Kim, "Pipelined Intra Prediction Using
Shuffled Encoding Order for H.264/AVC," TENCON 2006,
pp. 14-17, Nov. 2006.
[13] G. Jin and H. J. Lee, "A Parallel and Pipelined Execution of
H.264/AVC Intra Prediction," IEEE International Conference
on Computer and Information Technology, CIT'06, pp. 246250, Sep. 2006.
[14] K. Suh, S. Park and H. Cho, "An Efficient Hardware
Architecture of Intra Prediction and TQ/IQIT Module for
H.264 Encoder," ETRI Journal, vol. 27, no. 5, pp. 511-524,
Oct. 2005.
[15] K. McCann, W.-J. Han, and I.-K. Kim, et al, "Video Coding
Technology Proposal by Samsung (and BBC)". JCTVC-A124,
Dresden, Germany, Apr. 2010.
[16] S. Ma and C.-C. J. Kuo, ―High-definition video coding with
super-macroblocks,‖ Proc. SPIE, vol. 6508, part 1, 650816,
Jan.2007.
[17] F. Bossen, V. Drugeon and E. Francois, ―Video Coding Using
a Simplified Block Structure and Advanced Coding
Techniques‖, IEEE Transactions on Circuits and Systems for
Video Technology, 2010, 20(12), pp: 1667-1675.
[18] HEVC Reference Software HM-3.0[CP].
https://hevc.hhi.fraunhofer.de/svn/svn_TMuCSoftware/tags/H
M-3.0.
[19] F. Bossen, ―Common test conditions and software reference
configurations,‖ JCTVC-E700, Geneva, Switzerland, Mar.
2011.
Jie Jiang received the B.S. and M.S. degree both in signal and
information processing from Xidian University, Xi’an, China, in
2005 and 2008, respectively. She is currently working toward the
Ph.D. degree with the Institute of Intelligent Control and Image
Engineering (ICIE), Xidian University. Her research interests
include video coding and communication.
© 2012 ACADEMY PUBLISHER
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Baolong Guo received his B.S., M. S. and Ph.D. degrees from
Xidian University in 1984, 1988 and 1995, respectively, all in
communication and electronic system. From 1998 to 1999, he was
a visiting scientist at Doshisha University, Japan. He is currently a
full professor with the Institute of Intelligent Control and Image
Engineering (ICIE) at Xidian University. His research interests
include intelligent information processing, image processing and
communication.
Wei Mo was born in Guangxi Province, China, in 1956. He is a
professor, Ph.D. supervisor at Xidian University. His research
interests include multimedia coding and intelligent information
processing.
Fan Kefeng completed his Postdoctoral Research in the State
Key Laboratory of Networking and Switching Technology,
Beijing University of Posts and Telecommunications,
Beijing, China. His research interests include 3D TV, Smart
TV, DRM, digital interface, et al.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
295
Optimized LSB Matching Steganography Based on
Fisher Information
Yi-feng Sun
Zhengzhou Information Science and Technology Institute, Zhengzhou , China
Email: [email protected]
Dan-mei Niu
Electronic Information Engineering College, Henan University of Science and Technology, Luoyang, China
Email: [email protected]
Guang-ming Tang and Zhan-zhan Gao
Zhengzhou Information Science and Technology Institute, Zhengzhou , China
Email: [email protected] [email protected]
Abstract—This paper proposes an optimized LSB matching
steganography based on Fisher Information. The
embedding algorithm is designed to solve the optimization
problem, in which Fisher information is the objective
function and embedding transferring probabilities are
variables to be optimized. Fisher information is the
quadratic function of the embedding transferring
probabilities, and the coefficients of quadratic term are
determined by the joint probability distribution of cover
elements. By modeling the groups of elements in a cover
image as Gaussian mixture model, the joint probability
distribution of cover elements for each cover image is
obtained by estimating the parameters of Gaussian mixture
distribution. For each sub-Gaussian distribution in
Gaussian mixture distribution, the quadratic term
coefficients of Fisher information are calculated, and the
optimized embedding transferring probabilities are solved
by quadratic programming. By maximum posteriori
probability principle, cover pixels are classified as the
categories corresponding to sub-Gaussian distributions. At
last, in order to embed message bits, pixels chose to add or
subtract one according to the optimized transferring
probabilities of the category. The experiments show that the
security performance of this new algorithm is better than
the existing LSB matching.
Index Terms—steganography, security, information
hiding, quadratic programming, Fisher information
I. INTRODUCTION
The aim of steganography is to hide secret message
imperceptibly into a cover, so that the presence of hidden
data cannot be diagnosed. But steganogrpahy faces the
threaten of steganalysis. Steganalysis aims to expose the
presence of hidden data. How to improve the security of
steganography is an important problem.
In Least Significant Bit (LSB) replacement algorithm,
Manuscript received January 1, 2012; revised June 1, 2012; accepted
July 1, 2012.
National Nature Science Foundation of China (Grant No.60970141,
60903220) corresponding author: Yi-feng Sun, [email protected].
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.4.295-302
the LSBs of cover elements are replaced with message
bits. Some structural asymmetry (never decreasing even
pixels and increasing odd pixels when hiding message bit)
is introduced. It is easy to detect the existence of hidden
message. A trivial modification of LSB replacement is
LSB matching, which randomly increases or decreases
pixel values by one to match the LSBs with the message
bits. LSB matching is much harder to detect than LSB
replacement algorithm.
The embedding in LSB matching is similar to adding
an independent and identically distributed (IID) noise
sequence independent of cover. It is well known that
values of neighboring pixels in natural images are not
independent. Further more, there are complex
dependences in the noise component of neighboring
pixels [1]. These dependences are violated by LSB
matching. Many steganalysis methods utilized this fact [1]
[2]. The LSB matching also needs to be improved.
There are two kinds of approaches to improve
steganography. The first approach is to preserve a chosen
cover model in steganographic methods, such as modelbased (MB) steganography[3][4], OutGuess[5], the
statistical restoration based steganographic algorithms
[6][7], which preserve the first and second order statistics.
Another strategy is to minimize a heuristically-defined
embedding distortion. The steganography using tri-way
pixel-value differencing [8] reduces the quality distortion
of stego-image. Matrix embedding methods [9][10][11]
minimize the change number of cover elements by linear
codes, and the change number is predefined as
embedding distortion. Many other syndrome codes
steganographic methods, such as steganography using
wet paper codes[12], steganographic algorithms using
syndrome trellis codes [13][14][15][16], minimize the
more complex embedding distortion. According to
experiments, the above steganographic methods have the
better performance.
In this paper, we propose an optimized LSB matching
steganography based on Fisher information. It can
296
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
demonstrate the security improvement in the sense of
Kullback Leibler (KL) divergence. Firstly, we introduce
the relation between KL divergence and Fisher
information, and explain why Fisher information can be
used to improve steganographic security. Then we
present an embedding optimization framework based on
Fisher information. In the framework, an embedding
algorithm is designed to solve the optimization problem
whose objective is minimizing the Fisher Information.
The optimized LSB matching algorithm is an instance of
the framework. We assume that the groups of pixels in
the cover image submit Gaussian mixture distributions.
After obtaining the Gaussian mixture distribution of pixel
group for each cover image, the optimized embedding
transferring probabilities are solved by quadratic
programming for each sub-Gaussian distribution. Cover
pixels are classified as the categories corresponding to
sub-Gaussian distributions. The embedding is operated by
the optimized embedding transferring probabilities. The
experimental results show that the optimized LSB
matching steganography is better than the existing LSB
matching in the aspect of security performance.
KL divergence between cover and stego distributions
can be used to measure the stganographic security [17].
Let the distribution of cover objects be P0 , and the
distribution of stego objects be Pλ , where the parameter λ
denotes the relative payload, and 0 ≤ λ ≤ 1 . The KL
divergence between P0
and Pλ is denoted
as D( P0 || Pλ ) .The smaller the D( P0 || Pλ ) is, the more
secure the steganography will be. If D( P0 || Pλ ) is equal
to 0, stego-system is perfectly secure.
The physical interpretation of cover objects is not
discussed in [17]. We think that each statistic from a
cover can represent the cover object. Here the elements
sequence of cover, such as the sequence of the pixel value,
or the sequence of Discrete Cosine Transform (DCT)
coefficients, is selected as the presentation of the cover
object. The elements sequence is denote by
C = (C1 , C2 , ⋅⋅⋅, CN ) , where N is the number of the
elements that can be used to embed secret information in
a multimedia cover. In the following, we use
P0( N ) and Pλ( N ) instead of P0 and Pλ . It emphasizes that P0( N )
and Pλ( N ) is N-dimensional joint probability distribution.
Assuming that P0( N ) is known and embedding function
is fixed, KL divergence D( P0( N ) || Pλ( N ) ) is the function
of λ . KL divergence can also be denoted as d N (λ ) . The
second derivative of d N (λ ) to λ (when λ = 0 ) is known
as steganographic Fisher Information (FI) [18]. It is
denoted by d N′′ (0) . When λ → 0 ,
© 2012 ACADEMY PUBLISHER
corresponds to an embedding method. The probability
bxy is also called as the embedding transferring
probability. For embedding matrix B , Fisher information
is calculated [20]:
d N′′ (0) =
∑
y∈χ N
A(y ) 2
− N2 ,
P (C = y )
(N )
0
(2)
n
A(y ) = ∑∑ [ P0( N ) (C = y[u / yi ]) ⋅ buyi ] .
(3)
Here y = ( y1 , y2 ,L , yN ) ∈ χ N denotes the value of
INFORMATION
1
d N′′ (0) ⋅ λ 2 .
2
is the set of cover and stego elements. The
matrix B = (bxy ) x∈χ , y∈χ , also named as embedding matrix,
i =1 u∈χ
II. STEGANOGRAPHIC SECURITY AND FISHER
D( P0( N ) || Pλ( N ) ) ≈
When relative payload λ is fixed, Fisher Information is
smaller, and KL divergence will be smaller. Thus Fisher
Information can be used for evaluating steganographic
security.
Assuming that embedding operations are mutually
independent (MI) [19], Fisher Information can be
calculated. The probability of cover element being x and
stego element being y is denoted by the conditional
probability P ( S k = y | Ck = x) = bxy , x ∈ χ , y ∈ χ , where χ
(1)
C = (C1 , C2 , ⋅⋅⋅, CN ) , where χ N = χ × χ × L × χ . Here
y[u yi ] denotes the sequence ( y1 ,L , yi −1 , u, yi ,L , y N ) .
It means that u replaces the item yi in y .
The computational complexity of Fisher information is
decided by N. We call d N′′ (0) N-dimensional Fisher
Information. We often simplify the model of the covers in
order to calculate KL divergence. For example, the
elements of cover objects are IID. But it ignores the
correlation among the elements of cover objects. Hence,
we suppose that the cover is composed of IID element
groups. The elements in a group are correlated, and the
number of elements in a group is n. If a group is
composed of two elements, we can take the impact of
second-order correlation into account. If it includes more
elements, we can analyze the influence of "higher order"
correlation. For cover C = (C1 , C2 , ⋅⋅⋅, CN ) , starting from
the first element C1 , every n adjacent elements are
divided into a group, whose n-dimensional joint
distribution is denoted by P0( n ) . If P0( n ) replaces P0( N ) , and n
replaces N in (2) and (3), we can obtain the calculation
formula about n-dimensional Fisher information
corresponding to the element groups. The n-dimensional
Fisher information is denoted by d n′′(0) . In the following,
we focus on decreasing the value of d n′′(0) to improve the
security of embedding method.
III. THE EMBEDDING OPTIMIZATION FRAMEWORK BASED
ON FISHER INFORMATION
In order to improve the steganographic security, we
need to find the embedding algorithm whose B matrix
corresponds to the small Fisher information. In general,
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
297
the embedding matrix B = (bxy ) x∈χ , y∈χ is usually constant
for each cover. For example, in LSB matching algorithm,
if the message bit does not match the LSB of the cover
pixel, one is randomly either added or subtracted from the
value
of
the
cover
pixel.
That
is
to
say bxx +1 = bxx −1 = 0.25 and bxx = 0.5 for each cover. Here
we think the probability bxy is variable, and we adjust
bxy to minimize d n′′(0) for each cover, assuming that
P0( n ) can be obtained according to one cover.
B = (bxy ) x∈χ , y∈χ
So
can be adapted according to the cover
statistical properties, and we can get optimized
embedding algorithm.
Here d n′′(0) is determined by the n-dimensional joint
probability distribution P0( n ) and embedding matrix
B = (bxy ) x∈χ , y∈χ . If bxy , x ∈ χ , y ∈ χ is viewed as variables,
Fisher information is the function of bxy . To improve the
steganographic security, we need solve the optimization
problem in which the Fisher information d n′′(0) is the
objective function. Here n is constant, and the
optimization problem is
B optimized = arg min
B = ( bxy )
A(y ) 2
,
(n)
0 (C = y )
∑P
y∈χ n
(4)
with the constraints
⎧∑ bxy = 1, x ∈ χ
⎪ y∈χ
.
⎨
⎪⎩bxy ≥ 0
We expand the formula as follows:
∑P
y∈χ n
(5)
IV. OPTIMIZED LSB MATCHING EMBEDDING EALGORITHM
IN SPATIAL DOMAIN
In this paper, we optimize the LSB matching algorithm
in image spatial domain. To get the optimized bxy , we
must know the n-dimensional joint distribution P0( n ) of
the pixel group made of adjacent pixels. The distribution
of pixel group in spatial domain is complex. But
Gaussian mixture model (GMM) can describe many
complex distributions approximately [21]. We take GMM
as the distribution model of the n-dimensional pixel
group in spatial domain.
GMM is the sum of weighted sub-Gaussian
distributions. It can fit very complex distribution by
adjusting the parameters of GMM. The small local areas
of an image often correspond to a particular object or
belong to the same background. Therefore, we can think
that the pixel groups in a local area have the same
distribution characteristics. Sub-Gaussian distributions in
GMM can reflect these local characteristics. The pixel
groups in an image are instances of these sub-Gaussian
distributions in GMM.
For GMM composed of K sub-Gaussian distributions,
pixel groups can be divided into K categories, and each
sub-Gaussian distribution associates to a separate
category. Different pixel groups belonging to different
categories have different distribution characteristics. The
embedding matrix B should be optimized according to
the local distribution properties of the cover image.
Denote the optimized embedding matrix corresponding to
k
k sub-Gaussian distribution as B optimized
, where
k = 1, 2,L , K . For a pixel which is selected to embedding
message bit, if it belongs to the category of k sub-
A(y ) 2
(C = y )
( n)
0
n
n
= ∑∑∑∑ ∑
P0( n ) (C = y[u / yi ]) P0( n ) (C = y[v / y j ])
u∈χ v∈χ i =1 j =1 y∈χ n
P0n (C = y )
buyi bvy j
⎛
⎞
(6)
⎜ n n
P0( n ) (C = y[u / yi , b / y j ]) P0( n ) (C = y[a / yi , v / y j ]) ⎟
⎟ bua bvb
= ∑∑∑∑ ⎜ ∑∑ ∑
P0( n ) (C = y[a / yi , b / y j ])
⎟
u∈χ v∈χ a∈χ b∈χ ⎜ i =1 j =1, yh ∈χ
b≠a ⎜
j ≠ i (h =1,⋅⋅⋅, n )
⎟
( h≠i, j )
⎝
⎠
⎫
⎧
P0( n ) (C = y[u / yi , a / y j ]) P0( n ) (C = y[ a / yi , v / y j ]) n
P0( n ) (C = y[u / yi ]) P0( n ) (C = y[v / yi ]) ⎪⎪
⎪⎪ n n
+ ∑∑∑ ⎨∑∑ ∑
+∑ ∑
⎬ bua bva
P0( n ) (C = y[ a / yi , a / y j ])
P0( n ) (C = y[a / yi ])
u∈χ v∈χ a∈χ ⎪ i =1 j =1 yh ∈χ
i =1 yh ∈χ
⎪
≠
j i (h =1,⋅⋅⋅, n )
(h =1,⋅⋅⋅, n )
⎪⎭
⎪⎩
( h≠i )
(h ≠i, j )
Here y[a / yi , b / y j ] denotes the sequence
( y1 ,L , yi −1 , a, yi +1 ,L y j −1 , b, y j +1 ,L , yn ) .
From (6), we can find that Fisher information d n′′(0) is a
quadratic function about the probability bxy . Searching
bxy which minimizes Fisher information is a quadratic
programming problem.
© 2012 ACADEMY PUBLISHER
Gaussian distribution, we should modify it according
k
to B optimized
.
The whole framework of the optimized embedding
algorithm is shown in Figure 1. Firstly, we estimate the
GMM parameters for the pixel groups in a cover image.
Secondly, we obtain the optimized embedding matrix for
each sub-Gaussian distribution in GMM. Then we should
judge the sub-Gaussian distribution category of the pixels
which is selected to embedding message bit. At last, if the
LSBs of those pixels don’t equal to message bits, the
298
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
pixels should add or subtract 1 according to the
transferring probabilities indicated by the optimized
embedding matrix corresponding to each category.
i = 1, ⋅⋅⋅, N1 − 1
and
j = 1, ⋅⋅⋅, N 2 − 1 , there are
( N1 − 1)( N 2 − 1) samples in total; at last, every two
adjacent pixels in the vice diagonal direction
Estimate the GMM of pixel group
According to
the 1th subGaussian
distribution,
obtain the
optimized
embedding
matrix of the
first category
Cover image
Stego-key K1
Select pixels to
embed
message bits
According to
the 2th subGaussian
distribution,
obtain the
optimized
embedding
matrix of the
second category
Judge the
category of the
pixel
…
According to
the kth subGaussian
distribution,
obtain the
optimized
embedding
matrix of the
kth category
Stego image
Add or subtract 1 according to transition
probabilities indicated by optimized embedding
matrix corresponding to the category
10100100110100
Secret message
Figure 1.
The framework of the optimized LSB matching algorithm
(Ci +1, j , Ci , j +1 ) also constitute ( N1 − 1)( N 2 − 1) samples.
A. Estimating GMM for each cover image
For pixel groups y = ( y1 , y2 ,L , yn ) in a cover image,
we assume that they submit to the following Gaussian
mixture distribution
K
P0 (y ) = ∑ π k P0, k (y ) ,
(7)
k =1
P0, k (y ) =
1
Rk
(2π ) n 2
−1 2
⎧ 1
⎫
exp ⎨− (y − mk ) Rk−1 (y − mk )T ⎬ (8)
⎩ 2
⎭
where K is the amount of sub-Gaussian distributions in
GMM. Here mk and Rk represent the mean vector and
covariance matrix of sub-Gaussian distribution
respectively, and π k denote the weights of the subGaussian distributions, where k = 1, 2, ⋅⋅⋅, K . In the
following, we discuss how to estimate the parameters
{mk , Rk , π k }k =1,2,L, K for each cover image.
The pixel group is made up of two adjacent pixels in
order to reduce computational complexity in our scheme.
But in a cover image, the horizontal adjacent pixels, the
vertically adjacent pixels, and the adjacent pixels in main
diagonal direction or in vice diagonal direction reflect
different second-order statistical properties. In order to
embody these second-order properties in GMM, we get
pixel group samples as follows: Assume that an image C
has N1 × N 2 pixels. Firstly, from the pixel in the first line
and the first column of the image, every two horizontal
adjacent pixels (Ci , j , Ci , j +1 ) constitute a pixel group
sample, where i = 1, ⋅⋅⋅, N1 and j = 1, ⋅⋅⋅, N 2 − 1 , and there
are a total of N1 × ( N 2 − 1) samples; secondly, every two
vertical adjacent pixels (Ci , j , Ci +1, j ) constitute a sample,
and the number of these samples is ( N1 − 1) × N 2 ; Then,
every two adjacent pixels in the main diagonal
direction (Ci , j , Ci +1, j +1 ) constitute a sample, where
© 2012 ACADEMY PUBLISHER
With the above 4( N1 − 1)( N 2 − 1) + ( N1 − 1) + ( N 2 − 1)
samples, we use the MDL algorithm of Purdue University
[21] to estimate the parameters of GMM. The MDL
algorithm is an extension of Expectation Maximum (EM)
algorithm. Please refer to the literature [22].
Note that GMM is the continuous distribution. We
need the discrete distribution in optimization problem (4).
In order to reduce computation complexity, the values of
the sub-Gaussian probability density p0, k (y ) replace the
discrete values directly.
B. Obtaining the optimized LSB matching embedding
matrix for each sub-Gaussian distribution
Here we embed information into spatial pixel groups in
grayscale image, then χ = {0,1,L , 255} . The embedding
matrix of LSB matching is adjusted as follows:
Use x , x = 1, 2,L , 254 , to represent the value of the
selected cover pixel. If the secret message bit is different
with the LSB of x , and x is not equal to 0 or 255, we add
1 with the probability bx , x +1 , or subtract 1 with the
probability bx , x −1 . The probabilities bx , x +1 and bx , x −1 are the
variables in optimization problem. There are 254 × 2
unknown variables in all. We suppose that the secret
message bits have been encrypted. They are pseudorandom bit streams. We think that half of pixels’ LSBs
are equal to the message bits. So bx , x = 0.5 , and the new
LSB matching embedding matrix becomes
⎛ 0.5 0.5 0
⎜
⎜ b1,0 0.5 b1,2
⎜ 0 b2,1 0.5
⎜
B=⎜L L L
⎜ 0
0
0
⎜
0
0
0
⎜
⎜ 0
0
0
⎝
L
L
L
0
0
0
L
L
L 0.5
L b254,253
0
L
0
0
0
L
b253,254
0.5
0.5
⎞
⎟
⎟
⎟
⎟
L ⎟.
0 ⎟
⎟
b254,255 ⎟
0.5 ⎟⎠
0
0
0
(9)
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
299
For each sub-Gaussian distribution of GMM, we need
to obtain the corresponding optimized LSB matching
k
embedding matrix, denoted as B optimized
, k = 1,L , K .
pixel, we need judging the category to which the pixel
belongs at first. The embedding matrix of the kth
k
category is B optimized
. We add or subtract 1 by the
Here the pixel group is made up of two adjacent pixels,
and the optimization problem is simplified as
k
.
probability bx , x −1 or bx , x +1 in B optimized
k
B optimized
= arg min f k (B) ,
B = ( bxy )
(10)
f k (B )
⎛ 2 2
⎞
P0, k ( yi = u , y j = b]) P0, k ( yi = a, y j = v]) ⎟
= ∑∑∑∑ ⎜ ∑∑
b b
⎜
⎟⎟ ua vb
P0, k ( yi = a, y j = b)
u∈χ v∈χ a∈χ b∈χ ⎜ i =1 j =1,
b≠ a ⎝
j ≠i
⎠
⎧2 2
P0,k ( yi = u, y j = a]) P0,k ( yi = a, y j = v])
⎪
+∑∑∑ ⎨∑∑
P0,k ( yi = a, y j = a)
u∈χ v∈χ a∈χ ⎪ i =1 j =1
⎩ j ≠i
(11)
2
P0,k ( yi = u, y j ])P0,k ( yi = v, y j ]) ⎫⎪
+∑ ∑
⎬ bua bva .
P0,k ( yi = a, y j )
i =1 y j ∈χ
⎪⎭
The constraints are
⎧bx , x +1 + bx , x −1 = 0.5, bx , x +1 ≥ 0, bx , x −1 ≥ 0, x ∈ χ , x ≠ 0, x ≠ 255
⎪
.
⎪bx , x = 0.5, x ∈ χ
⎨
b
0.5,
b
0.5
=
=
255,254
⎪ 0,1
⎪others b = 0
x, y
⎩
column induce of the pixel. Firstly, we constitute pixel
group by Ci , j and its adjacent pixels. For example,
constructing the pixel group (Ci , j , Ci , j +1 ) with the right
horizontal adjacent pixel Ci , j +1 . Note that if Ci , j is at the
right edge of the image, then we can use the left
horizontal adjacent pixel Ci , j −1 to constitute the pixel
group. Then based on the value of the pixel group
y = (Ci , j , Ci , j +1 ) or y = (Ci , j −1 , Ci , j ) , the posterior
probabilities of each sub-Gaussian distribution is
calculated as follows:
P(k | y ) =
P0, k (C = y )π k
K
∑P
k =1
0, k
(C = y )π k
, k = 1, 2,L , K .
(13)
(12)
This optimization problem is a quadratic programming.
The standard form of quadratic programming
1
is min xHx T , where x = {x1 , x2 ,L , xL } is a row vector,
x 2
and H represents the quadratic coefficient matrix. By
arranging matrix B into a row vector, we can get the
standard form. The elements of H are the corresponding
coefficients in (6).
The complexity degree of quadratic programming
depends on the element number of x = {x1 , x2 ,L , xL } .
The element number of H is the square of the element
number of x. If computer memory can not store matrix
H , we need to iteratively calculate. It is slow.
Here, though the amount of elements in matrix B
is 256 × 256 , many matrix elements are 0 in our algorithm
because bxy = 0 when | x − y |≥ 2 . If bxy is 0, its
corresponding quadratic coefficient has no effect on the
quadratic function. There are 766 × 766 nonzero elements
in H , which need to be stored in the computer memory.
They are bx , x −1 , bx , x +1 , where x = 1, 2,L , 254 , bx , x = 0.5 ,
where x = 0,1,L 255 , b0,1 = 0.5 and b255,254 = 0.5 . PC can
store them. So we use quadratic programming function in
Matlab software to solve the optimized transferring
probabilities bx , x −1 and bx , x +1 .
C. Judging the category of Pixels
In the existing LSB matching algorithm, if the message
bit does not match the LSB of the cover pixel, one is
randomly either added or subtracted from the value of the
cover pixel. In our optimized LSB matching algorithm, if
the message bit does not match the LSB of the cover
© 2012 ACADEMY PUBLISHER
Now we explain the method of judging the category
the pixels Ci , j belonging to, where i, j are the row and
The pixel Ci , j is categorized according to maximum
posteriori probability principle
koptimal = arg max P0, k (C = y )π k ,
k =1,2,L, K
(14)
where koptimal denotes the category of pixels Ci , j to be
determined.
D. Embedding and Extraction Algorithm
The embedding process of the optimized LSB
matching steganography is summarized as follows:
Input: cover image {Ci , j } which has N = N1 × N 2
pixels, M bits encrypted secret message, stego-key k1 .
Output: stego image {Si , j } .
Step(i): Model the two-dimensional probability
distribution of two spatial adjacent pixels as GMM, and
estimate the parameters of GMM according to the pixel
group samples from cover image {Ci , j } .
Step(ii): For each sub-Gaussian distribution P0, k (y ) of
GMM, work out the matrix H by calculating the
coefficients of nonzero bx , x −1 and bx , x +1 in objective
function f k (B) according to (11), then solve the
quadratic programming problem (10), and get the
k
optimized embedding matrix B optimized
corresponding to
each sub-Gaussian distribution.
Step(iii): Calculate the rate of pixels used for
embedding, which is denoted as λ = M / N . The rate is
equal to the relative payload in general sense. According
to λ and the stego-key k1 , select the set of image pixels
{Ci , j } to embed message bits;
300
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Step(iv): For each selected pixel Ci , j , if its LSB is
equal to the message bit, do not change it, otherwise, turn
to the next step;
Step(v): Judge whether the value of Ci , j is 0 or 255, if
so, replace the LSB of Ci , j with the message bit,
otherwise, turn to the next step;
Step(vi): Constitute the pixel group y with the right
horizontal adjacent Ci , j +1 or the left horizontal adjacent
pixel Ci , j −1 , and then determine the category koptimal of the
pixel group according to (14);
k
, then produce a pseudoStep(vii): Obtain bCoptimal
i , j , Ci , j +1
random number ni , j
ni , j ∈ (0, 2bC
koptimal
i , j ,Ci , j +1
) , then Si , j = Ci , j + 1 ; otherwise ,
Si , j = Ci , j − 1 .
koptimal
threshold 2bC
within the range (0,1) , if
i , j ,Ci , j +1
The
reason
koptimal
is that bC
i , j ,Ci , j +1
why
+ bC
koptimal
i , j ,Ci , j −1
selecting
the
= 0.5 .
The extraction embedding process of the optimized
LSB matching steganography is summarized as follows:
Input: stego-image {Si , j } which has N = N1 × N 2
pixels, the rate λ of pixels used for embedding, and
stego-key k1 .
Output: secret message.
Step(i): According to λ and the stego-key k1 , select
the set of image pixels {Ci , j } to embed message;
Step(ii): Select the LSB in each selected pixel Ci , j ,
and combine them into the secret message.
In summary, for a cover image, we should calculate K
optimized embedding matrices corresponding to each
sub-Gaussian distribution. Then the message bits are
embedded by adding or subtracting 1 according to the
probability indicated by the embedding matrix
corresponding to the category the pixel belonging to.
V. EXPERIMENT AND ANALYSIS
In this section, we present some experimental results to
demonstrate the effectiveness of our proposed optimized
LSB matching steganography compared with the existing
LSB matching methods. A common way of testing
steganographic schemes is to report the detection metric
of some steganalysis methods empirically estimated from
a database of cover and stego images where each stego
image carriers a fixed relative payload[16].
Here the cover image database consists of 1000 images
which were downloaded from USDA NRCS Photo
Gallery [23]. The images are of very high resolution TIF
files (mostly 2100×1500) and appear to be scanned from
a variety of film sources. For testing, the images were
resampled to 614×418 and converted to grayscale (The
tool used is Advanced Batch Converter 3.8.20, and the
selected interpolation filter is bilinear).
Firstly, the cover images were used to generate 3
groups of 1000 stego images of the existing LSB
matching with the relative payload λ = 1 , λ = 0.75 and
© 2012 ACADEMY PUBLISHER
λ = 1 , respectively. Then, another 3 groups of 1000
stego images of the optimized LSB matching were
generated with the relative payload λ = 1 , λ = 0.75 and
λ = 1 , respectively. The 2000 stego images of two kind
of steganography with the same relative payload and the
corresponding cover images were used to build a training
set and a test set. The training and the test sets were built
randomly, both containing 50% cover and 50% stego
images. The training sets are used to train some
steganalysis detector. The security of two kind
steganography with the same relative payload is
compared by the detection performance on test sets.
The steganalysis detector makes two types of errors either detect the cover image as stego (false alarm, or
false positive) or recognize the stego image as cover
(missed detection, or false negative). The corresponding
probabilities are denoted PFA and PMD. The receiver
operating characteristic (ROC) curve is obtained by
plotting 1-PMD(PFA) as a function of PFA. The ROC curve
can be reduced into a scalar detection measure called the
mimimum error probability:
1
PE = min ( PFA + PMD ( PFA ))
(15)
PFA 2
Both the ROC curve and the mimimum error probability
PE can be used as the detection performance metrics of
the steganalysis method for some steganography with the
fixed relative payload.
A. ROC curves metric
In the experiments, the first steganalytic detector [2]
adopts the Difference Characteristic Function Moments
as the features, and Fisher linear discriminator (FLD) as
the classifier. The detection performance for the existing
LSB matching is excellent [2].
We adopt ROC curves to evaluate the security
performances of two steganography methods. The ROC
curves of detector show how the detection probability
(true positive rate, the fraction of the stego images that
are correctly classified) and the false alarm probability
(false positive rate, the fraction of the cover images that
are misclassified as stego images) vary as detection
threshold is varied. The lower the curve is, the more
difficult the detector is. It means that the corresponding
steganography is more secure.
Figure 2 shows the ROC curves of the existing LSB
matching and the optimized LSB matching with the
relative payload λ = 1 . Figure 3 shows the ROC curves
with λ = 0.75 . Figure 4 shows the ROC curves
with λ = 0.5 . From the figures, the ROC curves of the
optimized LSB matching are higher than those of the
existing LSB matching. Thus the optimized LSB
matching is more secure than the existing LSB matching.
Further more, the distinctions between the ROC curves of
two steganography become small with the decrease of the
relative payload. It means that the improvement of
security performance is more obvious when λ is larger.
B. Minimum error probability metric
We also use the steganalysis detector in the literature[1]
to compare the security of two steganography methods.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
The steganalysis detector is famous in the realm of
steganalysis and steganography. We use the second-order
SPAM features with T=3 and first-order SPAM features
with T=4 respectively. There are 686 dimensions in the
second-order features, and 162 dimensions in the firstorder features. With regards to machine learning, we use
soft-margin SVMs with a Gaussian kernel of width γ .
Soft-margin SVMs penalize the error on the training set
through a hyper-parameter C . In this section, the
minimum error probability in (15) is used to evaluate the
security. The steganography, with larger PE , is more
secure than that with smaller PE .
1
0.9
0.8
True Positive Rate
0.7
0.6
0.5
0.4
0.3
0.2
existing LSB matching
optimized LSB matching
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
301
In the experiment, the parameters C = 300 and γ = 5 .
Note that we didn’t using a grid-search with five-fold
cross-validation on the training set to obtain the best
C and γ as in [1]. The reason is that we only need to find
out which steganography has larger PE . We needn’t the
best detector with the best C and γ . The grid-search is
time consuming. As a result, the values of PE in our
experiments is larger than that in the literature [1]. But it
is trivial to compare the security of two steganographic
methods.
Table I demonstrates the security performances of the
existing LSB matching steganography and the optimized
LSB matching steganography. From table I, with each
relative payload, the minimum error probability PE of the
optimized LSB matching is larger than that of the existing
LSB matching. So the optimized LSB matching is more
secure than the existing one. On the other hand, the
distinction of the minimum errors of two steganography
decreases when the relative payload become small. Thus
the larger the λ is, the more the security improvement of
the optimized LSB matching is. The results are consistent
with the results in V-A section.
1
False Positive Rate
Figure 2. The ROC curves for two kind LSB matching
steganography when the relative payload is 1
TABLE I. MINIMUM ERRORS OF SPAM STEAGANALYZERS
Relative
payload
1
λ
0.9
0.8
existing LSB
matching
Our optimized
LSB matching
Second-order with T=3
0.3955
0.4098
First-order with T=4
0.2962
0.3105
Second-order with T=3
0.3636
0.3972
First-order with T=4
0.2556
0.2893
Second-order with T=3
0.3359
0.3808
First-order with T=4
0.2263
0.2643
0.5
0.7
True Positive Rate
SPAM Features
0.6
0.5
0.75
0.4
0.3
1
0.2
existing LSB matching
optimized LSB matching
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Positive Rate
VI. SUMMARY
Figure 3. The ROC curves for two kind LSB matching
steganography when the relative payload is 0.75
1
0.9
0.8
True Positive Rate
0.7
0.6
0.5
0.4
0.3
0.2
existing LSB matching
optimized LSB matching
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Positive Rate
Figure 4. The ROC curves for two kind LSB matching
steganography when the relative payload is 0.5
© 2012 ACADEMY PUBLISHER
We present an optimized LSB matching algorithm.
The probabilities of adding or subtracting one for
embedding, which is also named as the embedding
transferring probabilities, are determined by solving an
optimization problem. We demonstrate that Fisher
information is the quadratic function of the embedding
transferring probabilities. Assuming that the groups of
pixels in the cover image can be modeled as GMM, we
obtain Gaussian mixture distribution of pixel group for
each cover image. Based on it, the optimized embedding
transferring probabilities are solved by quadratic
programming for each sub-Gaussian distribution. The
experiments show that the security performance of this
new algorithm is better than the existing LSB matching.
Furthermore, the principle of improving the security in
our optimized LSB matching algorithm is different with
that of stegnography using syndrome coding. The
302
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
optimized embedding method here can be combined with
them. The reason is that many steganographic algorithms
based on syndrome coding [12][13] don’t define the
specific modification operation (adding or subtracting 1)
on cover elements, and our optimized LSB matching
algorithm can be utilized to decide which operation to be
selected. The combination of two kinds of algorithms
may further improve the security.
[17]
[18]
[19]
REFERENCES
[1] T. Pevny, P. Bas, and J. Fridrich, “Steganalysis by
subtractive pixel adjacency matrix,” IEEE Transactions on
Information Forensics and Security, vol. 5, no. 2, 2010, pp.
215–224.
[2] Y. Sun, F. Liu, and B. Liu, “Steganalysis based on
difference image,” in IWDW 08, LNCS 5450. Heidelberg:
Springer-Verlag, 2009, pp. 184–198.
[3] P. Sallee, “Model-based steganography,” in Proceedings of
IWDW 2003, LNCS 2939, 2004, pp.154–167.
[4] P. Sallee, “Model-based methods for steganography and
steganalysis,” International Journal of Image and Graphics,
vol. 5, no. 1, 2005, pp. 167–189.
[5] N. Provos, “Defending against statistical steganalysis,” in
Proceedings of 10th USENIX Security Symposium.
Washington. DC, 2001, pp. 323–335.
[6] K. Solanki, K. Sullivan, and U. Madhow, etc, “Provably
secure steganography: Achieving zero K-L divergence
using statistical restoration,” in Proceedings of ICIP’06.
Piscataway: IEEE Signal Processing Society, 2007, pp.
125–128.
[7] A. Sarkar, K. Solanki, and U. Madhow, etc, “Secure
steganography: statistical restoration of the second order
dependencies for improved security,” in Proceedings of
ICASSP’07. Piscataway: IEEE Signal Processing Society,
2007, pp. II277–II280.
[8] K. Changa, C. Changa, and P. S. Huangb, etc, “A novel
image steganographic method using tri-way pixel-value
differencing,” Journal of Multimedia, vol.3, no.2, 2008,
pp.37-44.
[9] A. Westfeld, “F5--A steganographic algorithm high
capacity despite better steganalysis,” in Proceedings of IH
2001, LNCS 2137, 2001, pp. 289–302.
[10] Y. Kim, Z. Duric, and D. Richards, “Modified matrix
encoding technique for minimal distortion steganography,”
in Proceedings of IH 06, LNCS 4437, 2006, pp. 314–327.
[11] J. Fridrich, and D. Soukal, “Matrix embedding for large
payloads],” IEEE Transactions on Information Forensics
and Security, vol. 1, no. 3, 2006, pp. 390–394.
[12] J. Fridrich, M. Golijan, and D. Soukal, etc, “Wet paper
codes with improved imbedding efficiency,” IEEE
Transactions on Information Forensics and Security, vol. 1,
no. 1, 2006, pp. 102–110.
[13] T. Filler, J. Judas, and J. Fridrich, “Minimizing embedding
impact
in
steganography
using
Trellis-Coded
quantization,” Proceedings of Media Forensics and
Security III, SPIE 7451, 2010. pp. 715405-1~715405-14.
[14] T. Filler, J. Judas, and J. Fridrich, “Gibbs construction in
steganography,” IEEE Transactions on Information
Forensics and Security, vol. 5, no.4, 2010, pp.705-720.
[15] T. Pevny, T. Filler, and B. Patrick, “Using highdimensional image models to perform highly undetectable
steganography,” in Proceedings of IH’10, LNCS 6387,
2010, pp.161-177.
[16] T. Filler, and J. Fridrich, “Design of adaptive
steganographic schemes for digital images,” in
© 2012 ACADEMY PUBLISHER
[20]
[21]
[22]
[23]
Proceedings of Media Watermarking, Security and
Forensics III, SPIE 7880, 2011, pp. 78800F-1~78800F-13.
C. Cachin, “An information-theoretic model for
steganography,” in proceedings of IH’98, LNCS 1525.
Heidlberg: Springer-Verlag, 1998, pp. 306–318.
T. Filler, and J. Fridrich, “Fisher Information determines
capacity of secure steganography,” in proceedings of IH’09,
LNCS 5806. Heidlberg: Springer-Verlag, 2009, pp. 31–47.
T. Filler, A. D. Ker, and J. Fridrich, “The square root law
of steganographic capacity for Markov covers,” in
Proceedings of Media Forensics and Security 09, SPIE
7254, Bllingham: SPIE press, 2009, pp. 08-1–08-11.
A. D. Ker, “Estimating the information theoretic optimal
stego noise,” in Proceedings of IWDW’09, LNCS 5703.
Heidlberg: Springer-Verlag, 2009, pp. 184–198.
G. McLachlan, and P. David, “Finite mixture models,”
New York: Wiley, 2001.
C. A. Bouman, “CLUSTER: An unsupervised algorithm
for
modeling
Gaussian
mixtures,”
http://www.ece.purdue.edu/~bouman.
http://photogallery.nrcs.usda.gov.
Yi-feng Sun was born in Xinxiang, China in 1976. He received the B.S.,
M.S. and PH.D. degrees in computer science
from Zhengzhou information science and
technology institute, Zhengzhou, China, in
1999, 2002, 2010.
He is presently an instructor with the
department
of
Information
Security,
Zhengzhou
information
science
and
technology institute, Zhengzhou, China. His
research interests are in image steganography,
image steganalysis, computer vision.
Dan-mei Niu was born in Luoyang, China in 1979. She received the
B.S. degree in 2001, and received the M.S.
degree from Henan university of science and
technology, Luoyang, China in 2006.
She is presently an instructor in electronic
information engineering college, Henan
university of science and technology,
Luoyang, China. Her research interests are in
information
security,
digital
rights
management.
Guang-ming Tang was born in Wuhan, China in 1963. She received
the B.S., M.S. and PH.D. degrees in
information security from Zhengzhou
information science and technology institute,
Zhengzhou, China, in 1983, 1990, and 2008,
respectively.
She is presently a professor with the
department
of
Information
Security,
Zhengzhou
information
science
and
technology institute, Zhengzhou, China. Her
fields of professional interest are information
hiding, watermarking and software reliability.
She has published 51 research articles and 3 books in these areas.
Ms. Tang is a recipient of Provincial Prize for Progress in Science
and Technology.
Zhan-zhan Gao was born in Shijiazhuang, China in 1988. He received
the B.S. degree in electronic science and technology from Zhengzhou
information science and technology institute, Zhengzhou, China, in
2011. He is pursuing the M.S degree in information security at
Zhengzhou information science and technology institute. His research
interests include information hiding and information security.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
303
A Novel Robust Zero-Watermarking Scheme
Based on Discrete Wavelet Transform
Yu Yang1, 2 Min Lei1, 2
1. Information Security Center, Beijing University of Posts and Telecommunications, Beijing, China
2. National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and
Telecommunications, Beijing, China
Email: [email protected], [email protected]
Huaqun Liu
School of Information and Mechanical Engineering, Beijing Institute of Graphic Communication, Beijing, China
Yajian Zhou1, 2 Qun Luo1
1.
Information Security Center, Beijing University of Posts and Telecommunications, Beijing, China
2. National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and
Telecommunications, Beijing, China
Abstract—In traditional watermarking algorithms,
the insertion of watermark into the original signal
inevitably introduces some perceptible quality
degradation. Another problem is the inherent
conflict between imperceptibility and robustness.
Zero-watermarking technique can solve these
problems successfully. But most existing zerowatermarking algorithm for audio and image
cannot resist against some signal processing
manipulations or malicious attacks. In the paper, a
novel audio zero-watermarking scheme based on
discrete wavelet transform (DWT) is proposed,
which is more efficient and robust. The
experiments show that the algorithm is robust
against the common audio signal processing
operations such as MP3 compression, requantization, re-sampling, low-pass filtering,
cutting-replacement, additive white Gaussian noise
and so on. These results demonstrate that the
proposed watermarking method can be a suitable
candidate for audio copyright protection.
Index Terms—zero-watermarking,
transform, copyright protection
discrete
wavelets
I. INTRODUCTION
Recently, the rapid development of the Internet has
increased multimedia services, such as electronic
commerce, pay-per-view, video-on-demand, electronic
newspapers, and peer-to-peer media sharing. As a result,
multimedia data can be obtained quickly over high speed
network connections. However, the authors, publishers,
owners and providers of multimedia data are reluctant to
grant the distribution of their documents in a networked
environment because the ease of intercepting, copying
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.4.303-308
and redistributing electrical data in their exact original
form encourages copyright violation. Therefore, it is
crucial for the future development of networked
multimedia systems that robust methods are developed to
protect the intellectual property right of data owners
against unauthorized copying and redistribution of the
material made available on the network. Classical
encryption systems do not completely solve the problem
of unauthorized copying because once encryption is
removed from a document, there is no more control of its
dissemination.
Digital watermarking is a good approach for providing
copyright protection of digital contents (for example text,
word, image, audio, video) [1-14]. This technique is
based on direct embedding the watermarking information
into the digital contents. The embedding operations
should not be introduced any perceptible distortions
ideally, there must be no perceptible difference between
the watermarked and the original version. That is to say
the watermark data should be embedded imperceptibly
into the audio media. Apart from imperceptibility,
capacity and robustness are two fundamental properties
of audio watermarking schemes. To make tradeoff
between imperceptible and robust, it’s a common method
to incorporate the human auditory system (HAS) into a
watermarking system. The watermark should be
extractable after various intentional and unintentional
attacks. These attacks may include additive noise, resampling, MP3 compression, low-pass filtering, requantization, and any other attack which removes the
watermark or confuse the watermark extraction system.
Considering a trade-off between capacity, transparency
and robustness is the main challenge for audio
watermarking applications.
We focus on the audio watermarking scheme in this
paper. In traditional audio watermarking techniques,
either in spatial domain, transform domain, or dual
304
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
domain [15-16], the embedding of watermark into the
original audio inevitably introduces some audible quality
degradation. Another problem is the inherent conflict
between the imperceptibility and robustness. Then, zerowatermarking technique was proposed by some
researchers to solve these problems [17–25]. Instead of
embedding watermark into the original signal, the zerowatermarking approach just constructs a binary pattern
based on the essential characteristics of the original signal
and uses them for watermark recovery. In [21], the
property of the natural images that the vector quantization
indices among neighboring blocks tend to be very similar
was utilized to generate the binary pattern. A scheme that
combined the zero-watermarking with the spatialdomain-based neural networks was proposed in [22], in
which the differences between the intensity values of the
selected pixels and the corresponding output values of the
neural network model were calculated to generate the
binary pattern. Some low-frequency wavelet coefficients
were randomly selected from the original image by
chaotic modulation and used for character extraction in
[23]. And in [24], two zero-watermarks were constructed
from the original image. One was robust to signal process
and central cropping, which was constructed from lowfrequency coefficients in discrete wavelet transform
domain and the other was robust to general geometric
distortions as well as signal process, which was
constructed from DWT coefficients of log-polar mapping
of the host image. There are novel zero-watermarking
algorithms for audio now. An efficient and robust zerowatermarking technique for audio signal is presented in
[25]. The multi-resolution characteristic of discrete
wavelet transform (DWT), the energy compression
characteristic of discrete cosine transform (DCT), and the
Gaussian noise suppression property of higher-order
cumulant are combined to extract essential features from
the original audio signal and they are then used for
watermark recovery. Simulation results demonstrate the
effectiveness of the scheme in terms of inaudibility,
detection reliability, and robustness. However, all these
zero-watermarking techniques are designed for still
image and their robustness against some signal
processing manipulations or malicious attacks is not
satisfying. In this paper, a novel robust zerowatermarking technique for audio signal is proposed.
The paper is organized as follows. Section II
introduces basic concepts for discrete wavelet transform.
Section III describes a novel robust zero-watermarking
scheme. Section IV exhibits the experimental results
illustrating that the proposed method can get good
robustness. A brief conclusion can be available in Section
V.
For audio, we need convert the 1-dimensional audio
into 2-dimensional. Wavelet transform decomposes an
audio into a set of band limited components which can be
reassembled to reconstruct the original audio without
error. Since the bandwidth of the resulting coefficient sets
is smaller than that of the original audio, the coefficient
sets can be down sampled without loss of information.
Reconstruction of the original signal is accomplished by
up sampling, filtering and summing the individual sub
bands. For 1-D audio, we convert the 1-D audio into 2-D
signal, and then apply DWT corresponds to processing
the audio by 2-D filters in each dimension. The filters
divide the input audio into four non-overlapping multiresolution coefficient sets, a lower resolution
approximation audio (LL1) as well as horizontal (HL1),
vertical (LH1) and diagonal (HH1) detail components.
The sub-band LL1 represents the coarse-scale DWT
coefficients while the coefficient sets LH1, HL1 and HH1
represent the fine-scale of DWT coefficients. To obtain
the next coarser scale of wavelet coefficients, the subband LL1 is further processed until some final scale N is
reached. When N is reached we will have 3N+1
coefficient sets consisting of the multi-resolution
coefficient sets LLN and LHx, HLx and HHx where x
ranges from 1 until N. Figure 1 show the representation
of 2-levels of wavelet transform.
II. DISCRETE WAVELET TRANSFORM
A. Embeddbing Process
Let A be the original audio signal and let
Wavelet transform is a time domain localized analysis
method with the window’s size fixed and convertible. It
supplies high temporal resolution in high frequency part,
and high frequency resolution in low frequency part of
signals. It can distill the information from signal
effectively.
© 2012 ACADEMY PUBLISHER
LL2
HL2
LH2
HH2
HL1
2-D Original Audio
LH1
HH1
Figure 1: Representation of 2-Levels of Wavelet
Transform
Due to its excellent spatio-frequency localization
properties, the DWT is very suitable to identify the areas
in the original audio where a watermark can be embedded
effectively. In particular, this property allows the
exploitation of the masking effect of the human visual
and auditory system such that if a DWT coefficient is
modified, only the region corresponding to that
coefficient will be modified. In general most of the audio
energy is concentrated at the lower frequency coefficient
sets LLx and therefore embedding watermarks in these
coefficient sets may degrade the audio significantly.
Embedding in the low frequency coefficient sets,
however, could increase robustness significantly.
III. AUDIO WATERMARKING ALGORITHM
W = {w(i, j ) i = 1, 2,..., M1 , j = 1, 2,..., M 2 } be the
binary-valued image watermark to be embedded. Now we
describe the watermark embedding procedure. (Figure 2)
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Step 1: Dimension reduction of watermark image.
Because the audio signal is one- dimensional and
watermarking W is two-dimensional. Therefore the
watermarking should be reduced into one-dimensional
signal:
w = {w(i) = w(m1 , m2 ) 0 ≤ m1 ≤ M 1 , 0 ≤ m2 ≤ M 2 ,
i = M1 × M 2 }
305
Step 4: We can get the watermarking extraction secret
key k (i ) = w(i ) ⊕ s (i ) through computing the
exclusive or operation between s (i ) and watermarking
information w(i ) .
B. Extraction Process
The watermark recovery procedure can be carried out
as follows (Figure 3).
Yes
0<i<1025
Yes
No
0<i<1025
Figure 2: Embedding Process
Step 2: Let the original audio A = ( a1 , a2 ,..., aN )
No
with N PCM (pulse-code modulation) samples be
M = ⎢⎣ N / 512⎥⎦ blocks Bi , where
M = M 1 × M 2 , i = 1, 2,..., M . Each block includes
segmented into
512 samples.
Step 3: for i=1: M
3.1). The 3-level discrete wavelet packet
decomposition is applied to the audio block Bi , and can
get
the
approximate
sub-band
coefficients
Ci = {ci (1), ci (2),..., ci (64)} , which has 64 samples.
3.2). Computes
t (i ) = ∑ j =1 ci ( j ) × ci ( j ) × j ,
64
t '(i ) = ∑ j =1 ci ( j ) × ci ( j )
64
According to the approximate sub-band coefficients
Ci = {ci (1), ci (2),..., ci (64)} . And gets:
s '(i ) = ⎣⎢t (i) / t '(i) ⎦⎥ .
3.3). Computes s (i ) = s '(i )(mod 2) .
3.4). End
© 2012 ACADEMY PUBLISHER
Figure 3: Extraction Process
Step 1: Let the watermarked audio A ' = (a '1 , a '2 ,
..., a 'N ) be segmented into M = ⎣⎢ N / 512⎦⎥ blocks B 'i ,
where M = M 1 × M 2 , i = 1, 2,..., M . Each block
includes 512 samples.
Step 2: for i=1: M
2.1). The 3-level discrete wavelet packet
decomposition is applied to the audio block B 'i , and can
get the approximate sub-band coefficients C 'i = {c 'i (1),
c 'i (2),..., c 'i (64)} which has 64 samples.
2.2). Computes
306
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
T (i ) = ∑ j =1 c 'i ( j ) × c 'i ( j ) × j
64
,
T '(i ) = ∑ j =1 c 'i ( j ) × c 'i ( j )
64
according to the approximate sub-band coefficients
C 'i = {c 'i (1), c 'i (2),..., c 'i (64)} . And gets
S '(i) = ⎣⎢T (i) / T '(i ) ⎦⎥ .
2.3). Computes S (i ) = S '(i )(mod 2) .
2.4). End
Step 3: We can get the watermarking information
w '(i ) = k (i ) ⊕ S (i ) through computing the exclusive
or operation between S (i ) and he watermarking
extraction secret key k (i ) .
Step 4: We can get the binary-valued image watermark
according to the extracted watermarking information.
IV.
EXPERIMENTAL RESULTS AND ANALYSIS
In this experiment, two binary stamp image with size
32 × 32 (i.e., M 1 = M 2 = 32 ), displayed in Fig. 5, are
taken as the original watermark And we evaluate the
performance of our proposed watermarking method for
different types of 16 bit mono audio signals sampled at
44.1 KHz as shown in Fig. 4. The sound files are: (a) Pop,
(b) Speech, (c) Classic. Each audio file contains 524,288
samples with duration of 11 seconds.
Figure 4: The signal of original audio
Figure 5: Watermarks with size 32 × 32
In order to evaluate the quality of watermarked audio,
the following signal-to-noise radio (SNR) equation is
used:
SNR = 10 log10
∑
© 2012 ACADEMY PUBLISHER
∑
N
n =1
N
n =1
Y 2 ( n)
[Y (n) − Y '(n)]2
where Y ( n ) and Y '( n ) are original audio signal and
watermarked audio signal respectively.
In order to evaluate the robustness of the watermarked
algorithm, the following normalized coefficient (NC) and
bit error rate (BER) are employed respectively:
M1 M 2
NC(W ,W ) =
∑∑W (i, j)W *(i, j)
i =1 j =1
*
M1 M 2
∑∑W (i, j)2 ×
i =1 j =1
M1 M 2
∑∑W *(i, j)
2
i =1 j =1
BER = (∑ i =1 ∑ j =1 W (i, j ) − W *(i, j ) ) / ( M 1 × M 2 )
M1
M2
where W (i, j ) and W * (i , j ) are the original
watermark and extract watermark respectively, and
M 1×M 2 is the size of watermark.
And, the bit error rate (BER) was employed to measure
the robustness of our algorithm,
BER =
B
× 100%
M1 × M 2
where B is the number of erroneously extracted bits and
M 1×M 2 is the size of watermark.
A. Imperceptibility Anallysis
One of the main requirements of audio
watermarking techniques is inaudibility of the embedded
watermark. For the proposed scheme, this requirement is
naturally achieved because the watermark is embedded
into the secret key but not the original audio signal itself.
Actually, the watermarked audio is the identical to the
original one.
B. Robusness Test and Analysis
In order to test the robustness of our proposed method,
eight different types of attacks is as following:
(1) Noise addition: 20 dB additive white Gaussian
noise (AWGN) is added to the watermarked audio signal
(2) Re-sampling: The watermarked signal originally
sampled at 44.1kHz is re-sampled at 22.050kHz, and then
restored by sampling again at 44.1 kHz.
(3) Low-pass filtering: cut-off frequency is 11.025kKz.
(4) Re-quantization: the 16 bit watermarked audio
signal is quantized down to 8 bits/sample and again requantized back to 16 bits/sample.
(5) MP3 compression: MPEG-1 layer 3 compression
with 64 kbps is applied to the watermarked signal.
(6) MP3 compression: MPEG-1 layer 3 compression
with 32 kbps is applied to the watermarked signal.
(7) MP3 compression: MPEG-1 layer 3 compression
with 128 kbps is applied to the watermarked signal
(8) Noise reduction: Preset shape is Hiss removal.
(9) Equalization.
Table 1 show the NC and BER of the proposed
watermarking method in terms of robustness against
several kinds of attacks applied to three different types of
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
307
watermarked audio signal “Speech”, “Classic” and “Pop”
respectively.
Speech BER
Extracted
image
Table 1 The result of attacks
Classic
NC
BER
Extracted
image
Pop BER
Audio
NC
Attack
1 1 0 1 0 1 0 2 1 0 1 0 1 0 3 1 0 1 0 1 0 4 1 0 1 0 1 0 5 1 0 1 0 1 0 6 1 0 1 0 1 0 7 1 0 1 0 1 0 8 0.9991 0.001 1 0 1 0 9 1 0 1 0 1 0 NC
Extracted
image
Table 2 Comparison between our algorithm and other algorithm
Attack
NC(our) NC([25]) NC([26]) NC([27])
Equalization
1
0.97
-
0.998
MP3 compression(32kbps)
1
0.99
-
0.999
Low-pass filtering
1
1
0.9971
0.824
Noise addition
1
1
0.9668
0.999
Re-sampling
1
1
-
0.981
The Table 2 compares our algorithm and other
algorithms, we can easily get the robustness of our
algorithm is more stronger than other algorithms.
sampling, low-pass filter, cutting-replacement and
additive white Gaussian noise. These results demonstrate
that our algorithm can be a suitable candidate for audio
copyright protection.
V CONCLUSION
A novel audio zero-watermarking scheme based on
discrete wavelet transform (DWT) is proposed in this
paper, which is more efficient and robust. The
experiments show that the proposed algorithm has good
robustness against the common audio signal processing
operations such as MP3 compression, re-quantization, re-
© 2012 ACADEMY PUBLISHER
ACKNOWLEDGMENT
This work is supported by the National Natural
Science Foundation of China under Grant NO. 60973146,
NO. 61170272, NO. 61170269, NO. 61003285, and
Institute Level Project Funded by Beijing Institute of
Graphic Communication under Grant NO. E-6-2012-23.
308
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
REFERENCES
[1] Min Lei, Research on the digital watermarking and
steganalysis algorithms of audio information hiding[D].A
dissertation for the degree of doctor of philosophy, Beijing
University of Posts and Telecommunications. 2011:28-44.
[2] Yu Yang, Min Lei, Xiayun Jia etc.A New Digital Audio
Watermarking Based on DCT-DWT-SVD. Proceedings
2011 IEEE International Conference on Information
Theory and Information Security[C] August 7-9, 2011,
Hangzhou China, IEEE PRESS, BEIJING,2011,906-909.
[3] Lei Min, Yang Yu Recent Advances in Audio-Based
Steganalysis Research. Proceedings of the 2nd Asia-Pacific
Conference on Information Network and Digital Content
Security[C], August 7-9,2011, Hangzhou China,
ATLANTIS PRESS, FRANCE, 2011, 208-213.
[4] LEI Min, YANG Yu, LUO Shou-shan et al. Semi-fragile
Audio Watermarking Algorithm in DWT Domain [J].
China Communications, 2010, 7(4): 71-75
[5] S. C. Liu and S. D. Lin, “BCH code based robust audio
watermarking in the cepstrum domain,” Journal of
Information Science and Engineering, vol. 22, pp. 535-543,
2006.
[6] M. Pooyan and A. Delforouzi, “Adaptive and Robust
Audio watermarking in Wavelet Domain,” in Proceedings
of International Conference on International Information
Hiding and Multimedia Signal Processing (IIH-MSP 2007),
vol. 2, pp. 287-290, 2007
[7] G. Zeng and Z. Qiu, “Audio Watermarking in DCT:
Embedding Strategy and Algorithm,” in Proceedings of
9th International Conference on Signal Processing
(ICSP’09), pp. 2193-2196, 2008.
[8] Samira Lagzian, Mohsen Soryani and Mahmood Fathy, “A
new robust watermarking scheme based on RDWT-SVD,”
International Journal of Intelligent Information Processing,
Vol. 2, No 1, 2011.
[9] Pranab Kumar Dhar, Mohammad Ibrahim Khan and Saif
Ahmad, “A new DCT-based watermarking method for
copyright protection of digital audio,” International
Journal of Computer Science & Information Technology.
Vol. 2, No. 5, 2010, pp. 91-101.
[10] Chen Yongqiang, Zhang yanqing, Hu Hanping and Ling
Hefei, “A novel gray image watermarking scheme,”
Journal of Software, Vol. 6, No, 5, 2011, pp. 849-856.
[11] Kilari Veera Swamy, B. Chandra Mohan, Y. V. Bhaskar
Reddy and S. Srinivas Kumar, “Image compression and
watermarking scheme using scalar quantization,” The
International Journal of Next Generation Network, Vol. 2,
No. 1, 2010, pp. 37-47.
[12] Hasan M. Meral, Emre Sevinc, Ersin Unkar, Bulent Sankur,
A. Sumru Ozsoy and Tunga Gungor, “Syntactic tool for
text watermarking,” in Proceedings of the in International
society for Optical Engineering, 2007.
[13] Pranab Kumar Dhar, Mohammad Ibrahim Khan and JongMyon, Kim, “A new audio watermarking system using
discrete fourier transform for copyright protection,”
International journal of computer science and network
security. 6(10), 2010: 35-40
© 2012 ACADEMY PUBLISHER
[14] Vivekananda B K, Indranil S and Abhijit D, “An Audio
Watermarking
Scheme
Using
Singular
Value
Decomposition and Dither-Modulation Quantization,”
Multimedia Tools and Applications, 52(2-3), 2010: 369 383.
[15] X.-Y. Wang and H. Zhao, “A novel synchronization
invariant audio watermarking scheme based on DWT and
DCT,” IEEE Transactions on Signal Processing, vol. 54,
no. 12, pp. 4835–4840, 2006.
[16] X.-Y. Wang, W. Qi, and P. Niu, “A new adaptive digital
audio watermarking based on support vector regression,”
IEEE Transactions on Audio, Speech, and Language
Processing, vol. 15, no. 8, pp. 2270–2277, 2007.
[17] T. Sun, W. Quan, and S.-X. Wang, “Zero-watermark
watermarking for image authentication,” in Proceedings of
the Signal and Image Processing, pp. 503–508, Kauai,
Hawaii, USA, August 2002.
[18] G. Horng, C. Chen, B. Ceng, and T. Chen, “Neural
network based robust lossless copyright protection
technique,”http://www.csie.cyut.edu.tw/TAAI2002/TAAI2
002PDF/Parallel.
[19] Q. Wen, T.-F. Sun, and S.-X. Wang, “Concept and
application of zero-watermark,” Tien Tzu Hsueh Pao/Acta
Electronica Sinica, vol. 31, no. 2, pp. 214–216, 2003.
[20] S. Yang, C. Li, F. Sun, and Y. Sun, “Study on the method
of image non-watermark in DWT domain,” Chinese
Journal Image Graphics, vol. 8A, no. 6, pp. 664–669, 2003.
[21] D. Charalampidis, “Improved robust VQ-based
watermarking,” Electronics Letters, vol. 41, no. 23, pp.
21–22, 2005.
[22] J. Sang, X. Liao, andM. S. Alam, “Neural network based
zero-watermark scheme for digital images,” Optical
Engineering, vol. 45, no. 9, 2006.
[23] C. Hanqiang, X. Hua, L. Xutao, L. Miao, Y. Sheng, and W.
Fang, “A zero-watermarking algorithm based on DWT and
chaotic modulation,” in Independent Component Analyses,
vol. 6247 of Proceedings of SPIE, pp. 1–9, Orlando, Fla,
USA, 2006.
[24] L. Jing and F. Liu, “Double zero-watermarks scheme
utilizing scale invariant feature transform and log-polar
mapping,” in Proceedings of the IEEE International
Conference onMultimedia and Expo, pp. 2118–2121, Las
Vegas,Nev, USA, February 2007.
[25] Ning Chen, Jie Zhu. “A robust zero-watermarking
algorithm for audio,” EURASIP Journal on Advances in
Signal Processing, 2008.
[26] W Song, J. J. Hou, Z. H. Li and L. Huang, “A novel zerobit watermarking algorithm based on logistic chaotic
system and singular value decomposition,” Acta physica
sinica, Vol. 58, No. 7, 2009, pp. 4449-4456.
[27] L. Tan, B. Wu, Z. Liu and M. T. Zhou, ‘An audio
information on hiding algorithm with high capacity which
based on Chaotic and wavelet transform,” Acta electronic
sinica, Vol. 38, No. 8, 2010, pp. 1812-1818.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
309
Stego Key Estimation in LSB Steganography
Jing LIU
Zhengzhou information science and technology institute, ZhengZhou, China
Email: [email protected]
Guangming TANG
Zhengzhou information science and technology institute, ZhengZhou, China
Email:[email protected]
Abstract—There are kinds of multimedia can be accessed
conveniently in social networks. Some of multimedia may be
used to hiding harmful information. We consider the
problem of estimating the stego key used for hiding using
least significant bit (LSB) paradigm, which has been proved
much difficult than detecting the hidden message. Previous
framework for stego key search was provided by the theory
of hypothesis testing. However, the test threshold is hard to
determine. In this paper, we propose a new method, in
which the correct key can be identified by inspecting the
difference between stego sequence and its shifted sequence
on the embedding path. It’s shown that the new technique is
much simpler and quicker than that previously known.
Index Terms—social networks, multimedia contents security,
steganography, stego key estimation
I. INTRODUCTION
The aim of steganography is to hide information
imperceptibly into cover objects, such as digital images.
To hide a message, some features of the original image,
also called the cover image, which are chosen by a stego
key are slightly modified by the embedding technique to
obtain the stego image. Before embedding, the message is
usually encrypted[1]. Steganography is significant to
information security. However, steganographic technique
can be used unlawfully by criminals and terrorists,
especially in social networks.
Steganalysis aims to break steganography. steganalysis
can be classified into two categories: active and passive[2].
The goal of the active steganalysis is to estimate some
parameters (stego key, message length etc.) of the
embedding algorithm or the hidden message[3][4], while
passive steganalysis deals with identifying the
presence/absence of a hidden message or the embedding
algorithm used [5][6]. In this paper, we investigate active
steganalysis since the aim is to estimate the stego key
under the assumptions that we already know the
Manuscript received January 1, 2012; revised June 1, 2011;
accepted July 1, 2011.
This work was supported in part by the National Natural Science
Foundation of China(No.61101112), and Postdoctoral Science
Foundation of China(No.2011M500775).
Corresponding author: Liu Jing, email:[email protected]
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.4.309-313
steganographic algorithm.
Trivedi et al.[7][8] presented a method for secret key
estimation in sequential steganography. The authors used
a sequential probability ratio test to determine the
embedding key, which was, in their interpretation, the
beginning and the ending of the subsequence modulated
during embedding. In fact, sequential embedding is
typically used for watermarking. Fridrich et al[9][10]
considered a more typical situation for a steganographic
application, in which the key determined a pseudorandomly ordered subset of all indices in the cover image
to be used for embedding. They performed chi-square test
to estimate the correct key. Unfortunately, chi-square test
suffered from the drawback of no simple way to choose
the test threshold to attain a desired target performance.
In this paper, we propose a novel approach to search
the stego key used in LSB steganography. The key can be
determined by simple count. In Section II, we include a
description of the LSB embedding algorithm. In Section
III, we give the definitions used in our method, and then
describe the method of identifying the correct key. In
Section IV, we give two kinds of experiments. we verify
our method in the first experiment and show the
effectiveness of our new method by attacking
steganographic software in the second experiment.
Finally, we outline some further directions for research.
II. LSB STEGANOGRAPHY
For an image I sized M N , let the pixel value
at i, j be I i, j , i 1, 2, , M , j 1, 2, , N ,
I i, j
0,1,
, 255 . Let K be the space of all
possible stego keys. For each key K K , let Path K
denote the ordered set of element indices visited along the
path generated from the key K . When embedding
message bits, the elements in the sequence
I i , j , i, j Path K 0 , K 0 K are chosen orderly to
be modified by the embedding operation, which is shown
in Table I. After embedding, the stego image I with
MN is obtained.
embedding ratio q
Our task is to identify the stego key K0 under the
condition that we have a complete knowledge of the
embedding algorithm and one stego image.
310
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
notation, we use simple symbols to denote the
corresponding pixel set, shown in TableII.
TABLE I.
LSB EMBEDDING OPERATION
2x
Cover pixel
Message bit
0
1
2x
Stego pixel
TABLE
0
2x 1
Symbols
1
2x
2x 1
III. PROPOSED METHOD
A. Definitions
To explain the detail of our new technique, we’ll first
briefly explore the definitions used in our method.
A non-boundary pixel I i, j has eight adjacent pixels
in image: I (i 1, j 1), I (i 1, j), , I i 1, j 1 . The
differences between center pixel and its adjacent pixels
are denoted by
D1 i, j I (i 1, j 1) I (i, j),
D2 i, j
HM
LM
LE min Dk
MH
M E max Dk
0
ML
M O min Dk
0
I (i 1, j 1) I (i, j)
L
min Dk
Flipping Operation: 0
1, 2
3, , 254
255
ˆ
Then a shifted sequence I i, j , i, j Path K
B. Method for Identifying Correct Key
ME
FM K
Fˆ K
FH K
Fˆ K
FL K
Fˆ K
H
(1)
L
The frequency change of H is independent of that of
L . Therefore, we only consider the frequency of M
pixels. It’s apparent that:
FˆM K
FM K
FMH K
FHM K
FML K
Figure 1. Movements of each kind of pixels after LSB
embedding.
But only four movements, which span different pixel
sets, will change the pixel frequencies of set H , L , and
M .The four very movements are:
H O max Dk
1
M E max Dk 0 ,
L min Dk
1
M
O
min Dk
0
where X y denotes pixels belonging to set X (called
X pixels) that meets requirement y . To simplify the
© 2012 ACADEMY PUBLISHER
(2)
FLM K
denotes
number of pixels in path K . Let S K denote the
frequency
difference
of
M
pixel
between
ˆI i, j , i, j Path K and I i, j , i, j Path K ,
MO
LE
LO
Path K
respectively.
Obviously:
message bits along path K , and path K
L
E
and Iˆ i, j , i, j
Path K
and
be the frequency of “1” and “0” in
Let
message bits and be the embedding ratio on the path
generated from K , which can be defined as
n path K , where n denotes number of embedded
M
HO
is
obtained.
M
Middle pixel set: I i, j M min Dk 0&max Dk 0
According to the parity of center pixel value, three sets
can be divided into six pixel subsets further: H E , H O ,
LE , LO , M E and M O . When the center pixel is modified
by LSB embedding, there are ten movements among the
six subsets, which are shown in Fig.1.
HE
1
Finally, we define a flipping operation on sequence
I i, j , i , j Path K , I i, j I :
0
H
1
Let FX K and F̂X K be the frequency of X pixels
I (i 1, j) I (i, j),
According to Dk , k 1, 2, ,8 , we define three kinds
of pixel sets— H , L , and M :
High pixel set: I i, j H
max Dk 0
Low pixel set I i, j
Pixel subset
H O max Dk
in I i, j , i, j
D8 i, j
.
SYMBOLS FOR PIXEL SETS
2x 1
namely: S K
S K
FˆM K
FM K . So:
FˆM K
FHM K
FM K
FLM K
FMH K
1
FHM K
FMH K
FML K
FML K
1
FLM K
FHM K
1
FMH K
FLM K
1
FML K
(3)
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
311
Usually, the message is encrypted, so the message bits
are i.i.d. realizations of a binary random variable
uniformly distributed on 0,1 , therefore
1 2 [11],
hence:
S K
FMH K
FHM K
FML K
FLM K
1
(4)
We use 0 and j denote the embedding ratio along
the correct path and incorrect path respectively. Because
the number of embedded message bits along the correct
path is , and the number of pixels along correct path is
too, the embedding ratio along the path generated from
correct key K 0 K is 0 1 . We have
S K0
FMH K
FHM K
FML K
FLM K
0
1
true in practice since for any payloads, the pixels are
chosen with the same probability to the embedding path.
The other is that the frequency changes of pixel
HM , MH , LM , ML in practical embedding processing
aren’t exactly same as that in theory. Despite of the two
factors, we find S K 0 S K j holds in all testing images.
We apply the method in [10] to the stego images above,
the searching ratio of which is 37 keys per second on a
Pentium IV machine running at 3.0GHZ, 512MB, while
the searching rate of our method is 115 keys per second.
0 (5)
ZHAI et al. have proved that Dk follows Generalized
Gaussian distribution with mean “zero” in [11], therefore:
FHM K , FML K
FMH K
FLM K
If the stego image is not fully embedded, the number
of embedded message bits along the incorrect path is
smaller than , so the embedding ratio along the path
generated from incorrect key K j K is smaller than 1,
namely
S Kj
j
1 . Thus:
FMH K
FHM K
FML K
FLM K
j
1 0 (7)
From (5) and (7), we get:
S K0
S Kj
(8)
Therefore, the difference between correct key and
incorrect key is obtained. By calculating S K , K K,
the key K with maximal value of S K
determined as the correct stego key.
can be
IV. EXPERIMENTS AND ANALYSIS
A. Experiments on Typical Images
We perform some experiments to verify (8). We test
100 typical images sized in 512×512, downloaded from
http://sipi.usc.edu/services/databased-/Database.html. We
simulate the course of LSB embedding using a Matlab
routine, in which the key space is 220 . For embedding
ratio q 0.60 , we embed encrypted message into each
S Kj
220 .
B. Experiment on Typical LSB Steganographic Software
We conduct this experiment by using “hide and seek
4.1”[12], which is a popular LSB steganographic software.
It uses GIF image containing 320×480 pixels as the cover
image and adopts random (num) of Borland C++ 3.1 as
the generator. The maximal embedding message length is
limited to 19000 byte The initialized state of random
(num) is a seed with 16 bits, and “num” is used to control
maximal migration step, which is related to message and
pixel number that haven’t been embedded. So the stego
key of Hide and Seek 4.1 consists of the seed and
message length. Therefore, we should recover the seed
and the exact length of message.
We convert 1000 images, downloaded from
http://www.cs.washington.edu/research/magedatabse/gro
undtruth/, into GIF format with 320×480 pixels and insert
different length of message using Hide and Seek 4.1. We
perform the experiment with our stego key recovery
algorithm described as follows.
0. Estimate the embedding ratio using the method
presented in [13] and calculate the possible length of
secret message
min , max .
1. For each
16
0,2
Ki
k
min
,
max
, test each seed
of the seed generator used in Hide and
1
Seek 4.1 with K ik
Ki ,
pixel set I i, j , i , j
image, and calculate the value of S K .We find that
S K0
for q 0.60 , K
Figure 2. S K
(6)
k
, generate the embedding
Path K ik
.
2. Calculate S K ik on sequence I i, j , i, j
holds in all images. Fig .2. shows the
value of S K in “lena.bmp”.
We can see that (8) is verified. However, the value of
S K 0 is not equal to zero, which is inconsistent with (5).
There are two factors resulting to this. One is that we get
(5) and (7) under important assumption that all
neighboring pixels are unmodified. However, this is not
© 2012 ACADEMY PUBLISHER
3. Let S max
&
k
min
,
max S ik
max
& S K ik
and T
Ki ,
k
Ki
Path Kik .
0,216 1
S max .
4. If T 1 , then the seed and the message length in T
are declared as the correct seed and length. Otherwise, the
algorithm couldn’t find any correct key.
Because our algorithm uses the method of [13] to
312
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
estimate the message length, where the possible deviation
for the estimation of embedding ratio is 0.02, 0.02 , the
effective length of stego key we should test actually is
16 log 2 0.04 . Table III shows the results of our method,
where Pr(succ) means the probability of success, defined
as: Pr(succ)=number of images whose stego key is
recovered/ total number of images.
TABLE III.
EXPERIMENT RESULTS ON HIDE AND SEEK 4.1
Embedding ratio
q
[2]
[3]
[4]
Pr(succ)
0.001
0
0.005
12.4%
0.06
89.3%
0.08
100%
0.50
100%
0.91
100%
0.95
84.4%
0.97
44.2%
0.99
0
[5]
[6]
[7]
[8]
From Table III, we can see the performance of our
method in recovering the stego key of LSB
steganographic software. When 0.08 q 0.91 , our
method can recovery the stego key of all the images.
Namely when 0.08 q 0.91 , the probability of
S K0
S Kj
[9]
[10]
is 1. But when the embedding ratio
approaches to 0 or 1, our method is disabled. Because
when q 0 , the number of data is not enough to
recovery the stego key, while q 1 , the embedding ratio
difference between correct path and incorrect path is so
small that the correct key can’t be detected. In addition,
the embedding message length influences our stego key
search ratio most, because all the elements along the path
should be calculated. The larger the message length, the
slower the search rate. The testing speed is about 4314500 keys per second.
[11]
[12]
[13]
message,” International Journal of Advanced Computer
Science and Applications, vol.2, no.3, 2011, pp.19-24.
R. Chandramouli, “A mathematical framework for active
steganalysis,” Special issue on multimedia watermarking
Springer/ACM Multimedia Systems, vol.9, no.3, 2003, pp.
303-311.
T. Pevny, J. Fridrich, and A. Ker, “From blind to
quantitative steganalysis,” SPIE Electronic Imageing, San
Jose, CA, vol.7254, 2009, pp. 0C 1-0C 14.
Kang Leng Chiew and Josef Pieprzyk, “Estimating Hidden
Message Length in Binary Image Embedded by Using
Boundary Pixels Steganography,” 2010 International
Conference on Availability, Reliability and Security,
Krakow, Poland, 2010, pp.683-688.
J.Fridrich, J. Kodovsky, V. Holub and M. Goljan,
“Steganalysis of Content-Adaptive Steganography in
Spatial Domain,” 13th Information Hiding Conference,
Prague, Czech Republic, May 2011.
J. Kodovsky, T. Pevny and J. Fridrich, “Modern
Steganalysis Can Detect YASS,”SPIE, Electronic Imaging,
Media Forensics and Security XII, San Jose, CA, Jan 2010,
pp. 02-01 - 02-11.
S. Trivedi and R. Chandramouli, “Locally Most Powerful
Detector for Secret Key Estimation in Spred Spectrum
Data Hiding,” in E. Delp(ed): Proc. SPIE, Security,
Steganography, and Watermarking of Multi-media
Contents VI, San Jose, vol.5306, 2004, pp. 1-12.
S. Trivedi and R. Chandramouli,“Secret Key Estimation in
Sequential Steganography,” IEEE Trans. on Siganl
Processing, Supplement on Secure Media, 2005.
J. Fridrich, M. Goljan, D. Soukal and T. Holotyak,
“Searching for the stego key,” Steganography and
Watermarking of Multimedia Contents of EI SPIE, San
Jose, vol.5306, 2004, pp.70-82.
J. Fridrich, M. Goljan, and D. Soukal, “Forensic
Steganalysis: Determining the Stego Key in Spatial
Domain Steganography,” in Proc. EI SPIE, San Jose, CA,
2005, pp.631-642.
ZHAI Wei-dong, LV Shu-wang, and LIU Zhen-hua,
“Spatial stego-detecting algorithm in color images based
on GGD,” Journal of China Institute of Communications,
vol.25, no.2, 2004, pp.33-42.
C.Moroney. Hide and Seek 4.1. Available at:
http://www.netlink.co.uk/users/hassop/pgp/hdsk41b.zip ,
2010.
J. Fridrich and M. Goljan, “On Estimation of Secret
Message Length in LSB Steganography in Spatial
domain,” in Proc. EI SPIE, Jose San, CA, vol.5306, Jan
2004, pp.23-34.
V. CONCLUSIONS AND FUTURE WORK
We have proposed a novel and simple way in the stego
key recovery of LSB steganography which can produce
satisfying performance in images.
In the future work, we should consider how to reduce
computational time and improve the success ratio in
searching stego key of images with larger or smaller
embedding rate. Finally, we should transfer the method
into the spatial sequential LSB steganography, whose
stego key can be considered as the beginning and the
ending of the message during embedding.
REFERENCES
[1] Joyshree Nath, and Asoke Nath, “Advanced
Steganography Algorithm using Encrypted secret
© 2012 ACADEMY PUBLISHER
Jing LIU was born in Xuancheng, China in 1985. She received
the B.S., M.S. degrees in information security from Zhengzhou
information science and technology
institute, Henan, China, in 2007 and
2010 respectively.
Currently, she is pursuing the PH.D.
degree in information security at
Zhengzhou information science and
technology institute. Her research
interest includes information hiding and
information security.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Guangming Tang was born in Wuhan, China in 1963. She
received the B.S., M.S. and PH.D. degrees in information
security from Zhengzhou information
science and technology institute, Henan,
China, in 1983, 1990, and 2008,
respectively. Her fields of professional
interest
are
information
hiding,
watermarking and software reliability.
She is presently a professor with the
department of Information Security,
Zhengzhou information science and
technology institute, Henan, China. She
has published 51 research articles and 3 books in these areas.
Ms. Tang is a recipient of Provincial Prize for Progress in
Science and Technology.
© 2012 ACADEMY PUBLISHER
313
314
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Facial Expression Spacial Charts for Describing
Dynamic Diversity of Facial Expressions
H. Madokoro and K. Sato
Department of Machine Intelligence and Systems Engineering, Faculty of Systems Science and Technology,
Akita Prefectural University, Yurihonjo, Japan
Email: {madokoro, ksato}@akita-pu.ac.jp
Abstract— This paper presents a new framework to describe
individual facial expression spaces, particularly addressing
the dynamic diversity of facial expressions that appear as
an exclamation or emotion, to create a unique space for
each person. We name this framework Facial Expression
Spatial Charts (FESCs). The FESCs are created using Self–
Organizing Maps (SOMs) and Fuzzy Adaptive Resonance
Theory (ART) of unsupervised neural networks. For facial
images with emphasized sparse representations using Gabor
wavelet filters, SOMs extract topological information in
facial expression images and classify them as categories in
the fixed space that are decided by the number of units on
the mapping layer. Subsequently, Fuzzy ART integrates categories classified by SOMs using adaptive learning functions
under fixed granularity that is controlled by the vigilance
parameter. The categories integrated by Fuzzy ART are
matched to Expression Levels (ELs) for quantifying facial
expression intensity based on the arrangement of facial
expressions on Russell’s circumplex model. We designate
the category that contains neutral facial expression as the
basis category. Actually, FESCs can visualize and represent
dynamic diversity of facial expressions consisting of ELs
extracted from facial expressions. In the experiment, we
created an original facial expression dataset consisting of
three facial expressions—happiness, anger, and sadness—
obtained from 10 subjects during 7–20 weeks at one-week
intervals. Results show that the method can adequately
display the dynamic diversity of facial expressions between
subjects, in addition to temporal changes in each subject.
Moreover, we used stress measurement sheets to obtain
temporal changes of stress for analyzing psychological effects
of the stress that subjects feel. We estimated stress levels of
four grades using Support Vector Machines (SVMs). The
mean estimation rates for all 10 subjects and for 5 subjects
over more than 10 weeks were, respectively, 68.6% and
77.4%.
Index Terms— Facial Expression Spatial Charts, SOMs,
Fuzzy ART, SVMs, SRS-18.
I. I NTRODUCTION
A face sends information of various types. Humans can
recognize intentions and emotions from diverse information that is exhibited through facial expressions. Especially for people with whom we share a close relation,
This paper is based on “Estimation of Psychological Stress Levels
Using Facial Expression Spatial Charts,” by H. Madokoro and K. Sato,
which appeared in the Proceedings of the 2010 IEEE International
Conference on Systems, Man, and Cybernetics (SMC 2010), Istanbul,
c 2010 IEEE.
Turkey, October 2010. °
This work was supported in part by the Ministry of Education,
Culture, Sports, Science and Technology (MEXT), Grant-in-Aid for
Young Scientists (B), No.21700257
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.4.314-324
we can feel and understand health conditions or moods
directly from facial expressions. For the role of facial
expressions in human communication, it is desirable to
develop advanced interfaces between humans and machines in the future [1].
In the 1970s, from a study to determine how to express emotions related to facial expressions, Ekman and
Friesen defined six facial expressions shown by people
feeling six basic emotions (happiness, disgust, surprise,
sadness, anger, and fear) that are apparently universal
among cultures. They described that these are basic facial
expressions because their associated emotions are distinguishable with high accuracy. However, real expressions
are blended intermediate facial expressions that often
show mixtures of two or three emotions. Human beings
often express various facial expressions simultaneously.
For example, eyes can express crying but the mouth can
express a smile when someone is moved by an extremely
kind deed. Moreover, the processes of expressive facial
expressions contain individual differences such as differences of face shapes among people.
Regarding this difference, Akamatsu described that
human faces present diversity of two types: static diversity
and dynamic diversity [2]. Static diversity is individual
diversity that is configured by facial componential position, size, location, etc., consisting of the eyes, nose,
mouth, and ears. We can identify a person and determine
their gender and impressions using static diversity. We are
able to move facial muscles to express internal emotions
unconsciously and sequentially or express emotions as
a message. This is called dynamic diversity. Facial expressions are expressed as a shift from a neutral facial
expression to one of changed shapes of parts and overall
configurations constructed with the face. For studying facial expression analysis, we must consider and understand
not only static diversity but also dynamic diversity.
When dealing with dynamic diversity of facial expressions, it is necessary to describe relations between
physical parameters of facial pattern changes with expressions and psychological parameters of recognized
emotion. Physical parameters of facial expressions are
the described facial deformations created by expression
under consistent measurements of facial patterns that
differ depending according to the size and shape of each
person. Ekman and Friesen proposed a Facial Action
Coding System (FACS) as a method to describe facial
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
expression changes from movements on a facial surface.
Viewed comprehensively, the FACS is the most popular
and standard method for objective description of facial
expressions. It is often used in behavioral sciences and
psychology. The FACS was developed originally as a tool
to measure facial expressions consisting of anatomical
stand-alone Action Units (AUs). The FACS is useful to
realize natural and flexible man-machine interfaces in the
fields of human cognition and behavior science studies.
However, special training is necessary to describe AUs as
an observer. Moreover, movements of facial expressions
that can not be described as AUs exist in practice because
AUs are classified subjectively by an observer to examine numerous quantities of facial expressions. Parameter
description methods aside from those of AUs of FACS
are examined because FACS systematizes appearances
of images as facial expressions, not always as suitable
parameters to describe shape changes of the facial surface
shown by expressions.
In contrast, psychological parameters can be acquired
from tasks of cognitive judgments related to emotions
indicating facial expression images to a subject as visual
stimulation. Although physical parameters of facial expression patterns are unique in each person, psychological
parameters of emotion are universal among humans. The
emotion to be recognized shows various attributes according to the degree of physical changes of facial expression
patterns such as open or closed eyes and mouth. For
estimating the degree of emotion, it is necessary to
relate physical variations of individual patterns of facial
expressions and psychological variations according to
their levels. Moreover, the feature space to describe this
amount of changes is necessary to describe spaces based
on the common scale in each person because emotion is
universal. The fact that expressive facial expressions differ
among people is associated with the fact that shapes of
faces differ among people. For example, the range within
which expressions on the facial surface change according
to an emotion differs among people. Akamatsu described
that adaptive learning mechanisms are necessary to modify models according to characteristics of expression in
each person as a platform of a classification mechanism
of emotion [2].
For organizing and visualizing facial expression spaces,
this paper presents a novel framework to describe the
dynamic diversity of facial expressions. The framework
accommodates dynamic changes of facial expressions as
topological changes of facial patterns driven by facial
muscles of expression. For that reason, the framework
is suitable to describe the richness of facial expressions using Expression Levels (ELs). The target facial
expressions are happiness, anger, and sadness from the
basic six facial expressions to represent expression levels
as a chart with axes of each expression quantitatively
and visually. From temporal facial expression images,
we use Self-Organizing Maps (SOMs) [3] that contain
self-mapping characteristics to extract facial expression
categories according to expressions. We also use Adap-
© 2012 ACADEMY PUBLISHER
315
tive Resonance Theory (ART) that contains stability and
plasticity that enable classification to integrate categories
adaptively under constant granularity. We infer relations
between categories created by Fuzzy ART and ELs based
on Russell’s circumplex model. This paper presents Facial
Expression Spatial Charts (FESCs) to represent dynamic
diversity of facial expressions as a dynamic and spatial
chart. For the experiment, we created original facial
expression datasets including images obtained during 7–
20 weeks at one-week intervals from 10 subjects with
three facial expressions: happiness, anger, and sadness.
Experimental results show that our method can visualize
and quantify facial expressions between subjects and temporal changes for creating FESCs in each subject. We use
stress measurement sheets to assess temporal changes of
stress in each subject for analyzing psychological stress,
which includes the subjects that affect facial expressions.
We analyze relations between FESCs and psychological
stress values. Moreover, we estimate stress levels from
FESCs.
This paper consists of the following. We review related
work to clarify the position of this study in Section II.
In Section III, we define two terms: ELs that quantitatively represent individual facial expression spaces and
FESCs that we propose in this paper. In Section IV, we
explain our proposed method to capture facial expression
images, preprocessing, classification of facial expression
patterns with SOMs, integration of facial expression categories with Fuzzy ART, and creation of FESCs based on
ELs. We explain our original developed facial expression
datasets in Section V. We show results of FESCs for 10
subjects in Section VI. In Section VII, we estimate stress
levels using FESCs as an application of our method. Finally, we present conclusions and feature work in Section
VIII.
II. R ELATED W ORK
Approaches dealing with facial expressions have
changed from emphasis of spatial factors as an extension
of pattern classification for recognition of the six basic
facial expressions. Increasingly, temporal changes are analyzed to solve the problems posed by dynamic diversity
of facial expressions [4]. In this section, we review related
work, particularly addressing the latter approaches. As
a study to address facial expression changes and its
timing factors, Bassili [5] classified facial expressions
using motion feature points captured by markers applied
on a human face. However, the influence of motion components is not clear because their method can not control
stimulations for dynamic changes of facial expressions in
their visual psychological experiment.
In recent studies specifically examining dynamic diversity of facial expressions, Ohta et al. [6] proposed a
method based on a model of facial structure elements.
They described facial expression patterns using parameter
values of construction of facial muscles matching with
facial video images. They created a variable model of
facial structural elements of the eyebrows, eyes, and
316
mouth. Through comparison with a target facial pattern
and pre-defined standard patterns, this model can extract expression levels to facilitate recognition of facial
expressions. Using expression levels, the intervals and
temporal changes can be extracted to reveal patterns in the
motion of the face that are related to facial expressions.
Moreover, temporal changes of expressions, duration, and
termination processes can be detected from increasing and
decreasing expression levels. However, setting of standard
facial expression patterns is necessary for calculating
expression levels. Therefore, this model can only describe
the maximum level of the standard facial expression
patterns. Moreover, this model includes the drawback that
setting the feature points manually on the first frame of
facial images is necessary.
Nishiyama et al. [7] proposed facial scores as a framework to interpret dynamic aspects of detailed facial expressions based on timing structures of movements of
facial parts. They used modes that are segmented facial
expressions according to expression levels as an index
to describe fine differences in temporal factors of facial
expression changes. After dividing a static or dynamic
status of movements of facial parts traced using Active
Appearance Models (AAMs), facial expression images are
segmented against indicators of movements from temporal
subtraction norms of feature vectors. The modes are
extracted as four patterns to repeat integration of intervals
based on hierarchical clustering defined by segmented
sectional distances. The capability of describing facial expression degrees is low, although the temporal resolution
is high for representation of timing structures from modes.
Moreover, various methods have been proposed to extract
facial expression intensity [8]–[16]. Similarly, the spatial
resolution is as high as four levels, although the temporal
resolution is also high.
The level of facial expressions differs according to
mental state, context, situation, etc. We actively use
the difference to take facial expression images for a
long term in each subject. We are aiming at describing
and quantifying individual facial expression spaces from
temporal changes of long-term facial expression datasets.
The degrees of expression levels differ among people
and their feelings at any particular time. We consider
creation of facial expression spaces in constant granularity
and adaptively, thereby avoiding quantification of the
maximum expression level. Moreover, we seek to obtain
facial expression images under consideration of the effects
of mental status.
III. A ROUSAL L EVELS AND FACIAL E XPRESSION
S PATIAL C HARTS
In this chapter, we explain ELs that use quantification
of individual facial expression spaces and FESCs as a
framework to integrate ELs.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Fear
Surprise
Happiness
15
Happiness
10
Anger
Disgust
Sadness
Neutral
5
0
Sad.
(a) Russell's circumplex model
Anger
(b) FESC
Figure 1. Correspondence relations between Russell’s circumplex model
and FESC.
sity based on the arrangement of facial expressions
on Russell’s circumplex model [17], as portrayed in
Fig. 1(a). In this model, all emotions are constellated
on a two-dimensional space: the pleasure dimension
of pleasure–displeasure and the arousal dimension of
arousal–sleepiness. The ELs include both features of the
pleasure and arousal dimensions. We extract dynamics of
facial parts such as eyes, eyebrows, and the mouth as
topological changes of facial expressions. Input images
are categorized using extracted features of topological
changes. The ELs are obtained as sorted categories according to their differences in intensity from expressions
that are regarded as neutral facial expressions.
B. Facial Expression Spatial Charts
Facial Expression Spatial Charts (FESCs) are a new
framework to describe facial expression spaces and patterns of ELs constituting each facial expression. Facial
expression spaces are spatial configurations of each facial expression that are used to analyze semantic and
polar characteristics of various emotions portrayed by
facial expressions [2]. They represent a correspondence
relation between the physical parameters that present
facial changes expressed by facial expressions and the
psychological parameters that are recognized as emotions.
Psychological parameters can be extracted from psychological experiments to take cognitive decisions related to emotions. Physical parameters must be described
based on a certain standard of types and based on the
facial deformity that invariably arises from expressions
on different facial patterns that differ in each person, as
represented by FACS.
Our target facial expressions are happiness in the first
quadrant, anger in the second quadrant, and sadness in
the third quadrant of Russell’s circumplex model. Fig.
1 shows the correspondence relation between Russell’s
circumplex model and an FESC. The value of each axis on
the FESC shows the maximum values of ELs. The FESC
is created by the connection among maximum values of
ELs.
IV. P ROPOSED METHOD
A. Expression Levels
A. Whole architecture
As described in this paper, we introduce Expression
Levels (ELs) for quantifying facial expression inten-
Akamatsu described the adaptive learning mechanisms
necessary for modification according to individual char-
© 2012 ACADEMY PUBLISHER
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
317
Preprocessing of facial images
Input
images
Acquisition of facial images
Smoothing of histogram
Coarse graining
Generation of FESCs
F0
F1
F2
Original image
Smoothing of histograms
Gabor wavelet filters
Mapping Layer
Gabor wavelet filters
SOMs
200 frames
8 x 9 pixels
Input
Layer
weights
(a) SOM
Vigilance Parameter
(b) Fuzzy ART
Figure 3. SOMs and Fuzzy ART.
Coarse graining
Hapiness
15
Weights
B. Preprocessing
local feature changes related to expressions. In contrast,
feature-based representation demands high calculation
costs for the process of extracting and tracing feature
points. Moreover, feature-based representation contains
problems of precision and stability in cases of numerous
samples being processed automatically. For our method,
we use view-based representation after converting images
with filters of Gabor wavelets showing similar characteristics to those of a human primary visual cortex. Our
processing target is to extract ELs from pattern changes
of one facial expression from neutral facial expression.
Therefore, we consider that the changed parts are apparent
on the feature space after converting Gabor wavelets,
without tracking of feature points based on AUs
The period during which images were obtained was
expanded from several weeks to several months. We
were unable to constrain external factors completely,
e.g. through lighting variations, although we took facial
expression images in constant conditions. Therefore, in
the first step, brightness values are preprocessed with
normalization of the histogram to the target images.
In the next step, features are extracted using Gabor
wavelet filtering. In the field of computer vision and image
processing, information representation of Gabor wavelets
is a popular method for an information-processing model
based on human visual characteristics. The information
representation of Gabor wavelets that can emphasize an
arbitrary feature with inner parameters shows the same
characteristic of response selectivity in a receptive field.
At the final step, we applied downsampling for noise reduction and compression of the data size. In this method,
we set the initial position of the template to contain
facial parts for capturing facial images. We use templatematching methods to trace the region of interest of a face
in real time. However, the trace results of the region of
interest yield errors caused by body motion. These errors
can be removed through the procedure of downsampling.
The downsampling window that we set is 10 × 10 pixels.
The dimension of the target images is compressed from
80 × 90 pixels to 8 × 9 pixels.
For this study, we use view-based feature representation
of holistic images, not feature-based representation such
as AUs. Actually, feature-based representation is superior
to view-based representation for detailed description of
C. Classification of facial patterns with SOMs
For classification according to ELs, 200-frame images
are normalized in a constant range. In this method, we
10
Fuzzy ART networks
FESCs
SVMs
5
0
Sad.
FESC
Anger
Estimation of stress grades
Figure 2. Procedure of the proposed method from acquisition of facial
images to generate FESCs.
acteristic features of facial expressions because the processes of expression differ among individuals. For example, a subject expresses facial surface changes of a certain
size; the expressions and their sizes differ among individuals because the shapes of faces differ among people.
Therefore, in this study, our target is intentional facial
expressions of a person. We use SOMs for extracting
topological changes of expressions and for normalization
with compression in the direction of the temporal axis. In
fact, SOMs perform unsupervised classification input data
into a mapping space that is defined preliminarily. After
classification by SOMs, facial images are integrated using
Fuzzy ART, which is an adaptive learning algorithm with
stability and plasticity. Fuzzy ART performs unsupervised
classification at a constant granularity that is controlled
by the vigilance parameter. Therefore, using SOMs and
Fuzzy ART, time-series datasets showing changes over a
long term are classified with a certain standard. Moreover,
as an application of FESCs, we estimate psychological
stress levels using Support Vector Machines (SVMs) [18].
Fig. 2 depicts an overview of the procedures used for our
proposed method. Detailed procedures of preprocessing,
category classification with SOMs, category integration
with Fuzzy ART, and stress estimation with SVMs are
explained below.
© 2012 ACADEMY PUBLISHER
318
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
used SOMs, which are unsupervised neural networks with
competitive learning in neighborhood regions. Fig. 3(a)
depicts a network architecture of a SOM. The network
architecture of SOMs typically includes two layers: the
input layer and the mapping layer. All units on the
mapping layer are connected to all units of the input layer
while maintaining weights between both layers. When a
set of input data is propagated, a unit whose weights are
the most similar to the input data is burst. Weights on the
burst unit and its neighbor units are updated to be close to
the input data, which is the learning of SOMs. Similarity
among input data limits the features of topological saving
that are reflected in the distance of the burst unit on
the one-dimensional or two-dimensional units. According
to the progress of learning, similar feature weights are
mapped to neighbor units; other units are mapped to separate units. The SOM training algorithm is the following.
1) Let wi,j (t) be the weight from the input unit i to
the Kohonen unit (n, m) at time t. The weights are
initialized with random numbers.
2) Let xi (t) be the input data to the input unit i at
time t. The Euclidean distance dj between xi (t)
and wi,j (t) is calculated as
v
u I
uX
(1)
dj = t (xi (t) − wi,j (t))2 .
i=1
3) The win unit c is defined, for which dj becomes a
minimum as
c = argmin(dj ).
(2)
4) Let Nc (t) be the units of the neighborhood of the
unit c. The weight wi,j (t) inside Nc (t) is updated
using the Kohonen training algorithm as (α(t) is the
training coefficient, which decreases with time.)
wi,j (t + 1) = wi,j (t) + α(t)(xi (t) − wi,j (t)). (3)
5) Training is finished when the iterations reach the
maximum number.
D. Integration of facial patterns with Fuzzy ART
The input data are classified in the fixed number of
units of the mapping layer. Therefore, classification results
are relative. In contrast, classification under the fixed
granularity is required for long-term datasets in each
subject. In our method, facial expression categories are
integrated with Fuzzy ART to learn weights of SOMs.
The use of ART, which was proposed by Grossberg
et al., is a theoretical model of unsupervised and selforganizing neural networks forming a category adaptively
in real time while maintaining stability and plasticity.
Actually, ART has many variations [19]. We use Fuzzy
ART [20], into which analog values can be input. Fig.
3(b) depicts a network architecture of SOMs. The network
architecture of Fuzzy ART consists of three fields: Field 0
(F0) of receiving input data, Field 1 (F1) for feature representation, and Field 2 (F2) for category representation.
The Fuzzy ART algorithm is the following.
© 2012 ACADEMY PUBLISHER
0
1
2
0
1
2
0
1
2
3
4
5
(a) Happiness
3
3
4
(c) Anger
4
5
6
7
6
7
8
5
(c) Sadness
Figure 4. Mean images in each EL of three facial expression.
The quantities of neurons of F1 and F2 are, respectively, m and n. Actually, wi represent the weights
between respective F2 neurons i and corresponding F1
neurons. All wi are initialized as one. Fuzzy ART dynamics are determined using a choice parameter a(a > 0), a
learning rate parameter r(0 ≤ r ≤ 1), and a vigilance
parameter p(0 ≤ p ≤ 1).
For each input xi to each unit i on the F2, the choice
function Ti is defined as
Ti =
|xi ∧ wi |
.
a + |wi |
(4)
In addition, ic , which is the maximum value c of Ti , is
selected for a category as a winner. The category with the
smallest index is chosen if more than one unit is maximal.
When Tc is selected for a category, the c-th neuron on the
F2 is set to 1; other neurons are set to zero. Resonance or
resetting is judged as the following equation if the selected
category matches xi . For the activation value xi ∧ wc
propagated from the signal of the c-th unit on F2 to F1,
if the match function is
|xi ∧ wc |
≥ p,
(5)
|i|
then resonance occurs at xi and c. The chosen category
is decided and the weights are updated as
wi0 = r(xi ∧ wi0 ) + (1 − r)wc .
(6)
Therefore, c is reset if resonance does not occur. The unit
to contain the next maximum value Ti is chosen again.
Resonance or reset is judged again. Fuzzy ART creates a
new unit on F2 if all units are reset.
E. Allocation of ELs to FESCs
The facial expression categories classified by SOMs
and integrated by Fuzzy ART are sorted in the order of
ELs from the neutral facial expression category. For this
dataset, the number of images of neutral facial expression
is the maximum. The neutral facial expression category
is selected to the maximum number of images. The ELs
are sorted by correlation values in each category. Fig. 4
presents an example of categories corresponding to ELs.
The number of ELs of happiness, anger, and sadness are,
respectively, eight steps, nine steps, and six steps. This
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
number is the same as the number of categories integrated
using Fuzzy ART.
The maximum values of ELs are dipicted to an FESC.
The center of an FESC is EL=0, which represents a
neutral facial expression. With increasing ELs, facial
expression categories are assigned to the outside of the
triangle.
F. Stress Estimation with SVMs
In this study, we specifically examine the effects of
psychological stress on facial expressions. We measured
psychological stress values using a stress checking sheet
while taking facial expression images together. As an
application of our study, we estimate the stress levels
using FESCs. We used SVMs [18], which have high
recognition capability, for mapping input data to a high
dimensional space using kernel tricks.
Kernel function K uses the polynomial kernel, the
Radial Basis Function (RBF), and the Sigmoid kernel,
etc. For this study, we used RBF defined as
||x − xi ||2
),
(7)
λ
where λ is the variance of RBF. The property of the
Kernel differs in the setting of λ. Therefore, we evaluate
our method using results to change in a certain range.
K(x, xi ) = exp(−
V. DATASETS
For this study, we created an original dataset dealing
with long-term facial expression changes. Additionally,
for analyzing psychological effects of temporal changes
of ELs, we measured psychological stress levels using
a stress sheet after taking facial expression images each
time.
A. Facial expression dataset
Human show facial expressions of two types: spontaneous facial expressions and intentional facial expressions. Taking a steady and long-term dataset without
regard to a camera and a situation is a challenging task,
although spontaneous facial expressions present the advantage of corresponding directly to affection or emotion.
Moreover, the cause-and-effect relation of facial expressions from emotions is uncertain. In contrast, intentional
facial expressions are used as a communication method
to communicate something positively to other person,
especially in social communication. We set a target to
create an original international facial expression datasets
to obtain a long-term facial expression dataset for selected
subjects. Moreover, intentional facial expression datasets
are suitable to keep the number of subjects as a horizontal
dataset.
Open datasets of facial expression images are released
from some universities and research institutes to be used
generally in many conventional studies for performance
comparisons of facial expression recognition or automatic
analysis of facial expressions [1]. However, the specifications vary in each dataset, and among datasets. As static
© 2012 ACADEMY PUBLISHER
319
facial images, the dataset presented by Ekman and Friesen
is a popular dataset comprising collected various facial
expressions used for visual stimulation in psychological
examinations of facial expression cognition. As dynamic
facial images, the Cohn–Kanade dataset and the Ekman–
Hager dataset are widely used, especially in experimental applications [21]. In recent years, the MMI Facial
Expression Database presented by Pantic et al. [22] has
become a widely used open dataset containing both static
and dynamic images. These dynamic datasets contain
a sufficient number of subjects as a horizontal dataset.
However, images are taken only once for each person. No
dataset exists in which the same subject has been traced
over a long term.
According to the mental status, circumstances, and
context of a subject, facial expression patterns differ each
week, even for the same subject. We set a long term
to obtain facial expression images and thereby create
individual models of FESCs and for use in estimating
stress levels in each subject. Existing methods created
an integrated model for classification and recognition
of facial expressions, although expression patterns were
affected by individual differences. We create individual
models in each subject to extract ELs for creating FESCs
and to estimate stress levels using each FESC. For development of individual models, we create long-term facial
expression datasets compiled with information gathered
during periods as long as 20 weeks.
B. Target facial expressions
In our daily life, we frequently observe mixed facial
expressions rather than a single facial expression. We
believe that the estimation accuracy can be improved if
mixed facial expression datasets are used. Considering
the load for subjects, it is a challenging task to collect
numerous samples of mixed facial expressions. Moreover,
the mixture ratio of facial expressions remains unclear.
Therefore, the target of this study is particular facial
expressions of three types.
Reducing the load to subjects for taking long-term
facial expression images, we selected target facial expressions from the six basic facial expressions described
by Ekman. As a cultural factor of expression, Ekman
reported that Japanese people express a smile even at
times when they feel disgust [23]. We considered that this
is a difficult facial expression used by Japanese people.
Regarding fear, we received opinions that it is extremely
difficult to express poses of fear because it arises only
from rare situations that are not usually encountered in
daily life. For surprise, many subjects, especially men,
feel embarrassed, although they do not feel difficulty in
expressing surprise. Moreover, mixed facial expressions
with happiness are apparent. Considering these restrictions, opinions from subjects, and the assignment of facial
expressions on Russell’s circumplex model, we selected
happiness, anger, and sadness as target facial expressions.
320
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
• relax.
Subjects were not trained using FACS. We agreed with
participants that facial expression images shall not be used
except for this research. We did not explain details of this
study or datasets used for stress level estimation.
80 pixels
D. Stress measurements
sl
ex
ip
09
(a) Acquired images (200 frames)
(b) ROI
Figure 5. Set of target images and regions of interest.
C. Acquisition of facial expression images
We took images of three facial expressions with 10
subjects over a long term. The terms of taking images
differed among subjects, but images were taken during
7–20 weeks at one-week intervals. Details of subjects are
five females (Subjects A, B, C, and D were 19; Subject
E was 21) and five males (Subjects F and J were 19;
Subjects G, H, and I were 22) university students.
We began to take facial expression images when a subject became accustomed to the experimental environment
after some trials. Considering generality and usability, we
used a USB camera (Qcam, Logicool; Logitec Corp.). We
set the environment to simulate a normal indoor condition
(lighting by fluorescent lamps). We took frontal facial
images to include the region containing facial components
such as the eyebrows, eyes, nose, and mouth. We previously indicated to subjects to restrain the head position
as much as possible. The images were fit to the constant
range including the facial region. We used a method using
Haar-like features and Boosting for tracking a face region
to adjust the centers of images [24].
Fig. 5 depicts captured images. We set the region of
interest to 80 × 90 pixels including the eyebrows, eyes,
nose, mouth, cheeks, and jaw, which all contribute to
the impression of a whole face as facial feature components. One set of datasets consisted of facial expression
image sequences with neutral facial expression and each
facial expression to be indicated. As an assumption, we
instructed subjects to express an emotion 3–4 times during
the image-taking time of 20 s. One set of data consisted
of 200 frames with the sampling rate of 10 frames per
second.
As conditions of this experiment, we previously explained to all subjects the following preferred procedures:
•
•
•
•
•
show posed facial expressions to the camera;
avoid movement of the head as much as possible;
repeat expressions three times during the acquisition
time of 20 s;
make a maximum facial expression;
return to the neutral facial expression in each interval; and
© 2012 ACADEMY PUBLISHER
For this study, we used stress measurement sheets
known as the Stress Response Scale 18 (SRS-18) by
Suzuki et al. [25]. The SRS-18 comprises question sheets
that can measure responses related to psychological stress
easily in a short time and record many that we meet in
our daily life. Specific psychological stress responses are
gloom, anxiety, and anger (emotional responses), lethargy
and difficulty concentrating (cognitive responses), decreased efficiency of work (behavioral responses), etc.
caused by stressors. This sheet can measure stress responses according to three factors: dysphoria or anxiety,
displeasure or anger, and lassitude. The SRS-18 has 18
questions that can elicit answers of four types: strongly
no, no, yes, and strongly yes. The scores for answers
correspond respectively to zero to three points. The range
of total points is 0–54 points. A high total score indicates
a high level of psychological stress. Moreover, four grades
of Level 1 (weak), Level 2 (normal), Level 3 (slightly
high), and Level 4 (high) are classified from the points.
Subjects complete this sheet before taking facial expression images to reduce the effect from stress checking
results.
Depending on the subject, we consider that subjective
factors affect estimation accuracy because SRS-18 is
based on subjective assessments. In this study, we used
a salivary amylase monitor produced by Nipro Corp. to
measure stress values as inferred from activated salivary
amylase. Kashiwano et al. [26] portrays the usability of
stress measurements and evaluation using a salivary amylase monitor. For this measurement, subjects are prevented
from eating food for two hours beforehand. During the
measurement phase, subjects hold a chip in the mouth
to take salivary measurements for about 30 s. The load
for subjects in this study is very high. Furthermore, the
salivary amylase monitor has sensitive response to temporal stress and presents a wide variation of measurement
results. In contrast, SRS-18, which consists of only 18
terms as questions, can be completed in a short time.
Moreover, SRS-18 can be used to assess various body
responses. We consider that SRS-18 is a useful check
sheet used in our experiment.
VI. E XPERIMENTAL RESULTS
In this section, we present results of each step to
create an FESC to apply to one subject. Subsequently, we
present results of FESCs for application to 10 subjects.
A. Results of an FESC (Subject A)
For corresponding to ELs, we extracted categories
classified with SOMs and integrated with Fuzzy ART. The
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
#2
#1
321
#3
10
lse 9
evL 87
oni 65
ss 4
er 3
px 2
E1
0
10
20
30
40
50
60
70
80
90
100 110
120 130
(a) Happiness
170 180
190 200
10
10
5
5
0
0
Anger
#4
10
20
30
40
50
60
70
80
90
Hapiness
15
10
10
5
5
0
0
(b) Angry
#2
#1
10
lse 9
evL 8
7
no 6
sis 5
er 4
px 3
E21
Anger
10
20
30
40
50
60
70
80
90
#4
100 110 120 130 140 150 160 170 180 190 200
(c) Sadness
Anger
(d) 4th week
[frames]
#3
Hapiness
15
Hapiness
15
10
10
5
5
0
0
Sad.
0
Sad.
(c) 3rd week
100 110 120 130 140 150 160 170 180 190 200
Anger
(b) 2nd week
Hapiness
15
Sad.
0
Sad.
(a) 1st week
[frames]
#3
#2
140 150 160
10
lse 9
evL 87
oni 65
ss
er 4
px 3
E21
0
Hapiness
15
Sad.
0
#1
0
Hapiness
15
[frames]
Anger
(e) 9th week
Sad.
Anger
(f) 20th week
Figure 7. FESC (Subject A).
Figure 6. Transition of ELs in each facial expression (Subject A at
ninth week).
category that contains the maximum number of images
is selected to the neutral facial expression category. For
example, in Fig. ??, the sixth category that corresponds
to 60 images is the base category. Each category is put in
correspondence with ELs to calculate correlation among
categories and to sort to small values of correlation.
We evaluated temporal changes of ELs to verify the
correspondence of ELs and facial expressions. Fig. 6
presents results of temporal changes of ELs of happiness,
anger, and sadness. The horizontal axis depicts the frames
that consist of 200 frames in each dataset. The vertical
axis depicts ELs. We marked the dashed vertical lines to
the start and terminal positions of expression. The subject
showed expressions of three or four times during one
dataset. In this dataset of Subject A at the ninth week,
happiness is expressed three times; anger and sadness
are expressed four times. Start and terminal timings of
expression are represented as changes of ELs. The ELs
are changed according to the expressions, although the
result contains slight variation.
Fig. 7 depicts some examples of FESCs. The FESCs
show temporal changes of facial expression patterns that
changed in each week in the same subject. We consider
that the changes are attributable to psychological effects.
In the next section, we will analyze these results with
stress that is assessed as measured using the SRS-18.
B. Results of FESCs (10 subjects)
We created FESCs for 10 subjects. The terms of taking
images differ among subjects for the 7–20 weeks. Fig.
8 portrays average FESCs of 10 subjects. As the facial
expression of happiness in all average ELs of all subjects,
© 2012 ACADEMY PUBLISHER
the maximum value is 9.1 in Subject J. The minimum
value is 5.6 in Subject G. As facial expressions of anger,
Subjects E and F respectively show the maximum and
minimum ELs. As facial expressions of sadness, Subjects
C and J respectively show the maximum and the minimum
ELs. The triangle of the FESC of Subject J is smaller than
those of other subjects.
As sexual differences, Kring et al. [27] portray that
emotional expressions with facial expressions of women
are richer than those of men. In our experience, most
female subjects were not averse to showing intense facial
expressions in front of a camera. We inferred that male
subjects were shy or not good at making facial expressions. This trend appears on FESC as the size of triangles.
VII. E STIMATION OF STRESS LEVELS
The degree of actual facial expressions is modified
by various types of psychological effects, a situation,
atmosphere, etc., although spontaneous and intentional
facial expressions are triggered by emotional changes and
intentional social restrictions, such as when one makes
a fake smile. In this study, we have acquired facial
expression images continually during a long period in an
identical situation. The FESCs show various distributions
in each week. Therefore, the ELs that show the degrees
of expressions in our method differ each week. In this
experiment, we specifically examine the effect between
expressions and stress from psychology for estimating
stress levels from FESCs.
We used SVMs [18], which have high recognition
capability, for mapping input data to a high dimensional
space using kernel tricks. We evaluated estimation rates
using Leave-One-Out Cross Validation (LOOCV). The
estimation targets are stress evaluation values of four
322
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Hapiness
15
Hapiness
15
10
10
5
5
0
0
Sad.
Anger
(a) Subject A (20 weeks)
Anger
(b) Subject B (12 weeks)
Hapiness
15
Hapiness
15
20
10
10
10
5
5
0
0
0
Sad.
Anger
(c) Subject C (8 weeks)
Sad.
Anger
(d) Subject D (7 weeks)
Hapiness
15
Hapiness
15
10
10
5
5
0
0
Sad.
Anger
(e) Subject E (7 weeks)
Sad.
Anger
(f) Subject F (11 weeks)
Hapiness
15
Hapiness
15
10
10
5
5
0
0
Sad.
Anger
(g) Subject G (13 weeks)
Sad.
Sad.
] 100
[% 90
yc
rau 80
cc 70
a 60
onit
a 50
m
tis 40
E 30
Sad.
Anger
(h) Subject H (13 weeks)
Hapiness
15
Hapiness
15
10
10
5
5
0
0
Anger
(i) Subject I (10 weeks)
Sad.
Anger
(j) Subject J (7 weeks)
Figure 8. FESC of 10 subjects (Subjects A–E, male; Subjects F–J,
female).
steps: Level 1 (weak), Level 2 (normal), Level 3 (slightly
high), and Level 4 (high). Herein, the distribution of target
datasets is 48.6% of Level 2, 33.6% of Level 1, 13.1%
of Level 3, and 4.7% of Level 1. We used Radial Basis
Functions (RBFs) as a kernel function for SVMs. We set
the γ parameter, which controls the distribution of Gauss
functions, to the inverse number of input vectors.
Fig. 9 portrays stress estimation results of 10 subjects. The mean estimation rate is 68.6%. The highest
estimation rate is 90.9% of Subject B. Subsequently, the
estimation rates of Subjects G and F are, respectively,
84.6% and 81.8%. The estimation rate of Subject A, who
had facial expression images taken the most times, is
80.0%. In contrast, the estimation rates of Subjects C and
I are the same: 50.0%. The lowest estimation rate is 42.9%
of Subject J. The data lengths of Subjects J and C are,
respectively, only seven and eight weeks. We consider that
this is a difficult problem for SVMs to create a classifier of
© 2012 ACADEMY PUBLISHER
90.9
81.8 84.6
80.0
71.4
57.1
50.0
A
B
C
76.9
D
E
50.0
F
G
H
I
42.9
J
Subject
Figure 9. Stress estimation results with SVM
four categories from such few data. Therefore, we selected
the datasets of these subjects for more than 10 weeks. The
mean estimation rate of Subjects A, B, F, G, H, and I is
77.4%. We consider that the estimation performance will
be improved if long-term datasets of more than 10 weeks
were obtained to continue to obtain vertical datasets.
Using our method, we achieved efficient estimation of
stress levels, although we used SVMs under the condition
of disproportionate training data distribution. We evaluated all datasets using LOOCV. The number for datasets
for each stress level is various. The number for datasets
of Level 2 is the largest: about 50%. This rate reaches
80% when including the number of datasets of Level 1.
Moreover, five patterns of datasets were produced, which
correspond to four subjects; one set of data consisted of
stress levels. Six patterns of five subjects produced only
two samples. We used all datasets of these few samples
without exception, although it is difficult to learn and to
estimate these samples using conventional generalization
capabilities. To collect these data evenly is a challenging
task because stress distributions vary among individuals.
We consider that estimation performance will be improved
to increase the terms of image acquisition, enabling us to
address seasonal transformations.
VIII. C ONCLUSION
This paper presents FESCs as a framework to describe
individual facial expression spaces based on the consideration of facial expressions created by emotion as an
individual space in each person. The ELs are created by
categories that are classified by SOMs and integrated with
Fuzzy ART. The FESCs are created with the axes of ELs
of three facial expressions (happiness, anger, and sadness)
based on Russell’s circumplex model. We created an
original facial expression dataset of 10 subjects (five male
subjects and 5 female subjects) for seven weeks. Using
this dataset, our method can express individual facial
expression spaces using FESCs. Moreover, we used SRS18 for measuring the stress levels of each subject before
taking images. We analyzed the effects of psychological
stress using FESCs. The results show that happiness and
sadness are affected by stress in most subjects.
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Future studies must evaluate intentional and spontaneous facial expressions for discrimination using symmetry properties of the horizontal direction to represent facial
expression rhythms created by individual patterns of time
changes of ELs. Moreover, we will seek to increase the
number of subjects for horizontal studies between subjects
and to capture long-term datasets for vertical studies in
each subject to analyze and to elucidate relations between
facial expressions and stress.
ACKNOWLEDGMENT
The authors would like to thank fellow engineers at
SmartDesign Co. Ltd. for providing much appreciated
advice and for developing a system for taking facial
images for these experiments. We also thank the 10
students at our university who participated as subjects by
letting us take facial images over such a long period.
R EFERENCES
[1] M. Pantic and L.J.M. Rothkrantz, “Automatic Analysis of
Facial Expressions: The State of the Art,” IEEE Trans. on
Pattern Anal. and Mach. Intell., vol. 22, no. 12, pp.14241445, Dec. 2000.
[2] S. Akamatsu, “Recognition of Facial Expressions by Human and Computer [I]: Facial Expressions in Communications and Their Automatic Analysis by Computer,” The
Journal of the Institute of Electronics, Information, and
Communication Engineers, vol. 85, no. 9, Sep. 2002, pp.
680-685.
[3] T. Kohonen, “Self-organized formation of topologically
correct feature maps,” Biological Cybernetics, vol. 43, no.
1, pp. 59–69, 1982.
[4] M. Pantic and I. Patras, “Dynamics of Facial Expression:
Recognition of Facial Actions and Their Temporal Segments From Face Profile Image Sequences,” IEEE Trans.
Syst., Man, Cybern. B, Cybern., vol. 36, no. 22, pp.433449, April 2006.
[5] J.N. Bassili, “Facial motion in the perception of faces
and of emotional expression,” Journal of Experimental
Psychology: Human Perception and Performance, vol. 4,
no. 3, pp. 373-379, 1978.
[6] H. Ohta, H. Saji, and H. Nakatani, “Recognition of Facial
Expressions Using Muscle-Based Feature Models,” Trans.
of the Institute of Electronics, Information and Communication Engineers, D-II, vol. J82-DII, no. 7, July 1999, pp.
1129-1139.
[7] M. Nishiyama, H. Kawashima, T. Hirayama, and T. Matsuyama, “Facial Expression Representation based on Timing Structures in Faces”, IEEE International Workshop on
Analysis and Modeling of Faces and Gestures, pp. 140154, 2005.
[8] M. Oda and K. Isono, “Effects of time function and
expression speed on the intensity and realism of facial
expressions,” Proc. IEEE International Conference on Systems, Man and Cybernetics, pp. 1103–1109, 2008.
[9] P. Yang, Q. Liu, and D. N. Metaxas, “RankBoost with
l1 regularization for facial expression recognition and
intensity estimation,” Proc. IEEE International Conference
on Computer Vision, pp. 1018–1025, 2009.
[10] K.K. Lee and Y. Xu, “Real-time estimation of facial
expression intensity,” Proc. IEEE International Conference
on Robotics and Automation, vol. 2, pp. 2567–2572, 2003.
[11] M.A. Amin and H. Yan, “Expression intensity measurement from facial images by self organizing maps,” Proc.
International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3490–3496, 2008.
© 2012 ACADEMY PUBLISHER
323
[12] S. Chin and K.Y. Kim, “Emotional Intensity-based Facial Expression Cloning for Low Polygonal Applications,”
IEEE Transactions on Systems, Man, and Cybernetics
(Part C), vol. 39, no. 3, pp. 315–330, 2009.
[13] J.J. Lien, T. Kanade, J.F. Cohn, and C.C. Li, “Subtly different facial expression recognition and expression intensity
estimation,” Proc. IEEE Computer Vision and Pattern
Recognition, pp. 853–859, 1998.
[14] J.R. Delannoy and J. McDonald, “Automatic estimation
of the dynamics of facial expression using a three–level
model of intensity,” Proc. IEEE International Conference
on Automatic Face & Gesture Recognition, pp. 1-6, 2008.
[15] L. Yin, X. Wei, P. Longo, and A. Bhuvanesh, “Analyzing
Facial Expressions Using Intensity-Variant 3D Data For
Human Computer Interaction,” Proc. International Conference on Pattern Recognition, vol. 1, pp. 1248–1251, 2006.
[16] J. Reilly, J. Ghent, and J. McDonald, “Non-Linear Approaches for the Classification of Facial Expressions at
Varying Degrees of Intensity,” Proc. Machine Vision and
Image Processing Conference, pp. 125–132, 2007.
[17] J.A. Russell and M. Bullock, “Multidimensional Scaling of Emotional Facial Expressions: Similarity From
Preschoolers to Adults,” Journal of Personality and Social
Psychology, vol. 48, 1985, pp. 1290-1298.
[18] V. Vapnik, Statistical Learning Theory, Wiley, 1998.
[19] G.A Carpenter and S. Grossberg, Pattern Recognition by
Self-Organizing Neural Networks, The MIT Press, 1991.
[20] G.A Carpenter, S. Grossberg, and D.B. Rosen, “Fuzzy
ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system,” Neural Networks,
no. 4, pp. 759-771, 1991.
[21] T. Kanade, J.F. Cohn, and Y. Tian,“Comprehensive
database for facial expression analysis,” Proc. Face and
Gesture, pp. 46-53, 2000.
[22] M. Pantic, M.F. Valstar, R. Rademaker and L. Maat, “Webbased Database for Facial Expression Analysis”, Proc.
IEEE Int’l. Conf. Multimedia and Expo, July 2005.
[23] P. Ekman, “Emotions Revealed: Understanding Faces and
Feelings,” Orion Publishing, 2004.
[24] P. Viola and M.J. Jones, “Rapid Object Detection using
a Boosted Cascade of Simple Features,” Proc. Computer
Vision and Pattern Recognition, pp. 511-518, 2001.
[25] S. Suzuki, Stress Response Scale-18, Kokoronet, Jul. 2007.
[26] A. Kashino, H. Kashiwa, and K, Suemori, “Stress evaluation of clinical laboratory technologists by the salivary
amylase monitor on working time,” The medical journal
of Kawasaki Hospital, vol. 5, pp. 25-27, 2010.
[27] A.M. Kring and A.H. Gordon, “Sex Differences in Emotion: Expression, Experience, and Physiology,” Journal of
Personality and Social Psychology, vol. 74, pp. 686-703,
1998.
Hirokazu Madokoro received the ME degree in information
engineering from Akita University in 2000 and joined Matsushita Systems Engineering Corporation. He moved to Akita
Prefectural Industrial Technology Center and Akita Research
Institute of Advanced Technology in 2002 and 2006, respectively. He received the PhD degree from Nara Institute of
Science and Technology in 2010. He is currently an assistant
professor at the Department of Machine Intelligence and Systems Engineering, Akita Prefectural University. His research
interests include in machine learning and robot vision. He is
a member of the Robotics Society of Japan, the Japan Society
for Welfare Engineering, the Institute of Electronics, Information
and Communication Engineers, and the IEEE.
324
Kazuhito Sato received the ME degree in electrical engineering
from Akita University in 1975 and joined Hitachi Engineering Corporation. He moved to Akita Prefectural Industrial
Technology Center and Akita Research Institute of Advanced
Technology in 1979 and 2005, respectively. He received the
PhD degree from Akita University in 1997. He is currently an
associate professor at the Department of Machine Intelligence
and Systems Engineering, Akita Prefectural University. He is
engaged in the development of equipment for noninvasive inspection of electronic pats, various kinds of expert systems, and
MRI brain image diagnostic algorithms. His current research
interests include in biometrics, medical image processing, facial
expression analysis, computer vision. He is a member of the
Medical Information Society, the Medical Imaging Technology
Society, the Japan Society for Welfare Engineering, the Institute
of Electronics, Information and Communication Engineers, and
the IEEE.
© 2012 ACADEMY PUBLISHER
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
Call for Papers and Special Issue Proposals
Aims and Scope.
Journal of Multimedia (JMM, ISSN 1796-2048) is a scholarly peer-reviewed international scientific journal published bimonthly, focusing on
theories, methods, algorithms, and applications in multimedia. It provides a high profile, leading edge forum for academic researchers, industrial
professionals, engineers, consultants, managers, educators and policy makers working in the field to contribute and disseminate innovative new work
on multimedia.
The Journal of Multimedia covers the breadth of research in multimedia technology and applications. JMM invites original, previously
unpublished, research, survey and tutorial papers, plus case studies and short research notes, on both applied and theoretical aspects of multimedia.
These areas include, but are not limited to, the following topics:
•
•
•
•
•
•
•
Multimedia Signal Processing
Multimedia Content Understanding
Multimedia Interface and Interaction
Multimedia Databases and File Systems
Multimedia Communication and Networking
Multimedia Systems and Devices
Multimedia Applications
JMM EDICS (Editors Information Classification Scheme) can be found at http://www.academypublisher.com/jmm/jmmedics.html.
Special Issue Guidelines
Special issues feature specifically aimed and targeted topics of interest contributed by authors responding to a particular Call for Papers or by
invitation, edited by guest editor(s). We encourage you to submit proposals for creating special issues in areas that are of interest to the Journal.
Preference will be given to proposals that cover some unique aspect of the technology and ones that include subjects that are timely and useful to the
readers of the Journal. A Special Issue is typically made of 10 to 15 papers, with each paper 8 to 12 pages of length.
The following information should be included as part of the proposal:
•
•
•
•
•
•
•
Proposed title for the Special Issue
Description of the topic area to be focused upon and justification
Review process for the selection and rejection of papers.
Name, contact, position, affiliation, and biography of the Guest Editor(s)
List of potential reviewers
Potential authors to the issue
Tentative time-table for the call for papers and reviews
If a proposal is accepted, the guest editor will be responsible for:
•
•
•
•
•
Preparing the “Call for Papers” to be included on the Journal’s Web site.
Distribution of the Call for Papers broadly to various mailing lists and sites.
Getting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors. Authors should be
informed the Instructions for Authors.
Providing us the completed and approved final versions of the papers formatted in the Journal’s style, together with all authors’ contact
information.
Writing a one- or two-page introductory editorial to be published in the Special Issue.
Special Issue for a Conference/Workshop
A special issue for a Conference/Workshop is usually released in association with the committee members of the Conference/Workshop like
general chairs and/or program chairs who are appointed as the Guest Editors of the Special Issue. Special Issue for a Conference/Workshop is
typically made of 10 to 15 papers, with each paper 8 to 12 pages of length.
Guest Editors are involved in the following steps in guest-editing a Special Issue based on a Conference/Workshop:
•
•
•
•
•
•
•
Selecting a Title for the Special Issue, e.g. “Special Issue: Selected Best Papers of XYZ Conference”.
Sending us a formal “Letter of Intent” for the Special Issue.
Creating a “Call for Papers” for the Special Issue, posting it on the conference web site, and publicizing it to the conference attendees.
Information about the Journal and Academy Publisher can be included in the Call for Papers.
Establishing criteria for paper selection/rejections. The papers can be nominated based on multiple criteria, e.g. rank in review process plus
the evaluation from the Session Chairs and the feedback from the Conference attendees.
Selecting and inviting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors.
Authors should be informed the Author Instructions. Usually, the Proceedings manuscripts should be expanded and enhanced.
Providing us the completed and approved final versions of the papers formatted in the Journal’s style, together with all authors’ contact
information.
Writing a one- or two-page introductory editorial to be published in the Special Issue.
More information is available on the web site at http://www.academypublisher.com/jmm/.