Contents | Zoom in | Zoom out Search Issue | Next Page For

Transcription

Contents | Zoom in | Zoom out Search Issue | Next Page For
Contents
|
Zoom in
|
Zoom out
For navigation instructions please click here
Search Issue
|
Next Page
|
Zoom out
For navigation instructions please click here
Search Issue
|
Next Page
___________
Contents
|
Zoom in
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
_________
____________________
Digital Object Identifier 10.1109/MCI.2013.2247904
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
M
q
M
q
MQmags
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
THE WORLD’S NEWSSTAND®
Volume 8 Number 2 ❏ May 2013
www.ieee-cis.org
Features
20
34
50
63
Learning Deep Physiological Models of Affect
by Héctor P. Martínez, Yoshua Bengio, and Georgios N. Yannakakis
Fuzzy Logic Models for the Meaning of Emotion Words
by Abe Kazemzadeh, Sungbok Lee, and Shrikanth Narayanan
Modeling Curiosity-Related Emotions for Virtual Peer Learners
by Qiong Wu and Chunyan Miao
Goal-Based Denial and Wishful Thinking
by César F. Pimentel and Maria R. Cravo
Column
77
on the cover
©ISTOCKPHOTO.COM/YANNIS
___________ NTOUSIOPOULOS
Departments
2
3
4
_____
Book Review
by Gouhei Tanaka
Editor’s Remarks
President’s Message
by Marios M. Polycarpou
Society Briefs
Newly Elected CIS
Administrative Committee
Members (2013–2015)
by Marios M. Polycarpou
IEEE Fellows—Class of 2013
by Erkki Oja
IEEE CIS GOLD Report:
Inaugural Elevator Pitch
Competition and Other
GOLD Activities
by Heike Sichtig, Stephen G.
Matthews, Demetrios G. Eliades,
Muhammad Yasser, and
Pablo A. Estévez
12
15
17
80
Publication Spotlight
by Derong Liu, Chin-Teng Lin,
Garry Greenwood, Simon Lucas,
and Zhengyou Zhang
Conference Report
A Report on the
IEEE Life Sciences Grand
Challenges Conference
by Gary B. Fogel
Guest Editorial
Special Issue on
Computational Intelligence
and Affective Computing
by Dongrui Wu
and Christian Wagner
Conference Calendar
by Gary B. Fogel
IEEE Computational Intelligence Magazine (ISSN 1556-603X) is published quarterly by The Institute of Electrical and Electronics Engineers, Inc. Headquarters: 3 Park Avenue, 17th Floor, New York, NY 10016-5997, U.S.A. +1 212 419 7900. Responsibility for the contents rests upon the authors and not upon the
IEEE, the Society, or its members. The magazine is a membership benefit of the IEEE Computational Intelligence Society, and subscriptions are included in Society
fee. Replacement copies for members are available for $20 (one copy only). Nonmembers can purchase individual copies for $163.00. Nonmember subscription
prices are available on request. Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy
beyond the limits of the U.S. Copyright law for private use of patrons: 1) those post-1977 articles that carry a code at the bottom of the first page, provided the percopy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01970, U.S.A.; and 2) pre-1978 articles without
fee. For other copying, reprint, or republication permission, write to: Copyrights and Permissions Department, IEEE Service Center, 445 Hoes Lane, Piscataway NJ
08854 U.S.A. Copyright © 2013 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Periodicals postage paid at New York, NY and at
additional mailing offices. Postmaster: Send address changes to IEEE Computational Intelligence Magazine, IEEE, 445 Hoes Lane, Piscataway, NJ 08854-1331
U.S.A. PRINTED IN U.S.A. Canadian GST #125634188.
Digital Object Identifier 10.1109/MCI.2013.2247812
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
1
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
CIM Editorial Board
Editor-in-Chief
Kay Chen Tan
National University of Singapore
Department of Electrical and Computer
Engineering
4 Engineering Drive 3
SINGAPORE 117576
(Phone) +65-6516-2127
(Fax) +65-6779-1103
(E-mail) [email protected]
____________
Founding Editor-in-Chief
Gary G.Yen, Oklahoma State University, USA
Editors-At-Large
Piero P. Bonissone,
General Electric Global Research, USA
David B. Fogel, Natural Selection Inc., USA
Vincenzo Piuri, University of Milan, ITALY
Marios M. Polycarpou, University of Cyprus,
CYPRUS
Jacek M. Zurada, University of Louisville, USA
Associate Editors
Hussein Abbass, University of New South
Wales, AUSTRALIA
Cesare Alippi, Politecnico di Milano, ITALY
Oscar Cordón, European Centre for
Soft Computing, SPAIN
Pauline Haddow, Norwegian University
of Science and Technology, NORWAY
Hisao Ishibuchi, Osaka Prefecture University,
JAPAN
Yaochu Jin, University of Surrey, UK
Jong-Hwan Kim, Korea Advanced Institute of
Science and Technology, KOREA
Jane Jing Liang, Zhengzhou University, CHINA
Chun-Liang Lin, National Chung
Hsing University,TAIWAN
Yew Soon Ong, Nanyang Technological
University, SINGAPORE
Ke Tang, University of Science and Technology
of China, CHINA
Chuan-Kang Ting, National Chung Cheng
University,TAIWAN
Slawo Wesolkowski, DRDC, CANADA
Jun Zhang, Sun Yat-Sen University, CHINA
Kay Chen Tan
National University of Singapore, SINGAPORE
Editor’s
Remarks
It’s Just “Emotions” Has Taken Over…
I
t is believed that the main difference between a machine and
the human operating it is the latter’s sense of feeling, pervasiveness and ability to understand rather than to process. Often we
may be amused by the smartphone’s speech recognition ability
(For example, we said: “Define perception” and the phone comes up with
“Are you asking about “The Fine Person”?”) or get frustrated by it
(we said: “Call Billy White, not Lily is white!”). As much as technology has tried to emulate humans, there is still so much to be done before it can
even come anywhere close to the complexities of the human; emotions and affect
are examples of the gravity of this gap. To date, the high complexities of the human
mind and emotions continue to baffle researchers and scientists alike.
Affective computing is a recent phenomenon popularized by MIT’s Professor
Rosalind Picard. It has led the way for us at the IEEE to develop systems that possess
the capabilities to recognise, interpret, process and simulate human emotions. These
systems can then be incorporated into machines that enable interaction with human
subjects for various purposes including psychological analysis and educational assistance. Can you imagine a future filled with smart phones or even smart cars that can
sense and detect our moods by the tone of our voice or body gestures such that music
and/or encouraging quotes that cheer us up can be automatically selected and recommended? How about having computers that are capable of recognizing students’ state
of mind through their body gestures and as such adapt accordingly to enhance the
learning experience? Or pushing further, how about affective marketing where an
online shopping experience is modulated based on your emotions detected from your
facial features? This is certainly part of what research in affective computing aspires
towards achieving—thus making life more fulfilling for everyone!
IEEE Periodicals/
Magazines Department
Associate Editor
Laura Ambrosio
Senior Art Director
Janet Dudar
Assistant Art Director
Gail A. Schnitzer
Production Coordinator
Theresa L. Smith
Business Development Manager
Susan Schneiderman
Advertising Production Manager
Felicia Spagnoli
Production Director
Peter M. Tuohy
Editorial Director
Dawn Melley
Staff Director, Publishing Operations
Fran Zappulla
IEEE prohibits discrimination, harassment, and bullying.
For more information, visit http://www.ieee.org/web/
aboutus/whatis/policies/p9-26.html.
_______________
Digital Object Identifier 10.1109/MCI.2013.2247813
2
Delighted participants of the Ninth International Conference on Simulated Evolution and Learning (SEAL), Hanoi, December 2012.
Digital Object Identifier 10.1109/MCI.2013.2247814
Date of publication: 11 April 2013
(continued on page 11)
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
M
q
M
q
MQmags
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
THE WORLD’S NEWSSTAND®
President’s
Message
Marios M. Polycarpou
University of Cyprus, CYPRUS
Computational Intelligence
in the Undergraduate Curriculum
C
omputational intelligence is at the heart of many new
technological developments. For example, recently there
are a lot of deliberations, even in popular media such as
The New York Times, about the need to handle Big Data. This is
an area that the industry is particularly interested in, with huge
potential in terms of creation of new jobs. Computational intelligence has a key role to play in some vital aspects of Big Data,
namely the analysis, visualization and real-time decision making of Big Data.
Other emerging areas where computational intelligence will play a key role
include human-computer interaction, optimization in large-scale systems, naturedinspired computing, Internet-of-Things, etc.
For computational intelligence to become an integral component in new
technological enterprises, it is crucial that graduating engineers and computer
scientists are familiar with computational intelligence methods. Ever since
I started my appointment as President of the IEEE Computational Intelligence
Society, I have been promoting the need for an introductory course in computational intelligence for students graduating with a degree in Electrical/Electronic
Engineering, Computer Engineering, Computer Science, and possibly other
related fields. My long-term vision for such a course is based on the idea that it
will include not only specific techniques, such as neural network computing,
fuzzy logic and evolutionary computation, but more importantly it will provide
the students with the fundamental knowledge and motivation for computational
intelligence and provide application examples that will explain the practical use
of computational intelligence in real-world applications. Naturally, just one
introductory course is not enough to cover everything that a student needs to
know in computational intelligence, however it plants the seed for further study
and familiarizes the student with the importance of computational intelligence
in new technological developments.
It is my belief that similar to the need for graduating electrical engineers to have
taken at least one course in topics such as communications, signal processing and
automation and control, there is also the need to take a corresponding introductory
course in computational intelligence. Of course, this will not happen overnight and
it will require a major effort by academic researchers in the area of computational
intelligence. It will also require the development of new textbooks with a holistic
view of computational intelligence. However, the addition of an introductory computational intelligence course in the standard undergraduate curriculum will offer a
new dimension to the field and it will serve the graduating engineers and computer
scientists with knowledge and skills that are essential in new technological advances.
The time is mature to pursue this ambitious goal!
Digital Object Identifier 10.1109/MCI.2013.2247815
Date of publication: 11 April 2013
CIS Society Officers
President – Marios M. Polycarpou,
University of Cyprus, CYPRUS
President Elect – Xin Yao,
University of Birmingham, UK
Vice President – Conferences- Gary B. Fogel,
Natural Selection, Inc., USA
Vice President – Education- Cesare Alippi,
Politecnico di Milano, ITALY
Vice President – Finances- Enrique H.
Ruspini, SRI International, USA
Vice President – Members ActivitiesPablo A. Estevez, University of Chile, CHILE
Vice President – Publications- Nikhil R. Pal,
Indian Statistical Institute, INDIA
Vice President – Technical Activities- Hisao
Ishibuchi, Osaka Prefecture University, JAPAN
Publication Editors
IEEE Transactions on Neural Networks
and Learning Systems
Derong Liu, University of Illinois, Chicago,
USA
IEEE Transactions on Fuzzy Systems
Chin-Teng Lin, National Chiao Tung
University,TAIWAN
IEEE Transactions on Evolutionary Computation
Garrison Greenwood, Portland
State University, USA
IEEE Transactions on Computational
Intelligence and AI in Games
Simon Lucas, University of Essex, UK
IEEE Transactions on Autonomous
Mental Development
Zhengyou Zhang, Microsoft Research, USA
Administrative Committee
Term ending in 2013:
Bernadette Bouchon-Meunier,
University Pierre et Marie Curie, FRANCE
Janusz Kacprzyk, Polish Academy of Sciences,
POLAND
Simon Lucas, University of Essex, UK
Luis Magdalena, European Centre for
Soft Computing, SPAIN
Jerry M. Mendel, University of Southern
California, USA
Term ending in 2014:
Pau-Choo (Julia) Chung, National Cheng
Kung University,TAIWAN
David B. Fogel, Natural Selection Inc., USA
Yaochu Jin, University of Surrey, UK
James M. Keller, University of
Missouri-Columbia, USA
Jacek M. Zurada, University of Louisville, USA
Term ending in 2015:
James C. Bezdek, University of Melbourne,
AUSTRALIA
Piero P. Bonissone, General Electric Co., USA
Jose C. Principe, University of Florida, USA
Alice E. Smith, Auburn University, USA
Lipo Wang, Nanyang Technological University,
SINGAPORE
Digital Object Identifier 10.1109/MCI.2013.2247816
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
3
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Society
Society
Briefs
Marios M. Polycarpou
University of Cyprus,
CYPRUS
Newly Elected CIS Administrative
Committee Members (2013–2015)
James C. Bezdek, University of
Melbourne, AUSTRALIA
Jim received the
Ph.D. in Applied
Mathematics from
Cornell University
in 1973. Jim is
past president of
NAFIPS (North
Amer ican Fuzzy
Information Processing Society), IFSA
(International Fuzzy Systems Association)
and the IEEE CIS (Computational
Intelligence Society): founding editor the
Int’l. Jo. Approximate Reasoning and the
IEEE Transactions on Fuzzy Systems: Life
Fellow of the IEEE and IFSA; and a
recipient of the IEEE 3rd Millennium,
CIS Fuzzy Systems Pioneer, and technical field award Rosenblatt medals, and
the IPMU Kempe de Feret Medal. Jim’s
interests: woodworking, optimization,
motorcycles, pattern recognition, cigars,
clustering in very large data, fishing, coclustering, blues music, wireless sensor
networks, gardening, poker and visual
clustering. Jim retired in 2007, and will
be coming to a university near you soon.
Piero P. Bonissone,
General Electric Co., USA
Piero P. Bonissone is
currently a Chief
Scientist at GE
Global Research. Dr.
Bonissone has been
a pioneer in the field
of fuzzy logic, AI,
soft computing, and
Digital Object Identifier 10.1109/MCI.2013.2247817
Date of publication: 11 April 2013
4
approximate reasoning systems applications since 1979. During the eighties, he
conceived and developed the Diesel Electric Locomotive Troubleshooting Aid
(DELTA), one of the first fielded expert
systems that helped maintenance technicians in troubleshooting diesel-electric
locomotives. He was the PI in many
DARPA programs, from Strategic Computing Initiative, to Pilot’s Associate, Submarine Operational Automation System,
and Planning Initiative (ARPI). During
the nineties, he led many projects in fuzzy
control, from the hierarchical fuzzy control
of turbo-shaft engines to the use of fuzzy
logic in dishwashers, locomotives, and resonant converters for power supplies. He
designed and integrated case-based and
fuzzy-neural systems to accurately estimate
the value of single-family residential properties when used as mortgage collaterals.
In early 2000, he designed a fuzzy-rule
based classifier, trained by evolutionary
algorithms, to automate the placement of
insurance applications for long term care
and term life, while minimizing the variance of their decisions. More recently he
led a Soft Computing (SC) group in the
development of SC application to diagnostics and prognostics of processes and
products, including the prediction of
remaining life for each locomotive in a
fleet, to perform efficient assets selection.
His current interests are the development
of multi-criteria decision making systems
for PHM and the automation of intelligent systems life cycle to create, deploy,
and maintain SC-based systems, providing
customized performance while adapting
to avoid obsolescence.
He is a Fellow of the Institute of
Electrical and Electronics Engineers
(IEEE), of the Association for the
Advancement of Artificial Intelligence
(AAAI), of the International Fuzzy Systems Association (IFSA), and a Coolidge
Fellow at GE Global Research. He is the
recipient of the 2012 Fuzzy Systems Pioneer Award from the IEEE Computational Intelligence Society. Since 2010, he
is the President of the Scientific Committee of the European Centre of Soft
Computing. In 2008 he received the II
Cajastur International Prize for Soft Computing from the European Centre of Soft
Computing. In 2005 he received the
Meritorious Service Award from the IEEE
Computational Intelligence Society. He
has received two Dushman Awards from
GE Global Research. He served as Editor
in Chief of the International Journal of
Approximate Reasoning for 13 years. He is
in the editorial board of four technical
journals and is Editor-at-Large of the
IEEE Computational Intelligence Magazine.
He has coedited six books and has over
150 publications in refereed journals,
book chapters, and conference proceedings, with an H-Index of 31 (by Google
Scholar). He received 66 patents issued
from the U.S. Patent Office (plus 19
pending patents). From 1982 until 2005
he has been an Adjunct Professor at
Rensselaer Polytechnic Institute, in Troy,
NY, where he has supervised 5 Ph.D.
theses and 33 Master theses. He has
cochaired 12 scientific conferences and
symposia focused on Multi-Criteria
Decision-Making, Fuzzy sets, Diagnostics,
Prognostics, and Uncertainty Management in AI. Dr. Bonissone is very active
in the IEEE, where is has been a member
of the Fellow Evaluation Committee
from 2007 to 2009. He has been an
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
M
q
M
q
MQmags
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
THE WORLD’S NEWSSTAND®
Executive Committee member of
NNC/NNS/CIS society from 1993 to
2012 and an IEEE CIS Distinguished
Lecturer from 2004 to 2011.
Filtering” (Wiley), and “Information
Theoretic Learning” (Springer).
Jose C. Principe, University of
Florida, USA
Alice E. Smith is
the W. Allen and
Martha Reed professor of Industrial
and Systems Engineering Department at Auburn
Univer sity and
served as department chair from 19992011. Previously, she was on the faculty of
the Department of Industrial Engineering
at the University of Pittsburgh, which she
joined after industrial experience with
Southwestern Bell Corporation. Her
degrees are from Rice University, Saint
Louis University and Missouri University
of Science and Technology.
Dr. Smith holds one U.S. patent and
several international patents and has
authored more than 200 publications
which have garnered over 1,600 citations
(ISI Web of Science). Several of her papers
are among the most highly cited in their
respective journals including the 2nd most
cited paper of IEEE Transactions on Reliability. Dr. Smith has served as a principal
investigator on over US$6 million of
sponsored research. Her research in computational intelligence has been funded by
NASA, U.S. Department of Defense,
NIST, Missile Defense Agency, U.S.
Department of Transportation, Lockheed
Martin, and U.S. National Science Foundation, from which she has been awarded
16 grants including a CAREER grant
and an ADVANCE Leadership grant.
International research collaborations have
been sponsored by the federal governments of Japan, Turkey, United Kingdom,
The Netherlands, Egypt, South Korea,
Iraq, China, Algeria and the U.S., and by
the Institute of International Education.
Her current service to IEEE CIS
includes Associate Editor of IEEE Transactions on Evolutionary Computation (position
held since 1998), Vice Chair of the IEEE
Evolutionary Computation Technical
Committee,Vice Chair of the IEEE Evolutionary Computation Technical Com-
Jose C. Principe
(M’83-SM’90-F’00)
is a Distinguished
Professor of Electrical and Computer
Engineering and
Biomedical Engineering at the University of Florida where he teaches
advanced signal processing, machine learning and artificial neural networks (ANNs)
modeling. He is BellSouth Professor and
the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) www.cnel.
ufl.edu.
____ His primary area of interest is processing of time varying signals with adaptive neural models. The CNEL Lab is
studying signal and pattern recognition
principles based on information theoretic
criteria (entropy and mutual information)
and applying these advanced algorithms to
Brain Machine Interfaces (both motor as
well as somatosensory feedback).
Dr. Principe is an IEEE, ABME,
AIBME Fellow. He is the Past-Editor in
Chief of the IEEE Transactions on
Biomedical Engineering, past Chair of the
Technical Committee on Neural
Networks of the IEEE Signal Processing
Society, and Past-President of the
International Neural Network Society.
He received the IEEE EMBS Career
Award, and the IEEE Neural Network
Pioneer Award. He has Honorary
Doctor Degrees from the U. of Reggio
Calabria Italy, S. Luis Maranhao Brazil
and Aalto U. in Finland. Currently he is
the Editor in Chief of the IEEE Reviews
in Biomedical Engineering. Dr. Principe has
more than 600 publications. He directed
73 Ph.D. dissertations and 65 Master theses. He wrote four books: an interactive
electronic book entitled “Neural and
Adaptive Systems: Fundamentals through
Simulation” (Wiley), “Brain Machine
Interface Engineering,” “Kernel Adaptive
Alice E. Smith, Auburn
University, USA
mittee Task Force on Education, and
Member of the Women in Computational
Intelligence Committee. In past service to
CIS, she was General Chair of Congress on
Evolutionary Computation (CEC) 2011,
Program Chair of CEC 2004, Technical
Chair (Americas) of CEC 2000, Special
Sessions Chair of CEC 1999, and on the
program or technical committee of seven
other CEC’s. She also served on the IEEE
Evolutionary Computation Technical
Committee from 1999-2000 and from
2007-2011.
Dr. Smith is a Senior Member of
IEEE, a fellow of the Institute of
Industrial Engineers, and a senior member of the Society of Women Engineers,
a member of Tau Beta Pi and a
Registered Professional Engineer. She is
the Area Editor for Heuristic Search and
Learning of INFORMS Journal on
Computing and an Area Editor of
Computers & Operations Research.
Lipo Wang, Nanyang Technological
University, SINGAPORE
Dr. Lipo Wang’s
research interest
is computational
intelligence with
applications to bioinformatics, data
mining, optimization, and image
processing. He is (co)author of over 240
papers. He holds a U.S. patent in neural
networks. He has coauthored 2 monographs and (co)edited 15 books. He was/
will be keynote/panel speaker for several
international conferences. He received the
Bachelor degree from National University of Defense Technology (China) and
Ph.D. from Louisiana State University
(USA). He was on staff at the National
Institutes of Health (USA) and Stanford
University (USA). He was on the faculty
of Deakin University (Australia) and is
now on the faculty of Nanyang Technological University (Singapore).
He is/was Associate Editor/Editorial
Board Member of 20 international journals, including IEEE Transactions on Neural
Networks, IEEE Transactions on Knowledge
and Data Engineering, and IEEE Transactions
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
5
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
on Evolutionary Computation. He is an
elected member of the AdCom (20102015) of the IEEE Computational Intelligence Society (CIS) and served as IEEE
CIS Vice President for Technical Activities
(2006-2007) and Chair of Emergent Technologies Technical Committee (20042005). He is an elected member of the
Board of Governors of the International
Neural Network Society (2011-2013) and
a CIS Representative to the AdCom of
the IEEE Biometrics Council (2011). He
serves as Chair, Education Committee,
IEEE Engineering in Medicine and Biology Society (2011, 2012). He was President of the Asia-Pacific Neural Network
Assembly (APNNA) in 2002/2003 and
received the 2007 APNNA Excellent Service Award. He was Founding Chair of
both the IEEE Engineering in Medicine
and Biology Singapore Chapter and IEEE
Computational Intelligence Singapore
Chapter. He serves/served as IEEE CIDM
2013 Program Co-Chair, IEEE EMBC
2011 & 2010 Theme Co-Chair, IJCNN
2010 Technical Co-Chair, IEEE CEC
2007 Program Co-Chair, IJCNN 2006
Program Chair, as well as on the steering/
advisory/organizing/program committees
of over 200 international conferences.
Erkki Oja
Aalto University, FINLAND
IEEE Fellows—Class of 2013
Danilo Mandic,
Imperial College London, UK
for contributions to multivariate and nonlinear learning systems
Dr. Mandic obtained
his Ph.D. in the area
of nonlinear adaptive systems from
Imperial College in
1999 and is currently a Professor of
Signal Processing at
the same institution. He has been working in the areas of nonlinear and multidimensional adaptive filters, complex- and
quaternion-valued neural networks, timefrequency analysis and complexity science.
His research has found applications in
biomedical engineering (brain-computer
interface), human-computer interaction
(body sensor networks), and renewable
energy and smart grid. He has published
two research monographs: Recurrent Neural Networks for Prediction, Wiley 2001, and
Complex Valued Nonlinear Adaptive Filters:
Noncircularity, Widely Linear and Neural
Models,Wiley 2009, and has also coedited
a book on Data Fusion (Springer 2008)
and has been a part-editor for Springer
Digital Object Identifier 10.1109/MCI.2013.2247818
Date of publication: 11 April 2013
6
Handbook on Neuro- and Bioinformatics
(Springer 2013). Dr. Mandic has held
visiting positions in RIKEN (Japan), KU
Leuven (Belgium) and Westminster University (UK).
Professor Mandic has been a Publicity
Chair for the World Congress on
Computational Intelligence (WCCI) in
2014, Plenary Talks Chair at EUSIPCO
2013, European Liaison at ISNN in 2011
and a Program Co-Chair for ICANN in
2007. He has given keynote and tutorial
talks at foremost conferences in Signal
P ro c e s s i n g a n d C o m p u t a t i o n a l
Intelligence (ICASSP in 2013 and 2007,
IJCNN in 2010, 2011, and 2012), and has
been an Associate Editor for IEEE
Transactions on Neural Networks and Learning
Systems (since 2008), IEEE Signal Processing
Magazine, and IEEE Transactions on Signal
Processing. He is also a Co-Chair of the
Task Force on Complex Neural Networks and
a Member of the Task Force on Smart Grid
(both within IEEE CIS), and the Signal
Processing Theory and Methods technical
committee within the IEEE SPS.
Dr. Mandic has won several Best
Paper awards in international conferences in Computational Intelligence
(2010, 2009, 2006, 2004, 2002), and was
appointed by the World University
Service (WUS) as a Visiting Lecturer
within the Brain Gain Program (BGP).
His Ear-EEG device has been shortlisted
for the Annual Brain Computer
Interface Award in 2012. He has been
granted patents and has had successful
industrial collaborations in the areas of
brain- and human-computer interface.
Dr. Mandic has great satisfaction in
educating new generations of researchers and his Ph.D. students and PDRAs
have won Best Thesis awards at his
home Department in 2007 and 2011,
Best Research at the Department in
2012, and Best Student Paper awards in
ISNN in 2010, MELECON 2004, and
RASC in 2002.
Ron Sun, Rensselaer Polytechnic
Institute, NY, USA
for contributions to cognitive architectures and
computations
Ron Sun is Professor of Cognitive
Sciences at RPI,
and formerly the
James C. Dowell
Professor of Engineering and Professor of Computer
Science at University of MissouriColumbia. He heads the Cognitive
Architecture Laboratory at RPI. His
received his Ph.D. from Brandeis University in Computer Science.
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
M
q
M
q
MQmags
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
THE WORLD’S NEWSSTAND®
His research interests center around
the study of cognition, especially in the
areas of cognitive architectures, human
reasoning and learning, cognitive social
simulation, and hybrid connectionistsymbolic models. He published many
papers in these areas, as well as nine
books, including: “Duality of the Mind”
and “Cambr idge Handbook of
Computational Psychology.” For his
paper on integrating rule-based and connectionist models for accounting for
human everyday reasoning, he received
the 1991 David Marr Award from
Cognitive Science Society. For his work
on understanding human skill learning,
he received the 2008 Hebb Award from
International Neural Network Society.
His early major contribution was in
hybrid connectionist-symbolic models.
His 1995 “Artificial Intelligence” paper
has demonstrated that the integration of
symbolic and connectionist processes can
capture complex human reasoning. He
has furthermore made seminal contributions to advancing hybrid cognitive architectures and their applications to understanding human cognition/intelligence.
His 2001 “Cognitive Science” paper
addressed for the first time the cognitive
phenomenon of “bottom-up learning”.
His 2005 “Psychological Review” paper
proposed a framework that centered on
the interaction of implicit and explicit
cognitive processes (computationally, with
connectionist and symbolic representations). The latter article was the first successful attempt at accounting for a wide
range of cognitive phenomena that up to
that point had not been adequately captured either psychologically or in computational systems. His recent paper in
Psychological Review presents the most
comprehensive and integrative theory of
human creativity based on a dual representational framework. This theory and its
resulting model account for a wide variety of empirical data and phenomena, and
point to future intelligent systems capable
of creativity. These models, theories, and
methods are of fundamental importance
for understanding human cognition/
intelligence, and have significant implications for developing future computational
intelligence systems.
He is the founding co-editor-in-chief
of the journal Cognitive Systems Research,
and serves on the editorial boards of many
other journals. He chaired a number of
major international conferences, including
CogSci and IJCNN. He is a member of
the Governing Boards of Cognitive Science Society and International Neural
Networks Society, and served as President
of International Neural Networks Society
for a two-year term (2011-2012). His
Web URL is http://sites.google.com/
site/drronsun where one may find further
_______
information about his work.
Andrzej Cichocki, RIKEN Brain
Science Insitute, JAPAN, and Warsaw
University of Technology, POLAND
for contributions to applications of blind signal processing and artificial neural networks
Prof . Andrzej
Cichocki received
the M.Sc. (with
honors), Ph.D. and
Dr.Sc. (Habilitation) degrees, all in
electrical engineering from Warsaw
University of Technology (Poland). Since
1976, he has been with the Institute of
Theory of Electrical Engineering, Measurement and Information Systems, Faculty of Electrical Engineering at the
Warsaw University of Technology, where
he became a full Professor in 1995. He
spent several years at University Erlangen-Nuerenberg (Germany) as an Alexander-von-Humboldt Research Fellow
and Guest Professor. In 1995-1997 he
was a team leader of the Laboratory for
Artificial Brain Systems, at Frontier
Research Program RIKEN (Japan), in
the Brain Information Processing Group.
He is currently a Senior Team Leader
and Head of the laboratory for Advanced
Brain Signal Processing, at RIKEN Brain
Science Institute (JAPAN). He has given
keynote and tutorial talks at international
conferences in Computational Intelligence and Signal Processing and served
as member of program and technical
committees (EUSIPCO, IJCNN, ICA,
ISNN, ICONIP, ICAISC, ICASSP). He
has coauthored more than 400 papers in
international journals and conferences
and 4 monographs in English (two of
them translated to Chinese): “Nonnegative Matrix and Tensor Factorizations:
Applications to Exploratory Multi-way
Data Analysis,” John Wiley-2009; “Adaptive Blind Signal and Image Processing”
(coauthored with Professor Shun-ichi
Amari; Wiley, April 2003-revised edition), “CMOS Switched-Capacitor and
Continuous-Time Integrated Circuits
and Systems” (coauthored with Professor
Rolf Unbehauen; Springer-Verlag, 1989)
and “Neural Networks for Optimizations and Signal Processing” (WileyTeubner1994). He serves/served as an
Associated Editor of IEEE Transactions on
Neural Networks, IEEE Transactions on Signals Processing, Journal of Neurosciemce
Methods and as founding Editor in Chief
for Journal Computational Intelligence and
Neuroscience. Currently, his research focus
on tensor decompositions, multiway
blind sources separation, brain machine
interface, human robot interactions,
EEG hyper-scanning, brain to brain
interface and their practical applications.
His publications currently report over
18,700 citations according to Google
Scholar, with an h-index of 58 and
i10-index 250.
Zhi-Hua Zhou,
Nanjing University, CHINA
for contributions to learning systems in data
mining and pattern recognition
Zhi-Hua Zhou
received B.Sc., M.
Sc. and Ph.D. in
computer science
from Nanjing University in 1996,
1998 and 2000,
respectively, all with
the highest honor. He joined the Department of Computer Science and Technology of Nanjing University in 2001, and
currently he is a Professor and Deputy
Director of the National Key Laboratory
for Novel Software Technology.
Dr. Zhou is actively pursuing research
in the field of machine learning, data
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
7
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
mining and pattern recognition. He has
made significant contributions to theories
and algorithms of ensemble learning,
multi-instance learning, multi-label learning, semi-supervised learning, etc., and
many innovative techniques have been
applied to diverse areas such as computeraided medical diagnosis, biometric
authentication, bioinformatics, multimedia
retrieval and annotation, mobile and network communications, circuits design, etc.
He has published more than 100 papers in
leading international journals and conference proceedings, and his papers have
been cited for more than 9,000 times
according to Google Scholar, with an
h-index of 52. He has authored the book
“Ensemble Methods: Foundations and
Algorithms” (CRC Press, 2012), and
coedited eight conference proceedings.
He also holds twelve patents. He is a
Fellow of the International Association of
Pattern Recognition, and has received
many awards including nine international
journal/conference papers or competitions awards. He is also the awardee of
the 2013 IEEE CIS Outstanding Early
Career Award.
Dr. Zhou is the founder and steering committee chair of ACML, steering
committee member of PAKDD and
PRICAI, and served as General Chair,
Program Committee Chair, Vice Chair
or Area Chair for dozens of international conferences. He is currently the
Associate Editor-in-Chief of Chinese
Science Bulletin and on the Advisory
Board of International Journal of Machine
Learning and Cybernetics. He serves/
served as an Associate Editor or Editor ial Board member of more than
twenty journals, including the ACM
Transactions on Intelligent Systems and
Technology and the IEEE Transactions on
Knowledge and Data Engineering. He is
the Vice Chair of CIS Data Mining
Technical Committee, Vice Chair of
IEEE Nanjing Section, Chair of IEEE
Computer Society Nanjing Chapter,
Chair of Artificial Intelligence and Pattern Recognition Technical Committee
of China Computer Federation, and
Chair of Machine Learning Technical
Committee of China Association of
Artificial Intelligence.
8
Gail Carpenter,
Boston University, MA, USA
for contributions to adaptive resonance theory
and modeling of Hodgkin-Huxley neurons
Gail Car penter
received a B.A. in
mathematics from
the University of
Colorado, Boulder,
in 1970, and a
Ph.D. in mathematics from the University of Wisconsin, Madison, in 1974.
She has since been an instructor in
applied mathematics at MIT, a professor
of mathematics at Northeastern University, and a professor of cognitive and neural systems (CNS) and mathematics at
Boston University.
Gail Carpenter’s neural modeling work
began with her Ph.D. thesis, Traveling
wave solutions of nerve impulse equations.
In a series of papers published in the
1970s, she defined generalized HodgkinHuxley models, used dynamical systems
techniques to analyze their solutions, characterized the qualitative properties of the
burst patterns that a typical neuron may
propagate, and investigated normal and
abnormal signal patterns in nerve cells.
Together with Stephen Grossberg and
their students and colleagues, Prof.
Carpenter has, since the 1980s, developed
the Adaptive Resonance Theory (ART)
family of neural networks for fast stable
online learning, pattern recognition, and
prediction, including both unsupervised
(ART 1, ART 2, ART 2-A, ART 3, fuzzy
ART, distributed ART) and supervised
(ARTMAP, fuzzy ARTMAP, ARTEMAP, ARTMAP-IC, ARTMAP-FTR,
distributed ARTMAP, default ARTMAP,
biased ARTMAP, self-super vised
ARTMAP) systems. These ART models,
designed by integrating cognitive and
neural principles with systems-level computational constraints, have been used in
a wide range of applications, including
remote sensing, medical diagnosis, automatic target recognition, mobile robots,
and database management.
Prof. Carpenter’s recent research has
focused on questions such as: How can a
neural system learning from one example
at a time absorb information that is
inconsistent but correct, as when a family
pet is called Spot, dog, and animal, while
rejecting nominally similar incorrect
information, as when the same pet is
called wolf ? How does this system transform such scattered information into the
knowledge that dogs are animals, but not
conversely? How can a real-time system,
initially trained with a few labeled examples and a limited feature set, continue to
learn from experience when confronted
with oceans of additional information,
without eroding reliable early memories?
How can such individual systems adapt to
their unique application contexts? How
can a neural system that has made an
error refocus attention on environmental
features that it had initially ignored? Systems based on distributed ARTMAP
address these questions and their applications to technology. Other aspects of Prof.
Carpenter’s research include the development, computational analysis, and application of neural models of vision, synaptic
transmission, and circadian rhythms. Her
work in vision has ranged from models of
the retina to color processing and longrange figure completion.
Gail Car penter has organized
numerous conferences and symposia for
the IEEE, the International Neural
Network Society (INNS), and the
American Mathematical Society (AMS).
At Boston University, she has served as
founder and director of the CNS
Technology Lab and as a founding
member of the Center for Adaptive
Systems and the Department of
Cognitive and Neural Systems. She
received the IEEE Neural Networks
Pioneer Award and the INNS Gabor
Award, and is an INNS Fellow.
Hani Hagras,
University of Essex, UK
for contributions to fuzzy systems
Prof. Hani Hagras is a Professor of Computational Intelligence, Director of the
Computational Intelligence Centre, Head
of the Fuzzy Systems Research Group and
Head of the Intelligent Environments
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
M
q
M
q
MQmags
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
THE WORLD’S NEWSSTAND®
Research Group in
the University of
Essex, UK.
His received his
Ph.D. in Computer
Science from the
University of Essex
in 2000. His major
research interests are in computational
intelligence, notably type-2 fuzzy systems,
fuzzy logic, neural networks, genetic algorithms, and evolutionary computation. His
research interests also include ambient
intelligence, pervasive computing and
intelligent buildings. He is also interested
in embedded agents, robotics and intelligent control.
He has authored more than 250 papers
in international journals, conferences and
books. His work has received funding that
totalled to about £4 Million in the last
five years from the European Union, the
UK Technology Strategy Board (TSB), the
UK Department of Trade and Industry
(DTI), the UK Engineering and Physical
Sciences Research Council (EPSRC), the
UK Economic and Social Sciences
Research Council (ESRC) as well as sev-
eral industrial companies including. He
has also three industrial patents in the field
of computational intelligence and intelligent control.
His research has won numerous prestigious international awards where most
recently he was awarded by the IEEE
Computational Intelligence Society (CIS),
the 2013 Outstanding Paper Award in the
IEEE Transactions on Fuzzy Systems and
also he has won the 2004 Outstanding
Paper Award in the IEEE Transactions on
Fuzzy Systems. He was also the Chair of
the IEEE CIS Chapter that won the 2011
IEEE CIS Outstanding Chapter award.
His work with IP4 Ltd has won the 2009
Lord Stafford Award for Achievement in
Innovation for East of England. His work
has also won the 2011 Best Knowledge
Transfer Partnership Project for London
and the Eastern Region. His work has also
won best paper awards in several conferences including the 2006 IEEE International Conference on Fuzzy Systems and
the 2012 UK Workshop on Computational Intelligence.
He is a Fellow of Institute of Electrical
and Electronics Engineers (IEEE) and he
is also a Fellow of the Institution of Engineering and Technology (IET). He served
as the Chair of IEEE Computational
Intelligence Society (CIS) Senior Members Sub-Committee. He served also as
the chair of the IEEE CIS Task Force on
Intelligent Agents. He is currently the
Chair of the IEEE CIS Task Force on
Extensions to Type-1 Fuzzy Sets. He is also
a Vice Chair of the IEEE CIS Technical
Committee on Emergent Technologies.
He is a member of the IEEE Computational Intelligence Society (CIS) Fuzzy
Systems Technical Committee.
He is an Associate Editor of the
IEEE Transactions on Fuzzy Systems. He
is also an Associate Editor of the International Journal of Robotics and Automation.
Prof. Hagras chaired several international conferences where most recently
he served as the Co-Chair of the 2013,
2011 and 2009 IEEE Symposium on
Intelligent Agents, and the 2011 IEEE
International Symposium on Advances
to Type-2 Fuzzy Logic Systems. He was
also the General Co-Chair of the 2007
IEEE International Conference on
Fuzzy Systems London.
Heike Sichtig
U.S. Food and Drug Administration, USA
Stephen G. Matthews
University of Bristol, UK
IEEE CIS GOLD Report:
Inaugural Elevator Pitch Competition
and Other GOLD Activities
T
he Computational Intelligence
Society (CIS), as well as IEEE,
strive to assist and support younger
members in entering their professional
career after graduation. For this purpose,
Digital Object Identifier 10.1109/MCI.2013.2247819
Date of publication: 11 April 2013
Demetrios G. Eliades
University of Cyprus, CYPRUS
Muhammad Yasser
Babcock-Hitachi K.K., JAPAN
Pablo A. Estévez
University of Chile, CHILE
the IEEE launched the GOLD
(Graduates Of the Last Decade) Program
to help students’ transition to young professionals within the larger IEEE community. IEEE young professionals are
automatically added to the GOLD
member community when they graduate. These GOLD benefits are available
for the first ten years after graduation
from university. Similarly, CIS established
the GOLD subcommittee to help
increase the number of activities for
young IEEE professionals in the
Computational Intelligence (CI) field.
The IEEE CIS GOLD subcommittee
is dedicated to serving the needs of a
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
9
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
vibrant community of CI engineers,
scientists, and technical experts with
member representation across the globe.
In this article, we will be showcasing
some of the CIS GOLD activities in
2012. The CIS GOLD subcommittee
hosted a “Novel CI Research Idea
Pitch” competition during the Student
and GOLD reception at WCCI 2012
in Brisbane, Australia. The competition
was a fantastic opportunity to socialize
with like-minded students and GOLDs,
and an opportunity to relax after a long
conference day.
The competition challenge was to
design a one-page research proposal of
a Computational Intelligence (CI) idea,
and then to pitch that idea to a panel
of CI experts and their peers using an
“elevator pitch” (3-minute time limit).
An “elevator pitch” is a short summary
of a research idea. The research area was
limited to “computational intelligence”
and the participants were asked to submit a quad chart (a high-level overview
of an idea), and to “sell” their idea to
the judges to qualify for prizes. A panel
of three CI experts selected three best
pitches, followed by the audience (your
peers) who were responsible in ranking
the selected pitches by a secret ballot.
Prizes included one iPad2 for the winner, and certificates and free full year
2013 IEEE CIS memberships for the
3 best pitches.
On June 12, 2012, the reception
started bustling with conference attendees, awaiting the start of the inaugural
GOLD “elevator pitch” competition.
Seven GOLDs/students registered for
the event and were now up for presenting their novel CI idea to a like-minded
audience and senior CI experts. Some
entrants must have felt a bit uneasy
before their pitches; however, the
entrants were put at ease by the panel of
selected senior CIS experts, Gary Fogel,
Piero Bonissone and Pablo Estevez, and
the audience. The competition was a fun
event and judging by responses, a huge
success! A big thank you goes to the
photographers and videographers Albert
Lam and Erdal Kayacan for capturing
the exhilarating moments! Ahsan Chowdhury was awarded 1st place with an
10
Apple iPad, Stephen G. Matthews was
awarded 2nd place, and Aram (Alex) TerSarkissov was awarded 3rd place. All
entrants demonstrated strong skills in
presenting a research idea to an audience. This is invaluable experience for
GOLDs/students learning to pitch ideas
and sell one’s self in a short period of
time.Well done to all entrants!
Gary Fogel, one of the panel judges,
wrote: “It was a pleasure to even be considered to serve on the panel of judges for the
inaugural GOLD ‘elevator pitch’ competition. This was a very fun and interesting
event and the students came up with some
very creative ideas. I have to applaud all of
the students that entered simply for competing—and it was clear that most of them took
the task quite seriously, which was a pleasure
to see.Thanks to all of the students, congratulations to the winners, and to the
organizers—I hope this will be the start of a
long-lasting GOLD tradition! –Gary F.”
Following on from the elevator pitch
competition the CIS GOLD and CIS
student subcommittee distributed a survey to attendees of the reception. The
feedback showed strong support for both
the reception and the competition. Most
knew of the CIS student travel grants, but
not many knew about other benefits
such as CI Webinars, Summer Schools,
Student Research Grants and the Ph.D.
Dissertation Award. CIS GOLD will
endeavor to keep you informed about
what’s going on in the society and the
benefits of being a member.
The “elevator pitch” competition is
just one of the initiatives conducted by
the CIS GOLD subcommittee. The
subcommittee has several achievements
from their hard work throughout 2012:
❏ The “elevator pitch” competition at
WCCI 2012 was great fun and invaluable experience for students/GOLDs.
We hope this competition runs again.
❏ The survey conducted at WCCI 2012
provided data that we analyzed to produce a report. The information from
this survey helps the subcommittee
2012 CIS Student/GOLD social event organizers: CIS GOLD Chair Heike Sichtig (left), CIS Student
Activities Chair Demetrios Eliades, and CIS President Marios M. Polycarpou (right).
Panel of “elevator pitch” judges: Pablo Estevez, Gary Fogel and Piero Bonissone (left to right).
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
to understand how best to support our
CIS GOLDs in the future.
❏ We have contacted all IEEE CIS
chapters, and many of them have
given positive responses. Our intention is to keep and to improve our
good relation and coordination with
all of IEEE CIS chapters.
❏ Informing CIS GOLDs about activities on our website1, Facebook2 and
LinkedIn3 pages. We hope our website can be a good way to improve
our interaction with all IEEE CIS
GOLD members. Our expectation
is to provide the latest updates about
our activities to all of IEEE CIS
GOLD members and others.
❏ Heike Sichtig, chair of the CIS
GOLD subcommittee during 20112012, received the 2012 IEEE Members and Geographic Activities
1
http://cis.ieee.org/gold-graduates-of-the-last-decade.
__
html
2
h________________________
t t p s://w w w. f a c e b o ok .c o m /p a g e s/ I E E E - C I S ___________
GOLD/212664895442435
http://www.linkedin.com/groups/IEEE-CIS-GOLD________________________
Computational-Intelligence-438209
________________
3
Editor’s Remarks
Winner of the “elevator pitch” competition Ahsan Chowdhury (second from left) with the three
judges Gary Fogel, Pablo Estevez and Piero Bonissone. Our videographer/photographer Albert
Lam in the background!
GOLD Achievement Award. She
said: “I am very proud to be part of
this society. It truly provides support
and is the best networking tool in
today’s world!”
From these activities, we strive to
maintain a positive level of participation from IEEE CIS GOLD members,
and we feel we achieved this. In all of
these activities we have seen enthusiastic participation, and we expect that
interaction to increase in the future. We
are keen to hear your feedback/suggestions to learn what CIS GOLDs want,
so please contact the CIS GOLD chair.
Be sure to “like” us on Facebook and
connect with us on LinkedIn, so you
can meet other members! Finally, please
check our website for a link to additional commentary and pictures from
the “elevator pitch” competition.
(continued from page 2)
This special issue, as guest edited by
Dongrui Wu and Christian Wagner, presents four featured articles that address the
various areas of improvements that are
notable in the field of affective computing using computational intelligence
technologies. Gaming is a major area
where affective computing can be helpful. By sensing and interpreting the emotions and body gestures of the gamer
during game play, adaptation can be
made to allow the gaming experience to
become more realistic. The first article
addresses feature extraction and selection
in affective modelling, especially in relation to machine learning with deep
learning approaches and tested on games.
The second article attempts to model
the representation of emotion words in a
game such that similar emotion words
and those belonging to the same subsets
are classified accurately.
This issue also sees a feature article
that touches on an aspect that is close to
a lot of us in the academia who are in
constant contact with students—curiosity as a motivation in learning. Using
affective computing in a virtual learning
environment, the authors attempt a
model to enhance curiosity in order to
motivate students that may require more
prodding than others. Decision-making
in humans is certainly not a simple
black/white or yes/no, binary decision
process. It is influenced by beliefs, logic
and emotions as considered in the fourth
article of this issue, where attempts to
model one of the complexities of decision-making using affective computing
technologies to address belief-revision in
machines is made—comparing denial,
wishful thinking with goals and objectives in a machine. Emotions are sometimes tough to pin down even in words.
In addition to a “Book Review” on
complex-valued neural networks, there is
also a report on the 2012 IEEE Life
Sciences Grand Challenges conference
and an update on the activities of IEEE
CIS GOLD. In the “Society Briefs” column, we congratulate the new IEEE
Fellows in the Class of 2013 elevated
through CIS, and welcome our five
newly elected AdCom members who
will help manage and administer CIS.
As we cross the half-way point of
2013, it is also time to take stock
of what we have done well and what we
can do better before the end of the year
comes upon us. Please let us know if
you have any suggestions or comments
on areas that we have done well in that
you’d like to see continue and areas
where we can improve on by e-mailing
me at [email protected].
____________
We look forward to hearing from
you and hope you will enjoy this issue as
much as we’ve enjoyed putting it
together for you!
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
11
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Publication
Spotlight
Derong Liu, Chin-Teng Lin,
Garry Greenwood, Simon
Lucas, and Zhengyou Zhang
CIS Publication Spotlight
IEEE Transactions on Neural
Networks and Learning Systems
Low-Rank Structure Learning via Nonconvex Heuristic Recovery, by Y. Deng,
Q. Dai, R. Liu, Z. Zhang, and S. Hu,
IEEE Transactions on Neural Networks
and Learning Systems, Vol. 24, No. 3,
March 2013, pp. 383–396.
Digital Object Identifier: 10.1109/
TNNLS.2012.2235082
“ nonconvex framework is proposed for learning the essential
low-rank structure from corrupted data. Different from traditional
approaches, which directly utilizes
convex norms to measure the sparseness, this method introduces more reasonable nonconvex measurements to
enhance the sparsity in both the
intrinsic low-rank structure and the
sparse corruptions. It includes how to
combine the widely used Lp norm
(0<p<1) and log-sum term into the
framework of low-rank structure
lear ning. Although the proposed
optimization is no longer convex, it
still can be effectively solved by a
majorization–minimization (MM)type algorithm, with which the nonconvex objective function is iteratively
replaced by its convex surrogate and
the nonconvex problem finally falls
i n t o t h e g e n e r a l f r a m ewo r k o f
reweighed approaches. It is proved that
A
Digital Object Identifier 10.1109/MCI.2013.2247820
Date of publication: 11 April 2013
12
order to gain experiences for
the MM-type algorithm can
success and for failure. Success
converge to a stationar y
map is learned with adaptive
point after successive iterareward that qualifies the
tions. The proposed model
learned task in order to
is applied to solve two
optimize the efficiency.
typical problems: robust
The approach is presented
pr incipal component
with an implementation
analysis and low-rank
on the NAO humanoid
representation. Exper irobot, controlled by a biomental results on low-rank
inspired neural controller
structure learning demonbased on a central pattern genstrate that our nonconvex
© CORBIS
erator. The learning system adapts
heuristic methods, especially the
the oscillation frequency and the motor
log-sum heuristic recovery algorithm,
neuron gain in pitch and roll in order
generally perform much better than
to walk on flat and sloped terrain, and
the convex-nor m-based method
to switch between them.”
(0<p<1) for both data with higher
rank and with denser corruptions.”
Qualitative Adaptive Reward Learning
with Success Failure Maps: Applied to
Humanoid Robot Walking, by J. Nassour, V. Hugel, F.B. Ouezdou, and
G. Cheng, IEEE Transactions on Neural
Networks and Learning Systems, Vol.
24, No. 1, January 2013, pp. 81–93.
Digital Object Identifier: 10.1109/
TNNLS.2012.2224370
“A learning mechanism is proposed
to learn from negative and positive
feedback with reward coding adaptively.
It is composed of two phases: evaluation
and decision making. In the evaluation
phase, a Kohonen self-organizing map
technique is used to represent success
and failure. Decision making is based on
an early warning mechanism that
enables avoiding repeating past mistakes.
The behavior to risk is modulated in
IEEE Transactions on
Fuzzy Systems
A Novel Approach to Filter Design for
T–S Fuzzy Discrete-Time Systems with
Time-Varying Delay, IEEE Transactions
on Fuzzy Systems, Vol. 20, No. 6,
December 2012, pp. 1114–1129.
Digital Object Identifier: 10.1109/
TFUZZ.2012.2196522
“In this paper, the problem of l2- l∞
filtering for a class of discrete-time Takagi-Sugeno (T-S) fuzzy time-varying
delay systems is studied. The authors
focused on the design of full- and
reduced-order filters that guarantee the
filtering error system to be asymptotically stable with a prescribed H∞ performance. Sufficient conditions for the
obtained filtering error system are
proposed by applying an input-output
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
approach and a two-term approximation
method, which is employed to approximate the time-varying delay. The corresponding full- and reduced-order filter
design is cast into a convex optimization
problem, which can be efficiently solved
by standard numerical algorithms. Finally,
simulation examples are provided to
illustrate the effectiveness of the proposed approaches.”
Fuzzy c-Means Algorithms for Very
Large Data, IEEE Transactions on
Fuzzy Systems, Vol. 20, No. 6,
December 2012, pp. 1130–1146.
Digital Object Identifier: 10.1109/
TFUZZ.2012.2201485
“Very large (VL) data or big data
are any data that we cannot load into
our computer’s working memory. This
is not an objective definition, but a
definition that is easy to understand
and practical, because there is a dataset
too big for any computer we might
use; hence, this is VL data for us. Clustering is one of the primary tasks used
in the pattern recognition and data
mining communities to search VL
databases (including VL images) in
various applications, and so, clustering
algorithms that scale well to VL data
are important and useful. This paper
compares the efficacy of three different implementations of techniques
aimed to extend fuzzy c-means
(FCM) clustering to VL data. Specifically, we compare methods that are
based on 1) sampling followed by
noniterative extension; 2) incremental
techniques that make one sequential
pass through subsets of the data; and 3)
kernelized versions of FCM that provide approximations based on sampling, including three proposed algorithms. Empirical results show that
random sampling plus extension FCM,
bit-reduced FCM, and approximate
ker nel FCM are good choices to
approximate FCM for VL data. We
conclude by demonstrating the VL
algorithms on a dataset with 5 billion
objects and presenting a set of recommendations regarding the use of different VL FCM clustering schemes.”
IEEE Transactions on
Evolutionary Computation
Continuous Dynamic Constrained Optimization—The Challenges, by T. Nguyen
and X. Yao, IEEE Transactions on Evolutionary Computation, Vol. 16, No. 6,
December 2012, pp. 769–786.
Digital Object Identifier: 10.1109/
TEVC.2011.2180533
“Many real-world dynamic problems
have both objective functions and constraints that can change over time. Currently no research addresses whether
current algorithms work well on continuous dynamic constrained optimization
problems. There also is no benchmark
problem that reflects the common characteristics of continuous dynamic optimization problems. This paper attempts
to close this gap. The authors present
some investigations on the characteristics
that might make these problems difficult
to solve by some existing dynamic optimization and constraint handling algorithms. A set of benchmark problems
with these characteristics is presented.
Finally, list of potential requirements that
an algorithm should meet to solve these
type of problems is proposed.”
The Automatic Design of Multiobjective
Ant Colony Optimization Algorithms, by M. Lopez-Ibanez and
T. Stutzle, IEEE Transactions on Evolutionary Computation, Vol. 16, No. 6,
December 2012, pp. 861–875.
Digital Object Identifier: 10.1109/
TEVC.2011.2182651
“Multiobjective optimization problems are problems with several, often
conflicting, objectives that must be
optimized. Without any a priori preference information, the Pareto optimality
principle establishes a partial order
among solutions, and the output of the
algorithm becomes a set of nondominated solutions rather than a single one.
Various ant colony optimization (ACO)
algorithms have been proposed in
recent years for solving such problems.
This paper proposes a formulation
of algorithmic components that suffices
to descr ibe most multiobjective
ACO algorithms proposed so far. The
proposed framework facilitates the
application of automatic algorithm configuration techniques.”
IEEE Transactions on
Computational Intelligence
and AI in Games
Monte Carlo Tree Search for the Hideand-Seek Game Scotland Yard, by Pim
Nijssen and Mark H.M. Winands,
IEEE Transactions on Computational
Intelligence and AI in Games, Vol. 4,
No. 4, December 2012, pp. 282–294.
Digital Object Identifier: 10.1109/
TCIAIG.2012.2210424
“This paper develops a strong
Monte-Carlo Tree Search player for
Scotland Yard, an interesting asymmetric imperfect information 2-player
strategy game. The game involves one
player controlling five detectives trying
to capture a “hider.” A novel combination of techniques are used including
determinization, location categorization and coalition reduction, the latter
of which aims to optimally balance the
tendencies for detectives to behave in
glory hunting versus parasitic ways.”
IEEE Transactions on Autonomous
Mental Development
A Unified Account of Gaze Following, by
H. Jasso, J. Triesch, G. Deák, and J.M.
Lewis, IEEE Transactions on Autonomous
Mental Development, Vol. 4, No. 4,
December 2012, pp. 257–272.
Digital Object Identifier: 10.1109/
TAMD.2012.2208640
“Gaze following, the ability to redirect one’s visual attention to look at
what another person is seeing, is foundational for imitation, word learning, and
theory-of-mind. Previous theories have
suggested that the development of gaze
following in human infants is the product of a basic gaze following mechanism,
plus the gradual incorporation of several
distinct new mechanisms that improve
the skill, such as spatial inference, and the
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
13
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
ability to use eye direction information
as well as head direction. In this paper,
we offer an alternative explanation based
on a single learning mechanism. From a
starting state with no knowledge of the
implications of another organism’s gaze
direction, our model learns to follow
gaze by being placed in a simulated environment where an adult caregiver looks
around at objects. Our infant model
matches the development of gaze following in human infants as measured in
key experiments that we replicate and
analyze in detail.”
CALL FOR PAPERS
General Game Systems
IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG)
Special issue: General Game Systems
Special issue editors: Cameron Browne, Nathan Sturtevant and Julian Togelius
General game playing (GGP) involves the development of AI agents for playing a range of games well, rather than specialising in any one particular game. Such systems have potential benefits for AI research, where the creation of general intelligence remains one of the open grand challenges.
GGP was first proposed in the 1960s and became a reality in the 1990s with the Metagame system for general Chess-like
games. The specification of the game description language (GDL) and annual AAAI GGP competitions followed in the first
decade of this century, providing a platform for serious academic study into this topic. The recent advent of Monte Carlo
tree search (MCTS) methods has allowed the development of truly competitive GGP agents, and there is exciting new
research into applying GGP principles to general video games.
The field of general games research is now becoming fully rounded, with the development of complete general game
systems (GGS) for playing, analysing and/or designing new games. These include not only GGP, but also any system that
attempts to model a range of games; the definition is itself kept deliberately broad. The key feature of such systems is their
generality, but the issue of representation remains an obstacle to true universality while they rely on formal descriptions
of target domains.
The purpose of this special issue is to draw together the various research topics related to AI and CI in general games, to
give an indication of where the field currently stands and where it is likely to head. It will explore questions such as: How
good and how general are existing systems, and how good and how general can they become? What have we learnt about AI and CI
from studying general games? How do we apply existing GGP expertise to general video games? We invite high quality work on
any aspect of general games research in any genre of game–digital or physical–including play, analysis and design. Topics
include but are not limited to:
❑
❑
❑
❑
❑
❑
❑
❑
General game playing
General game description and representation
General game design and optimisation
Generalised Monte Carlo tree search (MCTS) approaches
Real-time, nondeterministic and imperfect information extensions to GGP
General video game playing
Framing issues and constraints on generality
Bridging the gap between academic and commercial applications
Authors should follow normal T-CIAIG guidelines for their submissions, but identify their papers for this special issue
during the submission process. Submissions should be 8 to 12 pages long, but may exceed these limits in special cases.
Short papers of 4 to 6 pages are also invited. See http://www.ieee-cis.org/pubs/tciaig/ for author information.
Deadline for submissions: May 3, 2013
Notification of Acceptance: July 5, 2013
Final copy due: October 4, 2013
Publication: December 6, 2013
Digital Object Identifier 10.1109/MCI.2013.2247901
14
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Conference
Report
Gary B. Fogel
Natural Selection, Inc.,
USA
A Report on the IEEE Life Sciences Grand Challenges Conference
O
n October 4–5, 2012 I had the
good fortune to attend the first
IEEE Life Sciences Grand Challenges Conference (IEEE LSGCC) held
at the National Academy of Sciences in
Washington, D.C. The two day meeting
had attendees from essentially all IEEE
societies, reviewing applications and
advancements of engineering in biomedicine. IEEE Life Sciences represents
a new direction for IEEE, focused on the
ever increasing need for improved engineering solutions for high quality, lower
cost solutions to healthcare. As biology
generates larger datasets, the need for
computational intelligence approaches
also increases. As a biologist, it was excellent to see presentations from both engineers and biologists, with IEEE pulling
the two fields closer together.
The meeting itself was largely
focused on medical applications, including improved devices, use of robots as
medical assistants, even visualization
methods for modeling of biological systems such as blood flow in the heart so
that new types of replacement valves
could be tested in a realistic simulation
environment where the researcher can
interact with the simulation in three
dimensional projections.
IEEE CIS was mentioned in a lecture by Shangkai Gao (Tsinghua University) regarding the importance of and
future directions in brain-computer
interfaces. It was good to see the importance of machine learning featured, as
Digital Object Identifier 10.1109/MCI.2013.2247821
Date of publication: 11 April 2013
The start of the first IEEE Life Sciences Grand Challenges Conference in Washington, D.C. at the
National Academy of Sciences.
well as a cover from a previous special
issue on this topic in IEEE Computational Intelligence Magazine!
A highlight for me was the lecture
by Nobel Prize winner Phillip Sharp
on the convergence of the life sciences, physical sciences, and engineering. It is this convergence that was the
focus of the meeting, the will allow
for knowledge integration and iteration, to provide actionable insights to
future clinicians. A common theme in
many talks was the need to translate
better modeling and engineer ing
throughout the healthcare chain, not
just to the clinician but to better
informed and engaged patients. While
some presentations highlighted promising advances already being made in
these directions, the advent of big data
in biology and the realization of the
scope and size of the problems in systems biology remain daunting. These
“grand challenges” will be the reason
why this new direction for IEEE will
pay dividends for researchers and
patients for years to come.
For more information on the IEEE
Life Sciences Initiative, please visit
http://lifesciences.ieee.org.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
15
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
IEEE Transactions on Fuzzy Systems
Special Issue on
Web-Based Intelligence Support Systems
using Fuzzy Set Technology
I. Aims and Scope
Web–based technology has enjoyed a tremendous growth
and exhibited a wealth of development at both conceptual and algorithmic levels. In particular, there have been
numerous successful realizations of Web-based support
systems in various application areas, including e-learning,
e-commerce, e-government, and e-market. Web-based
support systems are highly visible and influential examples of user-oriented technology supporting numerous
human pursuits realized across the Internet. In the two
categories of decision support systems and recommender
systems, the facet of user centricity and friendliness is
well documented.
Recent literature review demonstrates that more and
more successful developments in Web-based support systems are being integrated with fuzzy sets to enhance intelligence-oriented functionality such as web search systems by
fuzzy matching; Internet shopping systems using fuzzy
multi-agents; product recommender systems supported by
fuzzy measure algorithms; e-logistics systems using fuzzy
optimization models; online customer segments using fuzzy
data mining; fuzzy case-based reasoning in e-learning systems, and particularly online decision support systems supported by fuzzy set techniques. These developments have
demonstrated how the use of fuzzy set technology can
benefit the implementation of Web-based support systems
in business real-time decision making and government
online services.
In light of the above observations, this special issue is
intended to form an international forum presenting
innovative developments of fuzzy set applications in
Web-based support systems. The ultimate objective is to
bring well-focused high quality research results in Webbased support systems with intent to identify the most
promising avenues, report the main results and promote
the visibility and relevance of fuzzy sets. The intent is to
raise awareness of the domain of Web-based technologies
as a high-potential subject area to be pursued by the
fuzzy set research community.
Digital Object Identifier 10.1109/MCI.2013.2247903
16
II. Topics Covered
Fuzzy sets technology in
❏ Web-based group support systems
❏ Web-based decision support systems
❏ Web-based personalized recommender systems
❏ Web-based knowledge management systems
❏ Web-based customer relationship management
❏ Web-based tutoring systems
and their applications to:
❏ E-business intelligence
❏ E-commerce intelligence
❏ E-government intelligence
❏ E-learning intelligence
III. Important Dates
Aug. 1, 2013: Submission deadline
Nov. 1, 2013: Notification of the first-round review
Jan. 1, 2014: Revised submission due
Mar. 1, 2014: Final notice of acceptance/reject
IV. Submission Guidelines
Manuscripts should be prepared according to the instruction of the “Information for Authors” section of the journal found and submission should be done through the
IEEE TFS journal website: http://mc.manuscriptcentral.
com/tfs-ieee/ Clearly mark “Special Issue on Web-Based
Intelligence Support Systems using Fuzzy Set Technology”
in your cover letter to the Editor-in-Chief. All submitted
manuscripts will be reviewed using the standard procedure
that is followed for regular submissions.
V. Guest Editors
Prof. Witold Pedrycz
Department of Electrical & Computer Engineering
University of Alberta, Canada
e-mail: [email protected]
_____________
Prof. Jie Lu
School of Software
Faculty of Engineering and Information Technology,
University of Technology, Sydney, Australia
e-mail: ___________
[email protected]
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Guest
Editorial
Dongrui Wu
GE Global Research, USA
Christian Wagner
University of Nottingham, UK
Special Issue on Computational Intelligence and Affective Computing
A
1
http://nsf.gov/news/news_summ.jsp?cntn_id=123707
http://www.robotshelpingkids.org/index.php
2000
Number of Publications
ffective Computing (AC) was first
introduced by Professor Picard
(MIT Media Lab) in 1995 as
“computing that relates to, arises from, or
deliberately influences emotions.’’ It has been
gaining popularity rapidly in the last
decade, largely because of its great
potential in the next generation of
human-computer interfaces. Figure 1
shows the number of publications containing the phrase “affective computing,”
over the last 17 years returned by
Google Scholar. In 2012 there were
close to 2000 publications on it.
Many countries have been also very
supportive of AC research, particularly
in relation to priority areas such as supporting children’s social and cognitive
development and the backdrop of a rapidly aging demographic, where humantangible computing such as affective
robot companions is expected to provide essential benefits. In April 2012 the
United States National Science
Foundation awarded $10M to a 5-year
project “Socially Assistive Robotics”
under the Expeditions in Computing
program1, which2 “will develop the fundamental computational techniques that will
enable the design, implementation, and evaluation of robots that encourage social, emotional, and cognitive growth in children,
including those with social or cognitive deficits.” The European Union has funded
many relevant projects under the 6th
1500
1000
500
0
1996
1998
2000
2002
2006
2008
2010
2012
FIGURE 1 Number of Google Scholar publications on affective computing since 1995.
and 7th Framework Programmes. The
HUMAINE 3 (HUman-MAchine
Interaction Network on Emotions)
Network of Excellence was established
in 2004 and now has 33 partners from
14 countries. The RoboCom (Robot
Companions for Citizens) project4 is
one of the six candidates for the two €1
billion 10-year Future and Emerging
Technologies Flagships5. These robots
will be able to display soft behavior
based on new levels of perceptual, cognitive and emotive capabilities.
There are also two journals and an
international conference dedicated to AC.
The HUMAINE association established
the bi-annual International Conference on
Affective Computing and Intelligent
Interaction (Beijing, China, 2005; Lisbon,
2
3
http://emotion-research.net/
http://www.robotcompanions.eu/
http://cordis.europa.eu/fp7/ict/programme/fet/
flagship/6pilots_en.html
___________
4
Digital Object Identifier 10.1109/MCI.2013.2247822
Date of publication: 11 April 2013
2004
Year
Por tugal, 2007; Amsterdam, The
Netherlands, 2009; Memphis, USA, 2011;
Geneva, Switzerland, 2013) in 2005, and
the IEEE/ACM Transactions on Affective
Computing in 2010. IGI Global established the International Journal of Synthetic
Emotions in 2010.
Notably, the IEEE Computational
Intelligence Society (CIS) is very active on
AC research. It is a sponsor of the IEEE
Transactions on Affective Computing, the
Workshop on Affective Computational
Intelligence in the 2011 IEEE Symposium
Series on Computational Intelligence
(SSCI 2011), and the Symposium on
Computational Intelligence for Creativity
and Affective Computing in SSCI
2013. The CIS Emergent Technologies
Technical Committee has established an
Affective Computing Task Force6, which is
currently chaired by the two Guest
5
6
https://sites.google.com/site/drwu09/actf
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
17
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Computational intelligence methods, including fuzzy
sets and systems, neural networks, and evolutionary
algorithms, provide ideal capabilities to develop
intuitive and robust emotion recognition algorithms.
Emotions, which are intrinsic to human beings, may
also inspire new CI algorithms.
Editors of this special issue. The Task
Force organized a special session on
“Affective Computing and Computational
Intelligence” at the 2012 World Congress
on Computational Intelligence (WCCI
2012) with a view to make it a bi-annual
event held at WCCI.
The combination of AC and computational intelligence is very natural. AC
raises many new challenges for signal
processing, affect recognition & modeling, and infor mation aggregation.
Physiological signals which are frequently
used as a basis for affect recognition are
very noisy and highly subject-dependent.
Computational intelligence methods,
including fuzzy sets and systems, neural
networks, and evolutionary algorithms,
provide ideal capabilities to develop intuitive and robust emotion recognition
algorithms. Further, emotions, which are
intrinsic to human beings, may also
inspire new CI algorithms, just like the
human brain inspired neural networks
and the survival of the fittest in nature
inspired evolutionary computation.
AC research itself has rapidly expanded
and today frequently goes beyond the initial core research challenge of mapping
body signals (facial expressions, voice, gesture, physiological signals, etc.) to affective
states. As an area which relies on contributions from a series of academic disciplines,
including Psychology, Biology, and
Computer Science, much of the research
in AC is firmly grounded in a multi-disciplinary approach. The four articles in this
special issue of IEEE Computational
Intelligence Magazine represent some latest
progress on the combination of AC
and computational intelligence. They
were selected from 20 submissions
through peer-review and provide a highly
interesting view of the current research
and potential avenues of computational
18
intelligence in AC. The breadth of
the research captured by these articles provides an indication of the importance of
affect in modern human-centric computation and indicates the potential for further development of Computational
Intelligence in this space.
The first article, “Learning Deep
Physiological Models of Affect,” describes
the first study that applies deep learning
to AC using psychophysiological signals
(skin conductance and blood volume
pulse). Deep learning is a very active
research area in machine learning, especially for object recognition in images. In
this article the authors use a deep artificial
neural network for automatic feature
extraction and feature selection. They
adopt preference-based (or ranking-based)
annotations for emotion rather than traditional rating-based annotation, as the former provides more reliable self-report
data. Experiments show that deep learning can extract meaningful multimodal
data attributes beyond manual ad-hoc feature design. For some affective states, deep
learning without feature selection
achieved similar or even better performance than models built on ad-hoc
extracted features boosted by automatic
feature selection. More importantly, the
method is generic and applicable to any
affective modeling task.
In the second article, the authors
present two models that employ interval
type-2 fuzzy sets to model the meaning
of words describing emotion. The first
model represents three factors for each
word: dominance, valence, and activation.
After describing the model the authors
deploy it in conjunction with similarity
measures for the task of translating from
one emotion vocabulary to another. As
an initial outcome, the authors show that
while the model works well with smaller
vocabularies, performance (rated by
comparison with human translators)
decreases when larger vocabularies are
used. The authors then introduce a second model which aims to overcome this
limitation by taking a different approach
to modeling words where interval
type-2 fuzzy sets are used to represent
the truth values of answers to questions
about emotion. A crowd-sourced evaluation of the latter approach is conducted
and the results presented.
The third article,“Modeling CuriosityRelated Emotions for Virtual Peer
Learners,” proposes a virtual peer learner
with curiosity-related emotions. It represents one of the latest advances on personalized learning, which was selected by the
United States National Academy of
Engineering as one of its 14 Grand
Challenges7. The idea is that “instruction can
be individualized based on learning styles,
speeds, and interests to make learning more reliable. ... Personal learning approaches range from
modules that students can master at their own
pace to computer programs designed to match the
way it presents content with a learner’s personality.” Experiments show that the curiosityrelated emotions can guide the curious
peer learner to behave naturally in a
virtual learning environment, and the
curious virtual peer learner can demonstrate a higher tendency for learning in
breadth and depth.
In the fourth article, “Goal-Based
Denial and Wishful Thinking,” the
authors propose a novel approach to
model an agent’s beliefs that aims to
incorporate denial and wishful thinking.
While not traditionally related to AC,
their work on belief revision highlights
an important aspect of emotion in
belief-structure with direct consequences for the design of artificial
agents. They describe how traditional
rational belief systems for autonomous
artificial agents can be extended to capture a more human-like approach to
belief creation, preservation and revision. Significantly, the authors show
how their approach enables the autonomous ranking and re-ranking of beliefs
7
h________________________
t t p : / / w w w. e n g i n e e r i n g c h a l l e n g e s . o r g /
cms/8996/9127.aspx
_________
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
subject to new evidence and changes in
an agent’s goals which in turn allow an
agent to autonomously revise its beliefs
without relying on their external prioritization. As part of scenarios, the
authors instantiate their belief model
and demonstrate the behavior of the
agent in particular in terms of the
“denial and wishful thinking” belief
revision driven by the context experienced by the agent.
In summary, the four selected papers
for this special issue highlight a subset
of the challenging and novel applications of computational intelligence to
AC. We would like to express our sincere thanks to all the authors and gratitude to reviewers for extending their
cooperation in preparing and revising the
papers. Special thanks go to Professor Kay
Chen Tan, Editor-in-Chief of IEEE
Computational Intelligence Magazine, for his
suggestions and advice throughout the
entire process of this special issue.We hope
that this issue will inspire others to work
on the exciting new frontier of computational intelligence and AC.
IEEE Transactions on Autonomous Mental Development
Special Issue on Behavior Understanding and Developmental Robotics
Call for Papers
We solicit papers that inspect scientific, technological and application challenges that arise from the mutual interaction of
developmental robotics and computational human behavior understanding. While some of the existing techniques of multimodal behavior analysis and modeling can be readily re-used for robots, novel scientific and technological challenges arise
when one aims to achieve human behavior understanding in the context of natural and life-long human-robot interaction.
We seek contributions that deal with the two sides of this problem: (i) Behavior analysis for developmental robotics;
(ii) Behavior analysis through developmental robotics.Topics include the following, among others:
Adaptive human-robot interaction
Action and language understanding
Sensing human behavior
Incremental learning of human behavior
Learning by demonstration
Intrinsic motivation
Robotic platforms for behavior analysis
Multimodal interaction
Human-robot games
Semiotics for robots
Social and affective signals
Imitation
Contributions can exemplify diverse approaches to behavior analysis, but the relevance to developmental robotics should
be clear and explicitly argumented. In particular, it should involve one of the following: 1) incremental and developmental
learning techniques, 2) techniques that allow adapting to changes in human behavior, 3) techniques that study evolution and
change in human behavior. Interested parties are encouraged to contact the editors with questions about the suitability of
a manuscript.
Editors:
Albert Ali Salah, Boğaziçi University, ____________
[email protected]; Pierre-Yves Oudeyer, INRIA, [email protected];
__________________
Çetin Meriçli, Carnegie Mellon University, [email protected];
jruizd@ing.
__________ Javier Ruiz-del-Solar, Universidad de Chile, ________
uchile.cl
_____
Instructions for Authors:
http://cis.ieee.org/ieee-transactions-on-autonomous-mental-development.html
We are accepting submissions through Manuscript Central at http://mc.manuscriptcentral.com/tamd-ieee (please select
“Human Behavior Understanding” as the submission type)
When submitting your manuscript, please also cc it to the editors.
Timeline:
30 April 2013:
15 July 2013:
15 October 2013:
20 October 2013:
December 2013:
Deadline for paper submission
Notification of the first round of review results
Final version
Electronic publication
Printed publication
Digital Object Identifier 10.1109/MCI.2013.2247902
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
19
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
© PHOTODISC
Héctor P. Martínez
IT University of Copenhagen, DENMARK
Yoshua Bengio
University of Montreal, CANADA
Georgios N. Yannakakis
University of Malta, MALTA
I. Introduction
M
ore than 15 years after the early studies in Affective Computing (AC),
[1] the problem of detecting and modeling emotions in the context
of human-computer interaction (HCI) remains complex and largely
unexplored. The detection and modeling of emotion is, primarily,
the study and use of artificial intelligence (AI) techniques for the construction of
computational models of emotion. The key challenges one faces when attempting to model emotion [2] are inherent in the vague definitions and fuzzy
boundaries of emotion, and in the modeling methodology followed. In this
context, open research questions are still present in all key components of the
modeling process. These include, first, the appropriateness of the modeling tool
employed to map emotional manifestations and responses to annotated affective
states; second, the processing of signals that express these manifestations (i.e.,
model input); and third, the way affective annotation (i.e., model output) is handled. This paper touches upon all three key components of an affective model
(i.e., input, model, output) and introduces the use of deep learning (DL) [3], [4],
[5] methodologies for affective modeling from multiple physiological signals.
Traditionally in AC research, behavioral and bodily responses to stimuli are
collected and used as the affective model input. The input can be of three main
types: a) behavioral responses to emotional stimuli expressed through an
Digital Object Identifier 10.1109/MCI.2013.2247823
Date of publication: 11 April 2013
20
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
1556-603X/13/$31.00©2013IEEE
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Abstract—Feature extraction and
feature selection are crucial phases in the
process of affective modeling. Both, however,
incorporate substantial limitations that hinder the
development of reliable and accurate models of affect.
For the purpose of modeling affect manifested through
physiology, this paper builds on recent advances in machine
learning with deep learning (DL) approaches. The efficiency of
DL algorithms that train artificial neural network models is tested
and compared against standard feature extraction and selection
approaches followed in the literature. Results on a game data corpus—containing players’ physiological signals (i.e., skin conductance
and blood volume pulse) and subjective self-reports of affect—
reveal that DL outperforms manual ad-hoc feature extraction as it
yields significantly more accurate affective models. Moreover, it
appears that DL meets and even outperforms affective models
that are boosted by automatic feature selection, for several of
the scenarios examined. As the DL method is generic and
recognition [7], [8]. DL allows
interactive application (e.g.,
applicable to any affective modeling task, the key
the
automation of feature
data obtained from a log of
findings of the paper suggest that ad-hoc feature
extraction
(and feature selection,
actions performed in a game); b)
extraction and selection—to a lesser
in part) without compromising on
objective data collected as bodily
degree—could be bypassed.
the accuracy of the obtained computaresponses to stimuli, such as physiotional models and the physical meaning of
logical signals and facial expressions; and c)
the data attributes extracted [9]. Using deep
the context of the interaction. Before these data
learning we were able to extract meaningful multimodal
streams are fed into the computational model, an automatic
data attributes beyond manual ad-hoc feature design. These
or ad-hoc feature extraction procedure is employed to derive
learned attributes led to more accurate affective models and, at
appropriate signal attributes (e.g., average skin conductance)
the same time, potentially save computational resources by
that will feed the model. It is also common to introduce an
bypassing the computationally expensive feature selection
automatic or a semi-automatic feature selection procedure that
phase. Most importantly, with the use of DL we gain simplicity
picks the most appropriate of the features extracted.
as multiple signals can be fused and fed directly—with limited
While the phases of feature extraction and feature selecpreprocessing—to the model for training.
tion are beneficial for affective modeling, they inherit a
Other common automatic feature extraction techniques
number of critical limitations that make their use cumberwithin AC are principal component analysis (PCA) and Fisher
some in highly complex multimodal input spaces. First, manprojection. However they are typically applied to a set of feaual feature extraction limits the creativity of attribute design
tures extracted a priori [10] while we apply DL directly to the
to the expert (i.e., the AC researcher) resulting in potentially
raw data signals. Moreover, DL techniques can operate with
inappropriate affect detectors that might not be able to capany signal type and are not restricted to discrete signals as, for
ture the manifestations of the affect embedded in the raw
example, sequential data mining techniques are [11]. Finally,
input signals. Second, both feature extraction and feature
compared to dynamic affect modeling
selection—to a larger degree—are computationally expenapproaches such as Hidden Markov
sive phases. In particular, the computational cost of feature
Models and Dynamic Bayesian
selection may increase combinatorially (quadratically, in the
Networks, DL models are advantagreedy case) with respect to the number of features considgeous with respect to their ability
ered [6]. In general, there is no guarantee that any search
to reduce signal resolution
algorithm is able to converge to optimal feature sets for the
across the several layers of their
model; even exhaustive search may be approximate, since
architectures.
models are often trained with non-deterministic algorithms.
This paper focuses on
Our hypothesis is that the use of non-linear unsupervised
developing DL models of
and supervised learning methods relying on the principles of
affect using data which are
DL [3], [4] can eliminate the limitations of the current feature
annotated in a ranking format
extraction and feature selection practices in affective modeling.
(pairwise preferences). We
We test the hypothesis that DL could construct feature extracemphasize the benefits of prefertors that are more appropriate than selected adhoc features
ence-based (or ranking-based) annopicked via automatic selection. Learning within deep artificial
tations for emotion (e.g., X is more
neural network (ANN) architectures has proven to be a powerfrustrating than Y) as opposed to ratingful machine learning approach for a number of benchmark
based annotation [12] (such as the self-assessment
problems and domains, including image and speech
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
21
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
manikins [13], a tool to rate levels of arousal and valence in
discrete or continuous scales [14]) and introduce the use of
DL algorithms for preference learning, namely, preference
deep learning (PDL). In this paper, the PDL algorithm proposed is tested on emotional manifestations of relaxation, anxiety, excitement, and fun, embedded in physiological signals (i.e.,
skin conductance and blood volume pulse) derived from a
game-based user study of 36 participants. The study compares
DL against ad-hoc feature extraction on physiological signals,
used broadly in the AC literature, showing that DL yields
models of equal or significantly higher accuracy when a single
signal is used as model input. When the skin conductance and
blood volume pulse signals are fused, DL outperforms standard
feature extraction across all affective states examined. The
supremacy of DL is maintained even when automatic feature
selection is employed to improve models built on ad-hoc features; in several affective states the performance of models built
on automatically selected ad-hoc features does not surpass or
reach the corresponding accuracy of the PDL approach.
This paper advances the state-of-the-art in affective modeling in several ways. First, to the best of the authors’ knowledge, this is the first time deep learning is introduced to the
domain of psychophysiology, yielding efficient computational
models of affect. Second, the paper shows the strength of the
method when applied to the fusion of different physiological
signals. Third, the paper introduces PDL, i.e., the use of deep
ANN architectures trained on ranked (pairwise preference)
annotations of affect. Finally, the key findings of the paper
show the potential of DL as a mechanism for eliminating
manual feature extraction and even, in some occasions,
bypassing automatic feature selection for affective modeling.
II. Computational Modeling of Affect
Emotions and affect are mental and bodily processes that can
be inferred by a human observer from a combination of contextual, behavioral and physiological cues. Part of the complexity of affect modeling emerges from the challenges of finding
objective and measurable signals that carry affective information (e.g., body posture, speech and skin conductance) and
designing methodologies to collect and label emotional experiences effectively (e.g., induce specific emotions by exposing
participants to a set of images). Although this paper is only concerned with computational aspects of creating physiological
detectors of affect, the signals and the affective target values
collected shape the modeling task and, thus, influence the efficacy and applicability of dissimilar computational methods.
Consequently, this section gives an overview of the field
beyond the input modalities and emotion annotation protocols
examined in our case study. Furthermore, the studies surveyed
are representative of the two principal applications of AI for
affect modeling and cover the two key research pillars of this
paper: 1) defining feature sets to extract relevant bits of information from objective data signals (i.e., for feature extraction),
and 2) creating models that map a feature set into predicted
affective states (i.e., for training models of affect).
22
A. Feature Extraction
In the context of affect detection, we refer to feature extraction as
the process of transforming the raw signals captured by the hardware (e.g., a skin conductance sensor, a microphone, or a camera)
into a set of inputs suitable for a computational predictor of affect.
The most common features extracted from unidimensional continuous signals—i.e. temporal sequences of real values such
as blood volume pulse, accelerometer data, or speech—are simple
statistical features, such as average and standard deviation values,
calculated on the time or frequency domains of the raw or the
normalized signals (see [15], [16] among others). More complex
feature extractors inspired by signal processing methods have also
been proposed by several authors. For instance, Giakoumis et al.
[17] proposed features extracted from physiological signals using
Legendre and Krawtchouk polynomials while Yannakakis and
Hallam [18] used the approximate entropy [19] and the parameters
of linear, quadratic and exponential regression models fitted to a
heart rate signal.The focus of this paper is on DL methods that can
automatically derive feature extractors from the raw data, as
opposed to a fixed set of hand-crafted extractors that represent
pre-designed statistical features of the signals.
Unidimensional symbolic or discrete signals—i.e., temporal
sequences of discrete labels, typically events such as clicking a
mouse button or blinking an eye—are usually transformed
with ad-hoc statistical feature extractors such as counts, similarly to continuous signals. Distinctively, Martínez and
Yannakakis [11] used frequent sequence mining methods [20]
to find frequent patterns across different discrete modalities,
namely gameplay events and discrete physiological events. The
count of each pattern was then used as an input feature to an
affect detector. This methodology is only applicable to discrete
signals: continuous signals must be discretized, which involves a
loss of information. To this end, the key advantage of the DL
methodology proposed in this paper is that it can handle both
discrete and continuous signals; a lossless transformation can
convert a discrete signal into a binary continuous signal, which
can potentially be fed into a deep network—DL has been successfully applied to classify binary images, e.g., [21].
Affect recognition based on signals with more than one
dimension typically boils down to affect recognition from
images or videos of body movements, posture or facial expressions. In most studies, a series of relevant points of the face or
body are first detected (e.g., right mouth corner and right
elbow) and tracked along frames. Second, the tracked points are
aggregated into discrete Action Units [22], gestures [23] (e.g., lip
stretch or head nod) or continuous statistical features (e.g.,
body contraction index), which are then used to predict the
affective state of the user [24]. Both above-mentioned feature
extraction steps are, by definition, supervised learning problems
as the points to be tracked and action units to be identified
have been defined a priori. While these problems have been
investigated extensively under the name of facial expression or
gesture recognition, we will not survey them broadly as this
paper focuses on methods for automatically discovering new or
unknown features in an unsupervised manner.
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Deep neural network architectures such as convolutional
neural networks (CNNs), as a popular technique for object
recognition in images [25], have also been applied for facial
expression recognition. In [26], CNNs were used to detect
predefined features such as eyes and mouth which later were
used to detect smiles. Contrary to our work, in that study
each of the layers of the CNN was trained independently
using backpropagation, i.e., labeled data was available for
training each level. More recently, Rifai et al. [27] successfully
applied a variant of auto-encoders [21] and convolutional
networks, namely Contractive Convolutional Neural
Networks, to learn features from images of faces and predict
the displayed emotion, breaking the previous state-of-the-art
on the Toronto Face Database [28]. The key differences of this
paper with that study reside in the nature of the dataset and
the method used. While Rifai et al. [27] used a large dataset
(over 100,000 samples; 4,178 of them were labeled with an
emotion class) of static images displaying posed emotions, we
use a small dataset (224 samples, labeled with pairwise orders)
with a set of physiological time-series recorded along an
emotional experience. The reduced size of our dataset (which
is of the same magnitude as datasets used in related psychophysiological studies—e.g., [29], [30]) does not allow the
extraction of large feature sets (e.g., 9,000 features in [27]),
which would lead to affect models of poor generalizability.
The nature of our preference labels also calls for a modified
CNN training algorithm for affective preference learning
which is introduced in this paper. Furthermore, while the use
of CNNs to process images is extensive, to the best of the
authors knowledge, CNNs have not been applied before to
process (or as a means to fuse) physiological signals.
As in many other machine learning applications, in affect
detection it is common to apply dimensionality reduction
techniques to the complete set of features extracted. A wide
variety of feature selection (FS) methods have been used in the
literature including sequential forward [31], sequential floating
forward [10], sequential backwards [32], n-best individuals [33],
perceptron [33] and genetic [34] feature selection. Fisher projection and Principal Component Analysis (PCA) have been
also widely used as dimensionality reducers on different modalities of AC signals (e.g., see [10] among others). An autoencoder can be viewed as a non-linear generalization of PCA
[8]; however, while PCA has been applied in AC to transpose
sets of manually extracted features into low dimensional spaces,
in this paper auto-encoders are used to train unsupervised
CNNs to transpose subsets of the raw input signals into a
learned set of features. We expect that information relevant for
prediction can be extracted more effectively using dimensionality reduction methods directly on the raw physiological signals than on a set of designer-selected extracted features.
B. Training Models of Affect
The selection of a method to create a model that maps a given
set of features to predictions of affective variables is strongly
influenced by the dynamic aspect of the features (stationary or
sequential) and the format in which training examples are
given (continuous values, class labels or ordinal labels). A vast set
of off-the-shelf machine learning (ML) methods have been
applied to create models of affect based on stationary features,
irrespective of the specific emotions and modalities involved.
These include Linear Discriminant Analysis [35], Multi-layer
Perceptrons [32], K-Nearest Neighbors [36], Support Vector
Machines [37], Decision Trees [38], Bayesian Networks [39],
Gaussian Processes [29] and Fuzzy-rules [40]. On the other
hand, Hidden Markov Models [41], Dynamic Bayesian Networks [42] and Recurrent Neural Networks [43] have been
applied for constructing affect detectors that rely on features
which change dynamically. In the approach presented here,
deep neural network architectures reduce hierarchically the resolution of temporal signals down to a set of features that can be
fed to simple stateless models eliminating the need for complex
sequential predictors.
In all the above-mentioned studies, the prediction targets are either class labels or continuous values. Class labels
are assigned either using an induction protocol (e.g., participants are asked to self-elicit an emotion [36], presented
with stories to evoke a specific emotion [44]) or via ratingor rank-based questionnaires given to users experiencing
the emotion (self-reports) or experts (third-person reports).
If ratings are used, they can be binned into discrete or
binary classes (e.g., on a scale from 1 to 5 measuring stress,
values above or below 3 correspond to the user at stress or
not at all, respectively [45]) or used as target values for
supervised learning (e.g., two experts rate the amount of
sadness of a facial expression and the average value is used
as the sadness intensity [46]). Alternatively, if ranks are used,
the problem of affective modeling becomes one of preference
learning. In this paper we use object ranking methods—a subset of preference learning algorithms [47], [48]—which
train computational models using partial orders among the
training samples. These methods allow us to avoid binning
together ordinal labels and to work with comparative questionnaires, which provide more reliable self-report data
compared to ratings, as they generate less inconsistency and
order effects [12].
Object ranking methods and comparative (rank) questionnaires have been scarcely explored in the AC literature,
despite their well-known advantages. For example, Tognetti
et al. [49] applied Linear Discriminant Analysis to learn
models of preferences over game experiences based on physiological statistical features and comparative pairwise selfreports (i.e., participants played pairs of games and ranked
games according to preference). On the same basis,
Yannakakis et al. [50], [51] and Martínez et al. [34], [33]
trained single and multiple layer perceptrons via genetic
algorithms (i.e., neuroevolutionary preference learning) to learn
models for several affective and cognitive states (e.g., fun,
challenge and frustration) using physiological and behavioral
data, and pairwise self-reports. In this paper we introduce a
deep learning methodology for data given in a ranked
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
23
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Input Signal
Convolutional
Layer 1
g
g
g
g
Feature Maps 1
Pooling Layer 1
Feature Maps 1
(Subsampled)
Feature Extraction
(Convolutional
Neural Network)
Convolutional
Layer 2
g
g
g
g
Feature Maps 2
Pooling Layer 2
Feature Maps 2
(Subsampled)
(a)
Extracted Features
x0 x1 x2
x3 x4 x5 x6 x7 x8
Model of Affect
(Single-Layer Perceptron)
(b)
FIGURE 1 Example of structure of a deep ANN architecture. The architecture contains: (a) a convolutional neural network (CNN) with two convolutional and two pooling layers, and (b) a single-layer perceptron (SLP) predictor. In the illustrated example the first convolutional layer (3
neurons and path length of 20 samples) processes a skin conductance signal which is propagated forward through an average-pooling layer
(window length of 3 samples). A second convolutional layer (3 neurons and patch length of 11 samples) processes the subsampled feature
maps and the resulting feature maps feed the second average-pooling layer (window length of 6 samples). The final subsampled feature maps
form the output of the CNN which provides a number of extracted (learned) features which feed the input of the SLP predictor.
format (i.e., Preference Deep Learning) for the purpose of
modeling affect.
III. Deep Artificial Neural Networks
We investigate an effective method of learning models that
map signals of user behavior to predictions of affective states.
To bypass the manual ad-hoc feature extraction stage, we use
24
a deep model composed from (a) a multi-layer convolutional
neural network (CNN) that transforms the raw signals into a
reduced set of features that feed (b) a single-layer perceptron
(SLP) which predicts affective states (see Fig. 1). Our
hypothesis is that the automation of feature extraction via
deep learning will yield physiological affect detectors of higher
predictive power, which, in turn, will deliver affective models
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
of higher accuracy. The advantages of deep learning techniques mentioned in the introduction of the paper have led
to very promising results in computer vision as they have
outperformed other state-of-the-art methods [52], [53].
Furthermore, convolutional networks have been successfully
applied to dissimilar temporal datasets (e.g., [54], [25])
including electroencephalogram (EEG) signals [55] for seizure prediction.
To train the convolutional neural network (see Section
III-A) we use denoising auto-encoders [56], an unsupervised
learning method to train filters or feature extractors which
transform the information of the input signal (see Section IIIB) in order to capture a distributed representation of its leading factors of variation, but without the linearity assumption
of PCA. The SLP is then trained using backpropagation [57]
to map the outputs of the CNN to the given affective target
values. In the case study examined in this paper, target values
are given as pairwise comparisons (partial orders of length 2)
making error functions commonly used with gradient
descent methods, such as the difference of squared errors or
cross-entropy, unsuitable for the task. For that purpose, we use
the rank margin error function for preference data [58], [59] as
detailed in Section III-C below. Additionally, we apply an
automatic feature selection method to reduce the dimensionality of the feature space improving the prediction accuracy of
the models trained (see Section III-D).
A. Convolutional Neural Networks
Convolutional or time-delay neural networks [25] are hierarchical models that alternate convolutional and pooling layers
(see Fig. 1) in order to process large input spaces in which a
spatial or temporal relation among the inputs exists (e.g.,
images, speech or physiological signals).
Convolutional layers contain a set of neurons that detect
different patterns on a patch of the input (e.g., a time window
in a time-series or part of an image). The inputs of each neuron
(namely receptive field) determine the size of the patch. Each
neuron contains a number of trainable weights equal to the
number of its inputs and an additional bias parameter (also
trainable); the output is calculated by applying an activation
function (e.g., logistic sigmoid) to the weighted sum of the
inputs plus the bias (see Fig. 2). Each neuron scans sequentially
the input, assessing at each patch location the similarity to the
pattern encoded on the weights. The consecutive outputs generated at every location of the input assemble a feature map (see
Fig. 1). The output of the convolutional layer is the set of feature maps resulting from convolving each of the neurons across
the input. Note that the convolution of each neuron produces
the same number of outputs as the number of samples in the
input signal (e.g., the sequence length) minus the size of the
patch (i.e., the size of the receptive field of the neuron), plus 1
(see Fig. 1).
As soon as feature maps have been generated, a pooling layer
aggregates consecutive values of the feature maps resulting from
the previous convolutional layer, reducing their resolution with
14
x = [x0 x1 g x19]
Input
13
0
0
5
10
15
12
t4
t23
[x0
g
1
w 00 w 0
g
0
w 20 w 9
x9
w 19
g
w 29
g x19]
w 019
w 119
w 219
Neurons
s(x$w0 + i0) s(x$w1 + i1) s(x$w2 + i2)
y0
y1
y2
y0
Output
y2
y1
t4
FIGURE 2 Convolutional layer. The neurons in a convolutional layer
take as input a patch on the input signal x. Each of the neurons calculates a weighted sum of the inputs (x . w), adds a bias parameter
i and applies an activation function s(x). The output of each neuron
contributes to a different feature map. In order to find patterns that
are insensitive to the baseline level of the input signal, x is normalized with mean equal to 0. In this example, the convolutional layer
contains 3 neurons with 20 inputs each.
t2
t3
t4
t
t
t
y0 = [y0 y0 y0 ]
Input
2
3
4
y1 = [y1 y1 y1 ]
t
t2
Pooling Window
t
t
[y22 y23 y24] = y2
t4
y0
y1
y2
Avg
Avg
Avg
y0
y1
y2
y0
Output y
2
y1
t2–4
FIGURE 3 Pooling layer. The input feature maps are subsampled
independently using a pooling function over non-overlapping windows, resulting in the same number of feature maps with a lower
temporal resolution. In this example, an average-pooling layer with a
window length of 3 subsamples 3 feature maps.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
25
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Input
Encoder
Output
Decoder
Reconstructed
Input
FIGURE 4 Structure of an auto-encoder. The encoder generates the
learned representation (extracted features) from the input signals.
During training the output representation is fed to a decoder that
attempts to reconstruct the input.
a pooling function (see Fig. 3). The maximum or average values
are the two most commonly used pooling functions providing
max-pooling and average-pooling layers, respectively. This aggregation is typically done inside each feature map, so that the output of a pooling layer presents the same number of feature
maps as its input but at a lower resolution (see Fig. 1).
B. Auto-Encoders
An auto-encoder (AE) [60], [8], [21] is a model that transforms
an input space into a new distributed representation (extracted
features) by applying a deterministic parametrized function
(e.g., single layer of logistic neurons) called the encoder (see
Fig. 4). The AE also learns how to map back the output of the
encoder into the input space, with a parametrized decoder, so
as to have small reconstruction error on the training examples,
i.e., the original and corresponding decoded inputs are similar.
However, constraints on the architecture or the form of the
training criterion prevent the auto-encoder from simply learning the identity function everywhere. Instead, it will learn to
have small reconstruction error on the training examples (and
where it generalizes) and high reconstruction error elsewhere.
Regularized auto-encoders are linked to density estimation in
several ways [56], [61]; see [62] for a recent review of regularized auto-encoders. In this paper, the encoder weights (used to
obtain the output representation) are also used to reconstruct
the inputs (tied weights). By defining the reconstruction error
as the sum of squared differences between the inputs and the
reconstructed inputs, we can use a gradient descent method
26
such as backpropagation to train the weights of the model. A
denoising auto-encoder (DA) [56] is a variant of the basic
model that during training adds a variable amount of noise to
the inputs before computing the outputs. The resulting training
objective is to reconstruct the original uncorrupted inputs, i.e.,
one minimizes the discrepancy between the outputs of the
decoder and the original uncorrupted inputs.
Auto-encoders are among several unsupervised learning
techniques that have provided remarkable improvements to
gradient-descent supervised learning [4], especially when the
number of labeled examples is small or in transfer settings [62].
ANNs that are pretrained using these techniques usually converge to more robust and accurate solutions than ANNs with
randomly sampled initial weights. In this paper, we use a DA
method known as Stacked Convolutional Auto-encoders [63]
to train all convolutional layers of our CNNs from bottom to
top. We trained the filters of each convolutional layer patchwise,
i.e., by considering the input at each position (one patch) in
the sequence as one example. This allows faster training than
training convolutionally, but may yield translated versions of
the same filter.
C. Preference Deep Learning
The outputs of a trained CNN define a number of learned features extracted from the input signal. These, in turn, may feed
any function approximator or classifier that attempts to find a
mapping between the input signal and a target output (i.e.,
affective state in our case). In this paper, we train a single layer
perceptron to learn to predict the affective state of a user based
on the learned features of her physiology (see Fig. 1). To this
aim, we use backpropagation [57], which optimizes an error
function iteratively across a number of epochs by adjusting the
weights of the SLP proportionally to the gradient of the error
with respect to the current value of the weights and current
data samples.
We use the Rank Margin error function [64] that given
two data samples {xP ,xN} such that XP is preferred over (or
should be greater than) xN is calculated as follows:
E ^x P, x Nh = max " 0, 1 - ^ f ^x Ph - f ^x Nhh,,
(1)
where f (xP) and f (xN) represent the outputs of the SLP for
the preferred and non-preferred sample, respectively. This
function decreases linearly as the difference between the predicted value for preferred and non-preferred samples
increases. The function becomes zero if this difference is
greater than 1, i.e., there is enough margin to separate the
preferred “positive example” score f (xP) from the nonpreferred “negative example” score f (xN). By minimizing this
function, the neural network is driven towards learning outputs separated at least by one unit of distance between the
preferred and non preferred data sample. In each training
epoch, for every pairwise preference in the training dataset,
the output of the neural network is computed for the two
data samples in the preference (preferred and non preferred)
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
and the rank-margin error is backpropagated through the
network in order to obtain the gradient required to update
the weights. Note that while all layers of the deep architecture could be trained (including supervised fine-tuning of the
CNNs), due to the small number of labeled examples available here, the Preference Deep Learning algorithm is constrained to the last layer (i.e., SLP) of the network in order to
avoid over fitting.
D. Automatic Feature Selection
Automatic feature selection (FS) is an essential process towards
picking those features (deep learned or ad-hoc extracted) that
are appropriate for predicting the examined affective states. In
this paper, we use Sequential Forward Feature Selection (SFS)
for its low computational effort and demonstrated good performance compared to more advanced, nevertheless time consuming, feature subset selection algorithms such as the
genetic-based FS [34]. While a number of other FS algorithms
are available for comparison, in this paper we focus on the
comparative benefits of learned physiological detectors over
ad-hoc designed features. The impact of FS on model performance is further discussed in Section VI.
In brief, SFS is a bottom-up search procedure where one
feature is added at a time to the current feature set (see e.g.,
[48]). The feature to be added is selected from the subset of the
remaining features such that the new feature set generates the
maximum value of the performance function over all candidate
features for addition. Since we are interested in the minimal feature subset that yields the highest performance, we terminate
selection procedure when an added feature yields equal or lower
validation performance to the performance obtained without it.
The performance of a feature set selected by automatic FS is
measured through the average classification accuracy of the
model in three independent runs using 3-fold cross-validation.
In the experiments presented in this paper, the SFS algorithm
selects the input feature set for the SLP model.
IV. The Maze-Ball Dataset
The dataset used to evaluate the proposed methodology was gathered during an experimental game survey where 36 participants
played four pairs of different variants of the same video-game.The
test-bed game named Maze-Ball is a 3D prey/predator game that
features a ball inside a maze controlled by the arrow keys.The goal
of the player is to maximize her score in 90 seconds by collecting
a number of pellets scattered in the maze while avoiding enemies
that wander around. Eight different game variants were presented
to the players.The games were different with respect to the virtual
camera profile used, which determined how the virtual world was
presented on screen. We expected that different camera profiles
would induce different experiences and affective states, which
would, in turn, reflect on the physiological state of the players,
making it possible to predict the players’ affective self-reported
preferences using information extracted from their physiology.
Blood volume pulse (BVP) and skin conductance (SC)
were recorded at 31.25 Hz during each game session. The
players filled in a 4-alternative forced choice questionnaire after
completing a pair of game variants reporting whether the first
or the second game of the pair (i.e., pairwise preference) felt
more anxious, exciting, frustrating, fun and relaxing, with options
that include equally or none at all [33]. While three additional
labels were collected in the original experiment (boredom, challenge and frustration), we focus only on affective states or states
that are implicitly linked to affective experiences, such as fun
(thereby, removing the cognitive state of challenge), and report
only results for states in which prediction accuracies of over
70% were achieved in at least one of the input feature sets
examined (thereby, removing frustration). Finally, boredom was
removed due to the small number of clear preferences available
(i.e., most participants reported not feeling bored during any of
the games). The details of the Maze-Ball game design and the
experimental protocol followed can be found in [33], [34].
A. Ad-Hoc Extraction of Statistical Features
This section lists the statistical features extracted from the two
physiological signals monitored. Some features are extracted
for both signals while some are signal-dependent as seen in
the list below. The choice of those specific statistical features is
made in order to cover a fair amount of possible BVP and SC
signal dynamics (tonic and phasic) proposed in the majority
of previous studies in the field of psychophysiology (e.g., see
[15], [65], [51] among many).
❏ Both signals (a ! {BV P, SC}): Average E {a}, standard
deviation v {a}, maximum max {a}, minimum min {a},
the difference between maximum and minimum signal
recording D a = max {a} - min{a}, time when maximum
a occurred t max {a}, time when minimum a occurred
t min {a} and the difference D at = t max {a} - t min {a}; autocorrelation (lag equals 1) of the signal t a1 and mean of the
absolute values of the first and second differences of the signal [15] d a1 and d a2 respectively).
❏ BVP: Average inter-beat amplitude E {IBAmp}; given the
inter-beat time intervals (RR intervals) of the signal, the
following Heart Rate Variability (HRV) parameters were
computed: the standard deviation of RR intervals
v {RR}, the fraction of RR intervals that differ by more
than 50 msec from the previous RR interval pRR50 and
the root-mean-square of successive differences of RR
intervals RMS RR [65].
❏ SC: Initial, SCin, and last, SClast, SC recording, the difference
between initial and final SC recording D SC
l - i = SC last - SC in
and Pearson’s correlation coefficient RSC between raw SC
recordings and the time t at which data were recorded.
V. Experiments
To test the efficacy of DL on constructing accurate models of
affect we pretrained several convolutional neural networks—
using denoising auto-encoders—to extract features for each of
the physiological signals and across all reported affective states
in the dataset. The topologies of the networks were selected
after preliminary experiments with 1- and 2-layer CNNs and
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
27
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
Weight Value
THE WORLD’S NEWSSTAND®
N1/5
N2/5
15 35
15 35
N3/5
N4/5
N5/5
15 35
15 35
N4/5
N5/5
0.5
0.0
-0.5
15 35
Time (s)
(a)
Weight Value
N1/5
N2/5
N3/5
0.5
A. Skin Conductance
0.0
-0.5
15 30
15 30
15 30
Time (s)
15 30
15 30
(b)
FIGURE 5 Learned features of the best-performing convolutional
neural networks. Lines are plotted connecting the values of consecutive connection weights for each neuron Nx. The x axis displays the
time stamp (in seconds) of the samples connected to each weight
BV P
within the input patch. (a) CNN SC
80 (skin conductance). (b) CNN 1 # 45
(blood volume pulse).
trained using the complete unlabeled dataset. In all experiments
reported in this paper the final number of features pooled from
the CNNs is 15, to match the number of ad-hoc extracted statistical features (see Section IV-A). Although a larger number of
pooled features could potentially yield higher prediction accuracies, we restricted the size to 15 to ensure a fair comparison
against the accuracies yielded by the ad-hoc extracted features.
The input signals are not normalized using global, baseline
or subject-dependent constants; instead, the first convolutional layer of every CNN subtracts the mean value within
each patch presented, resulting in patches with a zero mean
value inside the patch, making learned features that are only
sensitive to variation within the desired time window (patch)
and insensitive to the baseline level (see Fig. 2). As for statistical features, we apply z-transformation to the complete dataset: the mean and the standard deviation value of each feature
in the dataset are 0 and 1, respectively. Independently of
model input, the use of preference learning models—which
are trained and evaluated using within-participant differences—automatically minimizes the effects of between-participants physiological differences (as noted in [33], [12]
among other studies).
We present a comparison between the prediction accuracy
of several SLPs trained either on the learned features of the
CNNs or on the ad-hoc designed statistical features. The
affective models are trained with and without automatic feature selection and compared. This section presents the key
findings derived from the SC (Section V-A) and the BVP
(Section V-B) signals and concludes with the analysis of the
fusion of the two physiological signals (Section V-C). All the
28
experiments presented here run for 10 times and the average
(and standard error) of the resulting models’ prediction accuracies are reported. The prediction accuracy of the models is
calculated as the average 3-fold cross-validation (CV) accuracy (average percentage of correctly classified pairs on each
fold). While more folds in cross-validation (e.g., 10) or other
validation methods such as leave-one-out cross-validation are
possible, we considered the 3-fold CV as appropriate for
testing the generalizability of the trained ANNs given the relatively small size of (and the high across-subject variation
existent in) this dataset.
The focus of the paper is on the effectiveness of DL for affective modeling. While the topology of the CNNs can be critical
for the performance of the model, the exhaustive empirical validation of all possible CNN topologies and parameter sets is
out of the scope of this paper. For this purpose—and also due
to space considerations—we have systematically tested critical
parameters of CNNs (e.g., the patch length, the number of layers, and the number of neurons), we have fixed a number of
CNN parameters (e.g., pooling window length) based on
suggestions from the literature and we discuss results from representative CNN architectures. In particular, for the skin conductance signal we present results on two pretrained CNNs.
The first, labeled CNN SC
20 # 11, contains two convolutional layers
with 5 logistic neurons per patch location at each layer, as well
as average-pooling over non-overlapping windows of size 3.
Each of the neurons in the first and second convolutional layer
has 20 and 11 inputs, respectively. The second network (labeled
as CNN SC
80 ), contains one convolutional layer with 5 logistic
neurons of 80 inputs each, at each patch location.
Both CNNs examined here are selected based on a number of criteria. The number of inputs of the first convolutional
layer of the two CNNs considered were selected to extract
features at different time resolutions (20 and 80 inputs corresponding to 12.8 and 51.2 seconds, respectively) and, thereby,
giving an indication of the impact the time resolution might
have on performance. Extensive experiments with smaller and
larger time windows did not seem to affect the model’s prediction accuracy. The small window on the intermediate pooling layer was chosen to minimize the amount of information
lost from the feature maps while the number of inputs to the
neurons in the next layer was adjusted to cover about a third
of the pooled feature maps. Finally, we selected 5 neurons in
the first convolutional layer as a good compromise between
expressivity and dissimilarity among the features learned: a low
number of neurons derived features with low expressivity
while a large number of neurons generally resulted in features
being very similar.
Both topologies are built on top of an average-pooling layer
with a window length of 20 samples and are topped up with an
average-pooling layer that pools 3 outputs per neuron. Although
SC is usually sampled at high frequencies (e.g., 256 Hz), we
believe that the most affect-relevant information contained in
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
1) Deep Learned Features
Figure 5(a) depicts the values of the 80 connection weights of
the five neurons in the convolutional layer of the CNN SC
80
which cover 51.2 seconds of the SC signal (0.64 seconds per
weight) on each evaluation. The first neuron (N1) outputs a
maximal value for areas of the SC signal in which a long
decay is followed by 10 seconds of an incremental trend and a
final decay. The second neuron (N2) shows a similar pattern
but the increment is detected earlier in the time window and
the follow-up decay is longer. A high output of these neurons
would suggest that a change in the experience elicited a
heightened level of arousal that decayed naturally seconds
after. The forth neuron (N4) in contrast, detects a second
incremental trend in the signal that elevates the SC level even
further. The fifth neuron (N5) also detects two increments but
several seconds further apart. Finally, the third neuron (N3)
detects three consecutive SC increments. These last three
neurons could detect changes on the level of arousal caused
by consecutive stimuli presented few seconds apart. Overall,
this convolutional layer captures long and slow changes (10
seconds or more) of skin conductance. These local patterns
cannot be modeled with the same precision using standard
statistical features related to variation (such as standard deviation and average first/second absolute differences), which further suggests that dissimilar aspects of the signal are extracted
by learned and ad-hoc features.
2) DL vs. Ad-Hoc Feature Extraction
Figure 6(a) depicts the average prediction accuracies (3-fold
CV) of SLPs trained on the outputs of the CNNs compared
to the corresponding accuracies obtained by SLPs trained on
the ad-hoc extracted statistical features. Both CNN topologies yield predictors of relaxation with accuracies over 60%
SC
(66.07% and 65.38% for CNN SC
20 # 11, and CNN 80 , respectively), which are significantly higher than the models built
on statistical features. Given the performance differences
among these networks, it appears that learned local features
could detect aspects of SC that were more relevant to the
prediction of this particular affective state than the set of adhoc statistical features proposed. Models trained on automatically selected features further validate this result [see Fig. 6(b)]
showing differences with respect to statistical features above
5%. Furthermore, the relaxation models trained on selected
90
80
SC
CNN80
SC
CNN20#11
Statistical
70
60
50
40
Relaxation
Excitement
Anxiety
Fun
(a)
Average Prediction Accuracy
the signal can be found at a lower time resolutions as even rapid
arousal changes (i.e., a phasic change of SC) can be captured
with a lower resolution and at a lower computational cost [66],
[33]. For that purpose, the selection of this initial pooling stage
aims to facilitate feature learning at a resolution of 1.56 Hz.
Moreover, experiments with dissimilar pooling layers showed
that features extracted on higher SC resolutions do not necessarily yield models of higher accuracy. The selection of 5 neurons for the last convolutional layer and the following pooling
layer was made to achieve the exact number of ad-hoc statistical
features of SC (i.e. ,15).
Average Prediction Accuracy
THE WORLD’S NEWSSTAND®
90
80
SC
CNN80
SC
CNN20#11
Statistical
70
60
50
40
Relaxation
Anxiety
Excitement
Fun
(b)
FIGURE 6 Skin conductance: average accuracy of SLPs trained on statistical features (statistical), and features pooled from each of the
SC
CNN topologies (CNN SC
20 # 11 and CNN 80 ). The black bar displayed on
each average value represents the standard error (10 runs). (a) All
features. (b) Features selected via SFS.
ad-hoc features, despite the benefits of FS, yield accuracies
lower than the models trained on the complete sets of learned
features. This suggests that CNNs can extract general information from SC that is more relevant for affect modeling
than statistical features selected specifically for the task. An
alternative interpretation is that the feature space created by
CNNs allows backpropagation to find more general solutions
than the greedy-reduced (via SFS) space of ad-hoc features.
For all other emotions considered, neither the CNNs nor
the ad-hoc statistical features lead to models that can significantly improve chance prediction (see [67] for random baselines
on this dataset). When feature selection is used [see Fig. 6(b)],
CNN-based models outperform statistical-based models on the
prediction of every affective state with accuracies above 60%
with at least one topology.
Despite the difficulty of predicting complex affective
states based solely on SC, these results suggest that unsupervised CNNs trained as a stack of denoising auto-encoders
form a promising method to automatically extract features
from this modality, as higher prediction accuracies were
achieved when compared against a well-defined set of
ad-hoc statistical features. Results also show that there are
particular affective states (relaxation and anxiety, to a lesser
degree), in which DL is able to automatically extract features
that are beneficial for their prediction. On the other hand, it
appears that DL has a lesser effect in predicting some affective states (fun and excitement) based on the SC signal compared to models build on the ad-hoc designed features.
Prediction accuracies in those affective states for both type
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
29
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
Average Prediction Accuracy
THE WORLD’S NEWSSTAND®
90
80
BVP
CNN1#45
BVP
CNN30#45
Statistical
70
60
50
40
Relaxation
Excitement
Anxiety
Fun
Average Prediction Accuracy
(a)
90
80
BVP
CNN1#45
BVP
CNN30#45
Statistical
70
60
50
40
Relaxation
Anxiety
Excitement
Fun
(b)
FIGURE 7 Blood volume pulse: average accuracy of SLPs trained on
statistical features (statistical), and features pooled from each of the
P
BV P
CNN topologies (CNN BV
1 # 45 and CNN 30 # 45 ). The black bar displayed
on each average value represents the standard error (10 runs). (a) All
features. (b) Features selected via SFS.
of features (ad-hoc or CNN-extracted) are rather low, suggesting that SC is not an appropriate signal for their modeling in this dataset. It is worth mentioning that earlier studies
on this dataset [67] report higher accuracies on the ad-hoc
statistical features than those reported here. In that study,
however, two different signal components were extracted
from the SC signal, leading to three times the number of
features examined in this paper (i.e., 45 features). Given the
results obtained in this paper, it is anticipated that by using
more learned features—for example, combining CNNs with
different input lengths that would capture information from
different time resolutions—DL can reach and surpass those
baseline accuracies.
B. Blood Volume Pulse
Following the same systematic approach for selecting CNN
topology and parameter sets, we present two convolutional networks for the experiments on the Blood Volume Pulse (BVP)
signal. The CNN architectures used in the experiments feature
the following: 1) one max-pooling layer with nonoverlapping
windows of length 30 followed by a convolutional layer with 5
logistic neurons per patch location and 45 inputs at each neuron
P
(CNN BV
1 # 45 ); and 2) two convolutional layers with 10 and 5
logistic neurons per patch location, respectively, and an intermediate max-pooling layer with a window of length 30. The neurons of each layer contain 30 and 45 inputs, respectively
P
(CNN BV
1 # 45 ). As in the CNNs used in the SC experiments, both
topologies are topped up with an average-pooling layer that
reduces the length of the outputs from each of the 5 output neu-
30
rons down to 3—i.e., the CNNs output 5 feature maps of length
3 which amounts to 15 features. The initial pooling layer of the
first network collects the maximum value of the BVP signal
every 0.96 seconds, which results in an approximation of the signal’s upper envelope—that is a smooth line joining the extremes of
the signal’s peaks. Decrements in this function are directly linked
with increments in heart rate (HR), and further connected with
increased arousal and corresponding affective states (e.g., excitement and fun [33], [18]). Neurons with 45 inputs were selected
to capture long patterns (i.e., 43.2 seconds) of variation, as sudden and rapid changes in heart rate were not expected during
the experiment game survey. The second network follows the
same rationale but the first pooling layer—instead of collecting
the maximum of the raw BVP signal—processes the outputs of
10 neurons that analyze signal patches of 0.96 seconds, which
could operate as a beat detector mechanism.
1) Deep Learned Features
Figure 5(b) depicts the 45 connection weights of each neuP
ron in CNN BV
1 # 45 which cover 43.2 seconds of the BVP signal’s upper envelope. Given the negative correlation between
the trend of the BVP’s upper envelope and heart rate, neurons
produce output of maximal values when consecutive decreasing weight values are aligned with a time window containing
an HR increment and consecutive increasing weight values
with HR decays. On that basis, the second (N2) and fifth (N5)
neurons detect two 10-second-long periods of HR increments, which are separated by an HR decay period. The first
(N1) and the forth (N4) neuron detect two overlapping increments on HR, followed by a decay in N4. The third neuron
(N3), on the other hand, detects a negative trend on HR with
a small peak in the middle. This convolutional layer appears to
capture dissimilar local complex patterns of BVP variation
which are, arguably, not available through common ad-hoc
statistical features.
2) DL vs. Ad-Hoc Feature Extraction
Predictors of excitement and fun trained on features extracted
P
with CNN BV
1 # 45 outperformed the ad-hoc feature sets—both
the complete [see Fig. 7(a)] and the automatically selected feature sets [see Fig. 7(b)]. It is worth noting that no other model
improved baseline accuracy using all features [see Fig. 7(a)]. In
particular, excitement and fun models based on statistical features achieved performances of 61.1% and 64.3%, respectively,
which are significantly lower than the corresponding accuracies
P
of CNN BV
1 # 45 [68.0% and 69.7 %, respectively—see Fig. 7(b)]
P
and not significantly different from the accuracies of CNN BV
1 # 45
with the complete set of features [57.3% and 63.0%, respectively—see Fig. 7(a)]. Given the reported links between fun
P
and heart rate [18], this result suggests that CNN BV
1 # 45 effectively extracted HR information from the BVP signal to predict reported fun. The efficacy of CNNs is further supported
by the results reported in [67] where SLP predictors of fun
trained on statistical features of the HR signal (in the same
dataset examined here) do not outperform the DL models
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
C. Fusion of SC and BVP
To test the effectiveness of learned features in fused models,
we combined the outputs of the BVP and SC CNN networks presented earlier into one SLP and compared its performance against a combination of all ad-hoc BVP and SC
features. For space considerations we only present the combination of the best performing CNNs trained on each signal
BV P
individually—i.e., CNN SC
80 and CNN 1 # 45 . The fusion of
CNNs from both signals generates models that yield higher
prediction accuracies than models built on ad-hoc features
across all affective states, using both all features and subsets of
selected features (see Fig. 8). This result further validates the
effectiveness of CNNs for modeling affect from physiological
signals, as models trained on automatically selected learned
features from the two signals yield prediction accuracies
around 70-75%. In all cases but one (i.e., anxiety prediction
with SFS) these performances are significantly higher than
the performances of corresponding models built on commonly used ad-hoc statistical features.
VI. Discussion
Even though the results obtained are more than encouraging
with respect to the applicability and efficacy of DL for affective
modeling, there are a number of research directions that should
be considered in future research.While the Maze-Ball game dataset includes key components for affective modeling and is representative of a typical affective modeling scenario, our PDL
approach needs to be tested on diverse datasets. The reduced size
of the dataset limited the number of features that could be
learned. Currently, deep architectures are widely used to extract
thousands of features from large datasets, which yields models that
outperform other state-of-the-art classification or regression
methods (e.g., [27]). We expect that the application of DL to
model affect in large physiological datasets would show larger
improvements with respect to statistical features and provide new
insights on the relationship between physiology and affect. Moreover, to be able to demonstrate robustness of the algorithm, more
and dissimilar modalities of user input need to be considered, and
different domains (beyond games) need to be explored. To that
90
SC+BVP
CNN80+1#45
80
Statistical
70
60
50
40
Relaxation
Excitement
Anxiety
Fun
(a)
Average Prediction Accuracy
presented in this paper. For reported fun and excitement,
CNN-based feature extraction demonstrates a great advantage
of extracting affect-relevant information from BVP bypassing
beat detection and heart rate estimation.
Models built on selected features for relaxation and anxiety yielded low accuracies around 60%, showing small differences between learned and ad-hoc features, which suggests
that BVP-based emotional manifestations are not the most
appropriate predictors for those two states in this dataset.
Despite the challenges that the periodicity of blood volume
pulse generates in affective modeling, CNNs managed to
extract powerful features to predict two affective states, outperforming the statistical features proposed in the literature
and matching more complex data processing methods used in
similar studies [67].
Average Prediction Accuracy
THE WORLD’S NEWSSTAND®
90
SC+BVP
CNN80+1#45
80
Statistical
70
60
50
40
Relaxation
Anxiety
Excitement
(b)
Fun
FIGURE 8 Fusion of SC and BVP signals: average accuracy of SLPs
trained on blood volume pulse and skin conductance using statistical
features on the raw signal (statistical) and features pooled from
SC + BV P
BV P
CNN SC
80 and CNN 1 # 45 CNN 80 + 1 # 45 . The black bar displayed on each
average value represents the standard error (10 runs). (a) All features. (b) Features selected via SFS.
end, different approaches to multimodal fusion in conjunction
with DL need to be investigated. The accuracies obtained across
different affective states and modalities of user input, however,
already provide sufficient evidence that the method would generalize well in dissimilar domains and modalities.
The paper did not provide a thorough analysis of the
impact of feature selection to the efficiency of DL as the
focus was put on feature extraction. To that end, more feature
selection methods will need to be investigated and compared
to SFS. While ad-hoc feature performances might be
improved with more advanced FS methods, such as geneticsearch based FS [34], the obtained results already show that
DL matches and even beats a rather effective and popular FS
mechanism without the use of feature selection in several
experiments. Although in this paper we have compared DL to
a complete and representative set of ad-hoc features, a wider
set of features could be explored in future work. For instance,
heart rate variability features derived from the Fourier transformation of BVP (see [33]) could be included in the comparison. However, it is expected that CNNs would be able to
extract relevant frequency-based features as their successful
application in other domains already demonstrates (e.g., music
sample classification [54]). Furthermore, other automatic feature extraction methods, such as principal component analysis, which is common in domains, such as image classification
[68], will be explored for psycho-physiological modeling and
compared to DL in this domain.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
31
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
blood volume pulse) individually and on their
fusion for predicting the reported affective
states of relaxation, anxiety, excitement and fun
(given as pairwise preferences). The dataset is
derived from 36 players of a 3D prey/predator game. The proposed preference deep
learning (PDL) approach overcomes standard
ad-hoc feature extraction used in the affective computing literature as it manages to yield models of equal or significantly
higher prediction accuracy across all affective states examined.
The increase in performance is more evident when automatic
feature selection is employed.
Results, in general, suggest that DL methodologies are
highly appropriate for affective modeling and, more importantly, indicate that ad-hoc feature extraction can be redundant
for physiology-based modeling. Furthermore, in some affective
states examined (e.g., relaxation models built on SC; fun and
excitement models built on BVP; relaxation models built on
fused SC and BVP), DL without feature selection manages to
reach or even outperform the performances of models built on
ad-hoc extracted features which are boosted by automatic feature selection. These findings showcased the potential of DL for
affective modeling, as both manual feature extraction and automatic feature selection could be ultimately bypassed.
With small modifications, the methodology proposed can
be applied for affect classification and regression tasks across any
type of input signal. Thus, the method is directly applicable for
affect detection in one-dimensional time-series input signals
such as electroencephalograph (EEG), electromyograph (EMG)
and speech, but also in two-dimensional input signals such as
images [27] (e.g., for facial expression and head pose analysis).
Finally, results suggest that the method is powerful when fusing
different type of input signals and, thus, it is expected to perform equally well across multiple modalities.
Learned features derived from DL architectures may
define data-based extracted patterns, which could lead
to the advancement of our understanding of emotion
manifestations via physiology.
Despite the good results reported in this paper on the skin
conductance and blood volume pulse signals, we expect that
certain well-designed ad-hoc features can still outperform
automatically learned features. Within playing behavioral
attributes, for example, the final score of a game—which is
highly correlated to reported fun in games [69]—may not be
captured by convolutional networks, which tend to find patterns that are invariant with respect to the position in the signal. Such an ad-hoc feature, however, may carry information
of high predictive power for particular affective states. We
argue that DL is expected to be of limited use in low resolution signals (e.g., player score over time) which could generate well-defined feature spaces for affective modeling.
An advantage of ad-hoc extracted statistical features resides
in the simplicity to interpret the physical properties of the signal as they are usually based on simple statistical metrics.
Therefore, prediction models trained on statistical features can
be analyzed with low effort providing insights in affective phenomena. Artificial neural networks have traditionally been considered as black boxes that oppose their high prediction power to
a more difficult interpretation of what has been learned by the
model. We have shown, however, that appropriate visualization
tools can ease the interpretation of neural-network based features. Moreover, learned features derived from DL architectures
may define data-based extracted patterns, which could lead to
the advancement of our understanding of emotion manifestations via physiology (and beyond).
Finally, while DL can automatically provide a more complete and appropriate set of features when compared to adhoc
feature extraction, parameter tuning is a necessary phase in
(and a limitation of) the training process. This paper introduced a number of CNN topologies that performed well on
the SC and BVP signals while empirical results showed that,
in general, the performance of the CNN topologies is not
affected significantly by parameter tuning. Future work, however, would aim to further test the sensitivity of CNN topologies and parameter sets as well as the generality of the
extracted features across physiological datasets, reducing the
experimentation effort required for future applications of DL
to psychophysiology.
VII. Conclusions
This paper introduced the application of deep learning (DL) to
the construction of reliable models of affect built on physiological manifestations of emotion. The algorithm proposed
employs a number of convolutional layers that learn to extract
relevant features from the input signals. The algorithm was
tested on two physiological signals (skin conductance and
32
Acknowledgment
The authors would like to thank Tobias Mahlmann for his
work on the development and administration of the cluster
used to run the experiments. Special thanks for proofreading
goes to Yana Knight. Thanks also go to the Theano development team, to all participants in our experiments, and to Ubisoft, NSERC and Canada Research Chairs for funding. This
work is funded, in part, by the ILearnRW (project no: 318803)
and the C2Learn (project no. 318480) FP7 ICT EU projects.
References
[1] R. Picard, Affective Computing. Cambridge, MA: MIT Press, 2000.
[2] R. Calvo and S. D’Mello, “Affect detection: An interdisciplinary review of models,
methods, and their applications,” IEEE Trans. Affective Comput., vol. 1, no. 1, pp. 18–37,
2010.
[3] G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,”
Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006.
[4] Y. Bengio, “Learning deep architectures for AI,” Found. Trends® Mach. Learn., vol. 2,
no. 1, pp. 1–127, 2009.
[5] I. Arel, D. Rose, and T. Karnowski, “Deep machine learning–A new frontier in artificial intelligence research [Research Frontier],” IEEE Comput. Intell. Mag., vol. 5, no.
4, pp. 13–18, Nov. 2010.
[6] M. Dash and H. Liu, “Feature selection for classification,” Intell. data anal., vol. 1, nos.
1-4, pp. 131–156, 1997.
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
[7] M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, “Unsupervised learning of invariant feature hierarchies with applications to object recognition,” in Proc. IEEE Conf.
Computer Vision Pattern Recognition, 2007, pp. 1–8.
[8] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural
networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
[9] Y. Bengio and O. Delalleau, “On the expressive power of deep architectures,” in Algorithmic Learning Theory. Berlin, Germany: Springer-Verlag, 2011, pp. 18–36.
[10] E. Vyzas and R. Picard, “Affective pattern classification,” in Proc. AAAI 1998 Fall
Symp. Emotional Intelligent: The Tangled Knot Cognition, pp. 176–182, 1998.
[11] H. P. Martínez and G. N. Yannakakis, “Mining multimodal sequential patterns: A
case study on affect detection,” in Proc. 13th. Int. Conf. Multimodal Interfaces, 2011, pp. 3–10.
[12] G. N. Yannakakis and J. Hallam, “Ranking vs. preference: A comparative study of selfreporting,” in Proc. 4th Int. Conf. Affective Computing Intelligent Interaction, 2011, pp. 437–446.
[13] J. Morris, “Observations: SAM: The self-assessment manikinan efficient cross-cultural
measurement of emotional response,” J. Advertising Res., vol. 35, no. 6, pp. 63–68, 1995.
[14] J. Russell, “A circumplex model of affect,” J. Personality Social Psychol., vol. 39, no.
6, p. 1161, 1980.
[15] R. Picard, E. Vyzas, and J. Healey, “Toward machine emotional intelligence: Analysis
of affective physiological state,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 10,
pp. 1175–1191, 2001.
[16] D. Ververidis and C. Kotropoulos, “Automatic speech classification to five emotional
states based on gender information,” in Proc. Eusipco, Vienna, pp. 341–344, 2004.
[17] D. Giakoumis, D. Tzovaras, K. Moustakas, and G. Hassapis, “Automatic recognition
of boredom in video games using novel biosignal moment-based features,” IEEE Trans.
Affective Comput., vol. 2, no. 3, pp. 119–133, July-Sept. 2011.
[18] G. N. Yannakakis and J. Hallam, “Entertainment modeling through physiology in
physical play,” Int. J. Human-Comput. Stud., vol. 66, no. 10, pp. 741–755, Oct. 2008.
[19] S. Pincus, “Approximate entropy as a measure of system complexity,” in Proc. National
Academy Sciences, 1991, vol. 88, no. 6, pp. 2297–2301.
[20] N. Lesh, M. Zaki, and M. Ogihara, “Mining features for sequence classification,” in
Proc. 5th ACM Int. Conf. Knowledge Discovery Data Mining, 1999, pp. 342–346.
[21] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layerwise training
of deep networks,” in Advances in Neural Information Processing Systems. Cambridge, MA:
MIT Press 2007, vol. 19, p. 153.
[22] P. Ekman and W. Friesen, “Facial action coding system: A technique for the measurement of facial movement,” in From Appraisal to Emotion: Differences Among Unpleasant
Feelings, Motivation and Emotion, P. C. Ellsworth, and C. A. Smith, Eds. Palo Alto, CA:
Consulting Psychologists Press, 1988, vol. 12, pp. 271–302.
[23] G. Caridakis, S. Asteriadis, K. Karpouzis, and S. Kollias, “Detecting human behavior
emotional cues in natural interaction,” in Proc. 17th Int. Conf. Digital Signal Processing, July
2011, pp. 1–6.
[24] A. Kleinsmith and N. Bianchi-Berthouze, “Affective body expression perception and
recognition: A survey,” IEEE Trans. Affective Comput., vol. PP, no. 99, p. 1, 2012.
[25] Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time
series,” in The Handbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press
1995, vol. 3361, pp. 255–258.
[26] M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, “Subject independent facial expression recognition with robust face detection using a convolutional neural network,” Neural
Netw., vol. 16, no. 5, pp. 555–559, 2003.
[27] S. Rifai, Y. Bengio, A. Courville, P. Vincent, and M. Mirza, “Disentangling factors
of variation for facial expression recognition,” in Proc. European Conf. Computer Vision,
2012, pp. 802–822.
[28] J. Susskind, A. Anderson, and G. E. Hinton, “The Toronto face dataset,” U. Toronto,
Toronto, ON, Canada, Tech. Rep. UTML TR 2010-001, 2010.
[29] A. Kapoor, W. Burleson, and R. Picard, “Automatic prediction of frustration,” Int. J.
Human-Comput. Stud., vol. 65, no. 8, pp. 724–736, 2007.
[30] S. Tognetti, M. Garbarino, A. Bonarini, and M. Matteucci, “Modeling enjoyment
preference from physiological responses in a car racing game,” in Proc. IEEE Conf. Computational Intelligence Games, 2010, pp. 321–328.
[31] C. Lee and S. Narayanan, “Toward detecting emotions in spoken dialogs,” IEEE
Trans. Speech Audio Processing, vol. 13, no. 2, pp. 293–303, 2005.
[32] J. Wagner, J. Kim, and E. André, “From physiological signals to emotions: Implementing and comparing selected methods for feature extraction and classification,” in
Proc. IEEE Int. Conf. Multimedia and Expo, 2005, pp. 940–943.
[33] G. N. Yannakakis, H. P. Martínez, and A. Jhala, “Towards affective camera control in
games,” User Model. User-Adapted Interact., vol. 20, no. 4, pp. 313–340, 2010.
[34] H. P. Martínez and G. N. Yannakakis, “Genetic search feature selection for affective
modeling: A case study on reported preferences,” in Proc. 3rd Int. Workshop Affective Interaction Natural Environments, 2010, pp. 15–20.
[35] D. Giakoumis, A. Drosou, P. Cipresso, D. Tzovaras, G. Hassapis, T. Zalla, A. Gaggioli, and G. Riva, “Using activity-related behavioural features towards more effective
automatic stress detection,” PLoS ONE, vol. 7, no. 9, p. e43571, 2012.
[36] O. AlZoubi, R. Calvo, and R. Stevens, “Classification of EEG for affect recognition:
An adaptive approach,” in AI 2009 Proc. 22nd Australasian Joint Conf. Advances in Artificial
Intelligence, pp. 52–61. 2009.
[37] M. Soleymani, M. Pantic, and T. Pun, “Multimodal emotion recognition in response
to videos,” IEEE Trans. Affective Comput., vol. 3, no. 2, pp. 211–223, 2012.
[38] S. Mcquiggan, B. Mott, and J. Lester, “Modeling self-efficacy in intelligent tutoring systems: An inductive approach,” User Model. User-Adapted Interact., vol. 18, no. 1, pp.
81–123, 2008.
[39] H. Gunes and M. Piccardi, “Bi-modal emotion recognition from expressive face and
body gestures,” J. Netw. Comput. Appl., vol. 30, no. 4, pp. 1334–1345, 2007.
[40] R. Mandryk and M. Atkins, “A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies,” Int. J. Human-Comput. Stud.,
vol. 65, no. 4, pp. 329–347, 2007.
[41] J. F. Grafsgaard, K. E. Boyer, and J. C. Lester, “Predicting facial indicators of confusion with hidden Markov models,” in Affective Computing and Intelligent Interaction, (Series
Lecture Notes in Computer Science), S. D’Mello, A. Graesser, B. Schuller, and J.-C.
Martin, Eds. Berlin, Germany: Springer-Verlag, 2011, vol. 6974, pp. 97–106.
[42] R. Kaliouby and P. Robinson, “Real-time inference of complex mental states from
facial expressions and head gestures,” in Real-Time Vision Human-Computer Interaction.
New York: Springer-Verlag, 2005, pp. 181–200.
[43] H. Kobayashi and F. Hara, “Dynamic recognition of basic facial expressions by discrete-time recurrent neural network,” in Proc. Int. Joint Conf. Neural Networks, Oct. 1993,
vol. 1, pp. 155–158.
[44] K. Kim, S. Bang, and S. Kim, “Emotion recognition system using short-term monitoring of physiological signals,” Med. Biol. Eng. Comput., vol. 42, no. 3, pp. 419–427, 2004.
[45] J. Hernandez, R. R. Morris, and R. W. Picard, “Call center stress recognition with
person-specific models,” in Affective Computing and Intelligent Interaction, (Series Lecture
Notes in Computer Science), S. D’Mello, A. Graesser, B. Schuller, and J.-C. Martin, Eds.
Berlin, Germany: Springer-Verlag, 2011, vol. 6974, pp. 125–134.
[46] J. Bailenson, E. Pontikakis, I. Mauss, J. Gross, M. Jabon, C. Hutcherson, C. Nass, and
O. John, “Real-time classification of evoked emotions using facial feature tracking and
physiological responses,” Int. J. Human-Computer Stud., vol. 66, no. 5, pp. 303–317, 2008.
[47] J. Fürnkranz and E. Hüllermeier, “Preference learning,” Künstliche Intell., vol. 19, no.
1, pp. 60–61, 2005.
[48] G. N. Yannakakis, “Preference learning for affective modeling,” in Proc. Int. Conf. Affective Computing Intelligent Interaction, Amsterdam, The Netherlands, Sept. 2009, pp. 126–131.
[49] S. Tognetti, M. Garbarino, A. Bonanno, M. Matteucci, and A. Bonarini,“Enjoyment
recognition from physiological data in a car racing game,” in Proc. 3rd Int. Workshop Affective Interaction Natural Environments, 2010, pp. 3–8.
[50] G. N. Yannakakis, J. Hallam, and H. H. Lund, “Entertainment capture through heart
rate activity in physical interactive playgrounds,” User Model. User-Adapted Interact., vol.
18, no. 1, pp. 207–243, 2008.
[51] G. N. Yannakakis and J. Hallam, “Entertainment modeling through physiology in
physical play,” Int. J. Human-Comput. Stud., vol. 66, no. 10, pp. 741–755, 2008.
[52] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep
convolutional neural networks,” in Advances in Neural Information Processing Systems 25.
Cambridge, MA: MIT Press, 2012.
[53] C. Farabet, C. Couprie, L. Najman, Y. LeCun, “Learning hierarchical features for
scene labeling,” IEEE Trans. Pattern Anal. Mach. Intell., p. 1–15, 2013.
[54] P. Hamel, S. Lemieux, Y. Bengio, and D. Eck, “Temporal pooling and multiscale
learning for automatic annotation and ranking of music audio,” in Proc. 12th Int. Conf.
Music Information Retrieval, 2011, pp. 729–734.
[55] P. Mirowski, Y. LeCun, D. Madhavan, and R. Kuzniecky, “Comparing SVM and
convolutional networks for epileptic seizure prediction from intracranial EEG,” in Proc.
IEEE Workshop Machine Learning Signal Processing, 2008, pp. 244–249.
[56] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proc. Int. Conf. Machine Learning,
2008, pp. 1096–1103.
[57] D. Rumelhart, Backpropagation: Theory, Architectures, and Applications. Hillsdale, NJ:
Lawrence Erlbaum, 1995.
[58] B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and
K. Weinberger, “Learning to rank with (a lot of ) word features,” Inform. Retrieval, vol. 13,
no. 3, pp. 291–314, 2010.
[59] D. Grangier and S. Bengio, “Inferring document similarity from hyperlinks,” in Proc.
ACM Int. Conf. Information Knowledge Management, 2005, pp. 359–360.
[60] G. E. Hinton and R. S. Zemel, “Autoencoders, minimum description length, and Helmholtz free energy,” in Proc. Neural Information Processing System NIPS’1993, 1994, pp. 3–10.
[61] G. Alain, Y. Bengio, and S. Rifai, “Regularized auto-encoders estimate local statistics,” Dept. IRO, Université de Montréal, Montreal, QC, Canada, Tech. Rep. Arxiv
Report 1211.4246, 2012.
[62] Y. Bengio, A. Courville, and P. Vincent, “Unsupervised feature learning and deep
learning: A review and new perspectives,” Université de Montréal, Tech. Rep. Arxiv
Report 1206.5538, 2012.
[63] J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, “Stacked convolutional autoencoders for hierarchical feature extraction,” in Proc. Int. Conf. Artificial Neural Networks
and Machine Learning, pp. 52–59, 2011.
[64] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proc. Int. Conf. Machine Learning,
2008, pp. 160–167.
[65] J. Goldberger, S. Challapalli, R. Tung, M. Parker, and A. Kadish, “Relationship of
heart rate variability to parasympathetic effect,” Circulation, vol. 103, no. 15, p. 1977, 2001.
[66] N. Ravaja, T. Saari, M. Salminen, J. Laarni, and K. Kallinen, “Phasic emotional
reactions to video game events: A psychophysiological investigation,” Media Psychol., vol.
8, no. 4, pp. 343–367, 2006.
[67] H. P. Martínez, M. Garbarino, and G. N. Yannakakis, “Generic physiological features
as predictors of player experience,” Affective Comput. Intell. Interact., pp. 267–276, 2011.
[68] W. Zhao, R. Chellappa, and A. Krishnaswamy, “Discriminant analysis of principal
components for face recognition,” in Proc 3rd IEEE Int. Conf. IEEE Automatic Face Gesture
Recognition, 1998, pp. 336–341.
[69] H. P. Martínez, K. Hullett, and G. N. Yannakakis, “Extending neuro-evolution preference learning through player modeling,” in Proc. IEEE Conf. Computational Intelligence
and Games, Copenhagen, Denmark, Aug. 2010, pp. 313–320.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
33
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Abe Kazemzadeh, Sungbok Lee, and Shrikanth Narayanan
University of Southern California, USA
I. Introduction
W
ords and natural language play a central role in
how we describe and understand emotions. One
can learn about emotions first-hand by observing
physiological or behavioral data, but to communicate emotional information to others who are not first-hand
observers, one must use natural language descriptions to communicate the emotional information. The field of
affective computing deals with creating computer
systems that can recognize and understand human
emotions. To realize the goals of affective computing, it is necessary not only to recognize and
model emotional behavior, but also to understand
the language that is used to describe such emotional behavior. For example, a computer system
that recognizes a user’s emotion from speech
should not only recognize the user’s emotion from
expressive speech acoustics, but also understand
when the user says “I am beginning to feel X,”
where “X” is a variable representing some emotion
word or description. The ability to understand
descriptions of emotions is important not only for
human-computer interaction, but also in deliberative decision making activities where deriving
behavioral analytics is based on natural language
(for example, in mental health assessments). Such
analytics often rely on abstract scales that are
defined in terms of natural language.
This paper looks at the problem of creating a
computational model for the conceptual meaning
of words used to name and describe emotions. To
do this, we represent the meaning of emotion
words as interval type-2 fuzzy sets (IT2 FSs) that
constrain an abstract emotion space. We present two models
that represent different views of what this emotion space might
be like. The first model consists of the Cartesian product of the
abstract scales of valence, activation, and dominance. These
scales have been postulated to represent the conceptual meaning of emotion words [1]. The second model is based on scales
Digital Object Identifier 10.1109/MCI.2013.2247824
Date of publication: 11 April 2013
34 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
1556-603X/13/$31.00©2013IEEE
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Abstract—This paper presents two models that use interval type-2 fuzzy sets (IT2 FSs) for representing the
meaning of words that refer to emotions. In the first model, the meaning of an emotion word is represented
by IT2 FSs on valence, activation, and dominance scales. In the second model, the meaning of an emotion
word is represented by answers to an open-ended set of questions from the game of Emotion Twenty
Questions (EMO20Q). The notion of meaning in the two proposed models is made explicit using the
Fregean framework of extensional and intensional components of meaning. Inter- and intra-subject uncertainty is captured by using IT2 FSs learned from interval approach surveys. Similarity and subsethood operators are used for comparing the meaning of pairs of words. For the first model, we apply similarity and
subsethood operators for the task of translating one emotional vocabulary, represented as a computing with
words (CWW) codebook, to another. This act of translation is shown to be an example of CWW that is
extended to use the three scales of valence, activation, and dominance to represent a single variable. We
experimentally evaluate the use of the first model for translations and mappings between vocabularies.
Accuracy is high when using a small emotion vocabulary as an output, but performance decreases when the
output vocabulary is larger. The second model was devised to deal with larger emotion vocabularies, but
presents interesting technical challenges in that the set of scales underlying two different emotion words
may not be the same. We evaluate the second model by comparing it with results from a single-slider survey.
We discuss the theoretical insights that the two models allow and the advantages and disadvantages of each.
activation, and dominance scales. In the second model, emotion
concepts are represented as lists of propositions and associated
truth values.
In both models, the algebraic properties of fuzzy sets can be
used as a computational model for the meaning of an emotion
word. We outline the properties of these models and describe
the methodology that estimates the fuzzy set
shape parameters from data collected in interval
approach surveys [2], [3]. In an interval approach
survey, subjects rate words on abstract scales, but
instead of picking a single value on the scales (as
in a Likert scale survey), users select interval
ranges on these scales. In the two models we
present, the survey results are aggregated into
fuzzy sets for words in an emotion vocabulary.
The fuzzy set representation allows one to compute logical relations among these emotion
words. By using the relations of similarity and
subsethood as measures of mappings between
items of two vocabularies, one can translate
between these vocabularies. This allows us to use
our first model for several applications that
involve mapping between vocabularies of emotion words: converting emotion labels from one
codebook to another, both when the codebooks
are in the same language (for example, when
using different emotion annotation schemes) and
when they are in different languages, such as
when translating emotion words from one language to another (here, Spanish and English).
These applications show one way our proposed
model may be used and provide experimental
evidence by which we can evaluate the model.
For evaluation of the first model, we compare the
translation applications with human performance
in these tasks as a benchmark.
© CORBIS
derived from answers to yes/no questions, where each scale can
be seen as the truth value of a proposition. In each model, the
meaning of an emotion word is represented as a fuzzy set in an
emotion space, but the two models represent different theoretical organizations of emotion concepts. In the first, a spatial
metaphor is used to organize emotion concepts on valence,
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 35
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
One of the contrastive traits of this
research is that we try to use the dimensional
approach and fuzzy logic to model emotion
concepts used in natural language descriptions
of emotions [11], rather than characterizing
data from emotional human behavior [14],
[4]–[7]. Focusing on the conceptual meaning
of emotion words allows us to consider cases where emotion is
communicated through linguistic meaning, as opposed to paralinguistics or body language. The dimensional approach has
been used to both describe emotional data and emotion concepts but more often than not this distinction is not made clear.
By describing our model of the meaning of emotion words in
terminology established by the philosophy of language, we
hope to clarify this issue. Furthermore, by rigorously defining
the idea of an emotional variable and operations on such variables in terms of fuzzy logic, we can establish general relations
such as similarity and subsethood that can be applied even if
the underlying representation of valence, activation, and dominance is changed. Another contrast between this work and
other research using fuzzy logic to represent emotional dimensions is that we use IT2 FSs [15] and the interval approach [3].
This allows our model to account for both inter- and intrasubject variability. Compared with the earlier developments of
[16]–[18], this paper offers a more detailed description of the
theoretical framework and analysis of experimental results by
incorporating subsethood and applying newer developments to
the interval approach [19] (Section III-D). This paper also
extends these results by proposing a second model to deal with
larger emotion vocabularies (Section IV-C).
By constraining our focus to a conceptual level, we focus on
input/output relations whose objects are words, rather than
observations of stimuli and behavior. As such, this work can be
seen as an instance of Computing with Words (CWW) [20],
[21], [22]. CWW is a paradigm that considers words as the
input and output objects of computation. Perceptual computing [23], [24] is an implementation of the CWW paradigm that
we draw upon in this work.
The rest of the paper is organized as follows. In Section II,
we describe what we mean by the “meaning” of emotion
words. This is an important topic on its own, but we give an
introduction that we deem sufficient for the purposes of this
article. In Section III, we describe the fuzzy logic framework
and the proposed computational models for emotion words. In
Section IV, we describe the experimental implementation and
testing of the models. The results are presented in Section V. We
discuss advantages and disadvantages of these models in Section
VI conclude in Section VII.
Focusing on the conceptual meaning of emotion
words allows us to consider cases where emotion is
communicated through linguistic meaning, as opposed
to paralinguistics or body language.
Our results show that performance of the first model
decreases when the vocabulary size gets larger, which indicates
that a three-scale representation for emotions is ideal only for
small vocabularies. To address this limitation, our second model
uses inspiration from the game of twenty questions, where
players can identify a large set of objects using question-asking.
Because people’s beliefs about emotions can be subjective,
many of the answers to questions about emotions are vague
and can be represented as fuzzy sets. For evaluation of this
model, we test the estimated IT2 FS on data from different
subjects who took a single-value survey by finding the membership of these points in the estimated IT2 FS.
Other research has presented related methodologies–using
fuzzy logic for affective computing, emotion lexical resource
development, and representing emotions using valence, activation, and dominance dimensions. We will commence by
describing some of these works and the novelties that will be
introduced by our paper.
There are many examples where fuzzy logic has been
applied to the task of recognizing and representing observed
emotional behavior. [4] gives an example where fuzzy logic is
applied to multimodal emotion recognition. Other examples
of fuzzy logic in emotion recognition are [5]–[7], which use
fuzzy logic rules to map acoustic features to a dimensional
representation in valence, activation, and dominance. [8] uses
an IT2 FS model for emotion recognition from facial expressions. The model of [9] uses fuzzy logic for emotional behavior generation.
Another related trend of research is the development of
lexical resources. Our work can be seen as a lexical resource
framework like the Dictionary of Affective Language (DAL)
[10]. In this work, 8745 common English words were evaluated for valence and activation (as well as a third dimension,
imagery). The methodology for collecting the data in this
paper was similar to our survey in presenting subjects with
words as stimuli, but in the DAL the values of each word’s
dimensions are the mean across all subjects, so there is no estimate of the intra-subject variation. Also, compared with the
DAL, we focus on words that are names of emotions, rather
than words that might have emotional connotation. As such,
our approach is more geared toward analyzing the meaning of
short utterances explicitly referring to emotions, which we call
natural language descriptions of emotion [11], while the dictionary
of affect would be more appropriate for characterizing the
emotional tone at the document-level. Another related
research trend outside the domain of affective computing is
the study of linguistic description of signals [12], [13], which
aims to associate words with the signals they describe.
36
II. The Meaning of Meaning
What does it mean to say that our model represents the meaning of emotion words? We believe this is an important question
and therefore we will briefly discuss meaning in general in Section II-A and then explain how it relates to the meaning of
emotion words in Section II-B.
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
A. Meaning in General
In an influential paper around the end of the 19th century, the
philosopher of language Gottlob Frege described two components of meaning: extension and intension [25]. The extensional
component of meaning is a mapping from words to things in
the world, whereas the intensional meaning is a mapping from
words to concepts. The stereotypical example of this is illustrated by the terms “morning star,” “evening star,” and “Venus.”
The extensional meaning of these three terms is the same,
namely the second planet in the solar system. However, the
intensional meaning of these three terms is different, which
explains why the three terms cannot be freely substituted in an
arbitrary sentence without changing the meaning of the sentence. In this paper, we focus on the meaning of individual
words, but we touch upon the topic of the meaning of phrases
in the second model.
Although the notion of extension and intension are most
frequently associated with the field of philosophy of language,
the idea can also be described in mathematical terms [26].
One can think of the extension of a function as a set of
ordered pairs, where the first item of the pair is an input to
the function and the second item in the pair is the corresponding output. The intensions of a function are described
by their symbolic or algorithmic representations. Therefore
we can have “ f ^ x h = x 2 ” or “ f (x) = x ) x ” as intensions of
the extensional set of pairs “ 1, 1 , 2, 4 , 3, 9 , f.” Extension
and intension have been formally described in the study of
formal concept analysis [27].
We believe that by defining meaning in this way, we can
describe our model more precisely. Without explicitly describing “meaning,” whether in terms of extension and intension or
otherwise, this important concept tends to get blurred.
Although, this topic is complex, the intuition behind it is rather
simple: similar, intuitive distinctions along the lines of intension
and extension are common. Extension-related terms include:
referent, percept, object, empirical data, Aristotelian world view,
or stimulus meaning. Intension-related terms include: signified,
concept, subject, knowledge structure, schema, Platonic world
view, or linguistic meaning. The process of understanding a
word is a mapping, or interpretation, from the word itself to the
word’s meaning, whether it be intensional or extensional. We
argue that, when understanding natural language in the absence
of first-hand, perceptual evidence, people refer to intensional
meaning rather than extensional meaning. It is intensional
meaning that we focus on in this paper.
B. The Meaning of Emotion Words
According to the definition of meaning described above, the
extensional meaning of an emotion word is the set of human
behaviors and states of the world that the word refers to. The
intensional meaning of an emotion word is the concept that
people have when using it to communicate. Although most
other examples of emotion research do not make an explicit
distinction between intensional and extensional meaning, it
seems that many tend towards extensional meaning, especially
those that deal with the analysis of emotional data that has
been annotated with emotional labels. In this view, the extensional meaning of an emotion word used as an annotation label
refers to the set of all data to which it has been applied. The
focus on intensional meaning in this work therefore can be
seen as one of its distinguishing features, though it could be
said that machine learning that generalizes from training data is
in fact a way to infer intentional meaning.
The question then arises about the form of this intensional meaning, in particular, how we can simulate this subjective form of meaning, with respect to emotion words, in a
computer. The two computational models we describe mirror
two different theoretical views of intensional meaning. One
view seeks to represent the intensional meaning of emotion
words as points or regions of an abstract, low-dimensional
semantic space of valence, activation, and dominance. The
other view seeks to represent the intensional meaning of
emotion words in relation to other propositions. This latter
perspective is exemplified in the Emotion Twenty Question
(EMO20Q) game. EMO20Q is played like the normal
twenty questions guessing game except that the objects to be
guessed are emotions. One player, the answerer, picks an
emotion word and the other player, the questioner, tries to
guess the emotion word by asking twenty or fewer yes-no
questions. Each question can be seen as a proposition about
the emotion word, which prompts an answer that ranges on a
scale from assent to dissent.
Scale-based models of emotion have an interesting history
that goes back to Spearman’s attempts to measure general
intelligence using factor analysis. At first Spearman hypothesized that there was one underlying scale that could represent
a person’s intelligence, but later it came to be realized that
intelligence was a complex concept that required multiple
scales. Factor analysis was the method used to isolate these
scales, and in turn factor analysis was used in the pioneering
work [28] that first identified valence, activation, and dominance as factors in the connotative meanings of words. In
[28], psychologists, aided by one of the early computers, conducted semantic differential surveys that tried to measure the
meaning of words on Likert scales whose endpoints were
defined by thesaurus antonyms. Valence, activation, and dominance were identified as interpretations of the factors that
were encountered. Some of the early applications of this
emotional model to language are [29], [1], [30], [10]. The pictorial representation of these dimensions, which we use in the
interval surveys, was developed by [31]. It should be noted
that the valence, dominance, and activation representation is
merely a model for emotional meaning and these scales most
likely do not exhaustively describe all emotional concepts. In
[32] it is argued that four dimensions are required; “unpredictability” in addition to the three scales we use. The
approach we advocate here is based on an algebraic model
that is generalizable to any scales. Our choice of the three
scales for this model was motivated by their wide usage and
to balance theoretical and practical concerns.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
37
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
III. IT2 FS Model for the Meaning
of Emotion Words
Perceptual Computer for Translation
Word in
Language 1
Encoder
Fuzzy
Sets
Computing with
Words (CWW)
Engine
Fuzzy
Sets
Word in
Language 2
Decoder
FIGURE 1 Translation as a perceptual computer.
The second model we propose takes a different perspective.
Rather than having theoretically motivated scales for various
characteristics of emotions, the second model aims to represent
the intentional meaning of emotion words in terms of natural
language propositions that can be assented to or dissented
from. This view could also be construed as an abstract scale of
truth with respect to various propositions (which has been
considered in the study of veristic fuzzy sets [33]–[35]), but we
see this view as qualitatively different from the first model. The
reason why we see the propositional model as different from
the scale-based model is that, first, the number of propositions
about emotions will generally be larger than the number of
emotion words, whereas in the case of the scale-based representation the number of scalar dimensions will be smaller than
the emotion vocabulary size. Another reason that the propositional model can be considered qualitatively different than the
scale-based model is that propositions can be verbally (or
orthographically) expressed as linguistic stimuli, whereas
abstract scales carry more cognitive implications and are language independent. Some questions from EMO20Q closely
correspond to the scales in the first model, e.g., “is it positive?”
is similar to valence, “is it a strong emotion?” is similar to activation, and “is it related to another person?” hints at dominance. However, model 2 contains many questions
that are very specific, such as “would you feel this emotion on
your birthday?”.
The models we propose can be seen as an algebraic representation where theoretical entities like emotion concepts are
considered virtual objects [36] with abstract scales. In this view,
a collection of scales that describe an object can be seen as a
suite of congruence relations. Recall that a congruence relation /
(mod P) is an equivalence relation that holds given some
property or function P. A suite of congruence relations is a
bundle of equivalence relations " +i: i d I , , again, given some
property P. In both of the models we present, P are fuzzy sets
in an emotion space. In the case of the first model we present,
I is a set which can contain valence, activation, and/or dominance. In the case of the second model, I is a set of propositions derived from the EMO20Q game [37]–[40]. For example, for the statement that “ f makes you smile,” we can say that
happy and amused are congruent given this statement about
smiling behavior. In terms of the scales, the equivalence relations on each scale divide the scale space into equivalence
classes. In the next section, we describe this space of emotions
in more detail.
38
A. Emotion Space
and Emotional Variables
Let E be an emotion space, an abstract space
of possible emotions (this will be
explained later in terms of valence, activation, and dominance, but for the time
being we will remain agnostic about the
underlying representation). An emotion variable f represents an
arbitrary region in this emotion space, i.e., f 1 E , with the
subset symbol 1 used instead of set membership ^! h because
we wish to represent regions in this emotion space in addition to single points.
The intensional meaning of an emotion word can be represented by a region of the emotion space that is associated with
that word. An emotion codebook C = ^W C, eval C h is a set of words
W C and a function eval C that maps words of W C to their corresponding region in the emotion space, eval C :W C " E. Thus, an
emotion codebook can be seen as a dictionary for looking up the
meaning of words in a vocabulary. Words in an emotion codebook can also be seen as constant emotion variables. The region
of the emotion space that eval C maps words to is determined by
interval surveys, as described in Section III-D.
We consider two basic relations on emotion variables: similarity and subsethood. Similarity, sm : E # E, is a binary equivalence relation between two emotion variables (we will see that
the fuzzy logic interpretation of similarity will actually be a
function, sm : E # E " 60, 1@, which measures the amount of
similarity between the variables rather than being true or false).
Subsethood, ss : E # E, is a binary relation between two emotion variables that is true if the first variable of the relation is
contained in the second. Like similarity, the fuzzy logic interpretation of subsethood is a value between zero and one. Further details are provided in Section III-C, where we will define
the fuzzy logic interpretation of these relations.
Finally, a translation is a mapping from the words of
one vocabulary to another, as determined by the corresponding codebooks:
translate :W 1 # C 1 # C 2 " W 2 ,
(1)
which is displayed schematically in Figure 1. This can be
decomposed by thinking of C 1 # C 2 as a similarity or subsethood matrix, which is denoted as the CWW engine in the
figure. Translation can be seen as selecting the word from the
output language w output ! W 2 such that the similarity or subsethood is maximized for a given w input ! W 1 . In the case of
similarity, the translation output is
w output = arg max sm ^eval C2 ^w 2h, eval C1 ^w inputhh ,
w2 ! W2
(2)
where the argmax functions as the decoder in Figure 1. The
formulation of similarity and subsethood in terms of IT2 FSs
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
will be described in Section III-C and we will empirically
evaluate the use of similarity and subsethood for use in translation in Section V.
c
1
b
B. Fuzzy Logic and Emotion Concepts
In Section III-A, the definition of an emotion space E followed a traditional set theoretic formulation. Traditional, nonfuzzy sets have crisp boundaries, which means that we can
precisely determine whether a region in the emotion space is
a member of any given set representing an emotion word.
However, this seems to contradict the intuition and evidence
that emotion concepts are somewhat vague and not precisely
defined sets [41]. There are several sources of uncertainty that
theoretically preclude precise set boundaries in either of the
two models we present. There is modeling uncertainty because
a computational model is necessarily an approximation of
human thought processes. There is measurement uncertainty
because the precision on these scales may be limited by perceptual processes of mapping sensory data to concepts and in
distinguishing between concepts. Finally, there is uncertainty
due to inter- and intra-subject variation. Postulating a blurred
boundary between emotion concepts leads us to use fuzzy
logic, in particular IT2 FSs.
If we deem that emotion concepts can be represented as
fuzzy sets in either of these two models, then how do we
determine the shapes of sets in this space? As we describe later
in Section III-D, we use the interval approach survey methodology. One can think of a Likert type of survey where the
scales represents valence, activation, and dominance and then
query users with emotion words as stimuli; however, subjects
may be unsure about picking a specific point on the scale due
to vagueness in the meaning of emotion words, especially
broadly defined emotion words like those typically used as primary emotions. To deal with this intra-subject uncertainty, we
turn to interval surveys and IT2 FSs.
Just as type-1 fuzzy sets extend classical sets by postulating
set membership grade to be a point in [0,1], type-2 fuzzy sets
further extend this generalization by defining a membership
function’s membership grade at a given point in the domain
to be a distribution in [0,1] rather than a single point, which
allows for uncertainty in the membership grade [42]. The
rationale for type-2 fuzzy logic is that even if a membership
function takes a value between 0 and 1, there is still no
uncertainty being represented because the membership
value is a fixed point. What is represented by type-1 fuzzy sets
is partial membership, not uncertainty. Whenever there is
uncertainty, type-2 fuzzy logic is motivated on theoretical
grounds [21]. The region of uncertainty in the membership
grade with respect to the domain is known as the footprint
of uncertainty.
While general type-2 fuzzy logic systems account for
uncertainty, they are more conceptually and computationally
complex, and methods to estimate them directly from human
input are still ongoing areas of research [43]. IT2 FSs use intervals to capture uncertainty of the membership grade [15].
a
e’
0
d
a’
b’
c’
d’
FIGURE 2 Example of a trapezoidal interval type-2 membership function (IT2 MF). A normalized trapezoidal IT2 MF can be specified with
nine parameters, (a, b, c, d, a’, b’, c’, d’, e’). The trapezoidal height of
the upper membership function (e), can be omitted in normalized
IT2 FSs because it is always equal to 1.
Instead of an arbitrary distribution in [0, 1] as is the case for
general type-2 fuzzy sets, IT2 FSs use an interval [l, u] in [0,1]
to represent an area of uniform uncertainty in the membership
function’s value, where 0 # l # u # 1 are the lower and upper
bounds of the uncertainty interval, respectively. IT2 FSs can be
regarded as a first-order representation of uncertainty because
they are the simplest type of fuzzy set that will account for
uncertainty in the membership function. Also, as will be discussed in Section III-D, there is a method for constructing IT2
FSs from human input, which makes the use of IT2 FSs practical for human-computer interaction.
IT2 FSs have been widely used because they approximate
the capability to represent the uncertainty of general type-2
fuzzy set models while still using many of the same techniques used for type-1 fuzzy sets. IT2 FSs can be represented
as two type-1 membership functions: an upper membership
function, which defines the upper bound of membership, and
a lower membership function, which represents the lower
bound on membership. When these coincide, the IT2 FS
reduces to a type-1 fuzzy set [44], [45]. If the difference
between the upper and lower membership function is wide,
this means that we have much uncertainty about the membership grade.
An example of an interval type-2 membership function
can be seen in Fig. 2. The area between the upper and lower
membership functions is the footprint of uncertainty. In this
paper, as an engineering decision we have restricted ourselves
to trapezoidal membership functions, which can be specified
in a concise way using a 5-tuple (a, b, c, d, e). The first number
of the tuple, a, represents the x-value of the left side point of
the base of the trapezoid, b represents the x-value of the left
side point of the top of the trapezoid, c represents the x-value
of the right side point of the top of the trapezoid, d represents
the x-value of the right side point of the base of the trapezoid, and e represents the height of the trapezoid (i.e., the
y-value of the top of the trapezoid). Since IT2 FSs consist of
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
39
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
an upper and lower membership function, they can be represented as a 10-tuple. However, in the case of normalized
interval type-2 membership functions, those whose upper
membership function reaches 1, we can leave out the height
of the upper membership function and specify the fuzzy set
as a 9-tuple consisting of a 4-tuple for the upper membership
function with the fifth value assumed to equal be 1, and a
5-tuple for the lower membership function (we must include
the fifth value, e´ as described above, because in general the
height of the lower membership function can be anywhere
between 0 and 1).
C. Similarity and Subsethood
Similarity and subsethood form important parts of our model
of emotions.
The notion of similarity allows us to indicate that some
pairs of emotion concepts are more or less similar. For example,
we would say that angry is more similar to frustration than it is
to happiness. When we make this judgment, we do not explicitly consider specific experiential examples of angry, frustrated,
and happy data. Rather, we argue that one can make similarity
judgments based on a mental representations of emotions. Two
people could have disjoint sets of formative emotional stimuli,
but still largely agree on the emotion concepts which form the
intensional meaning of emotion words. In the fuzzy logic
interpretation, similarity ranges from 0 to 1, where 1 is equality
of two membership functions, and 0 indicates that the membership functions have no overlap.
The notion of subsethood allows us to capture that
some general emotions might encompass other emotions.
For example, “amused” might be a subset of “happy.” The
notion of subsethood is defined for traditional sets as being
a Boolean value, but for fuzzy sets it takes a value between
0 and 1.
Similarity and subsethood are closely related. For clarity, we
present the definitions of similarity and subsethood in terms of
crisp sets, then type-1 and type-2 fuzzy sets. The definitions of
the fuzzy set similarity and subsethood follow naturally from
crisp sets.
The general form of similarity is based on the Jaccard
Index, which states that the similarity of two sets is the cardinality of the intersection divided by the cardinality of the
union, i.e.,
sm J ^ A, B h =
A+B
.
A,B
(3)
For fuzzy sets, the set operations of intersection and union
(j and k) are realized by the min and max functions and the
cardinality operator (| |) is realized by summing along the
domain of the variable. Thus for type-1 fuzzy sets,
sm J ^ A, B h =
40
/ Ni =1 min ^ n A ^x ih, n B ^x ihh
.
/ Ni =1 max ^ n A ^x ih, n B ^x ihh
(4)
For IT2 FSs, the right hand side of this equation becomes
/ Ni =1 min ^ n A ^x ih, n B ^x ihh + / Ni =1 min ^ n A ^x ih, n B ^x ihh
,
/ Ni =1 max ^ n A ^x ih, n B ^x ihh + / Ni =1 min ^ n A ^x ih, n B ^x ihh
(5)
where n ^ x h and n ^ x h are the upper and lower membership
functions, respectively.The formulas for similarity are symmetric
^sm J ^ A, B h = sm J ^B, A hh and reflexive ^sm J ^ A, A h = 1 h [23].
We also examined a different, earlier similarity method
called the Vector Similarity Method (VSM) [46]. This method
was used in earlier experiments [16], so we tested it in addition
to the newer Jaccard-based method. The VSM uses intuition
that similarity of a fuzzy set is based on two notions: similarity
of shape and similarity of proximity. Thus, the similarity of two
fuzzy sets can be seen as a two element vector: ss V ^ A, B h =
^ss shape ^ A, B h, ss proximity ^ A, B hhT . The similarity measure of
proximity is based on the Euclidean distance between the fuzzy
set centroids. The similarity measure of shape is based on the
Jaccard similarity between the two fuzzy sets once their centroids have been aligned. To convert the vector similarity to a
single scalar, the product of ss shape and ss proximity is taken.
The subsethood measure is closely related to similarity and
is based on Kosko’s subsethood [47] for type-1 fuzzy sets. The
measure of subsethood of a set A in another set B is defined as:
ss K ^ A, B h =
A+B
.
A
(6)
As with the similarity metric, when the set and cardinality operators are replaced by their fuzzy logic realizations, one obtains
/ Ni =1 min ^ n A ^x ih, n B ^x ihh
ss K ^ A, B h =
/ Ni =1 n A ^x ih
(7)
for the case of type-1 fuzzy sets and for type-2 fuzzy sets the
right hand side of the equation becomes
/ Ni =1 min ^ n A ^x ih, n B ^x ihh + / Ni =1 min ^ n A ^x ih, n B ^x ihh
.
/ Ni =1 n A ^x ih + / Ni =1 n A ^x ih
(8)
As opposed to similarity, subsethood is asymmetrical, i.e.,
ss K ^ A, B h ! ss K ^B, Ah .
These equations give the similarity and subsethood measures for fuzzy variables of one dimension. To aggregate the
similarity of the three dimensions of valence, activation, and
dominance, we tr ied several methods: averaging the
similarity of the individual dimensions sm avg ^ A, B h =
1/3 / i ! "Val.,Act.,Dom. , sm i ^ A i, B i h, taking the product of the
similarity of the individual dimensions sm prod ^ A, B h =
% i ! "Val.,Act.,Dom. , sm i ^A i, B ih, and taking the linguistic weighted
average [48] sm lwa ^ A, B h = / i ! "Val.,Act.,Dom. , sm i ^ A i, B i h w i /
/ i ! "Val.,Act.,Dom , w i . The results of these different choices are
described in Section V.
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
D. Interval Surveys Using the Interval Approach
IV. Methodology
To estimate the interval type 2 fuzzy sets over the valence,
This section describes the experimental methodologies that
activation, and dominance scales, we used the interval
were used to create the two models for emotion codebooks.
approach [2], [3]. This survey methodology uses a Likert-like
In the first, we use an interval approach survey for emotion
scale but the subjects select interval ranges instead of single
words and we adapt the CWW paradigm to account for
numbers on the scale, which results in IT2 FSs. One of the
3-dimensional fuzzy scales, specifically, by implementing siminovelties of our work that adds to [2], [3] is that we look at
larity and subsethood measures for fuzzy sets that have 3
modeling a phenomenon where the underlying variable is
dimensions. In the case of the second model, the interval surcomposed of multiple scales: three separate scales (valence,
vey is separate from the elicitation of emotional information.
activation, and dominance) in the case of our first model, and
The emotional information is collected from the EMO20Q
an open-ended number of scales in our second model.
game and thereafter the fuzzy sets are calculated from the
The interval approach assumes that most people will be able
answers to the questions in the game.
to describe words on a scale, similar to a Likert scale. However,
while the Likert scale approach allows the subject to choose
A. Emotion Vocabularies
only a single point on the scale, the interval approach allows
In our experiments, we examined four different emotion
the subject to select an interval that encloses the range on the
vocabularies. The first vocabulary consisted of seven emotion
scale that the word applies to. Thus, while a Likert scale can
category words: angry, disgusted, fearful, happy, neutral, sad, and
capture direction and intensity on a scale, the interval approach
surprised. These are commonly used emotion categories used
also captures uncertainty. This uncertainty that an individual
for labeling emotional data. We refer to this vocabulary as
user has about a word can be thought of as intra-user uncerEmotion Category Words. These emotions are posited to be
tainty. The user does not need to know about the details of
basic in that they are reliably distinguishable from facial
interval type-2 fuzzy logic; they can indicate their uncertainty
expressions [49].
as an interval which is then aggregated into
IT2 FSs by the interval approach, which
TABLE 1 Similarity between words of the Blog Moods vocabulary and the Emotion
Category Word vocabulary.
represent inter-user uncertainty.
After collecting a set of intervals from
ANGRY DISGUSTED FEARFUL HAPPY NEUTRAL SAD SURPRISED
an interval approach survey, the interval
AMUSED
0.004
0.003
0.005
0.060
0.004
0.005 0.053
approach estimates an IT2 FS that takes
TIRED
0.006
0.003
0.034
0.001
0.038
0.196 0.001
into account the collective uncertainty of
CHEERFUL
0.003
0.003
0.003
0.109
0.001
0.002 0.088
BORED
0.015
0.012
0.075
0.004
0.064
0.335 0.004
a group of subjects. This type of uncerACCOMPLISHED
0.015
0.013
0.008
0.151
0.006
0.008 0.139
tainty can be thought of as inter-user
SLEEPY
0.007
0.005
0.018
0.009
0.172
0.128 0.010
uncertainty. The interval approach consists
CONTENT
0.005
0.004
0.007
0.044
0.015
0.012 0.040
EXCITED
0.015
0.017
0.006
0.255
0.002
0.002 0.213
of a series of steps to learn the fuzzy sets
CONTEMPLATIVE 0.006
0.004
0.012
0.006
0.161
0.075 0.007
from the survey data which can broadly be
BLAH
0.014
0.010
0.049
0.005
0.166
0.359 0.007
grouped into the data part and the fuzzy
AWAKE
0.020
0.017
0.016
0.061
0.015
0.014 0.068
CALM
0.003
0.002
0.011
0.007
0.137
0.069 0.008
set part. The data part takes the survey
BOUNCY
0.009
0.012
0.002
0.361
0.000
0.001 0.311
data, preprocesses it, and computes statisCHIPPER
0.002
0.002
0.001
0.066
0.002
0.003 0.059
tics for it. The fuzzy set part creates type-1
ANNOYED
0.393
0.380
0.080
0.041
0.002
0.023 0.076
CONFUSED
0.026
0.020
0.064
0.014
0.046
0.170 0.017
fuzzy sets for each subject, and then aggreBUSY
0.068
0.079
0.049
0.111
0.013
0.012 0.116
gates them with the union operation to
SICK
0.008
0.004
0.032
0.001
0.023
0.204 0.001
form IT2 FSs. A new version of the interANXIOUS
0.207
0.181
0.091
0.028
0.003
0.025 0.038
EXHAUSTED
0.015
0.011
0.048
0.003
0.046
0.298 0.004
val approach, the enhanced interval
DEPRESSED
0.008
0.005
0.050
0.001
0.015
0.218 0.001
approach, was proposed in [19]. This
CURIOUS
0.038
0.042
0.014
0.203
0.011
0.006 0.176
enhancement aims to produce tighter
DRAINED
0.009
0.007
0.039
0.002
0.061
0.280 0.003
AGGRAVATED
0.578
0.618
0.114
0.047
0.002
0.020 0.087
membership functions by placing new
ECSTATIC
0.000
0.000
0.000
0.108
0.000
0.000 0.117
constraints on the overlapping of subjectBLANK
0.006
0.004
0.017
0.005
0.133
0.137 0.006
specific membership functions in the reaOKAY
0.016
0.013
0.035
0.017
0.076
0.057 0.020
HUNGRY
0.084
0.082
0.029
0.045
0.013
0.034 0.052
sonable interval processing stage. We tested
HOPEFUL
0.009
0.007
0.007
0.047
0.010
0.009 0.050
this method as well as the original interval
COLD
0.005
0.003
0.026
0.001
0.047
0.123 0.002
approach and found that the enhanced
CREATIVE
0.027
0.037
0.007
0.524
0.001
0.002 0.462
PISSED_OFF
0.383
0.363
0.052
0.016
0.000
0.008 0.035
interval approach did in fact yield tighter
GOOD
0.004
0.003
0.004
0.067
0.005
0.006 0.060
membership functions, but that this did
THOUGHTFUL
0.005
0.003
0.004
0.011
0.079
0.029 0.012
not necessarily improve the overall perforFRUSTRATED
0.186
0.233
0.068
0.022
0.001
0.012 0.030
CRANKY
0.325
0.351
0.099
0.045
0.002
0.022 0.060
mance measures when compared with the
STRESSED
0.288
0.304
0.158
0.044
0.003
0.026 0.053
original method (c.f. Section VI).
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
41
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
The second vocabulary consisted of 40 words taken from
the top 40 emotion mood labels used by the bloggers of LiveJournal (this blogging site lets users label each post with a
mood label, which has been used as an annotated corpus for
studying emotional text [50]). The words in this vocabulary are:
accomplished, aggravated, amused, angry, annoyed, anxious, awake,
blah, blank, bored, bouncy, calm, cheerful, chipper, cold, confused, contemplative, content, cranky, crazy, creative, curious, depressed, disgusted,
drained, ecstatic, excited, exhausted, fearful, frustrated, good, happy,
hopeful, hungry, neutral, okay, pissed off, sad, sick, sleepy, stressed,
thoughtful, and tired. We refer to this vocabulary as Blog Moods.
The third vocabulary was a list of 30 Spanish emotion
words that was taken from the mental health initiative of a
Southern California medical service provider. The words in
the Spanish emotion vocabulary are: aburrido, agobiado, agotado,
ansioso, apenado, asqueado, asustado, avergonzado, cauteloso, celoso,
cómodo, confiado, confundido, culpable, deprimido, enamorado, enojado, esperanzado, extático, feliz, frustrado, histérico, malicioso, pasmado, rabioso, solitario, sorpredido, sospechoso, timido, and triste (see
Table 1 in [17] for glosses of these words from a SpanishEnglish dictionary). We refer to this vocabulary as Spanish
Emotion Words.
The fourth vocabulary was elicited from subjects playing
EMO20Q, both between two humans and also between a
human and computer with the computer in the questioner
role. [37], [38], [40]. These data sources resulted in a set of 105
emotion words.
B. Valence, Activation, and Dominance Model (Model 1)
The data collected from the interval surveys for the first model
consists of four experiments: three surveys of 32 subjects for
English and one survey of eight subjects for Spanish. All surveys
had a similar structure. First, the surveys gave the subject
instructions. Then the surveys sequentially presented the subject with emotion words, which we will refer to as the stimuli,
one word per page. For each stimulus there were sliders for
each of the three emotion dimensions. The sliders had two
handles, which allowed the subjects to select the lower and
upper points of ranges. The range of the sliders was 0–10. The
maximum range allowed was 10 and the minimum range was 1
because the steps were integer values and the implementation
imposed a constraint that the upper and lower endpoints could
not be the same. Above each scale was a pictorial representations known as a self-assessment manikin [31] that aimed to illustrate the scale non-verbally.
The overall structure of the Spanish survey was the same as
the English one, but special care was required for the translation
of the instructions and user interface elements. The first version
of the translation was done by a proficient second-language
Spanish speaker and later versions were corrected by native
Spanish speakers. The subjects of the surveys were native speakers of Spanish with Mexican and Spanish backgrounds.
In the surveys, each subject was presented with a series of
randomized stimuli from one of the emotion vocabularies. The
description of the stimuli regimen and other implementation
42
details for the experiments can be found in [16] for English
and [17] for Spanish. Links to the surveys can be found at
http://sail.usc.edu/~kazemzad/emotion_in_text_cgi/.
One final issue was deciding whether similarity or subsethood was best for our task and how to aggregate these metrics for three dimensions. Both similarity and subsethood can
be used as an objective function to be maximized by translation. [23, Chapter 4] recommends using subsethood when the
output is a classification and similarity if the input and output
vocabularies are the same, but it was not immediately clear
what would be preferable for our tasks, so we tested the different methods empirically. Also, since this is one of the first
studies that uses fuzzy sets that range over more than one
dimension, we tested several ways of combining the similarities and subsethoods of the individual scales using the average,
product, and linguistic weighted average as described in Section III-C. We also tried leaving dominance out as it is a distinguishing feature in only a few cases.
The mapping from one vocabulary to another is done by
choosing the word from the output vocabulary that has the
highest similarity or subsethood with the input word. Here,
similarity and subsethood are the aggregated scalewise similarities and subsethoods for valence, activation, and dominance.
We examined several different mappings. In [16], we examined mapping from the blog mood vocabulary to the more
controlled categorical emotion vocabulary, which simulates
the task of mapping from a large, noisy vocabulary to a more
controlled one. In this paper, we use mapping tasks that
involved translation from Spanish to English to evaluate the
estimated IT2 FSs.
To empirically evaluate the performance of the mapping,
we used a human translator to complete a similar mapping
task. We instructed the translator to choose the best word
or, if necessary, two words from the output vocabulary
that matched the input word. A predicted result was considered correct if it matched one of the output words chosen by
the evaluator.
We also use multidimensional scaling to visualize the
derived emotion space. Multidimensional scaling is similar to
principal component analysis except that it operates on a similarity matrix instead of a data or covariance matrix. Since it
operates directly on a similarity matrix, it is ideal for visualizing
the results of aggregating the scale-wise similarities into a single
similarity matrix.
C. Propositional Model (Model 2)
We devised the second model to address the results obtained
from model 1, described in Section V, where we found that
larger vocabulary sizes resulted in lower performance in the
translation tasks. Our inspiration for the second model was
that people can guess an object from a large, open-ended set
by adaptively asking a sequence of questions, as in the game
of twenty questions. The sequential questioning behavior
thus motivated our representation and experimental design
of the EMO20Q. The premiss of EMO20Q is that the
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Gradient Yes/No Answers
YESSSSS!!!!!:):)
Yes, Definitely
YES!!!
Yes!!!
Yes!
Yes Definitely
Yes!!
Yes
Definitely
Yes, That’s Right
Certainly
Yup
Yep
Yes, Very Likely
Yes Usually
Usually, Yes
I Would Say Yes
Yes, in General
I’m Going to Say Yes
Yes I Would Assume So Yes
In General Yes
Yes in General
Yea
Yes, I Think So
Usually
Yes, It Can Be
Yes It Can Be
Probably
Mostly
Generally
Yes, It Is Possible
Yes, It Could
Yes I Suppose
I Think So
Possibly, Yes
Yes to Some Extent
Hmm, I’d Say Yes in General
Yes at Least Possibly
I Think So...
Generally Yes, but Not Necessarily
Eh... Yes, You Could Say So.
Yes but Not Necessarily
Possibly
Perhaps
Almost
Kind of
I Think So Kind of
Sort of
Sometimes
Maybe
:) Not Sure, Maybe
It Depends
It Can Be but Not Necessarily
Depends
Could Be but Not Necessarily
Could Be Both Yes and No
Hmm Not Exactly... (but Again One Could)
No Not Necessarily but Could Be
Not Definitely
Possibly but It’s Doubtful
It’s Possible, but Generally I Guess I’d Say No
Not Quite
Not That Much
Possibly Not
No, It Can Be but Not Necessarily
Not Exactly
That’s a Hard One... I Guess Not Really
Uhm Not Necessarily
Not Necessarily
Not Really
No Not Necessarily at Least
Usually Not
Ah...Hmmmm.... I Guess I Have to Say No...
No, Not Necessarily
Rarely
Not Usually
No Not Exactly
Not Certainly
No, Not Generally
Not Probably
In General, No
Probably Not
I Don’t Think So
Not Possibly
No in General
No Not Really
In General No
No It Doesn’t Really Relate
No, Not Normally
No, Not Usually
Mostly Not
Certainly Not
Nope
No
Definitely Not
No, Not at All
Nono
No!
0
20
40
60
80
100
Truth Degree
FIGURE 3 Fuzzy answers to yes/no questions obtained by presenting the answer phrase (x-axis labels) to users of Amazon Mechanical Turk, who
responded by using a slider interface to indicate the truth-degree (y-axis). This plot was based on a single handle slider, in contrast to the interval
approach surveys, in order to show an overview of the data. The results presented below are for the double handle slider and interval approach analysis.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
43
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Happy Val.
1
Neutral Val.
1
Angry Val.
1
0.5
0.5
0.5
0
0
0
0 5 10
Happy Act.
1
0 5 10
Neutral Act.
1
0.5
0
0.5
0
5
0
10
Happy Dom.
1
0.5
0
0
5
Neutral Dom.
1
0.5
0
5
10
0
0.5
0 5 10
Angry Act.
5
0.5
0.5
0
5
10
Angry Dom.
1
0
10
0
1
0.5
0
1
1
0
10
Sad Val.
0
0 5 10
Sad Act.
0
5
10
Sad Dom.
1
0.5
0
5
10
0
0
5
10
FIGURE 4 Example membership functions (MF’s) calculated with the interval approach for
happy, neutral, angry, and sad emotions. All the membership functions shown here, except
the valence for neutral, are shoulder MF’s that model the edges of the domain of n. The
region between the upper and lower MF’s, the footprint of uncertainty, is shaded. The variables of Val., Act., and Dom. stand for valence, activation, and dominance.
Multidimensional Scaling Plot of the Product of Distances
Creative
Happy
Bouncy
Surprised
Accomplished
Excited
Cheerful
Good
Chipper Hopeful
Awake
Amused
Content
Ecstatic
Curious
0.4
0.3
0.2
Busy
0.1
Component 2
twenty questions game is a way to elicit
human knowledge about emotions and that
the game can also be used to test the ability
of computer agents to simulate knowledge
about emotions. The experimental design of
the EMO20Q game was proposed in [37] and
since then we have collected data from over
100 human-human and over 300 humancomputer EMO20Q games. In this paper we
focus on follow-up experiments that aim to
understand the answers in the game in terms
of fuzzy logic. More information about
EMO20Q, including demos, code, and data,
can be found at http://sail.usc.edu/emo20q.
Although the questions asked in the
EMO20Q game are required to be yes-no
questions, the answers are not just “yes” or
“no.” Often the answer contains some expression of uncertainty. Here we focus on the
Thoughtful
Okay
0
−0.1
−0.2
Neutral
Contemplative
Calm
Confused
Hungry
Sleepy
Blank
Cold
Blah
Sick Depressed
Drained
Tired Sad
Exhausted
Bored
Frustrated
Fearful
Pissed_Off
Anxious
Stressed
Cranky
Annoyed
Angry
Disgusted
Aggravated
−0.3
−0.4
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Component 1
FIGURE 5 Multidimensional scaling (2-D) representation of the emotion words’ similarity. This visualizes when the similarity of the individual
valence, activation, and dominance dimensions were combined by taking their product. The words in the categorical emotion vocabulary are
marked in bold.
44
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
ual scale-wise similarities as the aggregation method. In Fig. 5
fuzzy logical representation of answers to questions in the
we display the results of calculating a similarity matrix
game. Just as the first model uses valence, activation, and
between the words of both vocabularies using multidimensional
dominance scales to represent emotions, the second model
scaling (MDS) [51]. MDS is a statistical approach in the same
uses the questions from EMO20Q as scales that can be interfamily as principal components analysis (PCA) and factor
preted on axes that range from “yes” to “no.” In this case, the
analysis. We use MDS in this case because factor analysis has
interval surveys we performed were not overtly about emounwanted assumptions (namely, a multivariate normal distritions, but rather to evaluate the answers on the scale from
bution with linear relationships) and because PCA operates on
“no” to “yes,” which we defined as a domain for fuzzy sets
feature vectors as opposed to similarity matrices (and also
that range from 0 to 100.
assumes linear relationships). We performed MDS on the
Using data from EMO20Q games, we collected a set of
aggregated similarity measurements to qualitatively visualize
questions and answers about emotions. We sampled a set of
the emotion space as derived from the similarity matrix. The
answers based on frequency of occurrence and how well the
result of combining the similarities of the valence, activation,
set covered the space from affirmative to negative answers. We
and dominance dimensions was slightly different using sum
also included some control stimuli not observed in the data but
versus product aggregation. The sum aggregation produced a
included to provide insight on how people would interpret
more spread out distribution of the words in the space
negation. For example, we included phrase groups like “cerinduced by MDS, while the product aggregation produced a
tainly,” “not certainly” and “certainly not” that would allow us
space where the emotions are more tightly clustered. This was
to calibrate how the subjects would interpret phrases that
because the product aggregation method was less sensitive to
might have a logical interpretation. The final set of stimuli consmall dissimilarities. The multidimensional scaling plot also
sisted of 99 answers. These were presented to subjects along
allows one to see which emotions are close and potentially
with either a single or double handle slider. Below in Figure 3,
confusable. For example, “happy” and “surprised” are very
we plot the responses for single sliders, which are easier to visuclose, as are “angry” and “disgusted.” Since mapping between
alize than double sliders. In what follows, however, we present
vocabularies, like MDS, is done using similarities, this implies
the double handle slider results, which form the input to the
that these pairs are confusable. Since the components derived
interval approach methodology described above.
from MDS are calculated algorithmically, they are not directly
We conducted the interval approach survey on Amazon
interpretable as in the case of factor analysis.
Mechanical Turk (AMT), an internet marketplace for crowd
sourcing tasks that can be completed
online. The survey was conducted in sets
of 30 stimuli to each of 137 subjects on
TABLE 2 Similarity between Spanish and English emotion words.
AMT who were ostensibly English speakANGRY DISGUSTED FEARFUL HAPPY NEUTRAL SAD
SURPRISED
ers from the U.S. The average amount of
ABURRIDO
0.2284
0.2335
0.6370
0.1965 0.3196
0.4610 0.1230
ratings per stimulus was 38.5.
V. Experimental Results
In this section, we present the results of
experiments that used the two models
and the survey methodology described in
Sections III-D, IV-B, and IV-C to estimate fuzzy set membership functions for
the emotion vocabularies presented in
Section IV-A, to calculate similarity and
subsethood between emotion words as
described in Section III-C, and to map
between different emotion vocabularies.
A. Valence, Activation,
and Dominance Model (Model 1)
Examples of the membership functions
that were calculated for the emotion category vocabulary can be seen in Fig. 4.
The distances between these membership
functions and those of the blog moods
vocabulary can be seen in Table 1, as calculated using the product of the individ-
AGOBIADO
AGOTADO
ANSIOSO
APENADO
ASQUEADO
ASUSTADO
AVERGONZADO
CAUTELOSO
CELOSO
CÓMODO
CONFIADO
CONFUNDIDO
CULPABLE
DEPRIMIDO
ENAMORADO
ENOJADO
ESPERANZADO
EXTÁTICO
FELIZ
FRUSTRADO
HISTÉRICO
MALICIOSO
PASMADO
RABIOSO
SOLITARIO
SORPRENDIDO
SOSPECHOSO
TIMIDO
TRISTE
0.4762
0.2250
0.4579
0.2915
0.5445
0.4610
0.2701
0.0918
0.7396
0.0436
0.2835
0.2488
0.3275
0.2893
0.4371
0.8732
0.0929
0.3140
0.1329
0.6414
0.6522
0.3347
0.3102
0.5416
0.2657
0.3405
0.3026
0.0844
0.3376
0.5696
0.2344
0.4748
0.2928
0.5969
0.5324
0.2663
0.0957
0.6880
0.0510
0.3307
0.2531
0.3445
0.2914
0.5611
0.7125
0.0987
0.3108
0.1655
0.7271
0.6566
0.4270
0.3480
0.4616
0.2672
0.3803
0.3497
0.0857
0.3396
0.4611
0.4883
0.2837
0.7711
0.3885
0.3209
0.6345
0.5357
0.3335
0.3363
0.2382
0.7690
0.7051
0.5585
0.0942
0.3596
0.4023
0.0611
0.2293
0.3003
0.2804
0.3427
0.3910
0.2190
0.6091
0.1229
0.5129
0.3925
0.6502
0.3122
0.1425
0.3655
0.3128
0.4538
0.3508
0.2393
0.1848
0.1832
0.3686
0.4753
0.2202
0.2916
0.1529
0.4572
0.1940
0.5903
0.4305
0.6020
0.3021
0.2340
0.3540
0.2544
0.0945
0.0904
0.3336
0.3883
0.1092
0.1477
0.1495
0.4081
0.2703
0.1219
0.2045
0.2213
0.0660
0.3784
0.0515
0.3963
0.1393
0.1286
0.1375
0.3380
0.1055
0.1054
0.1798
0.1337
0.1796
0.1677
0.1550
0.2273
0.1931
0.0018
0.2549
0.1706
0.2084
0.3578
0.2389
0.2895
0.5135
0.1728
0.4065
0.2784
0.2141
0.4737
0.3126
0.2444
0.3518
0.0562
0.4498
0.3921
0.7058
0.0351
0.2654
0.1625
0.0268
0.0770
0.3026
0.1874
0.1322
0.3231
0.1402
0.5565
0.0746
0.2425
0.4436
0.5882
0.2175
0.1012
0.3598
0.1211
0.3199
0.3489
0.0713
0.0958
0.2390
0.2240
0.2821
0.0878
0.1401
0.0978
0.5774
0.3494
0.3270
0.7222
0.5046
0.3337
0.4272
0.2325
0.2654
0.3598
0.0396
0.3675
0.2900
0.0515
0.0852
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
45
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
To check the mapping induced by the similarity matrices,
we show in Table 1 the similarity matrix for the product aggregation of the dimension-wise similarity measures of the
valence, activation, and dominance scales. The location of
the maximum of each row (bold) shows the final translation
from the larger vocabulary (rows) to the smaller vocabulary
(columns). The most glaring error is that “fearful” is not in the
range of the mapping from large vocabulary to small vocabulary due to relatively low similarity to any word in the blog
mood vocabulary. Cases where one would expect to have a
mapping to “fearful” (e.g., “anxious,” “stressed”) do show ele-
Spanish to IEMOCAP Translation Performance
0.8
0.6
0.4
0.2
0.0
Sum/Avg Aggregation
Product Aggregation
Sum w/Valence and Activation
Product w/Valence and Activation
Linguistic Weighted Average
VSM Similarity
Jaccard Similarity
Subsethood
FIGURE 6 Performance of translating from the Spanish emotion
vocabulary to the categorical emotion vocabulary, which was the set
of emotion labels used for annotating the IEMOCAP corpus [52].
0.5
Spanish to LiveJournal Translation Performance
0.4
0.3
vated similarity to “fearful” but “angry” or “disgusted” are
higher. The observation that most of the values in the “fearful”
column are lower than the other columns, we normalized each
column by its maximum value. Doing this does in fact produce
the intuitive mapping of “anxious” and “stressed” to “fearful,”
but also changed other values.
To better quantify the intuitive goodness of the mapping
from one vocabulary to another, we undertook an evaluation
based on human performance on the same mapping task. We
found that at least one of the subject’s choices matched the
predicted mapping except in the following five cases (i.e., performance of approximately 84%): “confused,” “busy,” “anxious,”
“hungry,” and “hopeful.” Filtering out clearly nonemotion
words like “hungry” may have improved the results here, but
our aim was to use a possibly noisy large vocabulary, since the
data came from the web.
To see if the fuzzy logic approach agreed with a simpler
approach, we converted the survey interval end-points to single
points by taking the midpoints of the subjects’ intervals and
then averaging across all subjects. As points in the 3-D emotion
space, the mapping performance of Euclidean distance was
essentially the same as those determined by the fuzzy logic
similarity measures. However, a simple Euclidean distance metric loses some of the theoretical benefits we have argued for, as
it does not account for the shape of the membership functions
and cannot account for subsethood.
Based on the membership functions from the Spanish survey
and the previous English surveys, we constructed similarity
matrices between the Spanish words as input and the English
words as output. The similarity matrix of the Spanish words and
the Emotion Category Word vocabulary are shown in Table 2.
Overall, the best performance of 86.7% came from mapping
from the Spanish vocabulary to the Emotion Category Word
vocabulary using similarity (rather than subsethood), and aggregating the scale-wise similarities using the multiplicative product
of the three scales. The performance of mapping from Spanish to
the Blog Mood vocabulary was worse that with the Emotion
Category Word vocabulary as output because the much larger
size of the Blog Mood vocabulary resulted in more confusability.
The best performance for this task was 50% using similarity and
linguistic weighted average for aggregating the similarities. A
comparison of the different similarity and aggregation methods
can be seen in Fig. 6 for mapping from Spanish to the Emotion
Category Word vocabulary and Fig. 7 for mapping from Spanish
to the Blog Moods vocabulary.
0.2
B. Propositional Model (Model 2)
0.1
0.0
Sum/Avg Aggregation
Product Aggregation
Sum w/Valence and Activation
Product w/Valence and Activation
Linguistic Weighted Average
VSM Similarity Jaccard Similarity
Subsethood
FIGURE 7 Performance of translating Spanish emotion words to liveJournal mood labels (colloquial emotion words).
46
For the propositional model, we collected a set of 1228 question-answer pairs from 110 human-human EMO20Q
matches, in which 71 unique emotion words were chosen. In
these matches, the players successfully guessed the other players’ emotion words in 85% of the matches, requiring on average 12 turns.
In the set of question-answer pairs there were 761 unique
answer strings. We selected a set of 99 answers based on
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
frequency of occurrence and how well
the set covered the space from affirmaNo
Maybe
Kind Of
Probably
tive to negative answers. We used the
1.0
0.8
interval approach to obtain fuzzy sets
0.6
for the answers to yes/no questions.
0.4
0.2
A sample of these are shown in
0.0
Figure 8. To evaluate these, we deterNope
Not Really
Sometimes
Yes Usually
1.0
mined the extent to which the medi0.8
ans from the single handle slider survey
0.6
0.4
were a full or partial members in the
0.2
fuzzy sets der ived from interval
0.0
Definitely Not
Possibly Not
I Think So
Certainly
approach’s double handle slider survey,
1.0
which used different subjects but the
0.8
0.6
same stimuli. We found that the IT2
0.4
FSs from the interval approach surveys
0.2
0.0
corresponded well with the singleNo, Not Normally
Sort Of
Perhaps
Yes
1.0
slider data. All of the estimated IT2 FSs
0.8
except one contained the median of
0.6
the single-slider values, i.e., 99%. This
0.4
0.2
word, “NO!”, was a singleton IT2 FS
0.0
20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
at zero, while the median from the single slider was at one (on the scale from
0 to 100). The average value of the IT2
FIGURE 8 Example IT2 FSs calculated with the enhanced interval approach for answers to yes/
FS membership functions (which is an no questions.
interval-valued range) at points corresponding to the median of the singleslider values was (0.41,0.84). To evaluate the enhanced interval
emotions as a sparse vector of truth values over propositions
approach (EIA), we found that the EIA-derived IT2 FSs perabout emotions.
formed nearly as well. The IT2 FSs contained all but two of the
First, we examine the relative benefits and drawbacks of the
median single-slider (~98%) and the average membership of
two models we proposed: the first model based on valence,
the median single-slider values was (0.12,0.89).
activation, and dominance scales, and the second model based
Beyond these quantitative measurements, the membership
on questions about emotions whose answers are rated on a
functions from model 2 are qualitatively tighter than those of
scale from true to false.
model 1, especially with the enhanced interval approach.
The first model captures intuitive gradations between emoThough some of the membership functions span large portions
tions. For example, the relation of “ecstatic” and “happy” can be
of the domain, these are answers that signify uncertainty (such
seen in their values on the scales: “ecstatic” will a subset of
as “kind of,” “I think so,” and “perhaps” in Figure 8). This was
“happy” with valence and activation values more to the extreme
in contrast to model 1, which more frequently resulted in
periphery. Also, the scales used by the first model are languagebroad membership functions with wide footprints of uncerindependent, iconic representations of emotion, which enables
tainty. The data and code for the experiments of model 2 can
researchers to use the same scales for multiple languages.
be accessed at http://code.google.com/p/cwwfl/.
However, for the first model, each word needs an interval
survey on the three scales to calculate the membership function
for the word, which is laborious and limits the model to words
VI. Discussion
whose membership functions have been calculated already.
Variables that range over sets and functions rather than indiAlso, as we have seen, performance degrades with the size of
vidual numbers are important developments for modern
the vocabulary. Some of the performance degradation can be
mathematics, and further, variables that range over proofs,
expected due to the inherent difficulty of making a decision
automata languages, and programs further add to the richwith more choices. However, limiting the representation to
ness of objects that can be represented with variables. This
three scales does also limit the resolution and expressiveness of
paper looked at expanding the domain of variables to
the model.
include emotions. To model a seemingly non-mathematical
The second model, on the other hand, gives a better resoluobject in such a way, we use fuzzy sets, another relatively
tion when there is a large number of emotions.With more emonew type of variable. This paper proposed two models for
tions, more expressivity is needed than just valence, activation,
emotion var iables, one that represented the meaning
and dominance. To give examples of some of the emotion words
of emotion words on a three dimensional axis of valence,
from EMO20Q that are difficult to represent with only valence,
activation, and dominance, and another that represented
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
47
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
We consider two basic relations on emotional
variables: similarity and subsethood.
activation, and dominance, we can see that “pride,” “vindication,”
and “confidence,” might all have similar valence, activation, and
dominance values, so it would be hard to distinguish these on
the basis of only the three scales. By representing emotions with
propositions based on questions from EMO20Q, we can use a
single fuzzy scale for any arbitrary proposition: once the scales are
established the bulk of the data can be collected purely in natural
language. Moreover, the propositional truth-value scale can be
used for other domains besides emotions.
However, with the second model there is no clear way to
compare emotions that were not asked the same set of questions. In the EMO20Q game, the questions are seen as they
occur in the game. It will be necessary to collect more data
outside of the game to make sure that all the prevalent questions are asked about each emotion. Even though we can use a
single fuzzy scale for each proposition’s truth-value the set of all
propositions about emotions is a vast open set, so data collection is still an issue. Since the propositions are based on a specific human language, the equivalence of different propositions
in different languages is not as apparent as in the first model.
There were several modifications that we made to the interval approach to make it more robust for when all intervals are
discarded by the preprocessing. It was determined that the final
removal of all intervals took place in the reasonable interval
processing stage. The modification to the original interval
approach involved keeping the intervals in this stage if all
would have been removed. This had the effect of creating a
very broad membership function with a lower membership
function that was zero at all points. The enhanced interval
approach improved the rejection of intervals in various stages
by separately considering interval endpoint criteria and interval
length criteria. For the first model, the enhanced interval
approach yielded worse results when using the translation task
as a evaluation metric. This was due to the narrower membership functions that the enhanced interval approach was
designed to produce. In the case of similarity and subsethood
calculation, the narrower membership function led to more
zero entries in the calculation of similarity and subsethood. In
the translation task, this resulted in a less robust translation
because small variations in the membership function would
yield a disproportionate change in similarity and subsethood
values. However, in the case of the second model, where the
fuzzy sets are used in a more traditional fashion, i.e., as propositional truth quantifiers, the enhanced interval approach did in
fact yield membership functions that appeared to more tightly
contain the single slider results and performed as well on the
evaluation metric we used for this task.
The different models both use IT2 FSs, but beyond that,
they present different approaches in the representation of
emotion descriptions. Because of the difference in approach
48
and the resulting format of the model, they
were difficult to evaluate in the same way. For
the first model, because the fuzzy scales of
valence, activation, and dominance are
directly tied to the emotion representation
and because the scales are nonlinguistic in nature (they are
labeled with a cartoon manikin), the cross-language translation
task was a possible evaluation metric. However, the fuzzy scales
used in the second model are indirectly linked to emotions via
linguistic propositions about emotions. Since the propositions
about emotions are specific to a given language, the translation
task is not directly facilitated by this model.
From the comments given by the subjects of the survey, for
model 1, we found that subjects reported confusion with the
scale of dominance, despite the pictorial representation in the
survey. For model 2, we found that the interpretation of linguistic truth values was a source of reflection for the subjects and this
provided insight into the variation that may have otherwise been
attributed to lack of cooperation on the part of the Amazon
Mechanical Turkers. For example, the stimulus “definitely,” from a
logical point of view would be assumed to be a strong “yes.”
However, several Turkers mentioned that they realized that, when
they use the word “definitely,” they do not mean “definitely” in
the logical sense, but rather that the colloquial meaning is somewhat more relaxed. From the fuzzy set representation point of
view, it may be advantageous to recognize distinct senses for the
meaning of words and phrases. In the case mentioned, the word
“definitely” could have colloquial sense and a logical sense.
Another example of this was in the control phrases we used in
the second model. For example “not certainly” was often confused with “certainly not.” This is not to say that all the Turkers
were cooperative and took the time to understand the task, but it
shows that there are many factors involved with measuring
uncertainty. From Figure 3, we can see that the default value of
the slider (in this case, a single slider at the middle of the scale)
was a salient point of outliers. Modeling the effects of uncooperative users who may click through as quickly as possible is one
possible improvement that could be made to the interval
approach from the data processing point of view.
Our conclusion in comparing the two models is that for
basic emotions the valence, activation, and dominance scales of
model 1 would suffice. Examples of a use-case for the first
model would be for converting a large, expressive set of emotion labels to a smaller set for the purpose of training a statistical classifier. However, for the class of all words used to describe
emotions in natural language, the representational power of first
model’s valence, activation, and dominance scales is not sufficient. To fully understand what a given emotion word means to
someone, our work indicates that the second model is a better
model if the modeling goal is to represent a larger vocabulary
and finer shades of meaning.
VII. Conclusions
In this paper we presented two models to represent the meaning of emotion words. We gave an explicit description of
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
meaning in our models. The first model involved interpreting
the emotion words as three-dimensional IT2 FSs on the
dimensions of valence, activation, and dominance. This model
allowed us to map between emotion vocabularies of different
sizes and different languages. The mapping was induced by
picking the most similar word of the output vocabulary given
the input vocabulary word. The similarity used for this mapping was derived from similarity or subsethood measures of the
individual dimensions that were aggregated into a single measure for each pair of input and output vocabulary words. We
devised a second model that addresses the challenges that arise
when the vocabulary of emotion words is large. Instead of the
lower dimensional representation in terms of valence, activation, and dominance scales, the second model used a high
dimensional representation where the emotion words were
represented in terms of answers to questions about emotions, as
determined from data from the EMO20Q game. In the second
model, IT2 FSs were used to represent the truth values of
answers to questions about emotions. We found that the second
model was necessary to capture more highly nuanced meaning
when the vocabulary of emotion words was large.
Acknowledgment
The authors would like to thank Jerry Mendel, Dongrui Wu,
Mohammad Reza Rajati, Ozan Cakmak, and Thomas Forster for their discussion. We would also like to thank Rebeka
Campos Astorkiza, Eduardo Mendoza Ramirez, and Miguel
Ángel Aijón Oliva for helping to translate the Spanish version of our experiment.
References
[1] J. A. Russell and A. Mehrabian, “Evidence for a three-factor theory of emotions,” J.
Res. Personality, vol. 11, pp. 273–294, Sept. 1977.
[2] F. Liu and J. M. Mendel, “An interval approach to fuzzistics for interval type-2 fuzzy
sets,” in Proc. Fuzzy Systems Conf., 2007, pp. 1–6.
[3] F. Liu and J. M. Mendel, “Encoding words into interval type-2 fuzzy sets using an
interval approach,” IEEE Trans. Fuzzy Syst., vol. 16, no. 6, pp. 1503–1521, 2008.
[4] D. W. Massaro and M. M. Cohen, “Fuzzy logical model of bimodal emotion perception: Comment on ‘The perception of emotions by ear and by eye’ by de Gelder and
Vroomen,” Cogn. Emotion, vol. 14, no. 3, pp. 313–320, 2000.
[5] M. Grimm, K. Kroschel, E. Mower, and S. Narayanan, “Primitives-based evaluation
and estimation of emotions in speech,” Speech Commun., vol. 49, pp. 787–800, Dec. 2006.
[6] C. M. Lee and S. Narayanan, “Emotion recognition using a data-driven inference
system,” in Proc. Eurospeech, Geneva, Switzerland, 2003, pp. 157–160.
[7] D. Wu, T. D. Parsons, E. Mower, and S. Narayanan, “Speech parameter estimation in
3D space,” in Proc. IEEE Int. Conf. Multimedia Expo, 2010, pp. 737–742.
[8] A. Konar, A. Chakraborty, A. Halder, R. Mandal, and R. Janarthanan, “Interval
type-2 fuzzy model for emotion recognition from facial expression,” in Proc. Perception
Machine Intelligence, 2012, pp. 114–121.
[9] M. El-Nasr, J. Yen, and T. R. Ioerger, “Flame: Fuzzy logic adaptive model of emotions,” Auton. Agents Multi-Agent Syst., vol. 3, no. 3, pp. 219–257, 2009.
[10] C. M. Whissell, The Dictionary of Affect in Language. New York: Academic Press,
1989, pp. 113–131.
[11] A. Kazemzadeh, “Précis of dissertation proposal: Natural language descriptions of
emotions,” in Proc. ACII (Doctoral Consortium), 2011, pp. 216–223.
[12] S. Kim, P. G. Georgiou, S. S. Narayanan, and S. Sundaram, “Supervised acoustic
topic model for unstructured audio information retrieval,” in Proc. Int. Conf. Acoustics,
Speech, Signal Processing, 2010, pp. 243–246.
[13] S. Sundaram and S. S. Narayanan, “Classification of sound clips by two schemes: Using onomatopoeia and semantic labels,” in Proc. IEEE Int. Conf. Multimedia Expo, 2008,
pp. 1341–1344.
[14] M. Grimm and K. Kroschel, “Rule-based emotion classification using acoustic features,” in Proc. Int. Conf. Telemedicine Multimedia Communication, 2005.
[15] Q. Liang and J. Mendel, “Interval type-2 fuzzy logic systems: theory and design,”
IEEE Trans. Fuzzy Syst., vol. 8, no. 5, pp. 535–550, 2000.
[16] A. Kazemzadeh, S. Lee, and S. Narayanan, “An interval type-2 fuzzy logic system to
translate between emotion-related vocabularies,” in Proc. Interspeech, pp. 2747–2750, 2008.
[17] A. Kazemzadeh, “Using interval type-2 fuzzy logic to translate emotion words from
Spanish to English,” in Proc. IEEE World Conf. Computational Intelligence FUZZ-IEEE
Workshop, 2010, pp. 1–8.
[18] O. Cakmak, A. Kazemzadeh, and S. Yildirim, and S. Narayana, “Using interval
type-2 fuzzy logic to analyze Turkish emotion words,” in Proc. APSIPA Annu. Summit
Conf., 2012, pp. 1–4.
[19] S. Coupland, J. M. Mendel, and D. Wu, “Enhanced interval approach for encoding
words into interval type-2 fuzzy sets and convergence of the word FOUs,” in FUZZIEEE World Cong. Computational Intelligence, 2010, pp. 1–8.
[20] J. M. Mendel, R. I. John, and F. Liu, “Computing with words and its relations with
fuzzistics,” Inform. Sci., vol. 177, no. 4, pp. 988–1006, 2007.
[21] J. M. Mendel, “Computing with words: Zadeh, Turing, Popper and Occam,” IEEE
Comput. Intell. Mag., vol. 2, no. 4, pp. 10–17, 2007.
[22] L. A. Zadeh, “Fuzzy logic = computing with words,” IEEE Trans. Fuzzy Syst., vol.
4, pp. 103–111, May 1996.
[23] J. M. Mendel and D. Wu, Perceptual Computing: Aiding People in Making Subjective Judgements. Piscataway, NJ: IEEE Press, 2010.
[24] J. M. Mendel and D. Wu, “Challenges for perceptual computer applications and how
they were overcome,” IEEE Comput. Intell. Mag., vol. 7, pp. 36–47, Aug. 2012.
[25] G. Frege, “Über sinn und bedeutung,” in Zeitschrift für Philosophie und Philosophische
Kritik, 1892, pp. 25–50.
[26] T. Forster, Logic, Induction, and Sets. Cambridge, U.K.: Cambridge Univ. Press, 2003.
[27] B. Ganter, G. Stumme, and R. Wille, Eds., Formal Concept Analysis: Foundation and
Applications. Berlin, Germany: Springer-Verlag, 2005.
[28] C. E. Osgood, G. J. Suci, and P. H. Tannenbaum, The Measurement of Meaning. Urbana, IL: Univ. Illinois Press, 1957.
[29] D. E. Heise, “Semantic differential profiles for 1000 most frequent English words,”
Psychol. Monographs, vol. 79, no. 8, pp. 1–31, 1965.
[30] J. A. Russell, “A circumplex model of affect,” J. Personality Soc. Psychol., vol. 39, no.
6, pp. 1161–1178, 1980.
[31] M. M. Bradley and P. J. Lang, “Measuring emotion: The self-assessment manikin and
the semantic differential,” J. Behav. Therapy Exp. Psych., vol. 25, pp. 49–59, Mar. 1994.
[32] J. R. Fontaine, K. R. Scherer, E. B. Roesch, and P. C. Ellsworth, “The world of emotions is not two-dimensional,” Psychol. Sci., vol. 18, pp. 1050–1057, Dec. 2007.
[33] I. B. Türksen, “Computing with descriptive and veristic words,” in Proc. Int. Conf.
North American Fuzzy Information Processing Society, 1999, pp. 13–17.
[34] L. A. Zadeh, “From search engines to question answering systems–the problems of
world knowledge, relevance, deduction and precisiation,” in Fuzzy Logic and the Semantic
Web, E. Sanchez, Ed. The Netherlands: Elsevier, 2006, ch. 9, pp. 163–211.
[35] M. R. Rajati, H. Khaloozadeh, and W. Pedrycz, “Fuzzy logic and self-referential
reasoning: A comparative study with some new concepts,” Artificial Intell. Rev., pp. 1–27,
Mar. 2012.
[36] T. Forster, Reasoning About Theoretical Entities. Singapore: World Scientific, 2003.
[37] A. Kazemzadeh, P. G. Georgiou, S. Lee, and S. Narayanan, “Emotion twenty questions: Toward a crowd-sourced theory of emotions,” in Proc. ACII’11, 2011, pp. 1–10.
[38] A. Kazemzadeh, J. Gibson, P. Georgiou, S. Lee, and S. Narayanan, “EMO20Q questioner agent,” in Proc. ACII (Interactive Event), 2011, pp. 313–314.
[39] A. Kazemzadeh, S. Lee, P. G. Georgiou, and S. Narayanan, “Determining what questions to ask, with the help of spectral graph theory,” in Proc. Interspeech, pp. 2053–2056, 2011.
[40] A. Kazemzadeh, J. Gibson, J. Li, S. Lee, P. G. Georgiou, and S. Narayanan, “A
sequential Bayesian agent for computational ethnography,” in Proc. Interspeech, Portland,
OR, 2012.
[41] L. F. Barrett, “Are emotions natural kinds?” Perspectives Psychol. Sci., vol. 1, pp.
28–58, Mar. 2006.
[42] L. A. Zadeh, “The concept of a linguistic variable and its application to approximate
reasoning-I,” Inform. Sci., vol. 8, no. 3, pp. 199–249, 1975.
[43] R. John and S. Coupland, “Type-2 fuzzy logic: Challenges and misconceptions,”
IEEE Comput. Intell. Mag., vol. 7, pp. 47–52, Aug. 2012.
[44] J. M. Mendel, Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions.
Upper Saddle River, NJ: Prentice Hall, pp. 451–453, 2001.
[45] J. M. Mendel, R. I. John, and F. Liu, “Interval type-2 fuzzy logic systems made simple,” IEEE Trans. Fuzzy Syst., vol. 14, no. 6, pp. 808–821, 2006.
[46] D. Wu and J. Mendel, “A vector similarity measure for linguistic approximation:
Interval type-2 and type-1 fuzzy sets,” Inform. Sci., vol. 178, no. 2, pp. 381–402, 2008.
[47] B. Kosko, “Fuzzyness vs. probability,” Int. J. General Syst., vol. 17, nos. 2–3, pp.
211–240, 1990.
[48] D. Wu and J. M. Mendel, “The linguistic weighted average,” in FUZZ-IEEE, Vancouver, BC, pp. 566–573, 2006.
[49] P. Ekman, “Facial expression and emotion,” Amer. Psychol., vol. 48, no. 4, pp. 384–
392, 1993.
[50] G. Mishne, “Applied text analytics for blogs,” Ph.D. dissertation, Univ. Amsterdam,
Amsterdam, The Netherlands, 2007.
[51] T. F. Cox and M. A. A. Cox, Multidimensional Scaling, 2nd ed. Boca Raton, FL: CRC
Press, 2000.
[52] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. Chang, S. Lee,
and S. Narayanan, “IEMOCAP: Interactive emotional dyadic motion capture database,”
J. Lang. Resour. Eval., vol. 42, no. 4, pp. 335–359, 2008.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
49
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Qiong Wu and Chunyan Miao
Nanyang Technological University,
SINGAPORE
I. Introduction
W
ith the advances in computer
graphics, communication technologies and networking, virtual
worlds are rapidly becoming part
of the educational technology landscape [1].
Dede [2] suggests that the immersive interfaces offered by virtual worlds can promote
learning, by enabling the design of educational
experiences that are challenging or even
impossible to duplicate in real world. In recent
years, the usage of virtual worlds within the
educational context is growing quickly. The
New Media Consortium (NMC) Annual Survey on Second Life (SL) received 170%
increase in response rate between 2007 and
2008. They also found that many of the educators who earlier used the existing SL, have
started creating their own virtual worlds in
less than a year’s time [3].
Virtual Singapura1 (VS) is a Virtual Learning
Environment (VLE) designed to facilitate the
learning of plant transport systems in lower
secondary school. It has been employed in various studies, such as design perspectives for
learning in VLE, pre-service teachers’ perspectives on VLE in science education, product failure and impact of structure on learning in VLE,
slow pedagogy in scenario-based VLE, and
what students learn in VLE, etc. [4]–[8]. Till
1
http://virtualsingapura.com/game/project/
© DIGITAL STOCK
Digital Object Identifier 10.1109/MCI.2013.2247826
Date of publication: 11 April 2013
50
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
1556-603X/13/$31.00©2013IEEE
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
the same is provided in Section II. Berlyne [17] identified
date, over 500 students in Singapore and over 300 students in
four factors, viz., novelty, uncertainty, conflict and complexity,
Australia have played VS. During the field studies of Virtual
that can stimulate curiosity and determine the stimulation
Singapura, several issues with learning in VLE have been
level. Wundt [18] postulated an inverted U-shape relationship
observed. First, students tend to spend more time exploring
between stimulation level and the arousal of three curiosthe landscape of the virtual world rather than concentrating
ity-related emotions: boredom, curiosity and anxion the learning content. Second, some low-funcety. This relationship demonstrates that
tioning students studying alone in VLE often
too little stimulation results in boredom,
get confused or stuck, and require constant
too much stimulation results in anxiety
guidance from teachers or game designAbstract—Existing Virtual
and only optimal stimulation can
ers to move forward.
Learning Environments (VLE)
have two major issues: (1) students
result in curiosity.
Based on these observations, we
tend to spend more time playing than
Based on these psychological
propose a virtual peer learner to
learning and (2) low-functioning students
background,
curiosity appraisal
reside in VLE and accompany stuoften face difficulty progressing smoothly. To
for
the
proposed
virtual peer
dents in learning. The idea is
address these issues, we propose a virtual peer
learner is modeled as a twoderived from the common edulearner, which is guided by the educational theory
of peer learning. To create a human-like, naturally
step process: (1) determinacational practice of peer learnbehaving virtual peer learner, we build a computational
tion of stimulation level and
ing, where students learn with
model of curiosity for the agent based on human psy(2) mapping from the
and from each other without
chology. Three curiosity-related emotions, namely boredeter mined stimulation
the immediate intervention
dom, curiosity and anxiety, are considered. The appraisal of
level to the corresponding
of a teacher [9]. Benefits of a
these emotions is modeled as a two-step process: determination of stimulation level and mapping from the stimulaemotions. In the decisionpeer learner include: a peer
tion level to emotions. The first step is modeled based on
making system of the
learner can present “learning
Berlyne’s theory, by considering three factors that contribvirtual peer learner, curiostriggers”, that are interactions
ute to the arousal of curiosity: novelty, conflict and comity-related emotions act as
or experiences causing stuplexity. The second step is modeled based on Wundt’s theintrinsic rewards and infludents to try new things or to
ory, by dividing the spectrum of stimulation level into
three aforementioned emotion regions. Emotions
ence the agent’s action
think in novel ways; bi-direcderived from the appraisal process serve as intrinsic
strengths. In order to demtional peer relationships can
rewards for agent’s behavior learning and influence
onstrate the effectiveness of
facilitate professional and perthe effectiveness of knowledge acquisition. Empiricuriosity-related
emotions, we
sonal growth; and tapping into a
cal results indicate curiosity-related emotions can
simulate virtual peer learners in
learner’s own experience can be
drive a virtual peer learner to learn a strategy
similar to what we expect from human stuVS and conduct two sets of
both affirming and motivating
dents. A virtual peer learner with curiosexper iment. The first set of
[10]. Hence, a virtual peer learner
ity exhibits higher desire for exploraexperiment shows that curiosityhas the potential to engage students
tion and achieves higher learning
related
emotions can drive the virtual
and motivate them to spend more time
efficiency than one withpeer learner to learn a natural behavior
on the learning content. Also, a virtual
out curiosity.
strategy similar to what we expect from
peer learner can potentially help low-funchuman students. The second set of experiment
tioning students to think and learn better in VLE.
shows that a curious peer learner exhibits higher
In order to design a virtual peer learner that can
level of exploration breadth and depth than a non-curious
emulate a real student and behave naturally in the learning propeer learner.
cess, we believe a psychologically inspired approach is necessary.
The rest of the paper is organized as follows: Section II
In human psychology, studies have shown that curiosity is an
presents the psychological background for this research.
important motivation that links cues reflecting novelty and
Section III provides a short review on existing curiosity
challenge with natural behavior such as exploration, investigamodeling systems. Subsequently, in Section IV, we state
tion and learning [11]. In Reiss’s [12] 16 basic desires that
the key differences between our approach and the existing
motivate our actions and shape our personalities, curiosity is
curiosity modeling systems. Next, we present the prodefined as “the need to learn.” Attempts to incorporate curiosposed curious peer learner in Section V. Section VI disity into Artificial Intelligence find curious machines have
cusses the experimental process and the results obtained.
advanced behavior in exploration, autonomous development,
Finally, the major conclusions and future works are sumcreativity and adaptation [13]–[16]. However, as a basic desire
marized in Section VII.
that motivates human active learning [12], the role of curiosity
in a virtual peer learner is relatively unexplored.
In this work, we study the role of curiosity in simulating
II. Psychological Background
human-like behavior for virtual peer learners. To model the
In psychology, a major surge of study on curiosity began in
appraisal process of curiosity, we get inspirations from psy1960s. Loewenstein [19] divided theories on curiosity into
chological theories on human curiosity. A short review on
three categories: incongruity theory, competence theory and
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
51
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
postulated an inverted U-shape relationship between the stimulation level and
three curiosity-related emotions. According to him, too little stimulation results in
boredom while too much stimulation
results in anxiety, and only optimal stimulation results in curiosity.
In this work, both Berlyne’s theory and Wundt’s theory
serve as the psychological background for modeling humanlike curiosity in autonomous virtual peer learners.
Artificial Intelligence research frequently assumes
that the human decision-making process consists
of maximizing positive emotions and minimizing
negative emotions.
drive theory. The incongruity theory holds on the idea that
curiosity is evoked by violation of expectations [20], while the
competence theory views curiosity as an intrinsic motivation
to master one’s environments [21]. However, as Loewenstein
noted, both the incongruity theory and competence theory fail
to give a comprehensive account of curiosity. Hence, we focus
on the drive theory, advocating the existence of a curiosity
drive, either primary (homeostatic generated as hunger) or secondary (externally generated by stimuli) and look in depth at
Berlyne’s theory.
In order to understand curiosity, Berlyne conducted
extensive studies by observing the behavior of humans and
animals [17]. Different from traditional psychological
researches that concentrated on problems of response selection
(what response human will make to one standard stimulus at
a time), Berlyne interpreted curiosity as a process of stimulus
selection (when several conspicuous stimuli are introduced at
once, to which stimulus will human respond). Consider a real
life scenario, when a child is given several toys at the same
time, he will choose one toy out of the many to play with.
The study of curiosity tries to understand the underlying
mechanism that drives the child to select one stimulus (toy)
when faced with many choices.
Berlyne identified four major factors, viz., novelty, uncertainty, conflict and complexity, that can lead to curiosity and
determine the stimulation level. Novelty refers to something new. For instance, the child would be attracted to a
toy with new features, such as a toy car with story telling
functions. Uncertainty arises when a stimulus is difficult to
classify. The likelihood or degree of uncertainty depends on
the number of possible classes that the particular stimulus
belongs to. For example, the child may be interested in a toy
vehicle with both sails and wings, because he cannot immediately tell if it is a ship or plane. Conflict occurs when a
stimulus arouses two or more incompatible responses simultaneously in an organism. For example, the experience of
playing with a toy car, requires the child to press the forward button to win a race with a friend’s toy car, while at
the same time demands the child to press the backward
button to dodge a barrier. This may engage the child in
playing and make him decide to choose the car again.
Complexity is roughly defined as the amount of variety or
diversity in a stimulus pattern. For example, the child may
choose a jigsaw puzzle with twenty pieces rather than one
with only four pieces.
However, a higher level of stimulation does not necessarily lead to a higher level of curiosity. Wundt [18] introduced the theory of “optimal level of stimulation” and
52
III. Existing Curiosity Modeling Systems
In the past two decades, curiosity has successfully attracted
attention of numerous researchers in the field of Artificial
Intelligence. In this section, we will provide a short review on
existing curiosity modeling systems.
From the machine learning perspective, curiosity has been
proposed as algorithm principles to focus learning on novel
and learnable regularities, in contrast to irregular noise. For
example, Schmidhuber [22] introduced curiosity into modelbuilding control systems. In his work, curiosity is modeled as
the prediction improvement between successive situations and
is an intrinsic reward value guiding the selection of training
examples such that the expected performance improvement is
maximized. In autonomous robotic developmental systems,
Oudeyer and Kaplan [23] proposed an Intelligent Adaptive
Curiosity (IAC) mechanism and modeled curiosity as the prediction improvement between similar situations instead of successive situations.
Curiosity has also been modeled in exploratory agents to
explore and learn in uncertain domains. For example, Scott and
Markovitch [16] introduced curiosity for intelligent agents to
learn unfamiliar domains. They adopted a heuristic that “what
is needed is something that falls somewhere between novelty
and familiarity,” where novelty is defined as a measure of how
uncertain the agent is about the consequence of a stimulus.
Uncertainty is implemented as Shannon’s entropy of all the
possible outcomes to a stimulus. The system can learn a good
representation of the uncertain domain because it will not
waste resources on commonly occurred cases but concentrate
on less common ones. Another work is done by Macedo and
Cardoso [13], who modeled curiosity in artificial perceptual
agents to explore uncertain and unknown environments. This
model relies on graph-based mental representations of objects
and curiosity is implemented as the entropy of all parts that
contain uncertainty in an object.
In creative agents, curiosity has been modeled as an intrinsic
evaluation for novelty. For example, Saunders and Gero [24]
developed a computational model of curiosity for “curious
design agents” to search for novel designs and to guide design
actions. A Self-Organizing Map (SOM) is employed as the
“conceptual design space” for the agent. For a given input, novelty is implemented as a measure of cluster distance.This measure
reflects the similarity between newly encountered design patterns with previously experienced ones. In Merrick and Maher’s
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
model [14], they utilized an improved
SOM model named Habituated SelfOrganizing Map (HSOM) to cluster
Sensors
similar tasks and novelty is calculated by
External Stimuli
a habituation function.
Stateexternal
To summarize, in existing works,
Curiosity Appraisal
curiosity has been integrated into
Learnt
Learning of
Determination of
Emotions
Knowledge
agents’ learning module and decision
Stimulation Level Reinforcement State-Action
Memory
module to enhance their perforMapping
mance. However, these agents can
Mapping from
Energy
hardly be perceived to be believable
Stimulation
Intrinsic
Level to Emotions
by a human observer. There are two
Constraint Agent’s
main reasons for this: (1) existing
Actions
models lack a comprehensive psyUpdate
Emotions
chological theory as background, and
Influence on Action Strength
(2) agents perceive environment on
the machine language level (featureUpdate
Actuators
based knowledge representation)
rather than on the human language
level (semantic knowledge represenFIGURE 1 Architecture of the curious peer learner.
tation). Hence, in this work, we
attempt to build a computational
model of curiosity based on human psychology and by
[30]. Another function of curiosity-related emotions is their
adopting a semantic knowledge representation method.
influence on the agent’s knowledge acquisition ability. This is
inspired by human nature, where our learning ability can be
regulated by different emotion states [31].
IV. An Overview of Our Approach
An overview of the key innovations in our approach is given
as follows:
V. The Curious Peer Learner
First, to mimic a human student, a virtual peer learner
In this section, we present the proposed virtual peer learner
should perceive the VLE at the same level as a human stuwith curiosity-related emotions, referred to as curious peer
dent does. Hence, instead of feature-based knowledge reprelearner. Architecture of the curious peer learner is shown in
sentations, most commonly utilized in existing works, we
Fig. 1. It can be observed that the curious peer learner can
employ a semantic knowledge representation, that can easily
sense external states (e.g., in a learning zone) and receive exterbe interpreted by humans and is more suitable for designing
nal stimuli (e.g., learning tasks). The external stimuli can trigger
virtual peer learners. In this work, we adopt Concept Map
the curious peer learner to perform curiosity appraisal. The
(CM), a semantic knowledge representation stemming from
curiosity appraisal requires learnt knowledge stored in the
the learning theory of constructivism. It has been widely
agent’s memory and consists of two steps: determination of
applied in classrooms for knowledge organization [25] and
stimulation level and mapping from stimulation level to emomany educational softwares for modeling the mind of stutions. Emotions derived from the curiosity appraisal process
dents [26], [27].
serve two functions: (1) as reinforcement value for the learning
Second, the measurement of stimulation level incorporates
of state-action mapping, and (2) as influence on action
three dimensions of information proposed by Berlyne [17],
strengths (e.g., the depth of learning). Actions (e.g., explore)
including novelty, conflict and complexity. The calculation of
derived from the learning of state-action mapping module are
stimulation level is based on an extension and transformation of
performed by actuators, and update intrinsic constraints of the
Tversky’s ratio model [28].
agent (e.g., energy). In the rest of this section, detailed working
Third, we explicitly model three curiosity-related emotions:
mechanism of each module will be introduced.
boredom, curiosity and anxiety. They are appraised based on
Wundt’s theory [18], by adopting two thresholds to divide the
A. Memory and Knowledge Representation
spectrum of stimulation into three emotion regions.
We adopt Concept Maps (CMs) to represent the semantic
Finally, curiosity-related emotions are utilized as intrinsic
knowledge in both learning tasks (knowledge to be learnt) and
reward functions to guide the virtual peer learner’s learning
the agent’s memory (knowledge already learnt).
of behavior strategy. This is inspired by the frequently adopted
A CM is a graph-based representation that describes semanassumption in intrinsically motivated reinforcement learning
tic relationships among concepts. It can be represented by a
that human decision-making process consists of maximizing
directed graph with nodes and edges interconnecting nodes.
positive emotions and minimizing negative emotions [29],
We formalize the symbolic representation of CMs as follows:
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
53
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
2. CO2
1. Water
Is the Material for
Is the Material for
6. Sun Light
Aid
3. Chloroplast
Is the Location of
7. Photosynthesis
Produce
Produce
4. Sugar
5. O2
FIGURE 2 CM for the learning task “photosynthesis.”
A CM M with n concepts, is defined as: M = {C, L}:
1) C = {c i ; c i ! Pc; i = 1, 2, g, n;} represents the concepts,
where Pc is a set of predefined concepts in VLE;
2) L = {l ij ; l ij ! Pl , {null}; i = 1, 2, g, n; j = 1, 2, g, n;} represents the labels describing the relationships between two
concepts, where Pl is a set of predefined labels in VLE.
Based on the above definition, in CMs, concepts and relationships are all semantic expressions. An example of CM is
shown in Fig. 2, wherein the concept set is " c 1: water, c 2: CO 2,
c 3: chloroplast, c 4: sugar, c 5: O 2, c 6: sunlight, c 7: photosynthesis ,
and the label set is " l 17, l 27: is the material for, l 37: is the location
of, l 67: aid, l 74, l 75: produce , .
A relationship in M is defined as a knowledge point, denoted
by k = ^c i, c j, l ij h, where l ij ! null. For example, a knowledge
point in Fig. 2 is (water, photosynthesis, is the material for).
Knowledge in both learning tasks and the agent’s memory
is represented by CMs. Each learning task can be represented
by a set of knowledge points, denoted by T = " k 1; k 2, g, k m , .
For example, the CM in Fig. 2 can be designed to be a learning
task with six knowledge points. Knowledge related to learning
task T that has been learnt by the virtual peer learner is represented by Ts, contained in the agent’s memory.
B. Curiosity Appraisal
Based on psychological theories, curiosity appraisal is modeled
as a two-step process: determination of stimulation level and
mapping from the stimulation level to emotions.
1) Determination of Stimulation Level
Each learning task in VLE is considered as a stimulus. As defined
in the previous section, for each learning task, there is a set of
knowledge points associated, denoted by T = " k 1; k 2, g, k m , .
This set of knowledge points are intended to be learnt by the
agent upon finishing the learning task.
According to Berlyne, four factors: novelty, uncertainty,
conflict and complexity, can stimulate curiosity. With CM
based knowledge representation, the most salient factors
that can be appraised in a learning task (stimulus) include
novelty, conflict and complexity. Novelty and conflict can
54
be reflected in the dissimilarity between knowledge points
to be learnt in the learning task (T ) and learnt ones in the
agent’s memory (Ts). Complexity can be reflected by the
total amount of knowledge points intended to be learnt in
the learning task (T ). The appraisal of uncertainty may
require more complex knowledge representation that
contains uncertain information and will be studied in
future works. Next, the appraisal of novelty, conflict and
complexity is discussed in detail.
We define a novel knowledge point in T as the knowledge
point that is a member of T but does not have a corresponding
knowledge point in Ts, with the same order of concepts. This
indicates that the agent has not learnt the knowledge point
before. All novel knowledge points in T are kept in the novelty
o Ts . Formally,
set, denoted by T o Ts = " k k ! T / J7kl ! Ts, c i = c il / c j = c lj ,,
Tk = ^c i, c j, l ij h, kl = ^c il, c lj, l lij h .
(1)
A conflicting knowledge point in T is defined as the
knowledge point that is a member of T and has a corresponding knowledge point in Ts with same order of concepts, but
with different labels. This indicates that the agent understands
the knowledge point differently from the learning task. All
conflicting knowledge points in T are kept in the conflict set,
u Ts . Formally,
denoted by T u Ts = " k k ! T / 7kl ! Ts, c i = c il / c j = c lj / l ij ! l lij ,,
Tk = ^c i, c j, l ij h, kl = ^c il, c lj, l lij h .
(2)
It can be deduced from the definition that the conflict set
u Ts = Ts u T.
operator u is symmetric, i.e. T It can also be deduced that set difference T - Ts equals to the
union of novelty set and conflict set, i.e., T - Ts =
o Ts hj ^T u Tsh . Hence, the set difference from T to Ts con^T tains two types of information in this context: novelty and conflict.
In order to measure the level of combined novelty and conflict, we
extend Tversky’s classic set similarity measurement, referred to as
the ratio model [28], by introducing asymmetry to the novelty and
conflict information contained in the set difference.
According to the ratio model, the similarity between two
sets A and B can be represented by [28]:
S ^ A, B h =
f ^A + Bh
,
f ^ A + B h + a f ^ A - B h + b f ^B - Ah
a, b $ 0,
(3)
where f is a scale function, and a, b define the degree of asymmetry. According to Tversky, f is usually the cardinality of a set,
reflecting the salience or prominence of various members in
the set. Also, f satisfies additivity, i.e., f ^X j Y h =
f ^X h + f ^Y h . In the ratio model, S(A, B) is interpreted as the
degree to which A is similar to B, where A is the subject of
comparison and B is the reference. One naturally focuses on
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
the subject of comparison. Hence, the features of the subject
are usually weighed more heavily than the features of the reference, i.e., a 2 b.
Next, we extend Tversky’s ratio model to introduce asymmetric measure to the novelty and conflict subsets in the set
difference as follows:
Let
o Bh + p f ^A u B h, d, p $ 0
g^A - Bh = d f ^A -
(4)
and
S ^ A, B h =
f ^A + Bh
,
f ^ A + B h + a g^ A - B h + b g^B - A h
a , b $ 0,
(6)
where D(A, B) is the normalized value containing the dissimilarity information between set A and B.
Based on the definition given in (6), the difference between
knowledge points in task T and agent’s memory Ts can be represented by:
D ^T, Tsh = 1 - S ^T, Tsh
a g ^ T - Ts h + b g ^ Ts - T h
=
,
f ^T + Tsh + a g ^T - Tsh + b g ^Ts - T h
a , b $ 0.
Region of
Boredom
H1
Region of
Curiosity
Region of
Anxiety
H2
Stimulus Intensity
(5)
where g ^ A - B h is a function of the set difference from A to B,
with asymmetry introduced to the novelty and conflict subsets.
The parameters d and p give importance to novelty and conflict
respectively and determine the degree of asymmetry. Thus, S(A,
B) measures the similarity between set A and B, with asymmetry
between the set difference: A - B and B - A (determined by a
and b ), as well as asymmetry between the two types of information contained in the set difference: novelty and conflict (determined by d and p ).
S(A, B) gives the measure of similarity between two sets.
However, novelty and conflict are contained in the dissimilarity
between two sets, as the union of novelty and conflict forms
o Ts h j ^T u Ts h . Hence,
the set difference, i.e., T - Ts = ^T in order to measure novelty and conflict, we must define the
dissimilarity D(A, B) between two sets:
D ^ A, B h = 1 - S ^ A, B h
a g ^ A - B h + b g ^B - Ah
=
,
f ^ A + B h + a g ^ A - B h + b g ^B - Ah
a , b $ 0,
Unpleasantness Pleasantness
THE WORLD’S NEWSSTAND®
(7)
In the appraisal of curiosity, T is the subject of comparison
and Ts is the reference. Here, we give full importance to the
subject T, because only the difference from T to Ts, i.e., T - Ts,
reflects the stimulus’s information, consisting of novelty and
conflict. The difference from Ts to T, i.e., Ts - T, also contains
two sources of information: (1) learnt knowledge points that are
o T, and (2) conflicting
not given in the learning task, i.e. Ts o T does not
u T. However, Ts knowledge points, i.e., Ts -
FIGURE 3 The Wundt curve.
reflect the stimulus’s property but rather the agent’s knowledge
u T has been considered in T - Ts
not given in task T. Also, Ts (due to the symmetry of operator u ). Hence, in the appraisal of
curiosity, we assign a = 1 and b = 0. As a result, the difference
between T and Ts can be simplified as:
t ^T, Tsh = 1 - S ^T, Ts h
D
g ^T - Tsh
=
f ^T kTsh + g ^T - Ts h
o Tsh + p f ^T u Tsh
d f ^T =
o Tsh + p f ^T u Tsh ,
f ^T kTsh + d f ^T d, p
$ 0.
(8)
t reflects the combined appraisal of
It can be observed that D
novelty and conflict in a learning task T.
Now, let us consider the third factor that governs the
stimulus selection-complexity. In the context of VLE, complexity of a task T can be measured by the normalized
salience of all knowledge points contained in the task, represented by:
P ^T h =
f ^T h
, C = " T1, T2, g, Tn , ,
max T l! C f ^T lh
(9)
where C is the set of all the predefined tasks in VLE.
t , because
Here, we model complexity as a scaling factor for D
the value of novelty and conflict can be amplified in very complex
tasks and reduced in very simple tasks. For example, searching for
an intended piece in a jigsaw puzzle with 1000 pieces is more difficult than searching in one with 10 pieces. Hence, the stimulation
level of a learning task T, denoted by X ^T h, is defined as:
t ^T, Ts h ,
X ^T h = P ^T h $ D
(10)
t ^T, Tsh
where P (T ) is the measure of complexity and D
reflects the combined appraisal of novelty and conflict in a
stimulus as given in (8).
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
55
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
2) Mapping from Stimulation Level to Emotions
In psychology, Wundt introduced the Wundt curve (Fig. 3), an
inverted “U-shape” relationship between the stimulation intensity and arousal of emotions [18]. Three emotions are associated
along the spectrum of stimulus intensity, where too little stimulation results in boredom, too much stimulation results in anxiety, and optimal stimulation results in curiosity.
Based on Wundt’s theory, the appraisal of curiosity related
emotions is modeled as follows:
If X ^T h # i 1 & Boredom,
If i 1 1 X ^T h 1 i 2 & Curiosity,
If X ^T h $ i 2 & Anxiety,
0 # i1 # i2 # 1
(11)
where X ^T h is the stimulation level of learning task T,
obtained from (10), and i 1, i 2 are thresholds that split the
stimulus intensity axis into three emotion regions. The two
thresholds determine the curious peer learner’s propensity
towards each emotion. For example, if i 1 is close to 0 and i 2
is close to 1, then the virtual peer learner will become easily
curious about any learning task. On the contrary, if i 1 is very
close to i 2, then the virtual peer learner will have a narrow
curious region and become very picky about learning tasks.
C. Learning of State-Action Mapping
In real life, a curious student often exhibits higher tendency
to explore and higher ability to acquire novel information
[32]. Theoretically, a virtual peer learner with curiosity should
also exhibit higher tendency for exploration and higher ability for knowledge acquisition than one without curiosity. In
VLE, the believability of a virtual peer learner largely depends
on its strategy of state-action mapping. Hence, we allow the
virtual peer learner to adapt its behavior strategy based on the
mechanism of reinforcement learning [33]. In the remaining
part of this section, learning of state-action mapping for the
virtual peer learner is presented.
1) States of the Virtual Peer Learner
For the virtual peer learner, a state is the combination of inner
state and external state: State = State inner # State external .
The inner state is defined as a two tuple: State inner =1
emotion, energy 2. Here, emotion denotes the current emotion state of the virtual peer learner. Since we mainly focus
on the curiosity-related emotions, emotion can take four values: curiosity, boredom, anxiety and no_emotion. Curiosity, boredom,
anxiety are the possible emotions for the virtual peer learner
when learning tasks are nearby, i.e. there exist stimuli. When
no stimuli are nearby, the virtual peer learner’s emotion is set
to be no_emotion.
The second element energy is an intrinsic constraint that the
virtual peer learner should take into consideration when choosing actions. This is a natural constraint in any learning environment for a real human student. When a student feels fatigue, he/
she will need some rest before continuing to work. Also, different
learning tasks may cause different levels of fatigue. For example,
56
browsing through the topics is much easier than studying a topic
after it is chosen. Hence, a good student knows how to adjust
his/her learning strategy to properly spend energy. The value of
energy changes as follows:
energy = energy + E ^ a h, a ! Action ,
(12)
where E is a function mapping from an action a to its cost of
energy. When energy 1 0, the virtual peer learner can only
choose rest to recharge energy.
The external state Stateexternal reflects the virtual peer learner’s relation to VLE, and is defined as a two tuple:
State external =1 in_learning_zone, next_to_task 2. In_learning_
zone and next_to_task are binary values indicating whether the
virtual peer learner is in learning zones and whether its location
is within a range to learning tasks, respectively.
2) Actions of the Virtual Peer Learner
In VLE, a human student can take a great variety of actions
according to the design of control. Here, we focus on three
action categories relating to the acquisition of knowledge:
explore, rest and study.
The first category contains actions of a human student to
explore interesting learning contents in VLE. Examples include
clicking objects, reading information, etc. When a student is
tired of learning new stuff, he/she will choose some leisure
activities such as roaming around, chatting with friends, etc.
These actions are categorized under the group of rest. The third
category is study, including all actions related to gaining knowledge, such as reading, writing, answering questions, etc.
Here, we stay on the level of action categories and do not
specify the actions in each category. Hence, actions of the virtual peer learner are defined as follows:
Action = " explore, rest, study , .
(13)
The action explore can update next_to_task in the agent’s
external state and decrease certain amount of energy in its inner
state. The action rest can recover certain amount of energy in the
agent’s inner state.
The action study can incorporate a certain amount of new
knowledge in the current learning task to the agent’s memory.
We define the learning efficiency D of a virtual peer learner as:
D = l, 0 # l # 1 ,
(14)
where l is the base learning ability when the virtual peer
learner is in no_emotion state. The action study is implemented
by randomly selecting D percentage of new knowledge points
in the current learning task and recording them into the
agent’s memory.
3) The Learning Process
In this system, the virtual peer learner will only have the knowledge about what actions can be taken in each state, but not the
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
model of the world or the predefined rules on which action to
choose. Hence, we adopt a modeless reinforcement learning
mechanism, Q-learning [33], for the agent. The goal of
Q-learning is to estimate the Q(s, a) values, expected rewards for
executing action a in state s. Each Q(s, a) is updated based on
Q ^s, ah = ^1 - ~h $ Q ^s, ah + ~ $ ` r + cV ^slh j ,
(15)
V ^slh = max a ! Action Q ^sl, ah
(16)
where
is the best reward the agent can expect from the new state sl .
Here, a is an action and Action is the set of actions available for
the agent. r is the reinforcement value. c is the discount factor
that defines how much expected future rewards affect decision
now. ~ is the learning rate that controls how much weight is
given to the reward just experienced.
4) The Roles of Curiosity-Related Emotions
Emotions perform vital functions in human decision-making.
Psychological studies show that humans voluntarily seek novel
things due to the pleasure of satiating curiosity [17]. Artificial
Intelligence research frequently assumes that the human decision-making process consists of maximizing positive emotions
and minimizing negative emotions [29], [34], [30]. Based on
these observations, curiosity-related emotions are employed as
reinforcement functions to guide the virtual peer learner’s learning of behavior strategies. The emotion curiosity gives positive
reinforcement, while both boredom and anxiety give negative
reinforcement.
Also, in humans, emotion can influence action strengths
[31]. For example, a student who is interested in a subject will
concentrate more and achieve higher learning efficiency than
one who is bored with the same subject. Hence, the second
role of curiosity-related emotions is to influence actions. Here,
we consider their influence on action study. The virtual peer
learner’s learning efficiency D can be influenced by emotions
as follows:
D = l + F ^emotionh ,
trated in Fig. 4, where Fig. 4(a) shows the landscape of VS,
designed based on 19th century Singapore. Fig. 4(b) shows the
ant hole trigger where students can shrink their avatars to go
inside the tree environment.
For the proposed curious peer learner, we follow the methodology of Learner-Centered Design [35]. It advocates that
technology applications must focus on the needs, skills, and interests of the learner. Before developing the curious peer learner,
we had described the functionalities of a curious peer learner
and illustrated possible interaction scenarios. Some examples are
shown in Fig. 5. It can be observed from Fig. 5(a) that the curious peer learner (embodied by a butterfly) is posing questions to
a student in order to stimulate his/her thinking on the learning
content. Fig. 5(b) demonstrates that the curious peer learner is
directing the student’s attention to the potentially interesting
learning content.
We conducted a pre-study among several students who
have played VS. Responses from questionnaires and interviews have shown that students are interested in having a
(a)
(17)
where F is a mapping from the emotion to its influence on the
agent’s learning efficiency. Here, F(curiosity) returns a positive
value, while F(boredom) and F(anxiety) both return negative values. D is always capped between 0 and 1.
VI. Experiment
In this section, we present the experimental details and the
experimental results obtained.
(b)
A. Virtual Singapura
Virtual Singapura (VS) is a virtual learning environment
designed for lower secondary school students to learn the plant
transport systems. The virtual world environment in VS is illus-
FIGURE 4 The virtual environment in VS. (a) The VS landscape
designed based on the 19th century Singapore. (b) The ant hole
trigger where students can shrink their avatars and go inside a tree.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
57
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Concept Map is a semantic knowledge representation
method stemming from the learning theory of
constructivism.
curious peer learner in VS. Some of the comments from the
students include: “I prefer to have a virtual friend to play
with me,” “I will have a better knowledge if I can compare
my knowledge with a virtual friend,” “A virtual friend who
can be curious about the ant holes, water molecules, and
shooting games will make me feel more interested.”
B. Experiment Setup
In order to test the effectiveness of curiosity-related emotions
on influencing the virtual peer learner’s learning of behavior
(a)
strategy, we build up a simulation environment in VS. The simulation environment
consists of two main elements:
❏ Learning Zones (LZs) that contain predefined learning tasks.
❏ Virtual peer learners that reside in VLE to
take learning tasks.
1) Generation of Learning Tasks
For each LZ, we generate learning tasks based on one expert
CM with 20 concepts and 200 relationships. From this expert
CM, we spawn 15500 learning tasks. Each learning task is a
submap of the expert CM with 5 concepts randomly chosen.
Hence, in one LZ, there are in total 200 knowledge points and
15500 learning tasks. Some of the knowledge points will be
repeated in different learning tasks and the agent can not learn
more than the number of predefined knowledge points (200)
in one LZ.
2) Parameter Setting for the Virtual Peer Learners
In this experiment, we focus on examining the effect of curiosity-related emotions by comparing the performance of virtual
peer learners with and without curiosity-related emotions.
Hence, we simulate two types of virtual peer learner: a curious
peer learner with curiosity appraisal process and a non-curious
peer learner without curiosity appraisal process. The parameter
setting for the two types of virtual peer learner is summarized
in Table 1.
First, there are four parameters ^d, p, i 1, i 2h regarding curiosity appraisal. Only the curious peer learner will have curiosity
appraisal process. The four parameters can be understood as
describing a specific personality towards curiosity. As this paper
only considers the comparison between agents with and without
curiosity, the parameters are chosen to represent a curious peer
learner with an intermediate level of curiosity. d and p are non
negative real numbers that give importance to novelty and conflict respectively. If d is greater than p , then the agent will focus
more on novelty.This means that the agent will magnify the contribution of novelty to stimulation level and lessen the contribution of conflict to stimulation level. Here, we consider the curious
peer learner with equal preference to novelty and conflict, and set
d and p both as 1. For the two parameters that split the stimulus
TABLE 1 Parameter setting.
(b)
FIGURE 5 Possible interaction scenarios of a curious peer learner in
VS. (a) The curious peer learner embodied as a butterfly to stimulate
students’ thinking towards the learning content. (b) The curious peer
learner embodied as a butterfly to direct students’ attention to the
interesting learning content.
58
FUNCTIONALITY RELATED
PARAMETER
VALUE
CURIOSITY APPRAISAL
(ONLY FOR CURIOUS
PEER LEARNER)
d
p
1
1
0.3
0.8
REINFORCEMENT LEARNING
~
i1
i2
c
KNOWLEDGE ACQUISITION
l
0.5
0.8
0.5
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
intensity axis (Fig. 3) into three regions, we
In psychology, Berlyne identified four factors, viz.,
choose intermediate level for both i 1 and i 2,
with i 1 = 0.3 and i 2 = 0.8. This is intuitive
novelty, uncertainty, conflict and complexity, that can
because most of humans will have an intermelead to curiosity and determine the stimulation level.
diate level of appraisal on curiosity-related emoWundt postulated an inverted U-shape relationship
tions, and few people will have extremes such as
no region for curiosity ^i 1 = i 2h or no region
between stimulation level and three curiosity-related
for negative emotions ^i 1 = 0 and i 2 = 1h .
emotions: boredom, curiosity and anxiety.
Second, for both curious and non-curious
peer learners, they will perform reinforcement
tem designers. We set the output of actions as follows: rest
learning with the same parameter settings. The difference is
recharges energy fully to 10, explore consumes a small amount
that the curious peer learner has curiosity-related emotions as
of energy by 0.05 and study consumes a large amount of
reinforcement functions, while the non-curious peer learner
energy by 3. This is because browsing through topics
does not. ~ and c are the two parameters determining the
(explore) will be much easier than starting to work on the
reinforcement learning process in (15). They are all real numtopic (study) after it is chosen.
bers within the range of [0,1]. ~ is the learning rate that conSimilarly, for the function mapping from emotions to their
trols how much weight is given to the reward just experienced.
influence on the agent’s learning efficiency in (17), we set the
A high value of ~ can cause very sudden changes in the learnt
influence as -0.2 for boredom, +0.3 for curiosity and -0.4 for
Q-value, while a very low value of this parameter causes the
anxiety. This is based on human’s natural tendency. A student
learning process to be slow. Hence, an intermediate value of ~
with positive emotion tends to achieve better learning effiis favorable. In this experiment, we set ~ = 0.5. The parameter
ciency due to mental excitement. On the contrary, the learnc is the discount factor that defines how much expected future
ing efficiency of a student with negative emotion can be
rewards affect the decision now. A high value of this parameter
harmed due to the intrinsic resistance to learning.
gives more importance to future rewards, while a low value
gives more importance to current rewards. From the experiment, it showed that the agent needs a comparatively high
C. Experimental Results
value of c to learn a proper behavior strategy. Hence, we set c
In this section, two sets of experiment are conducted and the
with the value 0.8.
results are analyzed.
Third, the base learning efficiency l in (14) determines the
knowledge acquisition ability of virtual peer learners. l is a real
1) The Effect of Curiosity-Related
number within the range of [0,1]. The higher the value, the
Emotions on Behavior Learning
higher percentage of new knowledge will be accurately
In this experiment, we analyze the effect of curiosity-related
acquired and integrated into the agent’s memory. We set same
emotions on the learning of state-action mapping in the curibase learning efficiency for both the curious and non-curious
ous peer learner.
peer learners. The difference is that only the curious peer
We split the simulation into two phases: reinforcement
learning phase and steady phase. In the reinforcement learning
learner’s learning efficiency can be influenced by curiosityrelated emotions. In this experiment, we do not consider virphase, the curious peer learner is allowed to learn a behavior
strategy based on random exploration of state-action pairs and
tual peer learners with extremely high learning abilities or
extremely low learning abilities. For example, a virtual peer
the intrinsic rewards generated by curiosity-related emotions.
In the steady phase, the curious peer learner is put into a new
learner with l = 1 can learn everything in a learning task and a
LZ to exploit the behavior strategy learnt in the reinforcement
virtual peer learner with l = 0 can learn nothing in a learning
learning phase. For each phase, we ran 104 steps.
task. Hence, we choose to simulate virtual peer learners with
intermediate level of learning abilities, and set l as 0.5.
3) Function Mappings
The function mappings are summarized in Table 2. Function
f (A) is the scaling function in (8), reflecting the salience of
various members in a set. Here, we adopt the most commonly used scale function in Tversky’s ratio model, i.e. the
cardinality of a set [28].
The energy cost function in (12) determines the intrinsic
constraint of the virtual peer learners. In this system, the
function acts as a system rule that both the curious and noncurious peer learners should consider while choosing action
strategies. Here, the settings are purely determined by the sys-
TABLE 2 Function mapping.
FUNCTION
INPUT
OUTPUT
f(A)
A
E(a)
REST
EXPLORE
STUDY
RECHARGE FULLY TO 10
-0.05
-3
F(emotion)
BOREDOM
CURIOSITY
ANXIETY
-0.2
+0.3
-0.4
A
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
59
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
No. of k
THE WORLD’S NEWSSTAND®
10
5
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
1.4
1.6
1.8
Simulation Steps
2
# 104
(a)
0
0.2
0.4
0.6
0.8
1
1.2
2
# 104
Simulation Steps
(b)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
# 104
Simulation Steps
(c)
0
0.2
0.4
0.6
0.8
1
1.2
Simulation Steps
(d)
1.4
1.6
1.8
2
# 104
No. of k
FIGURE 6 Behavior of the curious peer learner in training and testing phases. (a) Knowledge
points learnt. (b) Study. (c) Explore. (d) Rest.
10
5
0
0
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
Simulation Steps
(a)
0
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
Simulation Steps
(b)
0
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
Simulation Steps
(c)
0
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
Simulation Steps
(d)
FIGURE 7 Behavior of the curious peer learner in the testing phase. (a) Knowledge points
learnt. (b) Study. (c) Explore. (d) Rest.
60
In Fig. 6, we plot the agent’s behavior in both the reinforcement learning
and steady phases. Fig. 6(a) shows the
number of knowledge points learnt in
each simulation step when the action
study is taken. It can be observed that
the activation pattern in Fig. 6(a) corresponds to the activation pattern of
study in Fig. 6(b). Fig. 6(b)–(d) plot the
activation pattern for action study,
explore and rest, respectively. It can be
observed from Fig. 6 that the curious
peer learner learnt a strategy in the
reinforcement learning phase. This is
shown by the significantly different
behavior patterns in the two phases.
Let us consider Fig. 6(b) “Study.” It
can be seen that the curious peer
learner took a significantly less number
of action study in the steady phase than
in the reinforcement phase. Also, the
number of knowledge points learnt
(shown in Fig. 6(a) “Knowledge point
learnt”) in the steady phase maintained
a comparatively higher level than that
in the reinforcement learning phase. It
can be deduced that the curious peer
learner learnt an “intelligent” strategy
similar to what we expect from human
students: they will take learning tasks
only when interested and learn with
a high efficiency, rather than take
whichever learning task appearing in
front of them.
Next, let’s analyze Fig. 6(c) and
Fig. 6(d). It can be observed that in
the reinforcement learning phase, the
curious peer learner randomly chose
explore or rest with no strategy. But in
the steady phase, the curious peer
learner would continuously do exploration before an interesting learning
task was detected, on the condition
that the agent had enough energy.
This resembles the behavior of human
students: in learning process, they will
first search for topics, and only when
an interesting topic is found, they will
start working on it. When they feel
tired, they will take a rest before continuing search or study.
Hence, it can be deduced that
curiosity-related emotions successfully
guide the curious peer learner to learn
a natural state-action mapping. The
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
positive rewards from curiosity and the negative rewards from
boredom and anxiety can guide the curious peer learner to
search for interesting learning tasks with intermediate level
of stimulation and avoid learning tasks with either too low
or too high stimulation. This phenomenon is in line with
Wundt’s theory.
TABLE 3 Exploration comparison between virtual peer learners
with curiosity and without.
MEASURE
CURIOSITY
NO CURIOSITY
EXPLORATION BREADTH
949
386
EXPLORATION DEPTH
5.33
0.74
No. of k
2) The Comparison of Exploration Breadth
The exploration breadth and depth of both virtual peer
and Depth Between a Curious Peer Learner
learners are summarized in Table 3. The exploration breadth is
and a Non-Curious Peer Learner
calculated by the total number of learning tasks browsed by
In this experiment, we compare the performance of two virthe agent in the testing phase. It can be observed that the
tual peer learners: one curious peer learner with curiositycurious peer learner explored 949 tasks while the non-curirelated emotions as intrinsic rewards and one non-curious peer
ous peer learner explored only 386 tasks. The exploration
learner without curiosity-related emotions. In order to analyze
breadth of the curious peer learner is around 3 times that of
the performance of the two virtual peer learners, two indicathe non-curious peer learner. The exploration depth is calcutors are defined:
lated by the average number of knowledge points learnt in
1) Exploration breadth:This indicator is measured by the numthe learning tasks studied by the agent in the testing phase. It
ber of learning tasks browsed through by the agent in a
can be seen that the curious peer learner learnt 5.33 (128
period of time. It indicates the virtual peer learners’ tenknowledge points learnt in total and studied 24 learning
dency to explore for interesting learning tasks.
tasks) knowledge points per learning task, while the non2) Exploration depth:This indicator is measured by the average
curious peer learner only learnt 0.74 (138 knowledge points
number of knowledge points learnt per learning task. It
learnt in total and studied 187 learning tasks) knowledge
indicates the virtual peer learners’ average learning efficiency.
points per learning task. The number 0.74 is less than 1
In this experiment, we first trained the two virtual peer
because for the non-curious peer learner, it sometimes learnt
learners in one LZ. Then, we put them in another LZ for
nothing new in the learning tasks taken. This indicates that
testing. We run 104 steps for both training and testing phase.
for the non-curious peer learner, it wasted a lot of time on
In order to compare the performance of the two virtual peer
learners, they always perform in same
LZs. The behavior of the two virtual
peer learners in the testing phase are
10
shown in Fig. 7 and Fig. 8, respectively.
5
By comparing Fig. 7 and Fig. 8, it
0
can be observed that the curious peer
0
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
lear ner lear nt a behavior strategy
Simulation Steps
resembling that of a human student,
(a)
while the non-curious peer learner
chose actions randomly and behaved
irrationally. The curious peer learner
0
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
followed a behavior pattern that shows
Simulation Steps
continuity in exploration for interest(b)
ing learning tasks. It only studied learning tasks when its curiosity was aroused
and tried to minimize negative emotions by avoiding tasks that are too low
0
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
or too high in stimulation. The tenSimulation Steps
dency of maximizing positive emotions
(c)
guarantees an averagely high learning
efficiency. However, the non-curious
peer learner randomly chose to explore
or rest without showing a continuity in
0
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
the exploration behavior. Also, it did
Simulation Steps
not choose learning tasks deliberatively
(d)
and spent a lot of time on learning
tasks that were not interesting and FIGURE 8 Behavior of the non-curious peer learner in the testing phase. (a) Knowledge points
learnt with very low efficiency.
learnt. (b) Study. (c) Explore. (d) Rest.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
61
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
studying learning tasks that are not interesting and gained
little knowledge from them.
Hence, it can be deduced that curiosity-related emotions
can directly guide the virtual peer learner to learn strategic
behavior resembling a natural student and indirectly influence
the acquisition of knowledge. The curiosity-related emotions
drive the virtual peer learner to enhance both its exploration
breadth and depth. This mimics real life where a curious student tends to explore more as well as learn deeper than a student who is less curious.
VII. Conclusion
In this work, we have modeled human-like curiosity for virtual
peer learners in a virtual learning environment. The proposed
model is built based on psychological theories by Berlyne and
Wundt. In this model, the curiosity appraisal is a two-step process: determination of stimulation level and mapping from the
stimulation level to emotions. The emotions serve as intrinsic
reward functions for the agent’s behavior learning, as well as
influence the agent’s knowledge acquisition efficiency.
Simulation results have shown that curiosity-related emotions can guide the virtual peer learner to behave as naturally as
a real student does. Also, the comparison between a curious
peer learner and a non-curious peer learner has shown that the
curious peer learner can demonstrate higher tendency for
exploration (exploration breadth), as well as higher average
learning efficiency (exploration depth).
The rationale behind this research is that a virtual peer
learner may have the potential to practise “peer learning,” a
common educational practice that can benefit learning in
multiple aspects. A virtual peer learner residing in VLE can
possibly engage students and motivate them to spend more
time on the learning content. Also, a virtual peer learner can
potentially help low-functioning students to think and learn
better in VLE.
We acknowledge several limitations. First, although uncertainty is part of Berlyne’s theory, it is not modeled in this
work due to the deterministic nature of concept maps. Future
work will investigate ways to bring uncertainty into our
knowledge representation. Second, actions of the virtual peer
learner are designed on an abstracted level. In the future, we
will design more complex action sets for the virtual peer
learner so as to achieve higher believability. Third, many of
the parameters in the experiments are empirically set to demonstrate the plausibility of the proposed model. In the future,
we plan to experiment with different parameter settings to
analyze the performance. Lastly, the proposed curious peer
learner is evaluated by simulations only. Large scale field studies to deploy VS with the curious peer learners in schools are
currently being planned.
Acknowledgment
The authors would like to thank the Ministry of Education
(MOE), Singapore, for the Interactive Digital Media (IDM)
challenge funding to conduct this study.
62
References
[1] J. Wiecha, R. Heyden, E. Sternthal, and M. Merialdi, “Learning in a virtual world:
Experience with using second life for medical education,” J. Med. Internet Res., vol. 12,
no. 1, pp. 1–27, 2010.
[2] C. Dede, “Immersive interfaces for engagement and learning,” Science, vol. 323, no.
5910, pp. 66–69, 2009.
[3] A. L. Harris and A. Rea, “Web 2.0 and virtual world technologies: A growing impact
on IS education,” J. Inform. Syst. Educ., vol. 20, no. 2, pp. 137–144, 2009.
[4] M. J. Jacobson, B. Kim, C. Miao, and M. Chavez, “Design perspectives for learning
in virtual worlds,” in Design for Learning Environments of the Future, New York: SpringerVerlag, 2010, pp. 111–141.
[5] S. Kennedy-Clark, “Pre-service teachers perspectives on using scenario-based virtual
worlds in science education,” Comp. Educ., vol. 57, no. 4, pp. 2224–2235, 2011.
[6] S. Kennedy-Clark, “Designing failure to encourage success: Productive failure in a
multi-user virtual environment to solve complex problems,” in Proc. European Conf. Technology Enhanced Learning, 2009, pp. 609–614.
[7] M. Tanti and S. Kennedy-Clark, “MUVEing slowly: Applying slow pedagogy to a
scenario-based virtual environment,” in Proc. Ascilite Curriculum, Technology Transformation
Unknown Future, Sydney, Australia, 2010, pp. 963–967.
[8] S. Kennedy-Clark and K. Thompson, “What do students learn when collaboratively
using a computer game in the study of historical disease epidemics, and why?” Games
Culture, vol. 6, no. 6, pp. 513–537, 2011.
[9] D. Boud, R. Cohen, and J. Sampson, Peer Learning in Higher Education: Learning from and
with Each Other. London: Routledge, 2001.
[10] M. J. Eisen, “Peer-based learning: A new-old alternative to professional development,” Adult Learn., vol. 12, no. 1, pp. 9–10, 2001.
[11] T. B. Kashdan and M. F. Steger, “Curiosity and pathways to wellbeing and meaning in life: Traits, states, and everyday behaviors,” Motivation Emotion, vol. 31, no. 3, pp.
159–173, 2007.
[12] S. Reiss, Who Am I? The 16 Basic Desires That Motivate Our Actions and Define Our
Personalities. New York: The Berkley Publishing Group, 2000.
[13] L. Macedo and A. Cardoso, “The role of surprise, curiosity and hunger on exploration of unknown environments populated with entities,” in Proc. Portuguese Conf. Artificial
Intelligence, 2005, pp. 47–53.
[14] K. Merrick, “Modeling motivation for adaptive nonplayer characters in dynamic
computer game worlds,” Comput. Entertainment, vol. 5, no. 4, pp. 1–32, 2008.
[15] R. Saunders, “Towards a computational model of creative societies using curious
design agent,” in Proc. Int. Conf. Engineering Societies Agents World VII, 2007, pp. 340–353.
[16] P. D. Scott and S. Markovitch, “Experience selection and problem choice in an exploratory learning system,” Mach. Learn., vol. 12, nos. 1-3, pp. 49–67, 1993.
[17] D. E. Berlyne, Conflict, Arousal, and Curiosit. New York: McGraw-Hill, 1960.
[18] W. M. Wundt, Grundzüde Physiologischen Psychologie. Leipzig, Germany: W.Engelman,
1874.
[19] G. Loewenstein, “The psychology of curiosity: A review and reinterpretation,” Psychol. Bull., vol. 116, no. 1, pp. 75–98, 1994.
[20] D. O. Hebb, The Organization of Behavior. New York: Wiley, 1949.
[21] R. W. White, “Motivation reconsidered: The concept of competence,” Psychol. Rev.,
vol. 66, no. 5, pp. 297–333, 1959.
[22] J. Schmidhuber, “Curious model-building control systems,” in Proc. IEEE Int. Joint
Conf. Neural Networks, 1991, pp. 1458–1463.
[23] P. Y. Oudeyer, F. Kaplan, and V. Hafner, “Intrinsic motivation systems for autonomous mental development,” IEEE Trans. Evol. Comp., vol. 11, no. 2, pp. 265–286, 2007.
[24] R. Saunders and J. S. Gero, “A curious design agent,” in Proc. Conf. Computer Aided
Architectural Design Research Asia, 2001, pp. 345–350.
[25] J. D. Novak and D. B. Gowin, Learning How to Learn. Cambridge, U.K.: Cambridge
Univ. Press, 1984.
[26] G. Biswas, K. Leelawong, D. Schwartz, and N. Vye, “Learning by teaching: A new agent
paradigm for educational software,” Appl. Artif. Intel., vol. 19, nos. 3–4, pp. 363–392, 2005.
[27] Q. Wu, C. Miao, and Z. Shen, “A curious learning companion in virtual learning
environment,” in Proc. IEEE Int. Conf. on Fuzzy Systems, 2012, pp. 1–8.
[28] A. Tversky, “Features of similarity,” Psychol. Rev., vol. 84, no. 4, pp. 327–352, 1977.
[29] A. G. Barto, S. Singh, and N. Chentanez, “Intrinsically motivated learning of hierarchical collections of skills,” in Proc. Int. Conf. Development Learn, 2004, pp. 112–119.
[30] M. Salichs and M. Malfaz, “A new approach to modeling emotions and their use on
a decision making system for artificial agents,” IEEE Trans. Affective Comput., vol. 3, no.
99, pp. 56–68, 2011.
[31] H. Hu, E. Real, K. Takamiya, M. G. Kang, J. Ledoux, R. L. Huganir, and R. Malinow, “Emotion enhances learning via norepinephrine regulation of ampa-receptor trafficking,” Cell, vol. 131, no. 1, pp. 160–173, 2007.
[32] H. I. Day and A. Wiley, “Online library curiosity and the interested explorer,” Perform. Instruction, vol. 21, no. 4, pp. 19–22, 1982.
[33] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artif. Intell. Res., vol. 4, no. 1, pp. 237–285, 1996.
[34] E. T. Rolls, “A theory of emotion, its functions, and its adaptive value,” in Emotions
in Humans and Artifacts. Cambridge, MA: MIT Press, 2003, pp. 11–34.
[35] E. Soloway, M. Guzdial, and K. E. Hay, “Learner-centered design: The challenge for
HCL in the 21st century,” Interactions, vol. 1, no. 2, pp. 36–48, 1994.
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Goal-Based
Denial
and Wishful
Thinking
© DIGITAL STOCK 1997
César F. Pimentel, INESC-ID and Instituto Superior Técnico, PORTUGAL
Maria R. Cravo, Instituto Superior Técnico, PORTUGAL
I. Introduction and Motivation
thinking, a widely known coping phenomeAbstract—Denial and wishful
non [6], [5], [15], [2], [11].
ne of the aims of Affective
thinking are well known affecWhen one’s gathered evidence leads
Computing [19] is to design
tive phenomena that influence
human belief processes. Put simply,
to conflicting beliefs, wishful thinking
agents that behave according
they consist of tendencies to disbecan affect the outcome of resolving
to models of human affeclieve what one would not like to be
such a conflict, thus determining
tive phenomena. Among these phetrue, and believe what one would like to
one’s resulting beliefs. Sometimes,
nomena are the roles that emotions
be true. We present an approach to an agent’s
wishful thinking can even be
play on various cognitive probelief dynamics, that simulates denial and wishful thinking, using the agent’s goals as the source
responsible for a conflict among
cesses, such as attention, reasonof affective preference. Our approach also addresses
beliefs, when there is no coning, decision making, and belief
several issues concerning the use of conventional
flicting evidence.
selection. In this paper, we
belief revision in human-like autonomous agents. In our
In AI, belief revision is the
focus on an affective pheapproach, every goal produces a wishful thought about its
process responsible for dealnomenon that concer ns
achievement, in the form of a weak belief. Consequently, a
“disliked situation” leads to inconsistent beliefs, thus
ing with conflicting beliefs.
belief selection.
triggering belief revision. Furthermore, the agent’s
Conventional belief revision
Human beings are biased
belief set is selected based on a measure of prefertheories aim to maintain consistowards believing in what they
ence that accounts for the “likeability” of beliefs
tency
within a set of basic beliefs,
would like to be true (wishful
(among other factors). We test an instantiation
called context, upon the arrival of
thinking) and not believing in what
of our model to assess whether the resulting
agent could produce the intended behavnew conflicting information. These
they would not like to be true
iors. The agent produces behaviors of
theories always accept the new infor(denial). These tendencies constitute
denial and wishful thinking, not only
mation, and inconsistencies are solved
two sides of the same coin, and can
as biases in belief selection, but also
by
abandoning beliefs from the context.
both be generally viewed as wishful
as the triggers to hold or reject
A human user is typically responsible for
certain beliefs.
defining the order(s) among beliefs that is
Digital Object Identifier 10.1109/MCI.2013.2247831
O
Date of publication: 11 April 2013
1556-603X/13/$31.00©2013IEEE
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
63
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
what is the believed context. Since belief revision aims at deciding what is the believed
context, we cannot properly use a fixed order
(or orders) among beliefs based on their resistance to change, without falling into a cyclical
definition. For this reason, we believe that
belief revision is best guided by an order among contexts,
rather than an order among beliefs.
In this paper we present Wishful Thinking Revision (WTR)
[20], an approach to belief revision, and overall belief dynamics, that addresses the three issues discussed in this section.
More specifically, WTR is a framework aimed at supporting,
in AI agents, belief revision with the following properties:
❏ Non-prioritized. New information is not necessarily believed.
❏ Autonomous. Revision is not dependent on the external definition of orders.
❏ Context-oriented. In order to properly model the influence
of likeability in a belief ’s resistance to change, the preferred
context is chosen according to an order (or orders) among
contexts, instead of an order (or orders) among beliefs.
❏ Simulates wishful thinking. The wishful thinking phenomenon is modeled, within the scope with respect to goal
satisfaction.1
The main idea behind WTR is that every goal generates a
tendency to believe in its achievement, and we model this tendency as a weak belief.This way, any information that contradicts
the achievement of a goal (i.e., any undesirable information)
gives rise to an inconsistency, thus triggering belief revision.
Typically, the belief in the goal achievement is abandoned, when
not further supported by any evidence, but exceptions may
occur, depending on various factors.
The fact that WTR generates weak beliefs from goals does
not mean that our agent will believe that all of its goals are, by
default, achieved. Such an approach could be acceptable only for
preservation goals [16] that, by definition, start out achieved, and
one hopes that their state does not change. In WTR, a weak
belief is not treated as a common belief, as explained below.
Although not always viewed as such, “belief is a matter of
degree” [9, p. 21]. You may believe that by using regular mail,
the letter you are about to send will be safely delivered to its
destination. However, if your envelope contains money instead
of just a simple letter, you will most likely not believe in that
safe delivery, and prefer to send it via registered or insured mail.
In WTR we model this relative notion of belief by means of a
function that determines causal strength of a belief. This causal
strength is an attempt to capture the degree of certainty of a
belief. Depending on the situation, a belief may or not be filtered out according to its value of causal strength.
This paper is organized as follows. In Section II we briefly
describe the conventional approach to belief revision, and in
Section III we review the wishful thinking phenomenon. Next,
in Section IV, we introduce our representations and assumptions, and in Section V we formalize our approach to belief
Put simply, belief is a relative notion, in the sense that
a piece of data may be a belief for some aim, and may
not be a belief for another aim.
used to determine which beliefs are kept and which are
abandoned.
Given this state of affairs, if we want to approach belief revision in the context of a human-like autonomous AI agent,
some important questions arise with respect to conventional
belief revision theories:
❏ Why should the agent always prefer the new information
over its previous beliefs?
❏ How can the agent autonomously generate its own order(s)
among beliefs?
❏ Can human-like preferences, in belief revision, be adequately expressed using an order (or orders) among beliefs?
The first question arises because it is not acceptable that
human-like agents always believe the new information. In
other words, we need revision to be non-prioritized, as discussed
by Hansson [8].
The second question arises because, if we want our agents
to be autonomous, their belief revision processes cannot be
dependent on the external definition of orders among beliefs.
To better understand the reason for the third question, we
start by making a distinction between two types of belief
strength that, given their similarity, are commonly modeled as
one and the same:
❏ Certainty. The certainty of a belief corresponds to how
strongly one feels that belief is true.
❏ Resistance to change. A belief ’s resistance to change corresponds to how strongly one feels against abandoning that
belief. This concept can be extended to represent also how
strongly one feels towards adding a new belief.
Isaac Levi [12] also distinguishes these two types of strength,
referring to the latter (resistance to change) as unchangeability
(see also [8]).
If I believe that “My mother boarded flight 17,” this belief ’s
resistance to change can be highly dependent on whether or
not I also believe that “Flight 17 crashed” (because keeping
these two beliefs is highly undesirable). However, if the first
belief is kept, then its certainty is the same regardless of
whether or not the second belief is kept.
Typical approaches to belief revision choose which beliefs
to hold and which to reject based on the reasons that originated each belief. In our view, accounting for the reasons that
originated each belief is enough to model certainty, but is not
enough to adequately guide belief revision, since it does not
capture the notion of likeability, illustrated in the example of
the previous paragraph.
Notice that resistance to change is, by definition, what
guides belief revision, in humans. However, as shown in the
example, a belief ’s resistance to change depends (among other
aspects) on that belief ’s likeability which, in turn, depends on
64
1
This means that anything that the agent would like to be true must be expressed in terms
of goals, in order to inf luence its beliefs.
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
revision, the WTR model. In Section VI we discuss our view of
the commonsense notion of belief, in light of the presented
model. In Section VII we test our model using a sequence of
scenarios, and in Section VIII, we compare our approach to
related work. Finally, in Section IX we present the conclusions.
II. Conventional Belief Revision
An essential aspect of commonsense reasoning is the ability to
revise one’s beliefs, that is, to change one’s beliefs when a new
belief is acquired that is not consistent with the existing beliefs.
In AI, Belief Revision theories decide which of the previous
belief(s) should be abandoned in order to incorporate the new
belief, and to keep the set of beliefs consistent. All belief revision theories try to keep the amount of change as small as possible, according to what is called the minimal change principle [9].
The reason for this principle is that beliefs are valuable, and we
do not easily give them up. However, this principle is not
enough to determine, in a unique way, the change to be made,
and so belief revision theories assume the existence of an order
among beliefs, which states that some beliefs are less valuable
than others, and should be more easily abandoned.
A number of belief revision theories have been developed
since the seminal work of Alchourrón, Gärdenfors and
Makinson [1]. An extensive overview of the past twentyfive
years of research on such theories can be found in [4]. Typically,
these theories assume that beliefs are represented by formulas
of the language L of some logic, and represent the revision of
a set of beliefs b, called context, with a formula U, by (b ) U).
This represents the new set of beliefs, i.e. the new context, and
must be such that: 1) It contains U; 2) It is consistent, unless, of
course, U is a contradiction. To ensure that the result is a
unique context, these theories either assume the existence of a
total order among beliefs, or allow for a partial order, and abandon all conflicting beliefs whose relative value is not known,
thus abandoning more beliefs than necessary.
III. Wishful Thinking
One of the most commonly known influences of affect on
human reasoning and belief is wishful thinking, that is, a bias that
shifts one’s interpretations/beliefs towards “liked” scenarios and
away from “disliked” ones. Generally speaking, what one likes
(as opposed to what one dislikes) is defined as what satisfies or
facilitates one’s current desires, goals, or commitments.
Wishful thinking is a widely known phenomenon, sometimes referred to by other names or as part of other, more general, concepts. In [6], the authors describe Motivational Force as
the phenomenon where motivation (i.e., the desire for pleasure
or for getting rid of discomfort) guides one’s thoughts and
alters one’s beliefs’ resistance to change. Quoting Frijda and
Mesquita, “The motivational source of the beliefs does much to
explain their resistance against change. Abandoning a belief
may undermine one’s readiness to act, and one may feel one
cannot afford that.” [6, p. 66]. According to Frijda’s Law of
Lightest Load [5], “Whenever a situation can be viewed in alternative ways, a tendency exists to view it in a way that
minimizes negative emotional load.” Castelfranchi discusses
how belief acceptance is influenced by likeability [3]. In [18],
Paglieri models likeability as the degree of goal satisfaction that
data represents. In his approach, likeability is one of the data
properties that interfere in the process of belief selection.
Throughout this paper we simply use the term wishful thinking, encompassing the tendency for: a) Wishful thinking (in the
strict sense), as the belief in something because it is liked; b)
Denial, as the rejection of a belief because it is disliked. Denial/
wishful thinking, as “two sides of the same coin”, are recognized as strategies of emotion-focused coping (see, e.g., [15], [2]
and [11]).
As a strategy of emotion-focused coping, wishful thinking
aims at increasing emotional satisfaction in the individual and,
most importantly, preventing extreme and/or prolonged negative affective states (such as shock or depression) that could otherwise hinder the individual’s performance in general. In this
sense, wishful thinking is a mechanism of emotional control/
regulation and, like any such mechanisms, it must operate within
a balanced trade off between satisfaction and realism. Favoring
realism too much may lead to strong and/or long negative affective states, but favoring satisfaction too much leads to losing
track of reality, and in both cases individual performance is
impaired. It is therefore evident that wishful thinking, although
often present, biasing belief strength and resistance to change,
only in exceptional cases causes a change of what is believed.
For the purpose of our work, we distinguish two types of
effects that wishful thinking may have on one’s beliefs:
❏ Passive effects: When one has conflicting evidence, supporting two or more alternative situations, one’s beliefs may fall
on a particular situation because it is more desirable, instead
of on the situation that is supported by stronger evidence.
❏ Active effects: One may start believing something that is not
supported by any evidence, simply because one would like it
to be true. Conversely, one may not believe in something that
is supported by evidence, even in the absence of opposing
evidence, simply because one would not like it to be true.
Active effects are rarer than passive effects, but may occur,
for example, when highly important goals are at stake.
Typically, one’s highest importance goals are preservation goals
[16], such as goals related to keeping oneself (and others) alive
and healthy. For instance, believing that someone close has
died may evoke a feeling of terror, and one may engage in
denial, even when there is only evidence indicating that it is
true and no evidence indicating that it is false (active effects).
If there is also evidence indicating that it is false (albeit weaker
than the evidence indicating that it is true) denial is even more
likely to occur (passive effects).
IV. Assumptions and Terminology
Our approach to belief revision, WTR, assumes an agent with
reasoning capabilities based on a monotonic logic. Given an
agent ag, we represent the language of the logic that ag uses to
represent information (including beliefs) by L ag, and the
derivability relation by =ag .
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
65
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
We assume that all agents in the current world are uniquely
identified by their name. We represent the set of names of the
agents by N, where N + " Obs, WT, Der , =Y
0.
If ag is the agent using WTR, our model assumes that its internal state contains, among other items, the following information:
❏ The agent’s knowledge base, represented by KB(ag)
❏ The agent’s goals, represented by Goals(ag)
❏ For each other agent agi, the subjective credibility that our
agent associates with agi, represented by Cred(ag, agi)
❏ The agent’s wishful thinking coefficient, represented
by wt(ag).
The knowledge base is where the agent keeps a record of
reasons to believe in propositions. This information is important because WTR abandons all beliefs that lose their justifications. Moreover, the knowledge base keeps information about
the origins of basic beliefs, which is used by WTR to measure
belief strength.
The representation we use, for the reasons stored in the
knowledge base, is based on the representation defined by
Martins and Shapiro, in the SWM logic [14], to record dependencies among formulas. Given an agent ag, KB(ag) is a set of
supports, defined as triplets with the form U, T, a , where:
❏ U d L ag is the support’s formula and represents the proposition that is being supported;
❏ T d " Obs, WT, Der , , N is the support’s origin tag and
indicates how U got in the knowledge base, in other words,
what kind of reason supports belief in the proposition represented by U ;
❏ a 3 L ag is the support’s origin set and contains the formulas
that U depends on, in this support (important for keeping
track of the formulas that support derivations).
If A = U, T, a is a support, we define form (A) = U,
ot (A) = T, and os(A) = a. A can be of four kinds, depending on its origin tag, T:
If T = Obs A is called an observation support. This means
that the proposition represented by U was
observed by the agent.
If T = WT A is called a wishful thinking support. This
means that the proposition represented by U
originated, by wishful thinking, from one of
the agent’s goals.
If T = Der A is called a derivation support. This means
that the proposition represented by U was
derived from other formulas.
If T d N A is called a communication support. This means
that the proposition represented by U was
communicated by the agent of name T.
We point out that the same formula may have more than
one support in the knowledge base. For instance, the agent may
be informed of a fact, represented by U , by two different
agents and also observe that fact. This would correspond to
three separate supports with formula U : two communication
supports and one observation support.
Furthermore, observation, communication and wishful
thinking supports are all called non-derivation supports. Formulas
66
that occur in derivation supports are known as derived formulas,
and formulas that occur in non-derivation supports are known
as hypotheses. Notice that a formula can be both a derived formula and a hypothesis, if there is at least one derivation support
and one non-derivation support with that formula.
When A is a derivation support, its origin set is the set of
hypotheses underlying this specific derivation of form ^Ah . If
A is a non-derivation support, its origin set is " form (A) , .
For instance, suppose that agent ag’s knowledge base contains only three supports, as shown in (1).
KB (ag) = {G A, Obs, {A} H ,
G A " B, Peter, {A " B} H ,
G B " C, Susan,{B " C} H} .
(1)
In other words, there are three hypotheses: A, A " B and
B " C. The first was observed by the agent, the second was
communicated by agent Peter, and the third was communicated
by agent Susan.
If the agent combines the first two hypotheses to derive B,
this originates a derivation support with the origin set
" A, A " B , (the hypotheses underlying the derivation). If, then,
the agent combines the newly derived formula (B) with the
third hypothesis (B " C ) to derive C, this originates another
derivation support with the origin set " A, A " B, B " C , .
After these two derivations take place, the agent’s knowledge
base contains five supports, shown in (2).
KB (ag) = {G A, Obs, {A} H ,
G A " B, Peter, {A " B} H ,
G B " C, Susan, {B " C} H ,
G B, Der, {A, A " B} H ,
G C, Der, {A, A " B, B " C} H} .
(2)
Now we move to the second item in the agent’s internal
state, namely the agent’s goals. We recall that WTR aims at
modeling wishful thinking within the scope of goal satisfaction.
Hence, whatever the agent wants to be true, that is meant to be
captured by WTR, must be expressed in terms of goals.
We represent by Goals(ag) the set of goals of agent ag and,
for every g d Goals(ag), we write:
❏ GDesc( g) d L ag to represent the goal’s description, that is,
the formula representing the proposition that the agent
wants to be true
❏ GImp( g) d @ 0, 1 6 to represent the goal’s importance, that
is, the weight that the agent associates with the goal.
WTR assumes that the agent associates a value of subjective
credibility with each of the other agents in the current world.
The value of subjective credibility that an agent ag 1 associates
with another agent ag2 reflects the degree to which ag 1 believes
in what ag2 communicates, and is represented by Cred( ag 1, ag 2 )
d @ 0, 1 6. This may start as a default value, and evolve based on
the interactions between ag1 and ag2.
The wishful thinking coefficient of an agent, ag, reflects the
degree to which ag is susceptible to wishful thinking, and is
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
represented by wt (ag) d 60, 1 6. This value is
As such, WTR treats, in a uniform fashion, “disliked
assumed to be related to the agent’s personality traits (e.g., increasing with extraversion and
situations” and “conflicting collected data.”
decreasing with conscientiousness).
While the creation of observation and
communication supports is fairly intuitive, some rules need to
context, because the knowledge base has derivation supports
be made clear about the creation of derivation and wishful
for all possible derivations. In other words, the expression for
thinking supports.
determining belief space becomes equivalent to {U: } = ag U} .
Wishful thinking supports originate from the agent’s goals. If
In WTR, we say that a context is consistent, as far as the agent
ag is the agent considered by WTR, there is exactly one wishful
knows, if and only if, for every formula in the corresponding
thinking support in KB(ag) for each of ag’s goals, according to (3).
belief space, its negation is not in that belief space. This is
expressed in the definition of predicate Cons (Definition 3).3
6g ! Goals (ag):
Definition 3: When } is the context believed by agent ag, we say
that
} is consistent, as far as ag knows, if and only if Cons( } , ag)
G GDesc (g), WT, {GDesc (g)} H ! KB (ag) . (3)
holds, where predicate Cons satisfies the following condition:
The only exception to this rule is when the agent is
Cons (}, ag) + 6U ! BS (}, ag): JU g BS (}, ag).
deprived of wishful thinking (i.e., wt(ag) = 0), in which case no
wishful thinking support exists in KB(ag).
The management of derivation supports is out of the scope
Notice that, for logically omniscient agents, this corresponds
of WTR, but certain rules must be followed. Given agent ag,
to saying that } is consistent, as far as ag knows, iff } E ag = .
for any derivation support, G U, Der, a H ! KB (ag), the followAs explained above, if we consider an agent that is not logiing conditions must hold:
cally omniscient, some important properties follow:
1) a = ag U (obviously, the formula must be derivable from
1) A logical consequence of the believed context is not necthe origin set)
essarily a belief
2) A context that is consistent as far as the agent knows, may
2) J7al 1 a: al=ag U (origin sets must be minimal)
be logically inconsistent.
3) U g a (self derivations are redundant and should not be
This happens because an agent of this kind does not necesregistered).
sarily derive everything that is possible, and is naturally ignorant
An agent that always generates every possible derivation is
concerning what was not yet concluded. Clearly, this is the case
known as a logically omniscient agent. WTR accounts for both
of humans.
logically omniscient and non-omniscient agents.
As expected, a context, in WTR, is defined as a set of hypotheses. Depending on the believed context, we can determine the
V. Wishful Thinking Revision
valid supports for a formula according to Definition 1.
In this section we describe how belief revision occurs in WTR.
Definition 1: When } is the context believed by agent ag, the
In other words, we describe the process that determines the
set of valid supports for a formula, U, is given by Sups (U, }, ag),
agent’s beliefs at a given moment.
defined as:
If ag is our agent, we divide the hypotheses in the
agent’s knowledge base (i.e., in KB(ag)) in two (possibly
intersecting) sets:
Sups (U, }, ag) = {A ! KB (ag): form (A) = U / os (A) 3 }} .
b 0 ={U: 7 (A ! KB (ag)) form (A)= U / ot (A) ! N , {Obs}}
is the set of collected data. It contains all the hypotheses
In other words, the valid supports for a formula are all the
that originate from the world (via observations and
supports for that formula where the origin set is entirely believed
communications).
(i.e., where the origin set is a subset of the believed context).
An agent’s beliefs, at a given moment, are all the hypotheses
c 0 ={U: 7 (A ! KB (ag)) form (A) = U / ot (A) = WT} is the
in the context believed by that agent, and all the derived formuset of wishful thoughts. It contains all the hypotheses that originate
las that can be derived from that context. Put simply, an agent’s
from goals (via wishful thinking).
beliefs are all the formulas that have at least one valid support.
WTR is responsible for determining, at a given moment,
Definition 2: When } is the context believed by agent ag, ag’s
what consistent subset of b 0 , c 0 is believed. Such subset is obvibelief space2 (i.e., the set with ag’s beliefs) is given by BS( } , ag),
ously a context, since the elements of b 0 , c 0 are hypotheses.
defined as:
We represent that context, i.e. the believed context, by b c.
We distinguish two (possibly intersecting) subsets of b c:
BS (}, ag) = {U: Sups (U, }, ag) ! 4}.
c
b = b + b 0 is the set of base beliefs, that is, of collected data
that is believed.
Notice that, for logically omniscient agents, the belief space
corresponds to all the logical consequences of the believed
3
2
The concept of belief space is also adapted from the SWM logic [14].
Throughout the remaining sections of this paper, wherever we write about consistency,
without further specifying the type of consistency, we refer to the definition of Cons
(Definition 3).
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
67
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
=
c
+ c 0 is the set of wishful beliefs, that is, of wishful
thoughts that are believed.
Figure 1 depicts the main sets involved in the revision process, and the existing support dependencies. The agent’s belief
space, at a given moment, corresponds to the union of the
believed context (b c) with the set labeled Derived Beliefs.
Derived Beliefs is the set of all derived formulas that are not
hypotheses and have at least one valid support.
A belief revision theory that does not account for wishful
thinking simply aims at finding a consistent context in the collected data, b 0 . Since WTR aims at finding a consistent context
in b 0 + c 0, it addresses, not only conflicts among collected data,
but also conflicts between collected data and wishful thoughts. As
such, WTR treats, in a uniform fashion, “disliked situations” and
“conflicting collected data.” In fact, in this paradigm, a contradiction can have, on each side of the conflict, combined forces from
collected data and/or from wishful thinking. Wishful thinking
forces are of a weaker nature and, usually, not enough to singlehandedly overthrow collected data, but can easily be the element
that “turns the tide” in a conflict among collected data.
As discussed in Section I, WTR is context-oriented, in
other words, it is guided by an ordering of contexts (instead of
an ordering of beliefs). More specifically, WTR determines the
believed context, b c, by comparing the values of preference of
several candidate contexts. This process consists of three steps:
1) Determining the candidate contexts (described in
Section V-A)
2) Determining the preference of each candidate context
(described in Section V-B)
3) Choosing a context (described in Section V-C).
c
b
mean all the contexts that may potentially be chosen as the
believed context, depending only on a measurement of context
preference. A context is a candidate context, if and only if it satisfies the three following conditions:
❏ It is a subset of b 0 , c 0
❏ It is consistent, as far as the agent knows
❏ It is a maximal set, in other words, it is not a proper subset
of another candidate context.
The first condition is necessary because, as explained in
Section V, the believed context is a subset of b 0 , c 0 . Notice
that b 0 , c 0 contains the only basic formulas for which there
are reasons to believe, that is, the only hypotheses.
The second condition is common in all belief revision theories, following from the fact that an agent with inconsistent
beliefs is ineffective.
Finally, the third condition is also common in all belief revision theories, following the same criterion behind the minimal
change principle [9] (see Section II): We do not reject a belief for
which there are reasons to believe and no reasons against it.
Requiring that candidate contexts be maximal ensures that, if
} 1 is a candidate context and } 2 1 } 1, then } 2 is not a candidate context, because choosing } 2 corresponds to rejecting
belief in } 1 \} 2, for no reason.
Following these three conditions, the set of candidate contexts is determined according to Definition 4.
Definition 4: If b 0 and c 0 are, respectively, the collected data
and wishful thoughts of an agent, ag, the set of candidate contexts is
given by Cand (b 0, c 0, ag), defined as:
Cand (b 0, c 0, ag) = {m 3 b 0 , c 0: Cons (m, ag) /
(J7n 3 b 0 , c 0: Cons (n, ag) / (m 1 n))}.
A. Candidate Contexts
In order to determine the believed context, the first step is to
determine the candidate contexts. By candidate contexts, we
Derived Beliefs
(Valid) Der Supports
Context (bc)
Wishful Beliefs (c)
Base Beliefs (b)
Collected Data (b0)
Obs/Comm
Supports
World
Wishful Thoughts (c0)
WT
Supports
Goals
FIGURE 1 WTR: Main sets and support dependencies.
68
B. Context Preference
As explained, WTR selects the believed context among the
candidate contexts, depending on their value of preference. In
this section we explain how the preference of a context is
determined. Note that, in the process of determining the preference of a context, we assume that context is believed, and
represent it by b c (the believed context).
In order to determine the preference of a context, a more
basic measure is necessary: belief certainty. As discussed in Section
I, the certainty of a belief is based on the reasons that originated
that belief, i.e., its causes. For this reason, we refer to belief certainty as causal strength. In WTR, the causal strength of a given
belief is determined according to that belief ’s valid supports.
If U is believed by agent ag, when ag’s believed context is
c
c
b (i.e., U d BS (b , ag)), the causal strength (or, simply, strength)
of U is given by CauStr(U, b c, ag) d @ 0, 1@ . A belief with a
causal strength of 1 represents a belief without doubt, and we
call it an absolute certainty (or, simply, certainty). It will become
clear that, in WTR, certainties can only be overthrown by
other conflicting certainties.
Since causal strength of a belief is based on that belief ’s
valid supports, we start by determining the causal strength
conveyed by each of those supports, and then combine the
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
results. In this line of thought, if A is a valid
In our approach, every goal produces a wishful
support for its formula, when b c is the context believed by agent ag, we say that the
thought about its achievement, in the form of a weak
(causal) strength conveyed by A is given by
belief. Consequently, a “disliked situation” leads to
SuppStr(A, b c, ag) d @ 0, 1@ .
c
inconsistent beliefs, thus triggering belief revision.
So, if agent ag’s believed context is b and
U d BS (b c, ag), the value of CauStr
beliefs and the third focuses on the context’s wishful beliefs and
(U, b c, ag) results from combining the values of SuppStr
their opposite counterparts (i.e., disliked beliefs):
(A, b c, ag), for each A in Sups(U, b c, ag).
1) The most important factor is the number of absolute cerWTR does not impose any particular definition for functainties in the base beliefs. Given the meaning of absolute
tion CauStr, however, certain conditions are postulated. For any
certainty, it is only natural that an agent always believes in
formula, U , believed by agent ag, when ag’s believed context is
c
all of its absolute certainties. The only situation where we
b , the definition of CauStr should be such that the following
find conceivable not to believe in an absolute certainty is
conditions hold:
if there are other conflicting absolute certainties (an atyp1) If there is a support, in Sups(U, b c, ag) , that conveys a
ical situation). To achieve this behavior, in WTR, we
strength of 1, then CauStr(U, b c, ag) = 1;
ensure that a context with more absolute certainties (than
2) Otherwise, having one more support in Sups(U, b c, ag),
another), among the base beliefs, always has a greater
or having a higher strength of a support in Sups
value of preference.
(U, b c, ag), increases CauStr(U, b c, ag).
2) Apart from certainties, the number and causal strength of
These postulates impose merely intuitive properties, respecthe other (uncertain) base beliefs also influences the preftively: 1) If I have a reason to be absolutely certain that U is
erence of the context. Obviously, the agent prefers beliefs
true, other reasons supporting belief in U will not invalidate
that have a greater degree of certainty over beliefs that
my certainty; 2) If I believe in U (but not with absolute cerhave a lower degree, and prefers to keep a larger number
tainty), having more reasons or stronger reasons to believe in U
of beliefs over keeping a smaller number (according to
increases the level of certainty of my belief in U .
the minimal change principle).
Once again, WTR does not impose any particular defini3) The third factor is likeability, meant to capture the inflution for function SuppStr, however, certain conditions are posence of wishful thinking. More specifically, a context’s
tulated. For any formula, U , believed by agent ag, when ag’s
likeability is an assessment of the corresponding belief
believed context is b c , any other agent ag l , and any set of
space, in terms of: a) the number and strength of beliefs in
hypotheses a 3 b c, the definition of SuppStr should be such
goal achievements (wishful beliefs), in combination with
that the following conditions hold:
the importance of the corresponding goals, and b) the
1) SuppStr ^ U, ag l , " U , , b c, ag h increases with Cred
number and strength of beliefs in negations of goal
(ag, ag l )
achievements, in combination with the importance of the
2) SuppStr^ U, WT, " U , , b c, ag h increases with the imporcorresponding goals.
tance of the goal with description U, and with wt(ag)
The context preference (or, simply, preference) that an agent, ag,
3) SuppStr ^ U, Der, Y
0 , b c, ag h = 1
attributes to a context, b c , is given by CtxPrf(b c, ag) d R + .
4) SuppStr ^ U, Der, a , b c, ag h remains the same if a has
Since we want the number of certainties, among base beliefs,
one more certainty, decreases if a has one more non-cer(i.e., factor 1. discussed above) to have more weight than all
tainty, and increases if the strength (considering context
c
other factors, we define the preference of a context as that numb \ " U ,, to disregard derivation cycles) of one of the
ber (of certainties), added to a value in @ 0, 1 6 that accounts for
beliefs in a increases.
These postulates impose merely intuitive properties, respecthe remaining factors (i.e., factors 2. and 3. discussed above).This
tively: 1) The more credible I find someone, the stronger I
added value is given by LessSigPrf(b c, ag) d @ 0, 1 6 (Less
believe that person’s communications (this is, in fact, how we
Significative Preference). This formulation ensures that having one
defined Cred, in Section IV); 2) If having U as a goal is a reason
more certainty always grants more context preference than any
for me to believe in U , that reason’s strength increases with the
other combination of factors.
goal’s importance and with how susceptible I am to wishful
Definition 5: Given agent ag and context b c (where b is
thinking (this is in accordance with the meaning of goal imporobtained from b c and ag, as defined in the beginning of Section V),
tance and with the definition of wt, discussed in Section IV);
function CtxPrf is defined as follows:
3) A belief that corresponds to a tautology is obviously a cerCtxPrf (b c, ag) = # {U ! b: CauStr (U, b c, ag) = 1}
tainty; 4) If I derive U , based on a set of believed hypotheses,
the strength of this derivation decreases with the overall uncer+ LessSigPrf (b c, ag).
tainty in that set of hypotheses.
Finally, in WTR, the preference of a context accounts for
We define LessSigPrf(b c, ag) as a mapping, to @ 0, 1 6,
three factors, where the first two focus on the context’s base
of LSP (b c, ag) d R Zero preference is mapped to 0.5, positive
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
69
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
preference (R +) is mapped to the interval @ 0.5, 1 6, and negative
preference (R -) is mapped to the interval @ 0, 0.5 6. We chose a
simple function (Definition 6) that performs this mapping,
obviously a continuous increasing function (to preserve the
order of preference).
Definition 6: Given agent ag and context b c , function
LessSigPrf is defined as follows:
LessSigPrf (b c, ag) =
LSP (b c, ag)
+ 0.5.
1 + 2 # |LSP (b c, ag)|
Hence, LSP (b c, ag) corresponds to a value of preference, in
R, that accounts factors 2. and 3. already discussed, namely:
❏ The uncertain base beliefs. This measure of preference is represented by UncertPrf(b c, ag) d R 0+ .
❏ Likeability. This measure of preference is represented by
LkbPrf(b c, ag) d R.
To this end, LSP (b c, ag) combines the values of UncertPrf
c
(b , ag) and LkbPrf(b c, ag) , as an average that is weighted
according to the agent’s wishful thinking coefficient.
Definition 7: Given agent ag and context b c, function LSP is
defined as follows:
LSP(b c, ag) = (1 - wt (ag)) # UncertPrf (b c, ag)
+ wt (ag) # LkbPrf(b c, ag) .
Function UncertPrf is based on the uncertain base beliefs,
determined by function Uncert.
Definition 8: When b c is the context believed by agent ag
(where b is obtained from b c and ag, as defined in the beginning of
Section V ), the set of uncertain base beliefs in b c is given by Uncert
(b c, ag), defined as follows:
Uncert (b c, ag) = {U ! b: CauStr(U, b c, ag) ! 1} .
WTR does not impose any particular definition for function UncertPrf, however, certain conditions are postulated. Given
a context, b c , believed by an agent, ag, the definition of
UncertPrf should be such that the following conditions hold:
1) When Uncert(b c, ag) = Y
0, then UncertPrf(b c, ag) = 0
2) The value of UncertPrf(b c, ag) increases when there is
one more belief in Uncert(b c, ag) or when the strength of
one of the beliefs in Uncert(b c, ag) increases (remaining
an uncertain belief ).
These postulates impose merely intuitive properties, respectively: 1) When there are no uncertain base beliefs, the preference conveyed by this component is the lowest (zero); 2) As
mentioned, the agent prefers beliefs that have a greater degree
of certainty over beliefs that have a lower degree, and prefers to
keep a larger number of beliefs over keeping a smaller number
(according to the minimal change principle).
Function LkbPrf is based on the goals believed to be
achieved and on the goals believed not to be achieved, determined by functions Achv and NotAchv, respectively.
Definition 9: When b c is the context believed by agent ag
(where c is obtained from b c and ag, as defined in the beginning of
70
Section V), the set of goals that ag believes to be achieved is given by
Achv(b c, ag), and the set of goals ag believes not to be achieved is given
by NotAchv(b c, ag), defined as follows:
Achv (b c, ag) = {g ! Goals (ag): GDesc (g) ! c};
NotAchv (b c, ag) = {g ! Goals (ag): JGDesc (g) ! BS (b c, ag)} .
WTR does not impose any particular definition for function
LkbPrf, however, certain conditions are postulated. Given a context, b c , believed by an agent, ag, the definition of LkbPrf should
be such that the following conditions hold:
1) When Achv(b c, ag) = NotAchv(b c, ag) = Y
0, then LkbPrf
(b c, ag) = 0
2) LkbPrf(b c, ag) increases when there is one more goal in
Achv(b c, ag), or when a goal in Achv(b c, ag) has its importance increased or is believed with greater strength
3) LkbPrf(b c, ag) decreases when there is one more goal in
NotAchv(b c, ag) , or when a goal in NotAchv(b c, ag) has
its importance increased or its negation is believed with
greater strength
4) The more important the goal, the greater the impact (on
likeability) of changing the strength of belief in its
achievement or its negation.
These postulates impose merely intuitive properties, respectively: 1) Since likeability, in WTR, is defined in terms of goal
satisfaction, having no beliefs regarding the achievement of
goals presents neither positive nor negative likeability; 2)
Believing that more goals are achieved, believing that more
important goals are achieved, or believing more strongly that
goals are achieved, are all factors that increase likeability; 3)
Conversely, believing that more goals are not achieved, believing that more important goals are not achieved, or believing
more strongly that goals are not achieved, are all factors that
decrease likeability; 4) Changing the strength of a belief about a
goal achievement has a greater impact on likeability when that
goal is more important.
C. The Believed Context
In Section V-A we explain how WTR determines the set of candidate contexts, and in Section V-B we explain how WTR can
associate a value of preference with each of those contexts. The
context that is believed by the agent is now chosen as the candidate context with highest value of preference. In other words, if
b 0 and c 0 are, respectively, agent ag’s collected data and wishful
thoughts, ag’s believed context is b c, as determined by (4).
b
c
= arg ! Cand(
max, ,ag)CtxPrf (}, ag).
}
b0 c0
(4)
VI. Degrees of Belief
As explained in the previous sections, WTR considers a broad
notion of belief. Tendencies/inclinations to believe are represented as weak beliefs, that is, beliefs with a very low degree of
certainty (causal strength). Typical examples of weak beliefs, in
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
WTR, are those that originate from the existence of goals
(active effects of wishful thinking).
Clearly this does not correspond to the commonsense notion
of belief. If I have the goal of having a boat and somehow do not
have any information confirming or disconfirming that I have
one, I do not automatically believe that I have one just because I
would like to. In this section we explain how a belief in WTR
may or may not represent a belief for commonsense.
Labeling a piece of data as a “belief” corresponds to saying that,
for a certain purpose/aim, one will rely on the fact that that piece
of data is true. As Paglieri puts it, “beliefs are data accepted as reliable
(…) considered as ‘safe ground’ for reasoning and action” [17]. In
other words, for commonsense, a belief is only a belief if one holds
it with some minimum degree of certainty that makes it reliable
enough for reasoning and action. This degree of certainty varies,
depending on the particular reasoning/action (henceforth referred
to as aim) to which the belief is relevant. Put simply, belief is a relative notion, in the sense that a piece of data may be a belief for
some aim, and may not be a belief for another aim.
We recall that, in WTR, belief certainty is modeled with
function CauStr (defined in Section V-B). Hence, given the
appropriate thresholds that depend on the particular aims, this
relative notion of belief is modeled using a straightforward
approach. If b c is the context believed by agent ag, and taim
d 60, 1@ is the threshold defined by some aim (aim) that is
dependent on a belief, U d BS (b c, ag), then U is also a belief
for the purposes of aim if and only if (5) holds.
CauStr (U, b c, ag) $ t aim .
(5)
The determination of the appropriate threshold for a given
aim is out of the scope of WTR. Intuitively, these thresholds
should be associated to an assessment of: a) The potential losses
when the aim is followed, based on a false belief; b) The potential gains when the aim is followed, based on a true belief; c)
The difficulty in acquiring more information that supports or
disconfirms the belief.
VII. Testing WTR
In the previous sections we have presented the WTR model. In
this section we test this model, by observing the belief states of
an agent that uses WTR.
In Section VII-A, we present a concrete instantiation of the
WTR model. In Section VII-B, we test this instantiation in a
sequence of scenarios that are representative of the most relevant types of possible situations.
A. An Instantiation of WTR
As we have seen, the definition of some of the functions used
by WTR is not imposed by the model, therefore WTR may
have different instantiations. More concretely, functions CauStr,
SuppStr, UncertPrf and LkbPrf are left undefined, though their
definitions must follow the postulates presented in Section V-B.
In this section we make an instantiation of WTR by presenting
a definition for each of these four functions.
In accordance with the postulates presented in Section V-B,
we present a definition for function CauStr (Definition 10).
Definition 10: For any agent ag, any context b c, and any belief
U d BS (b c, ag):
CauStr (U, b c, ag) =
1-
%
(1 - SuppStr (A, b c, ag)) .
A ! Sups(U, b , ag)
c
Notice that the expression used in this function is equivalent to the expression that determines the probability of the
disjunction of independent events.
In accordance with the postulates presented in Section V-B,
we present a definition for function SuppStr (Definition 11).
Definition 11: For any agent ag, any context b c, any belief
U d BS (b c, ag), and any support U, T, a d Sups (U, b c, ag):
SuppStr ( U, T, a , b c, ag) =
Z 1,
]
(1 -wt (ag))/wt(ag)
,
] GImp(g)
[
]%
CauStr (U i, b c \ " U ,, ag),
] Ui d
\ Cred (ag, T),
a
if T = Obs;
if T = WT, where
U = GDesc (g);
if T = Der;
if T d N.
According to this definition, an observation support conveys
a strength of 1, making every observed belief an absolute certainty. With respect to communications supports, the conveyed
strength equals the credibility attributed to the communicating
agent (the agent of name T ).
The expression used to determine the strength conveyed by
a wishful thinking support increases with the importance of
the corresponding goal and with the agent’s wishful thinking
coefficient, in a way that: a) The limit of the conveyed strength,
as the coefficient approaches 1, is 1; b) The limit of the conveyed strength, as the coefficient approaches 0, is 0; c) When
the coefficient is 0.5 the conveyed strength equals the goal
importance. We recall that when the coefficient is 0 there are
no wishful thinking supports, as explained in Section IV.
Finally, the expression used to determine the strength conveyed by a derivation support consists of a multiplication of the
causal strength of the hypotheses in its origin set ( a ). This
ensures that the result decreases with the number of uncertainties in a and with their degree of uncertainty (i.e. the distance
between their causal strength and 1). Notice that to determine
the causal strength of the beliefs in a we consider that the
believed context does not include U , to disregard any existing
cyclical derivations.
In accordance with the postulates presented in Section V-B,
we present a definition for function UncertPrf (Definition 12).
Definition 12: For any agent ag, and any context b c:
UncertPrf (b c, ag) =
/
CauStr (U, b c, ag) .
U ! Uncert(b , ag)
c
Notice that the expression used in this function simply
sums the causal strength of the beliefs in Uncert (b c, ag) (i.e., of
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
71
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
TABLE 1 Table of definitions, indexed by reference number.
REF.
DEFINITION
1
WHEN } IS THE CONTEXT BELIEVED BY AGENT ag, THE SET OF VALID SUPPORTS FOR A FORMULA, U , IS GIVEN BY Sups(U, }, ag), DEFINED
AS: Sups (U, }, ag) = " A d KB (ag): form (A) = U / os (A) 3 } , .
2
WHEN } IS THE CONTEXT BELIEVED BY AGENT ag, ag’S BELIEF SPACE (I.E., THE SET WITH ag’S BELIEFS) IS GIVEN BY BS (}, ag), DEFINED
0,.
AS: BS (}, ag) = " U: Sups (U, }, ag) ! Y
3
WHEN } IS THE CONTEXT BELIEVED BY AGENT ag, WE SAY THAT } IS CONSISTENT, AS FAR AS ag KNOWS, IF AND ONLY IF Cons (}, ag)
HOLDS, WHERE PREDICATE CONS SATISFIES THE FOLLOWING CONDITION:
Cons (}, ag) + 6U ! BS (}, ag): JU g BS (}, ag) .
4
IF b 0 AND c 0 ARE, RESPECTIVELY, THE COLLECTED DATA AND WISHFUL THOUGHTS OF AN AGENT, ag, THE SET OF CANDIDATE CONTEXTS IS
GIVEN BY Cand (b 0, c 0, ag), DEFINED AS:
Cand (b 0, c 0, ag) = {m 3 b 0 , c 0: Cons (m, ag) / (J7n 3 b 0 , c 0: Cons (n, ag) / (m 1 n))} .
GIVEN AGENT ag AND CONTEXT b c (WHERE b IS OBTAINED FROM b c AND ag, AS DEFINED IN THE BEGINNING OF SECTION V),
FUNCTION CtxPrf IS DEFINED AS FOLLOWS:
5
CtxPrf(b c, ag) = # {U ! b: CauStr (U, b c, ag) = 1} + LessSigPrf(b c, ag) .
GIVEN AGENT ag AND CONTEXT b c , FUNCTION LessSigPrf IS DEFINED AS FOLLOWS:
6
LessSigPrf (b c, ag) =
LSP (b c, ag)
+ 0.5.
1 + 2 #| LSP (b c, ag) |
GIVEN AGENT ag AND CONTEXT b c , FUNCTION LSP IS DEFINED AS FOLLOWS:
7
LSP (b c, ag) = (1 - wt (ag)) # UncertPrf (b c, ag) + wt (ag) # LkbPrf(b c, ag) .
WHEN b c IS THE CONTEXT BELIEVED BY AGENT ag (WHERE b IS OBTAINED FROM b c AND ag, AS DEFINED IN THE BEGINNING OF
8
SECTION V), THE SET OF UNCERTAIN BASE BELIEFS IN b c IS GIVEN BY Uncert (b c, ag) , DEFINED AS FOLLOWS:
Uncert(b c, ag) = {U ! b: CauStr (U, b c, ag) ! 1} .
WHEN b c IS THE CONTEXT BELIEVED BY AGENT ag (WHERE c IS OBTAINED FROM b c AND ag, AS DEFINED IN THE BEGINNING OF
9
SECTION V), THE SET OF GOALS THAT ag BELIEVES TO BE ACHIEVED IS GIVEN BY Achv (b c, ag) , AND THE SET OF GOALS ag BELIEVES
NOT TO BE ACHIEVED IS GIVEN BY NotAchv (b c, ag) , DEFINED AS FOLLOWS:
Achv (b c, ag) = { g ! Goals (ag): GDesc (g) ! c}; NotAchv (b c, ag) = {g ! Goals (ag): JGDesc (g) ! BS (b c, ag)} .
FOR ANY AGENT ag, ANY CONTEXT b c , AND ANY BELIEF U d BS (b c, ag):
10
CauStr (U, b c, ag) = 1 -
%
(1 - SuppStr (A, b c, ag)) .
A ! Sups(U, b c, ag)
11
FOR ANY AGENT ag, ANY CONTEXT b c , ANY BELIEF U d BS (b c, ag), AND ANY SUPPORT U, T, a d Sups(U, b c, ag):
Z 1,
if T = Obs;
]
] GImp (g) (1 - wt (ag))/wt (ag),
if T = WT, where U = GDesc (g);
SuppStr ( U, T, a , b c, ag) = [
c
if T = Der;
] % U i d a CauStr (U i, b \ " U ,, ag),
] Cred (ag, T),
if T ! N.
\
12
FOR ANY AGENT ag, AND ANY CONTEXT b c:
UncertPrf (b c, ag) =
CauStr(U, b c, ag) .
/
U ! Uncert(b c, ag)
FOR ANY AGENT ag, AND ANY CONTEXT b c:
13
LkbPrf( b c, ag) = c
/
g ! Achv(b c, ag)
CauStr (GDesc (g), b c, ag) # GImp (g) m -
the uncertain base beliefs of b c ) . The result of this sum consists
of the preference represented by uncertain base beliefs.
In accordance with the postulates presented in Section V-B,
we present a definition for function LkbPrf (Definition 13).
Definition 13: For any agent ag, and any context b c:
LkbPrf (b c, ag) =
c
/
CauStr (GDesc(g), b c, ag) # GImp (g) m -
/
CauStr (JGDesc (g), b c, ag) # GImp (g) .
g ! Achv(b c, ag)
g ! NotAchv(b c, ag)
72
/
CauStr (JGDesc(g), b c, ag) # GImp (g)
g ! NotAchv(b c, ag)
Notice that the expression used in this function simply
adds a term for every belief in the achievement of a goal, and
subtracts a term for every belief in the negation of a goal
achievement. Each of these terms consists of the strength of
the corresponding belief multiplied by the importance of the
corresponding goal.
B. Example Scenarios
In this section we show the behavior of an agent that uses
WTR, in a sequence of scenarios that capture the most relevant types of situations, with respect to WTR. The concrete
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
instantiation of WTR used in this section is the
Favoring realism too much may lead to strong and/or
one presented in the previous section (Section
VII-A). In Table 1 we review all the definitions
long negative affective states, but favoring satisfaction
presented in this article.
too much leads to losing track of reality, and in both
Each scenario corresponds to a particular
cases individual performance is impaired.
stage of the agent’s collected data, wishful
thoughts, and other parameters. In these
scenarios we analyze the agent’s belief states,
as below:
❏ CauStr (U bt, b c, ag) - 0.086. So, have(Boat) is a very weak
belief. In fact, it is not a belief for the purposes of answer
❏ What is the context believed by the agent (and resulting
(since 0.086 < 0.2).
belief space)?
❏ What is the certainty (causal strength) with which the agent
❏ CauStr (U al, b c, ag) - 0.887. So, alive(Mother) is a relatively
holds a given belief ? Is it enough to be a belief for the purstrong belief and a belief for the purposes of answer (since
pose of a given aim?
0.887 $ 0.2).
We are interested in showing how WTR is capable of simuSo, even for the purpose of giving an answer that does not
lating passive and active effects of the wishful thinking pheinvolve a big commitment (reflected in the low threshold), the
nomenon, explained in Section III.
agent considers herself ignorant regarding her possession of a
We assume an agent, ag, that reasons using first order logic. In
boat. On the other hand, since the goal of having her mother
other words, L ag = L FOL and = ag = = FOL . We also assume that
alive has a very high importance, the active effects of wishful
thinking lead the agent to answer that her mother is alive, even
the agent is not logically omniscient and that, unless specified
though there is no evidence supporting it.
otherwise, the agent’s wishful thinking coefficient is wt(ag) = 0.3.
Although it might seem strange that the agent does not
We refer to the agent in the female form (representing a
know anything regarding her mother’s state, one can perhaps
human female). As a simplification, we call answer to the aim of
imagine that (for some reason) the agent has not been in conanswering to any uncompromising question about a belief (e.g.,
tact with her mother for many years. In this sense, suppose
“do you believe you have a boat?”). Since such answers do not
that the agent would like to buy a new expensive TV set for
involve any big commitments from the agent, we define a relaher mother. We refer to this aim as TV, and we assume that
tively low strength threshold for the corresponding aim, more
tTV = 0.9, with respect to belief in U al . Consequently, the
specifically tanswer = 0.2 (see Section VI).
agent would not buy the new TV (at least not at the
1) Scenario 1
moment) because:
To begin, we assume that the agent has not collected any data,
❏ CauStr (U al, b c, ag) - 0.887. So, although alive(Mother) is a
in other words, b 0 =Y
relatively strong belief, it is not a belief for the purposes of
0. The agent has, however, two goals: To
TV (since 0.887 < 0.9).
have a boat (a goal of medium-low importance), and to have
her mother alive (an extremely important preservation goal).
2) Scenario 2
More concretely, Goals(ag) = " g bt, g al ,, where:
Here we expand the previous scenario with some collected
❏ U bt = GDesc( gbt ) = have(Boat)
data. What is typical of active-pursuit goals (such as the goal of
❏ U al = GDesc( gal ) = alive(Mother)
having a boat) is that one already knows that the goal is not
❏ GImp( gbt ) = 0.35
achieved when that goal is set. For instance, if I set a goal of
❏ GImp( gal ) = 0.95.
having a boat, I typically know that I do not have it because,
Since b 0 , c 0 is consistent, it becomes the only candidate
through regular observation and memory of the events in my
context, that is, Cand (b 0, c 0, ag) = {{U bt, U al}} . Given that
life, I know that I have never bought a boat and none was ever
there is only one candidate context, it becomes the believed
given to me. For the purposes of this scenario, we say that the
context. Moreover, since there are no derivations, the agent’s
agent observes she does not have a boat ( JU bt ). In addition,
belief space also corresponds to that context. In other words,
c
c
b = {U bt, U al} and BS (b , ag) = {U bt, U al} .
agent David (another agent) tells ag that her mother boarded
flight number 17, though ag thinks of David as having a not too
Since there is no information contradicting the agent’s
high credibility, more specifically, Cred(ag, David) = 0.5. We
wishful thoughts (that she has a boat and that her mother is
represent the information communicated by David as:
alive), they become beliefs. This phenomenon corresponds to
the active effects of wishful thinking, as explained in Section
❏ U 17 = inFlight (Mother, 17).
III. But to what degree are these beliefs reliable? For instance,
Since JU bt (the negation of one of the agent’s wishful
let us suppose that the agent is questioned about these two
thoughts) was observed, b 0 , c 0 is inconsistent. As a result, there
beliefs. In order to determine if these are beliefs for the purare two candidate contexts, Cand (b 0, c 0, ag) = {{U bt, U al, U 17},
pose of aim answer, we need to compare their strength with the
{JU bt, U al, U 17}}, with the following values of preference:
appropriate threshold (we recall that tanswer = 0.2), as explained
❏ CtxPrf ({U bt, U al, U 17}, ag) - 0.775
in Section VI:
❏ CtxPrf ({JU bt, U al, U 17}, ag) - 1.749.
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
73
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Notice that, while believing in U bt slightly adds to context
preference due to likeability, believing in JU bt adds necessarily
more because it consists of an absolute certainty. As explained in
Section V-B, likeability is one of the aspects that contributes to
the less significative preference (a value in @ 0, 1 6), while an absolute certainty conveys a preference of 1. This is why the preference of {JU bt, U al, U 17} is almost 1 unit above the preference of
{U bt, U al, U 17} . The same phenomenon occurs in all the following scenarios, preventing contexts that contain U bt bt to be
believed (which is natural, since JU bt is an absolute certainty).
Consequently, the believed context is b c = {JU bt, U al, U 17},
and the resulting belief space is BS ( b c, ag) = {JU bt, U al, U 17} .
Let us focus on what changed in the agent’s beliefs, in relation to the previous scenario. Now the agent has two new
beliefs: That she does not have a boat and that her mother
boarded flight 17. We look at the certainty (i.e. causal strength)
of these two beliefs, to conclude whether or not they are
beliefs for the purpose of answer:
❏ CauStr (JU bt, b c, ag) = 1. So, Jhave(Boat) is an absolute certainty and, obviously, a belief for the purposes of answer
(since 1 $ 0.2).
❏ CauStr (U bt, b c, ag) = 0.5. So, inFlight(Mother, 17) is a belief
of intermediate strength and a belief for the purposes of
answer (since 0.5 $ 0.2).
So, regarding these two beliefs, the agent answers that she
does not have a boat and that her mother boarded flight 17. In
other words, the agent believes her observation and David’s
communication, and the strength of these beliefs is far greater
than that of the belief of having a boat in the previous scenario.
3) Scenario 3
Continuing from the previous scenario, the agent now watches
a news report, announcing that flight number 17 crashed, leaving no survivors. The agent quickly concludes that this information, combined with what David has told her ( U 17 ), imply
that her mother has died. Assume that Reporter is the agent who
communicated the crash, who our agent (ag) finds quite credible, more specifically, Cred(ag, Reporter) = 0.8. We represent
(part of) the news report about the plane crash as:
❏ U cr = 6 (x) inFlight (x, 17) " Jalive(x) .
We consider that the agent’s knowledge base contains a derivation support, JU al, Der, {U 17, U cr } , because the agent was
able to conclude that " U 17, U cr , , = FOL JU al .
Now there is a second inconsistency in b 0 , c 0 because U 17
and U cr imply believing in JU al (hence, are inconsistent with U al ).
As a result, there are six candidate contexts, Cand (b 0, c 0, ag) =
{{U bt, U al, U 17}, {U bt, U al, U cr }, {U bt, U 17, U cr }, {JU bt, U al, U 17},
" JU bt, U al, U cr ,, " JU bt, U 17, U cr ,, with the following values of
preference:
❏ CtxPrf ({U bt, U al, U 17}, ag) - 0.775
❏ CtxPrf ({U bt, U al, U cr }, ag) - 0.811
❏ CtxPrf ({U bt, U 17, U cr }, ag) - 0.808
❏ CtxPrf ({JU bt, U al, U 17}, ag) - 1.749
❏ CtxPrf ({JU bt, U al, U cr }, ag) - 1.793
❏ CtxPrf ({JU bt, U 17, U cr }, ag) - 1.790.
74
Consequently, the believed context is b c = {JU bt, U al, U cr },
and the resulting belief space is BS ( b c, ag) = {JU bt, U al, U cr } .
Notice that the agent does not believe her mother
boarded flight 17 and, instead, believes that her mother is
alive, despite having no evidence to support it (active effects
of wishful thinking). The agent is in denial concerning her
mother’s death, mainly due to the very high goal importance
of having her mother alive (0.95) and to the low credibility
of David (who communicated U 17 ). Notice that denial was
achieved simply by rejecting belief in the “weakest link,”
namely David’s communication, U 17 (instead of the news
report, U cr , that has a higher causal strength). Note, however,
that the personality of the agent (more concretely, the agent’s
wishful thinking coefficient) is also responsible for
this denial.
4) Scenario 4
We now suppose that another agent, Bruno, tells ag that her
mother boarded flight 17. Our agent considers Bruno quite
credible, more specifically, Cred(ag, Bruno) = 0.8.
The candidate contexts are, as expected, the same as those
in Scenario 3, but with different values of preference:
❏ CtxPrf ({U bt, U al, U 17}, ag) - 0.820
❏ CtxPrf ({U bt, U al, U cr }, ag) - 0.811
❏ CtxPrf ({U bt, U 17, U cr }, ag) - 0.833
❏ CtxPrf ({JU bt, U al, U 17}, ag) - 1.804
❏ CtxPrf ({JU bt, U al, U cr }, ag) - 1.793
❏ CtxPrf ({JU bt, U 17, U cr }, ag) - 1.819.
Consequently, the believed context is b c = {JU bt, U 17,
U cr }, and the resulting belief space is BS ( b c, ag) = {JU bt,
U 17, U cr , JU al} .
As explained in Scenario 3, one of the reasons why the
agent was able to be in denial was that David has a low credibility and his communication was the only support for U 17 .
This was the easiest way that the agent could deny her mother’s
death, that is, by rejecting the fact that she (her mother)
boarded flight 17. In this scenario, this fact is no longer easy to
reject (the causal strength of U 17 increased from 0.5 to 0.9)
because there is a second agent, Bruno, claiming that it is true
and, moreover, he is considered quite credible. Consequently,
our agent (ag) is no longer in denial, as shown above. If Bruno
would have communicated, for example, U cr instead of U 17 ,
denial would still occur because the agent would still be able to
easily reject U 17 .
5) Scenario 5
Continuing from the previous scenario, we now suppose that
agent Susan tells ag that her (ag’s) mother did not board flight
17. The credibility that our agent attributes to Susan is 0.6 (in
other words, Cred(ag, Susan) = 0.6).
Now, the six candidate contexts must take into account
the new hypothesis (JU 17) , hence, Cand(b 0, c 0, ag) =
{{U bt, U al, U 17}, {U bt, U al, JU 17, U cr }, {U bt, U 17, U cr }, {JU bt, U al,
U 17}, {JU bt, U al, JU 17, U cr }, {JU bt, U 17, U cr }} . The preference
for each of these contexts is:
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
CtxPrf ({U bt, U al, U 17}, ag) - 0.820
As we have shown (see Section VII-B), wishful thinking
CtxPrf ({U bt, U al, JU 17, U cr }, ag) - 0.856
CtxPrf ({U bt, U 17, U cr }, ag) - 0.833
in WTR is not merely a bias in the resolution of
CtxPrf ({JU bt, U al, U 17}, ag) - 1.804
conflicts among inconsistent data (passive effects); it
CtxPrf ({JU bt, U al, JU 17, U cr }, ag) - 1.846
can sometimes be the sole cause for having a belief,
CtxPrf ({JU bt, U 17, U cr }, ag) - 1.819.
Consequently, the believed context is
or for evoking belief revision when certain beliefs are
c
b = {JU bt, U al, JU 17, U cr }, and the resulting
undesirable (active effects).
belief space is BS ( b c, ag) = {JU bt, U al,
JU 17, U cr ,} .
Both DBR and WTR are approaches to belief dynamics,
Susan’s communication was “just what the agent wanted to
for autonomous agents. They both produce non-prioritized
hear” to be able to deny, once again, her mother’s death. The
belief revision that is biased by the likeability of beliefs. Also,
new hypothesis is that the agent’s mother did not board flight
both models measure this likeability according to the satisfac17 (JU 17), so U 17 becomes, once again, easy to reject.
tion of goals; Affective states are not modeled explicitly, but
This result illustrates an important aspect of WTR. Notice
rather implicitly, through the preference conveyed by likeability.
that the fact that the agent’s mother boarded flight 17 is supOne of the main differences between the two models is
ported by communications from David and Bruno, and:
that DBR is belief-oriented while WTR is context oriented,
❏ Without wishful thinking, the agent would never reject this
and this difference is specially important in the determination
belief because it is the word of both David and Bruno against
of likeability.
the word of Susan, and even Bruno, alone, is considered
As explained in Section I, a context-oriented approach is
more credible than Susan (the causal strength of U 17 is 0.9
more adequate, since the preference of a belief may be depenwhile the causal strength of JU 17, is 0.6).
dent on certain other beliefs being kept or abandoned. This is
❏ Wishful thinking alone (in this situation) is also not enough
mostly obvious when it comes to the preference conveyed by
to allow the agent to reject such a strong belief ( U 17 ), as can
likeability because, for instance, a belief may be desired/undebe concluded from the results of Scenario 4.
sired only due to the presence of another belief.
Therefore, the denial that occurs in this scenario was only
Another important difference, between the two approaches,
possible because of a combination of rational and affective facis that, in WTR, we model active effects of wishful thinking, an
tors (i.e., the support from Susan’s communication combined
aspect that is not modeled in DBR. More concretely, in DBR
with the agent’s desire to believe that her mother is alive). This
beliefs originate from data which, in turn, comes from the outcombined effect captures, in WTR, the passive effects of wishside world. The only internally generated beliefs are those that
ful thinking, explained in Section III.
originate from other beliefs, through inference. In WTR, howWTR is guided by an order (or orders) among contexts
ever, beliefs can also originate from goals, by means of wishful
that does not necessarily have a correspondence to some order
thinking supports. Any inconsistencies, between these wishful
(or orders) among hypotheses. For instance, suppose that,
thoughts and collected data, trigger belief revision, the same
because JU 17 is in the believed context and U 17 is not (in
way inconsistencies among collected data do.
this scenario), we assume an order among hypotheses where
In [7], Jonathan Gratch and Stacy Marsella start by presentJU 17 is preferred to U 17 . Notice that this preference is a coning a framework that describes appraisal and coping as two
sequence of the agent’s belief in U cr (that makes U 17
strongly related operations. As the authors put it, “Appraisal
“unwanted,” given her goals). If, for some reason, the agent
characterizes the relationship between a person and their physirejects the belief in U cr the resulting order, between JU 17 and
cal and social environment (...) and coping recruits resources to
U 17 , is reversed.
repair or maintain this relationship” [7].
This view is based on Smith and Lazarus’ cognitive motivaVIII. Comparison with Related Work
tional-emotive system [21]. Gratch and Marsella describe this
In the previous sections we have presented the WTR framesystem’s architecture, highlighting that the consequence of
work, and shown how it can be used to manage an agent’s
appraisal (the action tendencies, the affect and the physiological
beliefs according to the aims discussed in Section I. In this secresponses) triggers coping which, in turn, acts on the antecedtion we compare WTR with two other approaches that are
ents of appraisal. These antecedents may be the environment, in
related to some extent.
the case of problem-focused coping, or the evaluation of the
In [17] and [18], Fabio Paglieri presents an approach to belief
situation, in the case of emotion-focused coping. Following the
revision in the context of cognitive agents: Dataoriented Belief
guidelines of this framework, the authors implement a specific
Revision (in short, DBR). More precisely, DBR is a model of cogcomputational model: EMA [7], [13] (named after Lazarus’
nitive agents’ epistemic dynamics (of which belief revision is a
book “Emotion and Adaptation” [10]).
part). The model builds upon the distinction between data (inforNote that coping commonly refers to how the individual
mation stored in the agent’s mind) and beliefs (information the
deals with strong negative emotions. Although the authors
agent considers reliable for further reasoning and direct action).
❏
❏
❏
❏
❏
❏
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
75
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
write “we view it as a general response to all kinds of emotions,
strong and weak, negative and positive” [7, p. 287], this view
does not seem to have been applied, with respect to positive
emotions, to the strategy of wishful thinking, in EMA. More
concretely, wishful thinking is triggered only as a response to a
negative appraisal and, unlike WTR, EMA does not model
active effects of wishful thinking.
Another fundamental distinction is that, while WTR targets
the problem of belief revision, EMA does not: WTR aims to
find a consistent set of beliefs, and denial of a belief implies
removing it from that set of beliefs and maintaining the relationships that exist among the remaining beliefs; In EMA,
inconsistencies are allowed, each belief is associated with some
probability of being true, and denial/wishful thinking consists
of adjusting these probabilities in order to improve a negatively
charged appraisal.
effects are achieved by enabling every goal to produce a tendency to believe in its achievement (a wishful thought); This
way, any information that contradicts the achievement of a goal
(i.e., any undesirable information) gives rise to an inconsistency,
thus triggering belief revision; Wishful thoughts are typically
too weak and, therefore, abandoned or just filtered out, but
exceptions may occur, depending on various factors.
We recall that WTR addresses wishful thinking in terms of
goal satisfaction. Clearly one’s desires and preferences cannot all
be reduced to goals. Consequently, WTR does not account for
all forms of wishful thinking, and neither does it account for
the large variety of emotions that influence belief dynamics in
humans. We view this work as one step toward the design of
belief processes that incorporate affective phenomena and are
suitable for human-like autonomous agents.
References
IX. Conclusions
With the aim of addressing belief revision, in the context of
human-like autonomous agents, we have identified the following issues concerning conventional belief revision:
❏ Why should an agent always prefer new information over
its previous beliefs?
❏ How can an agent autonomously generate its own order(s)
among beliefs?
❏ Can human-like preferences, in belief revision, be adequately expressed using an order (or orders) among beliefs?
To address these issues and enable the simulation of affective
preferences, we propose WTR, an approach to an agent’s belief
dynamics, with the following properties:
❏ Non-prioritized. New information is not necessarily believed.
❏ Autonomous. Revision is not dependent on the external definition of orders.
❏ Context-oriented. The preferred context is chosen according
to an order (or orders) among contexts, instead of an order
(or orders) among beliefs. As discussed in Section I, this is
necessary because a belief ’s resistance to change may
depend on the other beliefs.
❏ Simulates wishful thinking. It simulates passive and active
effects of the wishful thinking phenomenon, within the
scope with respect to goal satisfaction.
As we have shown (see Section VII-B), wishful thinking in
WTR is not merely a bias in the resolution of conflicts among
inconsistent data (passive effects); it can sometimes be the sole
cause for having a belief, or for evoking belief revision when
certain beliefs are undesirable (active effects).
Passive effects are achieved by accounting for the likeability
of beliefs, when measuring the preference of a context. Active
76
[1] C. E. Alchourrón, P. Gärdenfors, and D. Makinson, “On the logic of theory change:
Partial meet functions for contraction and revision,” J. Symbolic Logic, vol. 50, no. 2, pp.
510–530, 1985.
[2] C. S. Carver, M. F. Scheier, and J. K. Weintraub, “Assessing coping strategies: A theoretically based approach,” J. Personality Social Psychol., vol. 56, no. 2, pp. 267–283, 1989.
[3] C. Castelfranchi, “Guarantees for autonomy in cognitive agent architecture,” in Intelligent Agents: Theories, Architectures and Languages, (Lecture Notes on Artificial Intelligence
890), M. Woolridge, and N. Jennings, Eds. Berlin, Germany: Springer-Verlag, 1995, pp.
56-70.
[4] E. Fermé, and S. O. Hansson, “AGM 25 years—twenty-five years of research in belief
change,” J. Philosophical Logic, vol. 40, no. 2, pp. 295–331, 2011.
[5] N. H. Frijda, Ed., The Laws of Emotion. Mahwah, NJ: Lawrence Erlbaum Associates, 2007.
[6] N. H. Frijda, and B. Mesquita, “Beliefs through emotions,” in Emotions and Beliefs—
How Feelings Influence Thoughts, N. H. Frijda, A. S. R. Manstead, and S. Bem, Eds. Cambridge, U.K.: Cambridge Univ. Press, 2000.
[7] J. Gratch, and S. Marsella, “A domain-independent framework for modeling emotion,” J. Cogn. Syst. Res., vol. 5, no. 4, pp. 269–306, 2004.
[8] S. O. Hansson, “Ten philosophical problems in belief revision,” J. Logic Comput., vol.
13, no. 1, pp. 37–49, 2003.
[9] G. Harman, Change in View: Principles of Reasoning. Cambridge, MA: MIT Press, 1986.
[10] R. S. Lazarus, Emotion and Adaptation. New York: Oxford Univ. Press, 1991.
[11] R. S. Lazarus, S. Folkman, Stress, Appraisal and Coping. New York: Springer, 1984.
[12] I. Levi, The Fixation of Belief and Its Undoing. Cambridge, MA: Cambridge Univ. Press,
1991.
[13] S. Marsella, and J. Gratch, “EMA: A computational model of appraisal dynamics,” in
Proc. 18th European Meeting Cybernetics Systems Research, 2006, pp. 601–606.
[14] J. P. Martins, and S. C. Shapiro, “A model for belief revision,” Artif. Intell., vol. 35,
no. 1, pp. 25–79, 1988.
[15] R. R. McCrae, and P. T. Costa Jr, “Personality, coping, and coping effectiveness in
an adult sample,” J. Personality, vol. 54, no. 2, pp. 385–405, 1986.
[16] A. Ortony, G. L. Clore, and A. Collins, The Cognitive Structure of Emotions. New York:
Cambridge Univ. Press, 1988.
[17] F. Paglieri, “Data-oriented belief revision: Toward a unified theory of epistemic processing,” in Proc. of STAIRS 2004, E. Onaindia, and S. Staab, Eds. Amsterdam, The
Netherlands: IOS Press, 2004, pp. 179–190.
[18] F. Paglieri, “See what you want, believe what you like: Relevance and likeability in
belief dynamics,” in Proc. of AISB 2005 Symp. ‘Agents That Want and Like: Motivational and
Emotional Roots of Cognition and Action,’ L. Cañamero, Ed. Hatfield, U.K.: AISB, 2005,
pp. 90-97.
[19] R. Picard, Affective Computing. Cambridge, MA: MIT Press, 1997.
[20] C. F. Pimentel, “Emotional reasoning in AI: Modeling some of the inf luences of affects on reasoning,” Ph.D. dissertation, Inst. Superior Técnico, Univ. Técnica de Lisboa,
Lisbon, Portugal, Dec. 2010.
[21] C. A. Smith, and R. Lazarus, “Emotion and adaptation,” in Handbook of Personality:
Theory & Research, L. A. Pervin, Ed. New York: Guilford Press, 1990, pp. 609–637.
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Gouhei Tanaka
The University of Tokyo, JAPAN
Complex-Valued Neural Networks: Advances and Applications,
by Akira Hirose (Wiley-IEEE
Press, 2013, 320 pp.) ISBN: 9781-1183-4460-6.
C
omplex-valued neural networks
(CVNNs) are artificial neural
networks that are based on complex numbers and complex number
arithmetic. They are particularly suited
for signal and information with complex amplitude, i.e., amplitude and
phase, as typically found in wave phenomena including electromagnetic
wave, light wave, sonic wave, electron
wave, and electroencephalogram (EEG).
In this decade, the application fields of
CVNNs have been considerably
expanded together with the development of their theories and algorithms.
This book covers the recent advances of
CVNNs and their variants, demonstrating their applicability to optimization of
telecommunication systems, blind
source separation of complex-valued
signals, N-bit parity problems, wind
prediction, classification problems
in complex domain, brain computer
interface, digital predistorter design for
high power amplifiers, and color face
image recognition.
The contents of the book include not
only conventional CVNNs but also quaternion neural networks and CliffordDigital Object Identifier 10.1109/MCI.2013.2247895
Date of publication: 11 April 2013
1556-603X/13/$31.00©2013IEEE
Book
Review
developed in the individual chapters, by
algebraic neural networks, which are the
extending or generalizing the counterextended neural networks utilizing
parts of the conventional artificial neuhypercomplex number systems. The new
ral networks.
methods and challenges for establishThe book begins with an introducment of these hypercomplex-valued
tion to the theories and applications of
neural networks are more highlighted,
conventional CVNNs. In the former half,
compared with the first-ever book on
the representative application fields of
CVNNs published ten years ago [1]. The
CVNNs are compactly
book reviewed here
presented. The engiprovides an excellent
neer ing applications
overview of the current
include antenna design,
trends in the research of
beam-forming, radar
CVNNs for students
image processing, sonic
and researchers interComplexand ultrasonic processested in computational
Valued
ing, communication sigintelligence as well as
Neural
nal processing, image
offers up-to-date theories and applications of
Networks processing, traffic signal
control, quantum comCVNNs for experts and
Advances and Applications
putation, and optical
practitioners. The readinformation processing.
ers can refer to the
Akira Hirose
In the latter half, the
introductory textbook
emphasis is placed on
[2] on CVNNs for
the difference between
m o re f u n d a m e n t a l
the CVNNs and the
aspects and refer also to
ordinary real-valued
the book [3] for other
neural networks. It is shown in numerical
research topics related to CVNNs.
experiments that a feedforward layered
The entire book is organized into
CVNN yields a better generalization
ten chapters. The first chapter is an
ability for coherent signals compared to
overview of the methods and applicaother methods.
tions of conventional CVNNs. The
Chapter 2 deals with CVNNs
nine consecutive chapters focus on difwhose adaptable parameters lay on
ferent kinds of CVNNs and hypercomcomplex manifolds. Based on differenplex-valued neural networks. Various
tial geometrical methods, efficient
methods relying on CVNNs and
optimization algorithms to adapt the
hypercomplex-valued neural networks,
parameters of such CVNNs are preincluding learning algorithms, optimisented and successfully applied to
zation methods, classification algosignal processing problems. The probrithms, system estimation methods, and
lems include the purely algorithmic
system prediction methods, are
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
77
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
The book reviewed here provides an excellent overview
of the current trends in the research of CVNNs for
students and researchers interested in computational
intelligence as well as offers up-to-date theories and
applications of CVNNs for experts and practitioners.
problem of averaging the parameters
of a pool of cooperative CVNNs,
multichannel blind deconvolution of
signals in telecommunications, and
blind source separation of complexvalued signal sources. It is beneficial
that pseudocodes of the learning procedures for all of these problems are
listed in this chapter.
Chapter 3 focuses on an N-dimensional vector neuron, which is a natural
extension of a complex-valued neuron
in two-dimensional space to its
N-dimensional version. First, relevant
neuron models with high-dimensional
parameters are briefly reviewed to locate
the N-dimensional vector neuron. Next,
the author defines the N-dimensional
vector neuron which can represent
N signals as one cluster and reveals its
decision boundary to consist of N
hyperplanes which intersect orthogonally each other. The generalization ability of a single N-dimensional vector
neuron is demonstrated for N-bit parity
problem. Finally, the presented method is
compared with other layered neural networks in terms of the number of neurons, the number of parameters, and the
number of layers.
In Chapter 4, learning algorithms
with feedforward and recur rent
CVNNs are systematically described by
using Wirtinger calculus. The Wirtinger
calculus, which generalizes the concept
of derivatives in complex domain,
enables to perform all the computations of well-known learning algorithms with CVNNs directly in the
complex domain. For feedforward layered CVNNs, the complex gradient
descent algorithm and the complex
Levenberg-Marquardt algorithm are
derived with the complex gradient. For
recurrent type CVNNs, the complex
real-time recurrent learning algorithm
and the complex extended Kalman
78
filter algorithm are obtained utilizing
the Wirtinger calculus. Computer simulation results are given to verify the
above four algorithms.
Chapter 5 presents associative memory models with Hopfield-type recurrent neural networks based on quaternion, which is a four-dimensional
hypercomplex number. In the introduction to quaternion algebra, the definition of quaternion is given and its analyticity in the quaternionic domain is
described. Then, stability analysis is performed by means of energy functions
for several different types of quaternionvalued neural networks. The different
types of recurrent networks are constructed with bipolar state neurons, continuous state neurons, and multistate
neurons. All of these quaternion-valued
networks are shown to work well as
associative memory models by implementing typical learning rules including
the Hebbian rule, the projection rule,
and the local iterative learning rule.
Chapter 6 concentrates on recurrent-type Clifford neural networks. This
chapter starts with the definition of
Clifford algebra and the basic properties
of the operators in hypercomplex number systems. Subsequently, a Hopfieldtype recurrent Clifford neural network
is proposed as an extension of the classical real-valued Hopfield neural network,
with an appropriate definition of an
energy function for the Clifford neural
network. Finally, under several assumptions on the weight coefficients and the
activation functions, the existence of the
energy function is proved for two specific types of Clifford neural networks.
Chapter 7 provides a meta-cognitive
learning algorithm for a single hidden
layer CVNN, called Meta-cognitive Fully
Complex-valued Relaxation Network
(McFCRN), consisting of a cognitive
component and a meta-cognitive one.
First, it is explained that the learning
strategy of the neural network (cognitive
part) is controlled by a self-regulatory
learning mechanism (meta-cognitive
part) through sample deletion, sample
learning, and sample reserve. After the
drawbacks of the conventional
meta-cognitive CVNNs such as Metacognitive Fully Complex-valued Radial
Basis Function Network (McFCRBF)
and the Complex-valued Self-regulatory
Resource Allocation Network (CSRAN)
are pointed out, the learning algorithm of
McFCRN is presented with a pseudocode. The performance of McFCRN is
evaluated in a synthetic complex-valued
function approximation problem and
benchmarks of real-valued classification
problems, in comparison with the other
existing methods.
In Chapter 8, a multilayer feedforward
neural network with multi-valued neurons (MLMVNs), found in the monograph [4], is applied to brain-computer
interfacing (BCI) aiming at extracting
relevant information from the human
brain wave activity. Following a general
introduction to the concept of BCI using
EEG recordings, a particular type of BCI
based on Steady-State Visual Evoked
Potential (SSVEP) is focused, in which
the EEG signals are obtained as responses
to the target stimulus, flickering at a
certain frequency. Subsequently, the
MLMVN is presented to decode the
phase-coded SSVEP-based BCI.
The performance of the MLMVN is
demonstrated to show a better result
compared with other methods in terms
of decoding accuracy.
In Chapter 9, complex-valued
B-spline neural networks are developed
to identify a complex-valued Wiener
system, which compr ises a linear
dynamical model followed by a nonlinear static transformation. A CVNN
based on B-spline curves consisting of
many polynomial pieces is presented to
estimate the complex-valued nonlinear
function in the complex-valued Wiener
model. For identification of the system,
an algorithm to estimate the parameters
is given based on Gauss-Newton
method with the aid of De Boor algorithm. An algorithm to compute the
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
inverse of the estimated nonlinear function is also presented. The performance
of the presented method is demonstrated in the application to the design
problem of the digital predistorter in
wireless communication system, which
compensates the distortion caused by
the high power amplifiers with memory.
Chapter 10 is about a quaternion
fuzzy neural network for view-invariant color face image recognition. First,
conventional face recognition systems
are briefly reviewed. Second, several
face recognition systems are introduced,
including Principal Component Analysis (PCA), Non-Negative Matrix Factorization (NMF), and Block Diagonal
Non-Negative Matrix Factorization
(BDNMF). The view-invariant color
face image recognition system combining a quaternion-based color face
image correlator and a max-product
fuzzy neural network classifier is then
presented. Finally, the presented
method is shown to outperform conventional methods including NMF,
BDNMF, and hypercomplex Gabor filter in classifying view-invariant, noise
In summary, this book contains a wide variety of
hot topics on advanced computational intelligence
methods which incorporate the concept of complex and
hypercomplex number systems into the framework of
artificial neural networks.
influenced, and scale invariant color
face images from a database.
In summary, this book contains a
wide variety of hot topics on advanced
computational intelligence methods
which incorporate the concept of
complex and hypercomplex number
systems into the framework of artificial
neural networks. In most chapters, the
theoretical descriptions of the methodology and its applications to engineering problems are excellently balanced.
This book suggests that a better information processing method could be
brought about by selecting a more
appropriate information representation
scheme for specific problems, not only
in artificial neural networks but also in
other computational intelligence
frameworks. The advantages of CVNNs
and hypercomplex-valued neural networks over real-valued neural networks
are confirmed in some case studies but
still unclear in general. Hence, there is a
need to further explore the difference
between them from the viewpoint of
nonlinear dynamical systems. Nevertheless, it seems that the applications of
CVNNs and hypercomplex-valued
neural networks are very promising.
References
[1] A. Hirose, Complex-Valued Neural Networks: Theories
and Applications. Singapore: World Scientific, 2003.
[2] A. Hirose, Complex-Valued Neural Networks. New
York: Springer-Verlag, 2006.
[3] T. Nitta, Complex-Valued Neural Networks: Utilizing
High-Dimensional Parameters. Hershey, PA: IGI Global,
2009.
[4] I. Aizenberg, N. Aizenberg, and J. Vandewalle, MultiValued and Universal Binary Neurons: Theory, Learning, and
Applications. Kluwer: Norwell, MA, 2000.
Innovation doesn’t just happen.
Read first-person accounts of
IEEE members who were there.
IEEE Global History Network
www.ieeeghn.org
MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE
79
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Conference
Calendar
Gary B. Fogel
Natural Selection, Inc., USA
* Denotes a CIS-Sponsored Conference
D Denotes a CIS Technical CoSponsored Conference
D The 4th International Conference
on Intelligent Control and
Information Processing
(ICICIP 2013)
June 9–11, 2013
Place: Beijing, China
http://www.conference123.org/icicip2013
* 2013 IEEE International
Conference on Fuzzy Systems
(FUZZ-IEEE 2013)
D The 20th International
Conference on Neural Information
Processing (ICONIP 2013)
July 7–10, 2013
Place: Hyderabad, India
General Chair: Nik Pal
http://www.isical.ac.in/~fuzzieee2013/
November 3–7, 2013
Place: Daegu, Korea
http://iconip2013.org/
D Ninth International Conference
on Intelligent Computing
(ICIC 2013)
D International Workshop on
Semantic and Social Media
Adaptation and Personalization
(SMAP 2013)
July 28–31, 2013
Place: Nanning, China
http://www.ic-ic.org
December 12–13, 2013
Place: Bayonne, France
http://www.smap2013.org
* International Joint Conference
on Neural Networks
(IJCNN 2013)
* 2014 IEEE Conference on
Computational Intelligence in
Financial Engineering and Economics
June 20–23, 2013
Place: Cancun, Mexico
General Chair: Carlos Coello Coello
http://www.cec2013.org/
August 4–9, 2013
Place: Dallas, Texas, USA
General Co-Chairs: Plamen Angelov
and Daniel Levine
http://www.ijcnn2013.org
March 27–28, 2014
Place: London, United Kingdom
General Chair: Antoaneta Serguieva
Website: TBD
D The 7th International Conference
on Complex, Intelligent, and
Software Intensive Systems
(CISIS 2013)
D International Conference
on Image Analysis
and Processing
(ICIAP 2013)
July 3–5, 2013
Place: Taichung, Taiwan
http://voyager.ce.fit.ac.jp/conf/cisis/2013/
September 11–13, 2013
Place: Naples, Italy
http://www.iciap2013-naples.org
D International Symposium on
Neural Networks (ISNN 2013)
July 4–6, 2013
Place: Dalian, China
http://isnn.mae.cuhk.edu.hk/
D International Joint Conference
on Awareness Science
and Technology and
Ubi-Media Computing
(iCAST/UMEDIA 2013)
Digital Object Identifier 10.1109/MCI.2013.2247899
Date of publication: 11 April 2013
November 2–4, 2013
Place: Aizu, Japan
Website: TBD
D The 2013 International
Conference on Brain Inspired
Cognitive Systems
(BICS 2013)
June 9–11, 2013
Place: Beijing, China
http://www.conference123.org/bics2013/
* 2013 IEEE Congress on
Evolutionary Computation
(IEEE CEC 2013)
80
* 2014 Conference on
Computational Intelligence in
Bioinformatics and Computational
Biology (IEEE CIBCB 2014)
May 21–24, 2014
Place: Hawaii, USA
General Chair: Steven Corns
Website: TBD
* 2014 IEEE World Congress
on Computational Intelligence
(IEEE WCCI 2014)
July 6–14, 2014
Place: Beijing, China
General Chair: Derong Liu
http://www.ieee-wcci2014.org/
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
Advertisers
Index
The Advertisers Index contained in this issue is compiled as a service to our readers and advertisers: the publisher is not liable for
errors or omissions although every effort is made to ensure its accuracy. Be sure to let our advertisers know you found them
through IEEE Computational Intelligence Magazine.
Company
page#
IEEE Marketing Department
CVR 4
URL
Phone
www.ieee.org/tryieeexplore
IEEE Media Advertising Sales Offices
James A.Vick
Sr. Director, Advertising Business
+1 212 419 7767; Fax: +1 212 419 7589
[email protected]
____________
Marion Delaney
Advertising Sales Director
+1 415 863 4717; Fax: +1 415 863 4717
[email protected]
____________
Susan Schneiderman
Business Development Manager
+1 732 562 3946; Fax: +1 732 981 1855
[email protected]
___________
Product Advertising
Mid-Atlantic
Lisa Rinaldo
+1 732 772 0160; Fax: +1 732 772 0164
[email protected]
___________
NY, NJ, PA, DE, MD, DC, KY, WV
New England/South Central/
Eastern Canada
Jody Estabrook
+1 774 283 4528; Fax: +1 774 283 4527
[email protected]
____________
CT, ME,VT, NH, MA, RI, AR, LA, OK, TX.
CANADA: Nova Scotia, Prince Edward Island,
Newfoundland,
New Brunswick, Quebec
Southwest
Thomas Flynn
+1 770 645 2944; Fax: +1 770 993 4423
[email protected]
____________
VA, NC, SC, GA, FL, AL, MS, TN
Midwest/Central Canada
Dave Jones
+1 708 442 5633; Fax: +1 708 442 7620
[email protected]
____________
IL, IA, KS, MN, MO, NE, ND, SD,
WI, OH. CANADA: Manitoba,
Saskatchewan, Alberta
New England/Eastern Canada
Liza Reich
+1 212 419 7578; Fax: +1 212 419 7589
[email protected]
________
ME,VT, NH, MA, RI. CANADA:
Nova Scotia, Prince Edward Island,
Newfoundland, New Brunswick, Quebec
Midwest/Ontario, Canada
Will Hamilton
+1 269 381 2156; Fax: +1 269 381 2556
[email protected]
____________
IN, MI. CANADA: Ontario
Southeast
Cathy Flynn
+1 770 645 2944; Fax: +1 770 993 4423
[email protected]
___________
VA, NC, SC, GA, FL, AL, MS, TN
West Coast/Mountain States/
Western Canada
Marshall Rubin
+1 818 888 2407; Fax: +1 818 888 4907
[email protected]
____________
AZ, CO, HI, NM, NV, UT,
CA, AK, ID, MT, WY, OR, WA
CANADA: British Columbia
Europe/Africa/Middle East/Asia/
Far East/Pacific Rim
Heleen Vodegel
+1 44 1875 825 700; Fax: +1 44 1875 825 701
[email protected]
____________
Europe, Africa, Middle East, Asia, Far East,
Pacific Rim, Australia, New Zealand
Recruitment Advertising
Mid-Atlantic
Lisa Rinaldo
+1 732 772 0160; Fax: +1 732 772 0164
[email protected]
____________
CT, NY, NJ, PA, DE, MD, DC, KY, WV
Midwest/South Central/Central Canada
Darcy Giovingo
+1 847 498 4520; Fax: +1 847 498 5911
[email protected]
____________
AR, LA, TX, OK, IL, IN, IA, KS, MI,
MN, NE, ND, SD, OH, WI, MO.
CANADA: Ontario, Manitoba,
Saskatchewan, Alberta
West Coast/Mountain States/
Southwest/Asia
Tim Matteson
+1 310 836 4064; Fax: +1 310 836 4067
[email protected]
____________
AK, AZ, CA, CO, HI, ID, MT, NM,
NV, OR, UT, WA, WY.
CANADA: British Columbia
Europe/Africa/Middle East
Heleen Vodegel
+1 44 1875 825 700
Fax: +1 44 1875 825 701
[email protected]
____________
Europe, Africa, Middle East
Digital Object Identifier 10.1109/MCI.2013.2247900
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®
While the world benefits from what’s new,
IEEE can focus you on what’s next.
Develop for tomorrow with
today’s most-cited research.
Over 3 million full-text technical documents
can power your R&D and speed time to market.
t *&&&+PVSOBMTBOE$POGFSFODF1SPDFFEJOHT
t *&&&4UBOEBSET
t *&&&8JMFZF#PPLT-JCSBSZ
t *&&&F-FBSOJOH-JCSBSZ
t 1MVTDPOUFOUGSPNTFMFDUQVCMJTIJOHQBSUOFST
IEEE Xplore® Digital Library
Discover a smarter research experience.
Request a Free Trial
www.ieee.org/tryieeexplore
Follow IEEE Xplore on
M
q
M
q
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
M
q
M
q
MQmags
q
THE WORLD’S NEWSSTAND®