Contents | Zoom in | Zoom out Search Issue | Next Page For
Transcription
Contents | Zoom in | Zoom out Search Issue | Next Page For
Contents | Zoom in | Zoom out For navigation instructions please click here Search Issue | Next Page | Zoom out For navigation instructions please click here Search Issue | Next Page ___________ Contents | Zoom in M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® _________ ____________________ Digital Object Identifier 10.1109/MCI.2013.2247904 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q M q M q MQmags q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page THE WORLD’S NEWSSTAND® Volume 8 Number 2 ❏ May 2013 www.ieee-cis.org Features 20 34 50 63 Learning Deep Physiological Models of Affect by Héctor P. Martínez, Yoshua Bengio, and Georgios N. Yannakakis Fuzzy Logic Models for the Meaning of Emotion Words by Abe Kazemzadeh, Sungbok Lee, and Shrikanth Narayanan Modeling Curiosity-Related Emotions for Virtual Peer Learners by Qiong Wu and Chunyan Miao Goal-Based Denial and Wishful Thinking by César F. Pimentel and Maria R. Cravo Column 77 on the cover ©ISTOCKPHOTO.COM/YANNIS ___________ NTOUSIOPOULOS Departments 2 3 4 _____ Book Review by Gouhei Tanaka Editor’s Remarks President’s Message by Marios M. Polycarpou Society Briefs Newly Elected CIS Administrative Committee Members (2013–2015) by Marios M. Polycarpou IEEE Fellows—Class of 2013 by Erkki Oja IEEE CIS GOLD Report: Inaugural Elevator Pitch Competition and Other GOLD Activities by Heike Sichtig, Stephen G. Matthews, Demetrios G. Eliades, Muhammad Yasser, and Pablo A. Estévez 12 15 17 80 Publication Spotlight by Derong Liu, Chin-Teng Lin, Garry Greenwood, Simon Lucas, and Zhengyou Zhang Conference Report A Report on the IEEE Life Sciences Grand Challenges Conference by Gary B. Fogel Guest Editorial Special Issue on Computational Intelligence and Affective Computing by Dongrui Wu and Christian Wagner Conference Calendar by Gary B. Fogel IEEE Computational Intelligence Magazine (ISSN 1556-603X) is published quarterly by The Institute of Electrical and Electronics Engineers, Inc. Headquarters: 3 Park Avenue, 17th Floor, New York, NY 10016-5997, U.S.A. +1 212 419 7900. Responsibility for the contents rests upon the authors and not upon the IEEE, the Society, or its members. The magazine is a membership benefit of the IEEE Computational Intelligence Society, and subscriptions are included in Society fee. Replacement copies for members are available for $20 (one copy only). Nonmembers can purchase individual copies for $163.00. Nonmember subscription prices are available on request. Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of the U.S. Copyright law for private use of patrons: 1) those post-1977 articles that carry a code at the bottom of the first page, provided the percopy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01970, U.S.A.; and 2) pre-1978 articles without fee. For other copying, reprint, or republication permission, write to: Copyrights and Permissions Department, IEEE Service Center, 445 Hoes Lane, Piscataway NJ 08854 U.S.A. Copyright © 2013 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Periodicals postage paid at New York, NY and at additional mailing offices. Postmaster: Send address changes to IEEE Computational Intelligence Magazine, IEEE, 445 Hoes Lane, Piscataway, NJ 08854-1331 U.S.A. PRINTED IN U.S.A. Canadian GST #125634188. Digital Object Identifier 10.1109/MCI.2013.2247812 MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 1 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® CIM Editorial Board Editor-in-Chief Kay Chen Tan National University of Singapore Department of Electrical and Computer Engineering 4 Engineering Drive 3 SINGAPORE 117576 (Phone) +65-6516-2127 (Fax) +65-6779-1103 (E-mail) [email protected] ____________ Founding Editor-in-Chief Gary G.Yen, Oklahoma State University, USA Editors-At-Large Piero P. Bonissone, General Electric Global Research, USA David B. Fogel, Natural Selection Inc., USA Vincenzo Piuri, University of Milan, ITALY Marios M. Polycarpou, University of Cyprus, CYPRUS Jacek M. Zurada, University of Louisville, USA Associate Editors Hussein Abbass, University of New South Wales, AUSTRALIA Cesare Alippi, Politecnico di Milano, ITALY Oscar Cordón, European Centre for Soft Computing, SPAIN Pauline Haddow, Norwegian University of Science and Technology, NORWAY Hisao Ishibuchi, Osaka Prefecture University, JAPAN Yaochu Jin, University of Surrey, UK Jong-Hwan Kim, Korea Advanced Institute of Science and Technology, KOREA Jane Jing Liang, Zhengzhou University, CHINA Chun-Liang Lin, National Chung Hsing University,TAIWAN Yew Soon Ong, Nanyang Technological University, SINGAPORE Ke Tang, University of Science and Technology of China, CHINA Chuan-Kang Ting, National Chung Cheng University,TAIWAN Slawo Wesolkowski, DRDC, CANADA Jun Zhang, Sun Yat-Sen University, CHINA Kay Chen Tan National University of Singapore, SINGAPORE Editor’s Remarks It’s Just “Emotions” Has Taken Over… I t is believed that the main difference between a machine and the human operating it is the latter’s sense of feeling, pervasiveness and ability to understand rather than to process. Often we may be amused by the smartphone’s speech recognition ability (For example, we said: “Define perception” and the phone comes up with “Are you asking about “The Fine Person”?”) or get frustrated by it (we said: “Call Billy White, not Lily is white!”). As much as technology has tried to emulate humans, there is still so much to be done before it can even come anywhere close to the complexities of the human; emotions and affect are examples of the gravity of this gap. To date, the high complexities of the human mind and emotions continue to baffle researchers and scientists alike. Affective computing is a recent phenomenon popularized by MIT’s Professor Rosalind Picard. It has led the way for us at the IEEE to develop systems that possess the capabilities to recognise, interpret, process and simulate human emotions. These systems can then be incorporated into machines that enable interaction with human subjects for various purposes including psychological analysis and educational assistance. Can you imagine a future filled with smart phones or even smart cars that can sense and detect our moods by the tone of our voice or body gestures such that music and/or encouraging quotes that cheer us up can be automatically selected and recommended? How about having computers that are capable of recognizing students’ state of mind through their body gestures and as such adapt accordingly to enhance the learning experience? Or pushing further, how about affective marketing where an online shopping experience is modulated based on your emotions detected from your facial features? This is certainly part of what research in affective computing aspires towards achieving—thus making life more fulfilling for everyone! IEEE Periodicals/ Magazines Department Associate Editor Laura Ambrosio Senior Art Director Janet Dudar Assistant Art Director Gail A. Schnitzer Production Coordinator Theresa L. Smith Business Development Manager Susan Schneiderman Advertising Production Manager Felicia Spagnoli Production Director Peter M. Tuohy Editorial Director Dawn Melley Staff Director, Publishing Operations Fran Zappulla IEEE prohibits discrimination, harassment, and bullying. For more information, visit http://www.ieee.org/web/ aboutus/whatis/policies/p9-26.html. _______________ Digital Object Identifier 10.1109/MCI.2013.2247813 2 Delighted participants of the Ninth International Conference on Simulated Evolution and Learning (SEAL), Hanoi, December 2012. Digital Object Identifier 10.1109/MCI.2013.2247814 Date of publication: 11 April 2013 (continued on page 11) IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q M q M q MQmags q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page THE WORLD’S NEWSSTAND® President’s Message Marios M. Polycarpou University of Cyprus, CYPRUS Computational Intelligence in the Undergraduate Curriculum C omputational intelligence is at the heart of many new technological developments. For example, recently there are a lot of deliberations, even in popular media such as The New York Times, about the need to handle Big Data. This is an area that the industry is particularly interested in, with huge potential in terms of creation of new jobs. Computational intelligence has a key role to play in some vital aspects of Big Data, namely the analysis, visualization and real-time decision making of Big Data. Other emerging areas where computational intelligence will play a key role include human-computer interaction, optimization in large-scale systems, naturedinspired computing, Internet-of-Things, etc. For computational intelligence to become an integral component in new technological enterprises, it is crucial that graduating engineers and computer scientists are familiar with computational intelligence methods. Ever since I started my appointment as President of the IEEE Computational Intelligence Society, I have been promoting the need for an introductory course in computational intelligence for students graduating with a degree in Electrical/Electronic Engineering, Computer Engineering, Computer Science, and possibly other related fields. My long-term vision for such a course is based on the idea that it will include not only specific techniques, such as neural network computing, fuzzy logic and evolutionary computation, but more importantly it will provide the students with the fundamental knowledge and motivation for computational intelligence and provide application examples that will explain the practical use of computational intelligence in real-world applications. Naturally, just one introductory course is not enough to cover everything that a student needs to know in computational intelligence, however it plants the seed for further study and familiarizes the student with the importance of computational intelligence in new technological developments. It is my belief that similar to the need for graduating electrical engineers to have taken at least one course in topics such as communications, signal processing and automation and control, there is also the need to take a corresponding introductory course in computational intelligence. Of course, this will not happen overnight and it will require a major effort by academic researchers in the area of computational intelligence. It will also require the development of new textbooks with a holistic view of computational intelligence. However, the addition of an introductory computational intelligence course in the standard undergraduate curriculum will offer a new dimension to the field and it will serve the graduating engineers and computer scientists with knowledge and skills that are essential in new technological advances. The time is mature to pursue this ambitious goal! Digital Object Identifier 10.1109/MCI.2013.2247815 Date of publication: 11 April 2013 CIS Society Officers President – Marios M. Polycarpou, University of Cyprus, CYPRUS President Elect – Xin Yao, University of Birmingham, UK Vice President – Conferences- Gary B. Fogel, Natural Selection, Inc., USA Vice President – Education- Cesare Alippi, Politecnico di Milano, ITALY Vice President – Finances- Enrique H. Ruspini, SRI International, USA Vice President – Members ActivitiesPablo A. Estevez, University of Chile, CHILE Vice President – Publications- Nikhil R. Pal, Indian Statistical Institute, INDIA Vice President – Technical Activities- Hisao Ishibuchi, Osaka Prefecture University, JAPAN Publication Editors IEEE Transactions on Neural Networks and Learning Systems Derong Liu, University of Illinois, Chicago, USA IEEE Transactions on Fuzzy Systems Chin-Teng Lin, National Chiao Tung University,TAIWAN IEEE Transactions on Evolutionary Computation Garrison Greenwood, Portland State University, USA IEEE Transactions on Computational Intelligence and AI in Games Simon Lucas, University of Essex, UK IEEE Transactions on Autonomous Mental Development Zhengyou Zhang, Microsoft Research, USA Administrative Committee Term ending in 2013: Bernadette Bouchon-Meunier, University Pierre et Marie Curie, FRANCE Janusz Kacprzyk, Polish Academy of Sciences, POLAND Simon Lucas, University of Essex, UK Luis Magdalena, European Centre for Soft Computing, SPAIN Jerry M. Mendel, University of Southern California, USA Term ending in 2014: Pau-Choo (Julia) Chung, National Cheng Kung University,TAIWAN David B. Fogel, Natural Selection Inc., USA Yaochu Jin, University of Surrey, UK James M. Keller, University of Missouri-Columbia, USA Jacek M. Zurada, University of Louisville, USA Term ending in 2015: James C. Bezdek, University of Melbourne, AUSTRALIA Piero P. Bonissone, General Electric Co., USA Jose C. Principe, University of Florida, USA Alice E. Smith, Auburn University, USA Lipo Wang, Nanyang Technological University, SINGAPORE Digital Object Identifier 10.1109/MCI.2013.2247816 MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 3 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Society Society Briefs Marios M. Polycarpou University of Cyprus, CYPRUS Newly Elected CIS Administrative Committee Members (2013–2015) James C. Bezdek, University of Melbourne, AUSTRALIA Jim received the Ph.D. in Applied Mathematics from Cornell University in 1973. Jim is past president of NAFIPS (North Amer ican Fuzzy Information Processing Society), IFSA (International Fuzzy Systems Association) and the IEEE CIS (Computational Intelligence Society): founding editor the Int’l. Jo. Approximate Reasoning and the IEEE Transactions on Fuzzy Systems: Life Fellow of the IEEE and IFSA; and a recipient of the IEEE 3rd Millennium, CIS Fuzzy Systems Pioneer, and technical field award Rosenblatt medals, and the IPMU Kempe de Feret Medal. Jim’s interests: woodworking, optimization, motorcycles, pattern recognition, cigars, clustering in very large data, fishing, coclustering, blues music, wireless sensor networks, gardening, poker and visual clustering. Jim retired in 2007, and will be coming to a university near you soon. Piero P. Bonissone, General Electric Co., USA Piero P. Bonissone is currently a Chief Scientist at GE Global Research. Dr. Bonissone has been a pioneer in the field of fuzzy logic, AI, soft computing, and Digital Object Identifier 10.1109/MCI.2013.2247817 Date of publication: 11 April 2013 4 approximate reasoning systems applications since 1979. During the eighties, he conceived and developed the Diesel Electric Locomotive Troubleshooting Aid (DELTA), one of the first fielded expert systems that helped maintenance technicians in troubleshooting diesel-electric locomotives. He was the PI in many DARPA programs, from Strategic Computing Initiative, to Pilot’s Associate, Submarine Operational Automation System, and Planning Initiative (ARPI). During the nineties, he led many projects in fuzzy control, from the hierarchical fuzzy control of turbo-shaft engines to the use of fuzzy logic in dishwashers, locomotives, and resonant converters for power supplies. He designed and integrated case-based and fuzzy-neural systems to accurately estimate the value of single-family residential properties when used as mortgage collaterals. In early 2000, he designed a fuzzy-rule based classifier, trained by evolutionary algorithms, to automate the placement of insurance applications for long term care and term life, while minimizing the variance of their decisions. More recently he led a Soft Computing (SC) group in the development of SC application to diagnostics and prognostics of processes and products, including the prediction of remaining life for each locomotive in a fleet, to perform efficient assets selection. His current interests are the development of multi-criteria decision making systems for PHM and the automation of intelligent systems life cycle to create, deploy, and maintain SC-based systems, providing customized performance while adapting to avoid obsolescence. He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), of the Association for the Advancement of Artificial Intelligence (AAAI), of the International Fuzzy Systems Association (IFSA), and a Coolidge Fellow at GE Global Research. He is the recipient of the 2012 Fuzzy Systems Pioneer Award from the IEEE Computational Intelligence Society. Since 2010, he is the President of the Scientific Committee of the European Centre of Soft Computing. In 2008 he received the II Cajastur International Prize for Soft Computing from the European Centre of Soft Computing. In 2005 he received the Meritorious Service Award from the IEEE Computational Intelligence Society. He has received two Dushman Awards from GE Global Research. He served as Editor in Chief of the International Journal of Approximate Reasoning for 13 years. He is in the editorial board of four technical journals and is Editor-at-Large of the IEEE Computational Intelligence Magazine. He has coedited six books and has over 150 publications in refereed journals, book chapters, and conference proceedings, with an H-Index of 31 (by Google Scholar). He received 66 patents issued from the U.S. Patent Office (plus 19 pending patents). From 1982 until 2005 he has been an Adjunct Professor at Rensselaer Polytechnic Institute, in Troy, NY, where he has supervised 5 Ph.D. theses and 33 Master theses. He has cochaired 12 scientific conferences and symposia focused on Multi-Criteria Decision-Making, Fuzzy sets, Diagnostics, Prognostics, and Uncertainty Management in AI. Dr. Bonissone is very active in the IEEE, where is has been a member of the Fellow Evaluation Committee from 2007 to 2009. He has been an IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q M q M q MQmags q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page THE WORLD’S NEWSSTAND® Executive Committee member of NNC/NNS/CIS society from 1993 to 2012 and an IEEE CIS Distinguished Lecturer from 2004 to 2011. Filtering” (Wiley), and “Information Theoretic Learning” (Springer). Jose C. Principe, University of Florida, USA Alice E. Smith is the W. Allen and Martha Reed professor of Industrial and Systems Engineering Department at Auburn Univer sity and served as department chair from 19992011. Previously, she was on the faculty of the Department of Industrial Engineering at the University of Pittsburgh, which she joined after industrial experience with Southwestern Bell Corporation. Her degrees are from Rice University, Saint Louis University and Missouri University of Science and Technology. Dr. Smith holds one U.S. patent and several international patents and has authored more than 200 publications which have garnered over 1,600 citations (ISI Web of Science). Several of her papers are among the most highly cited in their respective journals including the 2nd most cited paper of IEEE Transactions on Reliability. Dr. Smith has served as a principal investigator on over US$6 million of sponsored research. Her research in computational intelligence has been funded by NASA, U.S. Department of Defense, NIST, Missile Defense Agency, U.S. Department of Transportation, Lockheed Martin, and U.S. National Science Foundation, from which she has been awarded 16 grants including a CAREER grant and an ADVANCE Leadership grant. International research collaborations have been sponsored by the federal governments of Japan, Turkey, United Kingdom, The Netherlands, Egypt, South Korea, Iraq, China, Algeria and the U.S., and by the Institute of International Education. Her current service to IEEE CIS includes Associate Editor of IEEE Transactions on Evolutionary Computation (position held since 1998), Vice Chair of the IEEE Evolutionary Computation Technical Committee,Vice Chair of the IEEE Evolutionary Computation Technical Com- Jose C. Principe (M’83-SM’90-F’00) is a Distinguished Professor of Electrical and Computer Engineering and Biomedical Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs) modeling. He is BellSouth Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) www.cnel. ufl.edu. ____ His primary area of interest is processing of time varying signals with adaptive neural models. The CNEL Lab is studying signal and pattern recognition principles based on information theoretic criteria (entropy and mutual information) and applying these advanced algorithms to Brain Machine Interfaces (both motor as well as somatosensory feedback). Dr. Principe is an IEEE, ABME, AIBME Fellow. He is the Past-Editor in Chief of the IEEE Transactions on Biomedical Engineering, past Chair of the Technical Committee on Neural Networks of the IEEE Signal Processing Society, and Past-President of the International Neural Network Society. He received the IEEE EMBS Career Award, and the IEEE Neural Network Pioneer Award. He has Honorary Doctor Degrees from the U. of Reggio Calabria Italy, S. Luis Maranhao Brazil and Aalto U. in Finland. Currently he is the Editor in Chief of the IEEE Reviews in Biomedical Engineering. Dr. Principe has more than 600 publications. He directed 73 Ph.D. dissertations and 65 Master theses. He wrote four books: an interactive electronic book entitled “Neural and Adaptive Systems: Fundamentals through Simulation” (Wiley), “Brain Machine Interface Engineering,” “Kernel Adaptive Alice E. Smith, Auburn University, USA mittee Task Force on Education, and Member of the Women in Computational Intelligence Committee. In past service to CIS, she was General Chair of Congress on Evolutionary Computation (CEC) 2011, Program Chair of CEC 2004, Technical Chair (Americas) of CEC 2000, Special Sessions Chair of CEC 1999, and on the program or technical committee of seven other CEC’s. She also served on the IEEE Evolutionary Computation Technical Committee from 1999-2000 and from 2007-2011. Dr. Smith is a Senior Member of IEEE, a fellow of the Institute of Industrial Engineers, and a senior member of the Society of Women Engineers, a member of Tau Beta Pi and a Registered Professional Engineer. She is the Area Editor for Heuristic Search and Learning of INFORMS Journal on Computing and an Area Editor of Computers & Operations Research. Lipo Wang, Nanyang Technological University, SINGAPORE Dr. Lipo Wang’s research interest is computational intelligence with applications to bioinformatics, data mining, optimization, and image processing. He is (co)author of over 240 papers. He holds a U.S. patent in neural networks. He has coauthored 2 monographs and (co)edited 15 books. He was/ will be keynote/panel speaker for several international conferences. He received the Bachelor degree from National University of Defense Technology (China) and Ph.D. from Louisiana State University (USA). He was on staff at the National Institutes of Health (USA) and Stanford University (USA). He was on the faculty of Deakin University (Australia) and is now on the faculty of Nanyang Technological University (Singapore). He is/was Associate Editor/Editorial Board Member of 20 international journals, including IEEE Transactions on Neural Networks, IEEE Transactions on Knowledge and Data Engineering, and IEEE Transactions MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 5 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® on Evolutionary Computation. He is an elected member of the AdCom (20102015) of the IEEE Computational Intelligence Society (CIS) and served as IEEE CIS Vice President for Technical Activities (2006-2007) and Chair of Emergent Technologies Technical Committee (20042005). He is an elected member of the Board of Governors of the International Neural Network Society (2011-2013) and a CIS Representative to the AdCom of the IEEE Biometrics Council (2011). He serves as Chair, Education Committee, IEEE Engineering in Medicine and Biology Society (2011, 2012). He was President of the Asia-Pacific Neural Network Assembly (APNNA) in 2002/2003 and received the 2007 APNNA Excellent Service Award. He was Founding Chair of both the IEEE Engineering in Medicine and Biology Singapore Chapter and IEEE Computational Intelligence Singapore Chapter. He serves/served as IEEE CIDM 2013 Program Co-Chair, IEEE EMBC 2011 & 2010 Theme Co-Chair, IJCNN 2010 Technical Co-Chair, IEEE CEC 2007 Program Co-Chair, IJCNN 2006 Program Chair, as well as on the steering/ advisory/organizing/program committees of over 200 international conferences. Erkki Oja Aalto University, FINLAND IEEE Fellows—Class of 2013 Danilo Mandic, Imperial College London, UK for contributions to multivariate and nonlinear learning systems Dr. Mandic obtained his Ph.D. in the area of nonlinear adaptive systems from Imperial College in 1999 and is currently a Professor of Signal Processing at the same institution. He has been working in the areas of nonlinear and multidimensional adaptive filters, complex- and quaternion-valued neural networks, timefrequency analysis and complexity science. His research has found applications in biomedical engineering (brain-computer interface), human-computer interaction (body sensor networks), and renewable energy and smart grid. He has published two research monographs: Recurrent Neural Networks for Prediction, Wiley 2001, and Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models,Wiley 2009, and has also coedited a book on Data Fusion (Springer 2008) and has been a part-editor for Springer Digital Object Identifier 10.1109/MCI.2013.2247818 Date of publication: 11 April 2013 6 Handbook on Neuro- and Bioinformatics (Springer 2013). Dr. Mandic has held visiting positions in RIKEN (Japan), KU Leuven (Belgium) and Westminster University (UK). Professor Mandic has been a Publicity Chair for the World Congress on Computational Intelligence (WCCI) in 2014, Plenary Talks Chair at EUSIPCO 2013, European Liaison at ISNN in 2011 and a Program Co-Chair for ICANN in 2007. He has given keynote and tutorial talks at foremost conferences in Signal P ro c e s s i n g a n d C o m p u t a t i o n a l Intelligence (ICASSP in 2013 and 2007, IJCNN in 2010, 2011, and 2012), and has been an Associate Editor for IEEE Transactions on Neural Networks and Learning Systems (since 2008), IEEE Signal Processing Magazine, and IEEE Transactions on Signal Processing. He is also a Co-Chair of the Task Force on Complex Neural Networks and a Member of the Task Force on Smart Grid (both within IEEE CIS), and the Signal Processing Theory and Methods technical committee within the IEEE SPS. Dr. Mandic has won several Best Paper awards in international conferences in Computational Intelligence (2010, 2009, 2006, 2004, 2002), and was appointed by the World University Service (WUS) as a Visiting Lecturer within the Brain Gain Program (BGP). His Ear-EEG device has been shortlisted for the Annual Brain Computer Interface Award in 2012. He has been granted patents and has had successful industrial collaborations in the areas of brain- and human-computer interface. Dr. Mandic has great satisfaction in educating new generations of researchers and his Ph.D. students and PDRAs have won Best Thesis awards at his home Department in 2007 and 2011, Best Research at the Department in 2012, and Best Student Paper awards in ISNN in 2010, MELECON 2004, and RASC in 2002. Ron Sun, Rensselaer Polytechnic Institute, NY, USA for contributions to cognitive architectures and computations Ron Sun is Professor of Cognitive Sciences at RPI, and formerly the James C. Dowell Professor of Engineering and Professor of Computer Science at University of MissouriColumbia. He heads the Cognitive Architecture Laboratory at RPI. His received his Ph.D. from Brandeis University in Computer Science. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q M q M q MQmags q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page THE WORLD’S NEWSSTAND® His research interests center around the study of cognition, especially in the areas of cognitive architectures, human reasoning and learning, cognitive social simulation, and hybrid connectionistsymbolic models. He published many papers in these areas, as well as nine books, including: “Duality of the Mind” and “Cambr idge Handbook of Computational Psychology.” For his paper on integrating rule-based and connectionist models for accounting for human everyday reasoning, he received the 1991 David Marr Award from Cognitive Science Society. For his work on understanding human skill learning, he received the 2008 Hebb Award from International Neural Network Society. His early major contribution was in hybrid connectionist-symbolic models. His 1995 “Artificial Intelligence” paper has demonstrated that the integration of symbolic and connectionist processes can capture complex human reasoning. He has furthermore made seminal contributions to advancing hybrid cognitive architectures and their applications to understanding human cognition/intelligence. His 2001 “Cognitive Science” paper addressed for the first time the cognitive phenomenon of “bottom-up learning”. His 2005 “Psychological Review” paper proposed a framework that centered on the interaction of implicit and explicit cognitive processes (computationally, with connectionist and symbolic representations). The latter article was the first successful attempt at accounting for a wide range of cognitive phenomena that up to that point had not been adequately captured either psychologically or in computational systems. His recent paper in Psychological Review presents the most comprehensive and integrative theory of human creativity based on a dual representational framework. This theory and its resulting model account for a wide variety of empirical data and phenomena, and point to future intelligent systems capable of creativity. These models, theories, and methods are of fundamental importance for understanding human cognition/ intelligence, and have significant implications for developing future computational intelligence systems. He is the founding co-editor-in-chief of the journal Cognitive Systems Research, and serves on the editorial boards of many other journals. He chaired a number of major international conferences, including CogSci and IJCNN. He is a member of the Governing Boards of Cognitive Science Society and International Neural Networks Society, and served as President of International Neural Networks Society for a two-year term (2011-2012). His Web URL is http://sites.google.com/ site/drronsun where one may find further _______ information about his work. Andrzej Cichocki, RIKEN Brain Science Insitute, JAPAN, and Warsaw University of Technology, POLAND for contributions to applications of blind signal processing and artificial neural networks Prof . Andrzej Cichocki received the M.Sc. (with honors), Ph.D. and Dr.Sc. (Habilitation) degrees, all in electrical engineering from Warsaw University of Technology (Poland). Since 1976, he has been with the Institute of Theory of Electrical Engineering, Measurement and Information Systems, Faculty of Electrical Engineering at the Warsaw University of Technology, where he became a full Professor in 1995. He spent several years at University Erlangen-Nuerenberg (Germany) as an Alexander-von-Humboldt Research Fellow and Guest Professor. In 1995-1997 he was a team leader of the Laboratory for Artificial Brain Systems, at Frontier Research Program RIKEN (Japan), in the Brain Information Processing Group. He is currently a Senior Team Leader and Head of the laboratory for Advanced Brain Signal Processing, at RIKEN Brain Science Institute (JAPAN). He has given keynote and tutorial talks at international conferences in Computational Intelligence and Signal Processing and served as member of program and technical committees (EUSIPCO, IJCNN, ICA, ISNN, ICONIP, ICAISC, ICASSP). He has coauthored more than 400 papers in international journals and conferences and 4 monographs in English (two of them translated to Chinese): “Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis,” John Wiley-2009; “Adaptive Blind Signal and Image Processing” (coauthored with Professor Shun-ichi Amari; Wiley, April 2003-revised edition), “CMOS Switched-Capacitor and Continuous-Time Integrated Circuits and Systems” (coauthored with Professor Rolf Unbehauen; Springer-Verlag, 1989) and “Neural Networks for Optimizations and Signal Processing” (WileyTeubner1994). He serves/served as an Associated Editor of IEEE Transactions on Neural Networks, IEEE Transactions on Signals Processing, Journal of Neurosciemce Methods and as founding Editor in Chief for Journal Computational Intelligence and Neuroscience. Currently, his research focus on tensor decompositions, multiway blind sources separation, brain machine interface, human robot interactions, EEG hyper-scanning, brain to brain interface and their practical applications. His publications currently report over 18,700 citations according to Google Scholar, with an h-index of 58 and i10-index 250. Zhi-Hua Zhou, Nanjing University, CHINA for contributions to learning systems in data mining and pattern recognition Zhi-Hua Zhou received B.Sc., M. Sc. and Ph.D. in computer science from Nanjing University in 1996, 1998 and 2000, respectively, all with the highest honor. He joined the Department of Computer Science and Technology of Nanjing University in 2001, and currently he is a Professor and Deputy Director of the National Key Laboratory for Novel Software Technology. Dr. Zhou is actively pursuing research in the field of machine learning, data MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 7 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® mining and pattern recognition. He has made significant contributions to theories and algorithms of ensemble learning, multi-instance learning, multi-label learning, semi-supervised learning, etc., and many innovative techniques have been applied to diverse areas such as computeraided medical diagnosis, biometric authentication, bioinformatics, multimedia retrieval and annotation, mobile and network communications, circuits design, etc. He has published more than 100 papers in leading international journals and conference proceedings, and his papers have been cited for more than 9,000 times according to Google Scholar, with an h-index of 52. He has authored the book “Ensemble Methods: Foundations and Algorithms” (CRC Press, 2012), and coedited eight conference proceedings. He also holds twelve patents. He is a Fellow of the International Association of Pattern Recognition, and has received many awards including nine international journal/conference papers or competitions awards. He is also the awardee of the 2013 IEEE CIS Outstanding Early Career Award. Dr. Zhou is the founder and steering committee chair of ACML, steering committee member of PAKDD and PRICAI, and served as General Chair, Program Committee Chair, Vice Chair or Area Chair for dozens of international conferences. He is currently the Associate Editor-in-Chief of Chinese Science Bulletin and on the Advisory Board of International Journal of Machine Learning and Cybernetics. He serves/ served as an Associate Editor or Editor ial Board member of more than twenty journals, including the ACM Transactions on Intelligent Systems and Technology and the IEEE Transactions on Knowledge and Data Engineering. He is the Vice Chair of CIS Data Mining Technical Committee, Vice Chair of IEEE Nanjing Section, Chair of IEEE Computer Society Nanjing Chapter, Chair of Artificial Intelligence and Pattern Recognition Technical Committee of China Computer Federation, and Chair of Machine Learning Technical Committee of China Association of Artificial Intelligence. 8 Gail Carpenter, Boston University, MA, USA for contributions to adaptive resonance theory and modeling of Hodgkin-Huxley neurons Gail Car penter received a B.A. in mathematics from the University of Colorado, Boulder, in 1970, and a Ph.D. in mathematics from the University of Wisconsin, Madison, in 1974. She has since been an instructor in applied mathematics at MIT, a professor of mathematics at Northeastern University, and a professor of cognitive and neural systems (CNS) and mathematics at Boston University. Gail Carpenter’s neural modeling work began with her Ph.D. thesis, Traveling wave solutions of nerve impulse equations. In a series of papers published in the 1970s, she defined generalized HodgkinHuxley models, used dynamical systems techniques to analyze their solutions, characterized the qualitative properties of the burst patterns that a typical neuron may propagate, and investigated normal and abnormal signal patterns in nerve cells. Together with Stephen Grossberg and their students and colleagues, Prof. Carpenter has, since the 1980s, developed the Adaptive Resonance Theory (ART) family of neural networks for fast stable online learning, pattern recognition, and prediction, including both unsupervised (ART 1, ART 2, ART 2-A, ART 3, fuzzy ART, distributed ART) and supervised (ARTMAP, fuzzy ARTMAP, ARTEMAP, ARTMAP-IC, ARTMAP-FTR, distributed ARTMAP, default ARTMAP, biased ARTMAP, self-super vised ARTMAP) systems. These ART models, designed by integrating cognitive and neural principles with systems-level computational constraints, have been used in a wide range of applications, including remote sensing, medical diagnosis, automatic target recognition, mobile robots, and database management. Prof. Carpenter’s recent research has focused on questions such as: How can a neural system learning from one example at a time absorb information that is inconsistent but correct, as when a family pet is called Spot, dog, and animal, while rejecting nominally similar incorrect information, as when the same pet is called wolf ? How does this system transform such scattered information into the knowledge that dogs are animals, but not conversely? How can a real-time system, initially trained with a few labeled examples and a limited feature set, continue to learn from experience when confronted with oceans of additional information, without eroding reliable early memories? How can such individual systems adapt to their unique application contexts? How can a neural system that has made an error refocus attention on environmental features that it had initially ignored? Systems based on distributed ARTMAP address these questions and their applications to technology. Other aspects of Prof. Carpenter’s research include the development, computational analysis, and application of neural models of vision, synaptic transmission, and circadian rhythms. Her work in vision has ranged from models of the retina to color processing and longrange figure completion. Gail Car penter has organized numerous conferences and symposia for the IEEE, the International Neural Network Society (INNS), and the American Mathematical Society (AMS). At Boston University, she has served as founder and director of the CNS Technology Lab and as a founding member of the Center for Adaptive Systems and the Department of Cognitive and Neural Systems. She received the IEEE Neural Networks Pioneer Award and the INNS Gabor Award, and is an INNS Fellow. Hani Hagras, University of Essex, UK for contributions to fuzzy systems Prof. Hani Hagras is a Professor of Computational Intelligence, Director of the Computational Intelligence Centre, Head of the Fuzzy Systems Research Group and Head of the Intelligent Environments IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q M q M q MQmags q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page THE WORLD’S NEWSSTAND® Research Group in the University of Essex, UK. His received his Ph.D. in Computer Science from the University of Essex in 2000. His major research interests are in computational intelligence, notably type-2 fuzzy systems, fuzzy logic, neural networks, genetic algorithms, and evolutionary computation. His research interests also include ambient intelligence, pervasive computing and intelligent buildings. He is also interested in embedded agents, robotics and intelligent control. He has authored more than 250 papers in international journals, conferences and books. His work has received funding that totalled to about £4 Million in the last five years from the European Union, the UK Technology Strategy Board (TSB), the UK Department of Trade and Industry (DTI), the UK Engineering and Physical Sciences Research Council (EPSRC), the UK Economic and Social Sciences Research Council (ESRC) as well as sev- eral industrial companies including. He has also three industrial patents in the field of computational intelligence and intelligent control. His research has won numerous prestigious international awards where most recently he was awarded by the IEEE Computational Intelligence Society (CIS), the 2013 Outstanding Paper Award in the IEEE Transactions on Fuzzy Systems and also he has won the 2004 Outstanding Paper Award in the IEEE Transactions on Fuzzy Systems. He was also the Chair of the IEEE CIS Chapter that won the 2011 IEEE CIS Outstanding Chapter award. His work with IP4 Ltd has won the 2009 Lord Stafford Award for Achievement in Innovation for East of England. His work has also won the 2011 Best Knowledge Transfer Partnership Project for London and the Eastern Region. His work has also won best paper awards in several conferences including the 2006 IEEE International Conference on Fuzzy Systems and the 2012 UK Workshop on Computational Intelligence. He is a Fellow of Institute of Electrical and Electronics Engineers (IEEE) and he is also a Fellow of the Institution of Engineering and Technology (IET). He served as the Chair of IEEE Computational Intelligence Society (CIS) Senior Members Sub-Committee. He served also as the chair of the IEEE CIS Task Force on Intelligent Agents. He is currently the Chair of the IEEE CIS Task Force on Extensions to Type-1 Fuzzy Sets. He is also a Vice Chair of the IEEE CIS Technical Committee on Emergent Technologies. He is a member of the IEEE Computational Intelligence Society (CIS) Fuzzy Systems Technical Committee. He is an Associate Editor of the IEEE Transactions on Fuzzy Systems. He is also an Associate Editor of the International Journal of Robotics and Automation. Prof. Hagras chaired several international conferences where most recently he served as the Co-Chair of the 2013, 2011 and 2009 IEEE Symposium on Intelligent Agents, and the 2011 IEEE International Symposium on Advances to Type-2 Fuzzy Logic Systems. He was also the General Co-Chair of the 2007 IEEE International Conference on Fuzzy Systems London. Heike Sichtig U.S. Food and Drug Administration, USA Stephen G. Matthews University of Bristol, UK IEEE CIS GOLD Report: Inaugural Elevator Pitch Competition and Other GOLD Activities T he Computational Intelligence Society (CIS), as well as IEEE, strive to assist and support younger members in entering their professional career after graduation. For this purpose, Digital Object Identifier 10.1109/MCI.2013.2247819 Date of publication: 11 April 2013 Demetrios G. Eliades University of Cyprus, CYPRUS Muhammad Yasser Babcock-Hitachi K.K., JAPAN Pablo A. Estévez University of Chile, CHILE the IEEE launched the GOLD (Graduates Of the Last Decade) Program to help students’ transition to young professionals within the larger IEEE community. IEEE young professionals are automatically added to the GOLD member community when they graduate. These GOLD benefits are available for the first ten years after graduation from university. Similarly, CIS established the GOLD subcommittee to help increase the number of activities for young IEEE professionals in the Computational Intelligence (CI) field. The IEEE CIS GOLD subcommittee is dedicated to serving the needs of a MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 9 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® vibrant community of CI engineers, scientists, and technical experts with member representation across the globe. In this article, we will be showcasing some of the CIS GOLD activities in 2012. The CIS GOLD subcommittee hosted a “Novel CI Research Idea Pitch” competition during the Student and GOLD reception at WCCI 2012 in Brisbane, Australia. The competition was a fantastic opportunity to socialize with like-minded students and GOLDs, and an opportunity to relax after a long conference day. The competition challenge was to design a one-page research proposal of a Computational Intelligence (CI) idea, and then to pitch that idea to a panel of CI experts and their peers using an “elevator pitch” (3-minute time limit). An “elevator pitch” is a short summary of a research idea. The research area was limited to “computational intelligence” and the participants were asked to submit a quad chart (a high-level overview of an idea), and to “sell” their idea to the judges to qualify for prizes. A panel of three CI experts selected three best pitches, followed by the audience (your peers) who were responsible in ranking the selected pitches by a secret ballot. Prizes included one iPad2 for the winner, and certificates and free full year 2013 IEEE CIS memberships for the 3 best pitches. On June 12, 2012, the reception started bustling with conference attendees, awaiting the start of the inaugural GOLD “elevator pitch” competition. Seven GOLDs/students registered for the event and were now up for presenting their novel CI idea to a like-minded audience and senior CI experts. Some entrants must have felt a bit uneasy before their pitches; however, the entrants were put at ease by the panel of selected senior CIS experts, Gary Fogel, Piero Bonissone and Pablo Estevez, and the audience. The competition was a fun event and judging by responses, a huge success! A big thank you goes to the photographers and videographers Albert Lam and Erdal Kayacan for capturing the exhilarating moments! Ahsan Chowdhury was awarded 1st place with an 10 Apple iPad, Stephen G. Matthews was awarded 2nd place, and Aram (Alex) TerSarkissov was awarded 3rd place. All entrants demonstrated strong skills in presenting a research idea to an audience. This is invaluable experience for GOLDs/students learning to pitch ideas and sell one’s self in a short period of time.Well done to all entrants! Gary Fogel, one of the panel judges, wrote: “It was a pleasure to even be considered to serve on the panel of judges for the inaugural GOLD ‘elevator pitch’ competition. This was a very fun and interesting event and the students came up with some very creative ideas. I have to applaud all of the students that entered simply for competing—and it was clear that most of them took the task quite seriously, which was a pleasure to see.Thanks to all of the students, congratulations to the winners, and to the organizers—I hope this will be the start of a long-lasting GOLD tradition! –Gary F.” Following on from the elevator pitch competition the CIS GOLD and CIS student subcommittee distributed a survey to attendees of the reception. The feedback showed strong support for both the reception and the competition. Most knew of the CIS student travel grants, but not many knew about other benefits such as CI Webinars, Summer Schools, Student Research Grants and the Ph.D. Dissertation Award. CIS GOLD will endeavor to keep you informed about what’s going on in the society and the benefits of being a member. The “elevator pitch” competition is just one of the initiatives conducted by the CIS GOLD subcommittee. The subcommittee has several achievements from their hard work throughout 2012: ❏ The “elevator pitch” competition at WCCI 2012 was great fun and invaluable experience for students/GOLDs. We hope this competition runs again. ❏ The survey conducted at WCCI 2012 provided data that we analyzed to produce a report. The information from this survey helps the subcommittee 2012 CIS Student/GOLD social event organizers: CIS GOLD Chair Heike Sichtig (left), CIS Student Activities Chair Demetrios Eliades, and CIS President Marios M. Polycarpou (right). Panel of “elevator pitch” judges: Pablo Estevez, Gary Fogel and Piero Bonissone (left to right). IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® to understand how best to support our CIS GOLDs in the future. ❏ We have contacted all IEEE CIS chapters, and many of them have given positive responses. Our intention is to keep and to improve our good relation and coordination with all of IEEE CIS chapters. ❏ Informing CIS GOLDs about activities on our website1, Facebook2 and LinkedIn3 pages. We hope our website can be a good way to improve our interaction with all IEEE CIS GOLD members. Our expectation is to provide the latest updates about our activities to all of IEEE CIS GOLD members and others. ❏ Heike Sichtig, chair of the CIS GOLD subcommittee during 20112012, received the 2012 IEEE Members and Geographic Activities 1 http://cis.ieee.org/gold-graduates-of-the-last-decade. __ html 2 h________________________ t t p s://w w w. f a c e b o ok .c o m /p a g e s/ I E E E - C I S ___________ GOLD/212664895442435 http://www.linkedin.com/groups/IEEE-CIS-GOLD________________________ Computational-Intelligence-438209 ________________ 3 Editor’s Remarks Winner of the “elevator pitch” competition Ahsan Chowdhury (second from left) with the three judges Gary Fogel, Pablo Estevez and Piero Bonissone. Our videographer/photographer Albert Lam in the background! GOLD Achievement Award. She said: “I am very proud to be part of this society. It truly provides support and is the best networking tool in today’s world!” From these activities, we strive to maintain a positive level of participation from IEEE CIS GOLD members, and we feel we achieved this. In all of these activities we have seen enthusiastic participation, and we expect that interaction to increase in the future. We are keen to hear your feedback/suggestions to learn what CIS GOLDs want, so please contact the CIS GOLD chair. Be sure to “like” us on Facebook and connect with us on LinkedIn, so you can meet other members! Finally, please check our website for a link to additional commentary and pictures from the “elevator pitch” competition. (continued from page 2) This special issue, as guest edited by Dongrui Wu and Christian Wagner, presents four featured articles that address the various areas of improvements that are notable in the field of affective computing using computational intelligence technologies. Gaming is a major area where affective computing can be helpful. By sensing and interpreting the emotions and body gestures of the gamer during game play, adaptation can be made to allow the gaming experience to become more realistic. The first article addresses feature extraction and selection in affective modelling, especially in relation to machine learning with deep learning approaches and tested on games. The second article attempts to model the representation of emotion words in a game such that similar emotion words and those belonging to the same subsets are classified accurately. This issue also sees a feature article that touches on an aspect that is close to a lot of us in the academia who are in constant contact with students—curiosity as a motivation in learning. Using affective computing in a virtual learning environment, the authors attempt a model to enhance curiosity in order to motivate students that may require more prodding than others. Decision-making in humans is certainly not a simple black/white or yes/no, binary decision process. It is influenced by beliefs, logic and emotions as considered in the fourth article of this issue, where attempts to model one of the complexities of decision-making using affective computing technologies to address belief-revision in machines is made—comparing denial, wishful thinking with goals and objectives in a machine. Emotions are sometimes tough to pin down even in words. In addition to a “Book Review” on complex-valued neural networks, there is also a report on the 2012 IEEE Life Sciences Grand Challenges conference and an update on the activities of IEEE CIS GOLD. In the “Society Briefs” column, we congratulate the new IEEE Fellows in the Class of 2013 elevated through CIS, and welcome our five newly elected AdCom members who will help manage and administer CIS. As we cross the half-way point of 2013, it is also time to take stock of what we have done well and what we can do better before the end of the year comes upon us. Please let us know if you have any suggestions or comments on areas that we have done well in that you’d like to see continue and areas where we can improve on by e-mailing me at [email protected]. ____________ We look forward to hearing from you and hope you will enjoy this issue as much as we’ve enjoyed putting it together for you! MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 11 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Publication Spotlight Derong Liu, Chin-Teng Lin, Garry Greenwood, Simon Lucas, and Zhengyou Zhang CIS Publication Spotlight IEEE Transactions on Neural Networks and Learning Systems Low-Rank Structure Learning via Nonconvex Heuristic Recovery, by Y. Deng, Q. Dai, R. Liu, Z. Zhang, and S. Hu, IEEE Transactions on Neural Networks and Learning Systems, Vol. 24, No. 3, March 2013, pp. 383–396. Digital Object Identifier: 10.1109/ TNNLS.2012.2235082 “ nonconvex framework is proposed for learning the essential low-rank structure from corrupted data. Different from traditional approaches, which directly utilizes convex norms to measure the sparseness, this method introduces more reasonable nonconvex measurements to enhance the sparsity in both the intrinsic low-rank structure and the sparse corruptions. It includes how to combine the widely used Lp norm (0<p<1) and log-sum term into the framework of low-rank structure lear ning. Although the proposed optimization is no longer convex, it still can be effectively solved by a majorization–minimization (MM)type algorithm, with which the nonconvex objective function is iteratively replaced by its convex surrogate and the nonconvex problem finally falls i n t o t h e g e n e r a l f r a m ewo r k o f reweighed approaches. It is proved that A Digital Object Identifier 10.1109/MCI.2013.2247820 Date of publication: 11 April 2013 12 order to gain experiences for the MM-type algorithm can success and for failure. Success converge to a stationar y map is learned with adaptive point after successive iterareward that qualifies the tions. The proposed model learned task in order to is applied to solve two optimize the efficiency. typical problems: robust The approach is presented pr incipal component with an implementation analysis and low-rank on the NAO humanoid representation. Exper irobot, controlled by a biomental results on low-rank inspired neural controller structure learning demonbased on a central pattern genstrate that our nonconvex © CORBIS erator. The learning system adapts heuristic methods, especially the the oscillation frequency and the motor log-sum heuristic recovery algorithm, neuron gain in pitch and roll in order generally perform much better than to walk on flat and sloped terrain, and the convex-nor m-based method to switch between them.” (0<p<1) for both data with higher rank and with denser corruptions.” Qualitative Adaptive Reward Learning with Success Failure Maps: Applied to Humanoid Robot Walking, by J. Nassour, V. Hugel, F.B. Ouezdou, and G. Cheng, IEEE Transactions on Neural Networks and Learning Systems, Vol. 24, No. 1, January 2013, pp. 81–93. Digital Object Identifier: 10.1109/ TNNLS.2012.2224370 “A learning mechanism is proposed to learn from negative and positive feedback with reward coding adaptively. It is composed of two phases: evaluation and decision making. In the evaluation phase, a Kohonen self-organizing map technique is used to represent success and failure. Decision making is based on an early warning mechanism that enables avoiding repeating past mistakes. The behavior to risk is modulated in IEEE Transactions on Fuzzy Systems A Novel Approach to Filter Design for T–S Fuzzy Discrete-Time Systems with Time-Varying Delay, IEEE Transactions on Fuzzy Systems, Vol. 20, No. 6, December 2012, pp. 1114–1129. Digital Object Identifier: 10.1109/ TFUZZ.2012.2196522 “In this paper, the problem of l2- l∞ filtering for a class of discrete-time Takagi-Sugeno (T-S) fuzzy time-varying delay systems is studied. The authors focused on the design of full- and reduced-order filters that guarantee the filtering error system to be asymptotically stable with a prescribed H∞ performance. Sufficient conditions for the obtained filtering error system are proposed by applying an input-output IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® approach and a two-term approximation method, which is employed to approximate the time-varying delay. The corresponding full- and reduced-order filter design is cast into a convex optimization problem, which can be efficiently solved by standard numerical algorithms. Finally, simulation examples are provided to illustrate the effectiveness of the proposed approaches.” Fuzzy c-Means Algorithms for Very Large Data, IEEE Transactions on Fuzzy Systems, Vol. 20, No. 6, December 2012, pp. 1130–1146. Digital Object Identifier: 10.1109/ TFUZZ.2012.2201485 “Very large (VL) data or big data are any data that we cannot load into our computer’s working memory. This is not an objective definition, but a definition that is easy to understand and practical, because there is a dataset too big for any computer we might use; hence, this is VL data for us. Clustering is one of the primary tasks used in the pattern recognition and data mining communities to search VL databases (including VL images) in various applications, and so, clustering algorithms that scale well to VL data are important and useful. This paper compares the efficacy of three different implementations of techniques aimed to extend fuzzy c-means (FCM) clustering to VL data. Specifically, we compare methods that are based on 1) sampling followed by noniterative extension; 2) incremental techniques that make one sequential pass through subsets of the data; and 3) kernelized versions of FCM that provide approximations based on sampling, including three proposed algorithms. Empirical results show that random sampling plus extension FCM, bit-reduced FCM, and approximate ker nel FCM are good choices to approximate FCM for VL data. We conclude by demonstrating the VL algorithms on a dataset with 5 billion objects and presenting a set of recommendations regarding the use of different VL FCM clustering schemes.” IEEE Transactions on Evolutionary Computation Continuous Dynamic Constrained Optimization—The Challenges, by T. Nguyen and X. Yao, IEEE Transactions on Evolutionary Computation, Vol. 16, No. 6, December 2012, pp. 769–786. Digital Object Identifier: 10.1109/ TEVC.2011.2180533 “Many real-world dynamic problems have both objective functions and constraints that can change over time. Currently no research addresses whether current algorithms work well on continuous dynamic constrained optimization problems. There also is no benchmark problem that reflects the common characteristics of continuous dynamic optimization problems. This paper attempts to close this gap. The authors present some investigations on the characteristics that might make these problems difficult to solve by some existing dynamic optimization and constraint handling algorithms. A set of benchmark problems with these characteristics is presented. Finally, list of potential requirements that an algorithm should meet to solve these type of problems is proposed.” The Automatic Design of Multiobjective Ant Colony Optimization Algorithms, by M. Lopez-Ibanez and T. Stutzle, IEEE Transactions on Evolutionary Computation, Vol. 16, No. 6, December 2012, pp. 861–875. Digital Object Identifier: 10.1109/ TEVC.2011.2182651 “Multiobjective optimization problems are problems with several, often conflicting, objectives that must be optimized. Without any a priori preference information, the Pareto optimality principle establishes a partial order among solutions, and the output of the algorithm becomes a set of nondominated solutions rather than a single one. Various ant colony optimization (ACO) algorithms have been proposed in recent years for solving such problems. This paper proposes a formulation of algorithmic components that suffices to descr ibe most multiobjective ACO algorithms proposed so far. The proposed framework facilitates the application of automatic algorithm configuration techniques.” IEEE Transactions on Computational Intelligence and AI in Games Monte Carlo Tree Search for the Hideand-Seek Game Scotland Yard, by Pim Nijssen and Mark H.M. Winands, IEEE Transactions on Computational Intelligence and AI in Games, Vol. 4, No. 4, December 2012, pp. 282–294. Digital Object Identifier: 10.1109/ TCIAIG.2012.2210424 “This paper develops a strong Monte-Carlo Tree Search player for Scotland Yard, an interesting asymmetric imperfect information 2-player strategy game. The game involves one player controlling five detectives trying to capture a “hider.” A novel combination of techniques are used including determinization, location categorization and coalition reduction, the latter of which aims to optimally balance the tendencies for detectives to behave in glory hunting versus parasitic ways.” IEEE Transactions on Autonomous Mental Development A Unified Account of Gaze Following, by H. Jasso, J. Triesch, G. Deák, and J.M. Lewis, IEEE Transactions on Autonomous Mental Development, Vol. 4, No. 4, December 2012, pp. 257–272. Digital Object Identifier: 10.1109/ TAMD.2012.2208640 “Gaze following, the ability to redirect one’s visual attention to look at what another person is seeing, is foundational for imitation, word learning, and theory-of-mind. Previous theories have suggested that the development of gaze following in human infants is the product of a basic gaze following mechanism, plus the gradual incorporation of several distinct new mechanisms that improve the skill, such as spatial inference, and the MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 13 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® ability to use eye direction information as well as head direction. In this paper, we offer an alternative explanation based on a single learning mechanism. From a starting state with no knowledge of the implications of another organism’s gaze direction, our model learns to follow gaze by being placed in a simulated environment where an adult caregiver looks around at objects. Our infant model matches the development of gaze following in human infants as measured in key experiments that we replicate and analyze in detail.” CALL FOR PAPERS General Game Systems IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG) Special issue: General Game Systems Special issue editors: Cameron Browne, Nathan Sturtevant and Julian Togelius General game playing (GGP) involves the development of AI agents for playing a range of games well, rather than specialising in any one particular game. Such systems have potential benefits for AI research, where the creation of general intelligence remains one of the open grand challenges. GGP was first proposed in the 1960s and became a reality in the 1990s with the Metagame system for general Chess-like games. The specification of the game description language (GDL) and annual AAAI GGP competitions followed in the first decade of this century, providing a platform for serious academic study into this topic. The recent advent of Monte Carlo tree search (MCTS) methods has allowed the development of truly competitive GGP agents, and there is exciting new research into applying GGP principles to general video games. The field of general games research is now becoming fully rounded, with the development of complete general game systems (GGS) for playing, analysing and/or designing new games. These include not only GGP, but also any system that attempts to model a range of games; the definition is itself kept deliberately broad. The key feature of such systems is their generality, but the issue of representation remains an obstacle to true universality while they rely on formal descriptions of target domains. The purpose of this special issue is to draw together the various research topics related to AI and CI in general games, to give an indication of where the field currently stands and where it is likely to head. It will explore questions such as: How good and how general are existing systems, and how good and how general can they become? What have we learnt about AI and CI from studying general games? How do we apply existing GGP expertise to general video games? We invite high quality work on any aspect of general games research in any genre of game–digital or physical–including play, analysis and design. Topics include but are not limited to: ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑ General game playing General game description and representation General game design and optimisation Generalised Monte Carlo tree search (MCTS) approaches Real-time, nondeterministic and imperfect information extensions to GGP General video game playing Framing issues and constraints on generality Bridging the gap between academic and commercial applications Authors should follow normal T-CIAIG guidelines for their submissions, but identify their papers for this special issue during the submission process. Submissions should be 8 to 12 pages long, but may exceed these limits in special cases. Short papers of 4 to 6 pages are also invited. See http://www.ieee-cis.org/pubs/tciaig/ for author information. Deadline for submissions: May 3, 2013 Notification of Acceptance: July 5, 2013 Final copy due: October 4, 2013 Publication: December 6, 2013 Digital Object Identifier 10.1109/MCI.2013.2247901 14 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Conference Report Gary B. Fogel Natural Selection, Inc., USA A Report on the IEEE Life Sciences Grand Challenges Conference O n October 4–5, 2012 I had the good fortune to attend the first IEEE Life Sciences Grand Challenges Conference (IEEE LSGCC) held at the National Academy of Sciences in Washington, D.C. The two day meeting had attendees from essentially all IEEE societies, reviewing applications and advancements of engineering in biomedicine. IEEE Life Sciences represents a new direction for IEEE, focused on the ever increasing need for improved engineering solutions for high quality, lower cost solutions to healthcare. As biology generates larger datasets, the need for computational intelligence approaches also increases. As a biologist, it was excellent to see presentations from both engineers and biologists, with IEEE pulling the two fields closer together. The meeting itself was largely focused on medical applications, including improved devices, use of robots as medical assistants, even visualization methods for modeling of biological systems such as blood flow in the heart so that new types of replacement valves could be tested in a realistic simulation environment where the researcher can interact with the simulation in three dimensional projections. IEEE CIS was mentioned in a lecture by Shangkai Gao (Tsinghua University) regarding the importance of and future directions in brain-computer interfaces. It was good to see the importance of machine learning featured, as Digital Object Identifier 10.1109/MCI.2013.2247821 Date of publication: 11 April 2013 The start of the first IEEE Life Sciences Grand Challenges Conference in Washington, D.C. at the National Academy of Sciences. well as a cover from a previous special issue on this topic in IEEE Computational Intelligence Magazine! A highlight for me was the lecture by Nobel Prize winner Phillip Sharp on the convergence of the life sciences, physical sciences, and engineering. It is this convergence that was the focus of the meeting, the will allow for knowledge integration and iteration, to provide actionable insights to future clinicians. A common theme in many talks was the need to translate better modeling and engineer ing throughout the healthcare chain, not just to the clinician but to better informed and engaged patients. While some presentations highlighted promising advances already being made in these directions, the advent of big data in biology and the realization of the scope and size of the problems in systems biology remain daunting. These “grand challenges” will be the reason why this new direction for IEEE will pay dividends for researchers and patients for years to come. For more information on the IEEE Life Sciences Initiative, please visit http://lifesciences.ieee.org. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 15 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® IEEE Transactions on Fuzzy Systems Special Issue on Web-Based Intelligence Support Systems using Fuzzy Set Technology I. Aims and Scope Web–based technology has enjoyed a tremendous growth and exhibited a wealth of development at both conceptual and algorithmic levels. In particular, there have been numerous successful realizations of Web-based support systems in various application areas, including e-learning, e-commerce, e-government, and e-market. Web-based support systems are highly visible and influential examples of user-oriented technology supporting numerous human pursuits realized across the Internet. In the two categories of decision support systems and recommender systems, the facet of user centricity and friendliness is well documented. Recent literature review demonstrates that more and more successful developments in Web-based support systems are being integrated with fuzzy sets to enhance intelligence-oriented functionality such as web search systems by fuzzy matching; Internet shopping systems using fuzzy multi-agents; product recommender systems supported by fuzzy measure algorithms; e-logistics systems using fuzzy optimization models; online customer segments using fuzzy data mining; fuzzy case-based reasoning in e-learning systems, and particularly online decision support systems supported by fuzzy set techniques. These developments have demonstrated how the use of fuzzy set technology can benefit the implementation of Web-based support systems in business real-time decision making and government online services. In light of the above observations, this special issue is intended to form an international forum presenting innovative developments of fuzzy set applications in Web-based support systems. The ultimate objective is to bring well-focused high quality research results in Webbased support systems with intent to identify the most promising avenues, report the main results and promote the visibility and relevance of fuzzy sets. The intent is to raise awareness of the domain of Web-based technologies as a high-potential subject area to be pursued by the fuzzy set research community. Digital Object Identifier 10.1109/MCI.2013.2247903 16 II. Topics Covered Fuzzy sets technology in ❏ Web-based group support systems ❏ Web-based decision support systems ❏ Web-based personalized recommender systems ❏ Web-based knowledge management systems ❏ Web-based customer relationship management ❏ Web-based tutoring systems and their applications to: ❏ E-business intelligence ❏ E-commerce intelligence ❏ E-government intelligence ❏ E-learning intelligence III. Important Dates Aug. 1, 2013: Submission deadline Nov. 1, 2013: Notification of the first-round review Jan. 1, 2014: Revised submission due Mar. 1, 2014: Final notice of acceptance/reject IV. Submission Guidelines Manuscripts should be prepared according to the instruction of the “Information for Authors” section of the journal found and submission should be done through the IEEE TFS journal website: http://mc.manuscriptcentral. com/tfs-ieee/ Clearly mark “Special Issue on Web-Based Intelligence Support Systems using Fuzzy Set Technology” in your cover letter to the Editor-in-Chief. All submitted manuscripts will be reviewed using the standard procedure that is followed for regular submissions. V. Guest Editors Prof. Witold Pedrycz Department of Electrical & Computer Engineering University of Alberta, Canada e-mail: [email protected] _____________ Prof. Jie Lu School of Software Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia e-mail: ___________ [email protected] IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Guest Editorial Dongrui Wu GE Global Research, USA Christian Wagner University of Nottingham, UK Special Issue on Computational Intelligence and Affective Computing A 1 http://nsf.gov/news/news_summ.jsp?cntn_id=123707 http://www.robotshelpingkids.org/index.php 2000 Number of Publications ffective Computing (AC) was first introduced by Professor Picard (MIT Media Lab) in 1995 as “computing that relates to, arises from, or deliberately influences emotions.’’ It has been gaining popularity rapidly in the last decade, largely because of its great potential in the next generation of human-computer interfaces. Figure 1 shows the number of publications containing the phrase “affective computing,” over the last 17 years returned by Google Scholar. In 2012 there were close to 2000 publications on it. Many countries have been also very supportive of AC research, particularly in relation to priority areas such as supporting children’s social and cognitive development and the backdrop of a rapidly aging demographic, where humantangible computing such as affective robot companions is expected to provide essential benefits. In April 2012 the United States National Science Foundation awarded $10M to a 5-year project “Socially Assistive Robotics” under the Expeditions in Computing program1, which2 “will develop the fundamental computational techniques that will enable the design, implementation, and evaluation of robots that encourage social, emotional, and cognitive growth in children, including those with social or cognitive deficits.” The European Union has funded many relevant projects under the 6th 1500 1000 500 0 1996 1998 2000 2002 2006 2008 2010 2012 FIGURE 1 Number of Google Scholar publications on affective computing since 1995. and 7th Framework Programmes. The HUMAINE 3 (HUman-MAchine Interaction Network on Emotions) Network of Excellence was established in 2004 and now has 33 partners from 14 countries. The RoboCom (Robot Companions for Citizens) project4 is one of the six candidates for the two €1 billion 10-year Future and Emerging Technologies Flagships5. These robots will be able to display soft behavior based on new levels of perceptual, cognitive and emotive capabilities. There are also two journals and an international conference dedicated to AC. The HUMAINE association established the bi-annual International Conference on Affective Computing and Intelligent Interaction (Beijing, China, 2005; Lisbon, 2 3 http://emotion-research.net/ http://www.robotcompanions.eu/ http://cordis.europa.eu/fp7/ict/programme/fet/ flagship/6pilots_en.html ___________ 4 Digital Object Identifier 10.1109/MCI.2013.2247822 Date of publication: 11 April 2013 2004 Year Por tugal, 2007; Amsterdam, The Netherlands, 2009; Memphis, USA, 2011; Geneva, Switzerland, 2013) in 2005, and the IEEE/ACM Transactions on Affective Computing in 2010. IGI Global established the International Journal of Synthetic Emotions in 2010. Notably, the IEEE Computational Intelligence Society (CIS) is very active on AC research. It is a sponsor of the IEEE Transactions on Affective Computing, the Workshop on Affective Computational Intelligence in the 2011 IEEE Symposium Series on Computational Intelligence (SSCI 2011), and the Symposium on Computational Intelligence for Creativity and Affective Computing in SSCI 2013. The CIS Emergent Technologies Technical Committee has established an Affective Computing Task Force6, which is currently chaired by the two Guest 5 6 https://sites.google.com/site/drwu09/actf MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 17 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Computational intelligence methods, including fuzzy sets and systems, neural networks, and evolutionary algorithms, provide ideal capabilities to develop intuitive and robust emotion recognition algorithms. Emotions, which are intrinsic to human beings, may also inspire new CI algorithms. Editors of this special issue. The Task Force organized a special session on “Affective Computing and Computational Intelligence” at the 2012 World Congress on Computational Intelligence (WCCI 2012) with a view to make it a bi-annual event held at WCCI. The combination of AC and computational intelligence is very natural. AC raises many new challenges for signal processing, affect recognition & modeling, and infor mation aggregation. Physiological signals which are frequently used as a basis for affect recognition are very noisy and highly subject-dependent. Computational intelligence methods, including fuzzy sets and systems, neural networks, and evolutionary algorithms, provide ideal capabilities to develop intuitive and robust emotion recognition algorithms. Further, emotions, which are intrinsic to human beings, may also inspire new CI algorithms, just like the human brain inspired neural networks and the survival of the fittest in nature inspired evolutionary computation. AC research itself has rapidly expanded and today frequently goes beyond the initial core research challenge of mapping body signals (facial expressions, voice, gesture, physiological signals, etc.) to affective states. As an area which relies on contributions from a series of academic disciplines, including Psychology, Biology, and Computer Science, much of the research in AC is firmly grounded in a multi-disciplinary approach. The four articles in this special issue of IEEE Computational Intelligence Magazine represent some latest progress on the combination of AC and computational intelligence. They were selected from 20 submissions through peer-review and provide a highly interesting view of the current research and potential avenues of computational 18 intelligence in AC. The breadth of the research captured by these articles provides an indication of the importance of affect in modern human-centric computation and indicates the potential for further development of Computational Intelligence in this space. The first article, “Learning Deep Physiological Models of Affect,” describes the first study that applies deep learning to AC using psychophysiological signals (skin conductance and blood volume pulse). Deep learning is a very active research area in machine learning, especially for object recognition in images. In this article the authors use a deep artificial neural network for automatic feature extraction and feature selection. They adopt preference-based (or ranking-based) annotations for emotion rather than traditional rating-based annotation, as the former provides more reliable self-report data. Experiments show that deep learning can extract meaningful multimodal data attributes beyond manual ad-hoc feature design. For some affective states, deep learning without feature selection achieved similar or even better performance than models built on ad-hoc extracted features boosted by automatic feature selection. More importantly, the method is generic and applicable to any affective modeling task. In the second article, the authors present two models that employ interval type-2 fuzzy sets to model the meaning of words describing emotion. The first model represents three factors for each word: dominance, valence, and activation. After describing the model the authors deploy it in conjunction with similarity measures for the task of translating from one emotion vocabulary to another. As an initial outcome, the authors show that while the model works well with smaller vocabularies, performance (rated by comparison with human translators) decreases when larger vocabularies are used. The authors then introduce a second model which aims to overcome this limitation by taking a different approach to modeling words where interval type-2 fuzzy sets are used to represent the truth values of answers to questions about emotion. A crowd-sourced evaluation of the latter approach is conducted and the results presented. The third article,“Modeling CuriosityRelated Emotions for Virtual Peer Learners,” proposes a virtual peer learner with curiosity-related emotions. It represents one of the latest advances on personalized learning, which was selected by the United States National Academy of Engineering as one of its 14 Grand Challenges7. The idea is that “instruction can be individualized based on learning styles, speeds, and interests to make learning more reliable. ... Personal learning approaches range from modules that students can master at their own pace to computer programs designed to match the way it presents content with a learner’s personality.” Experiments show that the curiosityrelated emotions can guide the curious peer learner to behave naturally in a virtual learning environment, and the curious virtual peer learner can demonstrate a higher tendency for learning in breadth and depth. In the fourth article, “Goal-Based Denial and Wishful Thinking,” the authors propose a novel approach to model an agent’s beliefs that aims to incorporate denial and wishful thinking. While not traditionally related to AC, their work on belief revision highlights an important aspect of emotion in belief-structure with direct consequences for the design of artificial agents. They describe how traditional rational belief systems for autonomous artificial agents can be extended to capture a more human-like approach to belief creation, preservation and revision. Significantly, the authors show how their approach enables the autonomous ranking and re-ranking of beliefs 7 h________________________ t t p : / / w w w. e n g i n e e r i n g c h a l l e n g e s . o r g / cms/8996/9127.aspx _________ IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® subject to new evidence and changes in an agent’s goals which in turn allow an agent to autonomously revise its beliefs without relying on their external prioritization. As part of scenarios, the authors instantiate their belief model and demonstrate the behavior of the agent in particular in terms of the “denial and wishful thinking” belief revision driven by the context experienced by the agent. In summary, the four selected papers for this special issue highlight a subset of the challenging and novel applications of computational intelligence to AC. We would like to express our sincere thanks to all the authors and gratitude to reviewers for extending their cooperation in preparing and revising the papers. Special thanks go to Professor Kay Chen Tan, Editor-in-Chief of IEEE Computational Intelligence Magazine, for his suggestions and advice throughout the entire process of this special issue.We hope that this issue will inspire others to work on the exciting new frontier of computational intelligence and AC. IEEE Transactions on Autonomous Mental Development Special Issue on Behavior Understanding and Developmental Robotics Call for Papers We solicit papers that inspect scientific, technological and application challenges that arise from the mutual interaction of developmental robotics and computational human behavior understanding. While some of the existing techniques of multimodal behavior analysis and modeling can be readily re-used for robots, novel scientific and technological challenges arise when one aims to achieve human behavior understanding in the context of natural and life-long human-robot interaction. We seek contributions that deal with the two sides of this problem: (i) Behavior analysis for developmental robotics; (ii) Behavior analysis through developmental robotics.Topics include the following, among others: Adaptive human-robot interaction Action and language understanding Sensing human behavior Incremental learning of human behavior Learning by demonstration Intrinsic motivation Robotic platforms for behavior analysis Multimodal interaction Human-robot games Semiotics for robots Social and affective signals Imitation Contributions can exemplify diverse approaches to behavior analysis, but the relevance to developmental robotics should be clear and explicitly argumented. In particular, it should involve one of the following: 1) incremental and developmental learning techniques, 2) techniques that allow adapting to changes in human behavior, 3) techniques that study evolution and change in human behavior. Interested parties are encouraged to contact the editors with questions about the suitability of a manuscript. Editors: Albert Ali Salah, Boğaziçi University, ____________ [email protected]; Pierre-Yves Oudeyer, INRIA, [email protected]; __________________ Çetin Meriçli, Carnegie Mellon University, [email protected]; jruizd@ing. __________ Javier Ruiz-del-Solar, Universidad de Chile, ________ uchile.cl _____ Instructions for Authors: http://cis.ieee.org/ieee-transactions-on-autonomous-mental-development.html We are accepting submissions through Manuscript Central at http://mc.manuscriptcentral.com/tamd-ieee (please select “Human Behavior Understanding” as the submission type) When submitting your manuscript, please also cc it to the editors. Timeline: 30 April 2013: 15 July 2013: 15 October 2013: 20 October 2013: December 2013: Deadline for paper submission Notification of the first round of review results Final version Electronic publication Printed publication Digital Object Identifier 10.1109/MCI.2013.2247902 MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 19 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® © PHOTODISC Héctor P. Martínez IT University of Copenhagen, DENMARK Yoshua Bengio University of Montreal, CANADA Georgios N. Yannakakis University of Malta, MALTA I. Introduction M ore than 15 years after the early studies in Affective Computing (AC), [1] the problem of detecting and modeling emotions in the context of human-computer interaction (HCI) remains complex and largely unexplored. The detection and modeling of emotion is, primarily, the study and use of artificial intelligence (AI) techniques for the construction of computational models of emotion. The key challenges one faces when attempting to model emotion [2] are inherent in the vague definitions and fuzzy boundaries of emotion, and in the modeling methodology followed. In this context, open research questions are still present in all key components of the modeling process. These include, first, the appropriateness of the modeling tool employed to map emotional manifestations and responses to annotated affective states; second, the processing of signals that express these manifestations (i.e., model input); and third, the way affective annotation (i.e., model output) is handled. This paper touches upon all three key components of an affective model (i.e., input, model, output) and introduces the use of deep learning (DL) [3], [4], [5] methodologies for affective modeling from multiple physiological signals. Traditionally in AC research, behavioral and bodily responses to stimuli are collected and used as the affective model input. The input can be of three main types: a) behavioral responses to emotional stimuli expressed through an Digital Object Identifier 10.1109/MCI.2013.2247823 Date of publication: 11 April 2013 20 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 1556-603X/13/$31.00©2013IEEE M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Abstract—Feature extraction and feature selection are crucial phases in the process of affective modeling. Both, however, incorporate substantial limitations that hinder the development of reliable and accurate models of affect. For the purpose of modeling affect manifested through physiology, this paper builds on recent advances in machine learning with deep learning (DL) approaches. The efficiency of DL algorithms that train artificial neural network models is tested and compared against standard feature extraction and selection approaches followed in the literature. Results on a game data corpus—containing players’ physiological signals (i.e., skin conductance and blood volume pulse) and subjective self-reports of affect— reveal that DL outperforms manual ad-hoc feature extraction as it yields significantly more accurate affective models. Moreover, it appears that DL meets and even outperforms affective models that are boosted by automatic feature selection, for several of the scenarios examined. As the DL method is generic and recognition [7], [8]. DL allows interactive application (e.g., applicable to any affective modeling task, the key the automation of feature data obtained from a log of findings of the paper suggest that ad-hoc feature extraction (and feature selection, actions performed in a game); b) extraction and selection—to a lesser in part) without compromising on objective data collected as bodily degree—could be bypassed. the accuracy of the obtained computaresponses to stimuli, such as physiotional models and the physical meaning of logical signals and facial expressions; and c) the data attributes extracted [9]. Using deep the context of the interaction. Before these data learning we were able to extract meaningful multimodal streams are fed into the computational model, an automatic data attributes beyond manual ad-hoc feature design. These or ad-hoc feature extraction procedure is employed to derive learned attributes led to more accurate affective models and, at appropriate signal attributes (e.g., average skin conductance) the same time, potentially save computational resources by that will feed the model. It is also common to introduce an bypassing the computationally expensive feature selection automatic or a semi-automatic feature selection procedure that phase. Most importantly, with the use of DL we gain simplicity picks the most appropriate of the features extracted. as multiple signals can be fused and fed directly—with limited While the phases of feature extraction and feature selecpreprocessing—to the model for training. tion are beneficial for affective modeling, they inherit a Other common automatic feature extraction techniques number of critical limitations that make their use cumberwithin AC are principal component analysis (PCA) and Fisher some in highly complex multimodal input spaces. First, manprojection. However they are typically applied to a set of feaual feature extraction limits the creativity of attribute design tures extracted a priori [10] while we apply DL directly to the to the expert (i.e., the AC researcher) resulting in potentially raw data signals. Moreover, DL techniques can operate with inappropriate affect detectors that might not be able to capany signal type and are not restricted to discrete signals as, for ture the manifestations of the affect embedded in the raw example, sequential data mining techniques are [11]. Finally, input signals. Second, both feature extraction and feature compared to dynamic affect modeling selection—to a larger degree—are computationally expenapproaches such as Hidden Markov sive phases. In particular, the computational cost of feature Models and Dynamic Bayesian selection may increase combinatorially (quadratically, in the Networks, DL models are advantagreedy case) with respect to the number of features considgeous with respect to their ability ered [6]. In general, there is no guarantee that any search to reduce signal resolution algorithm is able to converge to optimal feature sets for the across the several layers of their model; even exhaustive search may be approximate, since architectures. models are often trained with non-deterministic algorithms. This paper focuses on Our hypothesis is that the use of non-linear unsupervised developing DL models of and supervised learning methods relying on the principles of affect using data which are DL [3], [4] can eliminate the limitations of the current feature annotated in a ranking format extraction and feature selection practices in affective modeling. (pairwise preferences). We We test the hypothesis that DL could construct feature extracemphasize the benefits of prefertors that are more appropriate than selected adhoc features ence-based (or ranking-based) annopicked via automatic selection. Learning within deep artificial tations for emotion (e.g., X is more neural network (ANN) architectures has proven to be a powerfrustrating than Y) as opposed to ratingful machine learning approach for a number of benchmark based annotation [12] (such as the self-assessment problems and domains, including image and speech MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 21 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® manikins [13], a tool to rate levels of arousal and valence in discrete or continuous scales [14]) and introduce the use of DL algorithms for preference learning, namely, preference deep learning (PDL). In this paper, the PDL algorithm proposed is tested on emotional manifestations of relaxation, anxiety, excitement, and fun, embedded in physiological signals (i.e., skin conductance and blood volume pulse) derived from a game-based user study of 36 participants. The study compares DL against ad-hoc feature extraction on physiological signals, used broadly in the AC literature, showing that DL yields models of equal or significantly higher accuracy when a single signal is used as model input. When the skin conductance and blood volume pulse signals are fused, DL outperforms standard feature extraction across all affective states examined. The supremacy of DL is maintained even when automatic feature selection is employed to improve models built on ad-hoc features; in several affective states the performance of models built on automatically selected ad-hoc features does not surpass or reach the corresponding accuracy of the PDL approach. This paper advances the state-of-the-art in affective modeling in several ways. First, to the best of the authors’ knowledge, this is the first time deep learning is introduced to the domain of psychophysiology, yielding efficient computational models of affect. Second, the paper shows the strength of the method when applied to the fusion of different physiological signals. Third, the paper introduces PDL, i.e., the use of deep ANN architectures trained on ranked (pairwise preference) annotations of affect. Finally, the key findings of the paper show the potential of DL as a mechanism for eliminating manual feature extraction and even, in some occasions, bypassing automatic feature selection for affective modeling. II. Computational Modeling of Affect Emotions and affect are mental and bodily processes that can be inferred by a human observer from a combination of contextual, behavioral and physiological cues. Part of the complexity of affect modeling emerges from the challenges of finding objective and measurable signals that carry affective information (e.g., body posture, speech and skin conductance) and designing methodologies to collect and label emotional experiences effectively (e.g., induce specific emotions by exposing participants to a set of images). Although this paper is only concerned with computational aspects of creating physiological detectors of affect, the signals and the affective target values collected shape the modeling task and, thus, influence the efficacy and applicability of dissimilar computational methods. Consequently, this section gives an overview of the field beyond the input modalities and emotion annotation protocols examined in our case study. Furthermore, the studies surveyed are representative of the two principal applications of AI for affect modeling and cover the two key research pillars of this paper: 1) defining feature sets to extract relevant bits of information from objective data signals (i.e., for feature extraction), and 2) creating models that map a feature set into predicted affective states (i.e., for training models of affect). 22 A. Feature Extraction In the context of affect detection, we refer to feature extraction as the process of transforming the raw signals captured by the hardware (e.g., a skin conductance sensor, a microphone, or a camera) into a set of inputs suitable for a computational predictor of affect. The most common features extracted from unidimensional continuous signals—i.e. temporal sequences of real values such as blood volume pulse, accelerometer data, or speech—are simple statistical features, such as average and standard deviation values, calculated on the time or frequency domains of the raw or the normalized signals (see [15], [16] among others). More complex feature extractors inspired by signal processing methods have also been proposed by several authors. For instance, Giakoumis et al. [17] proposed features extracted from physiological signals using Legendre and Krawtchouk polynomials while Yannakakis and Hallam [18] used the approximate entropy [19] and the parameters of linear, quadratic and exponential regression models fitted to a heart rate signal.The focus of this paper is on DL methods that can automatically derive feature extractors from the raw data, as opposed to a fixed set of hand-crafted extractors that represent pre-designed statistical features of the signals. Unidimensional symbolic or discrete signals—i.e., temporal sequences of discrete labels, typically events such as clicking a mouse button or blinking an eye—are usually transformed with ad-hoc statistical feature extractors such as counts, similarly to continuous signals. Distinctively, Martínez and Yannakakis [11] used frequent sequence mining methods [20] to find frequent patterns across different discrete modalities, namely gameplay events and discrete physiological events. The count of each pattern was then used as an input feature to an affect detector. This methodology is only applicable to discrete signals: continuous signals must be discretized, which involves a loss of information. To this end, the key advantage of the DL methodology proposed in this paper is that it can handle both discrete and continuous signals; a lossless transformation can convert a discrete signal into a binary continuous signal, which can potentially be fed into a deep network—DL has been successfully applied to classify binary images, e.g., [21]. Affect recognition based on signals with more than one dimension typically boils down to affect recognition from images or videos of body movements, posture or facial expressions. In most studies, a series of relevant points of the face or body are first detected (e.g., right mouth corner and right elbow) and tracked along frames. Second, the tracked points are aggregated into discrete Action Units [22], gestures [23] (e.g., lip stretch or head nod) or continuous statistical features (e.g., body contraction index), which are then used to predict the affective state of the user [24]. Both above-mentioned feature extraction steps are, by definition, supervised learning problems as the points to be tracked and action units to be identified have been defined a priori. While these problems have been investigated extensively under the name of facial expression or gesture recognition, we will not survey them broadly as this paper focuses on methods for automatically discovering new or unknown features in an unsupervised manner. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Deep neural network architectures such as convolutional neural networks (CNNs), as a popular technique for object recognition in images [25], have also been applied for facial expression recognition. In [26], CNNs were used to detect predefined features such as eyes and mouth which later were used to detect smiles. Contrary to our work, in that study each of the layers of the CNN was trained independently using backpropagation, i.e., labeled data was available for training each level. More recently, Rifai et al. [27] successfully applied a variant of auto-encoders [21] and convolutional networks, namely Contractive Convolutional Neural Networks, to learn features from images of faces and predict the displayed emotion, breaking the previous state-of-the-art on the Toronto Face Database [28]. The key differences of this paper with that study reside in the nature of the dataset and the method used. While Rifai et al. [27] used a large dataset (over 100,000 samples; 4,178 of them were labeled with an emotion class) of static images displaying posed emotions, we use a small dataset (224 samples, labeled with pairwise orders) with a set of physiological time-series recorded along an emotional experience. The reduced size of our dataset (which is of the same magnitude as datasets used in related psychophysiological studies—e.g., [29], [30]) does not allow the extraction of large feature sets (e.g., 9,000 features in [27]), which would lead to affect models of poor generalizability. The nature of our preference labels also calls for a modified CNN training algorithm for affective preference learning which is introduced in this paper. Furthermore, while the use of CNNs to process images is extensive, to the best of the authors knowledge, CNNs have not been applied before to process (or as a means to fuse) physiological signals. As in many other machine learning applications, in affect detection it is common to apply dimensionality reduction techniques to the complete set of features extracted. A wide variety of feature selection (FS) methods have been used in the literature including sequential forward [31], sequential floating forward [10], sequential backwards [32], n-best individuals [33], perceptron [33] and genetic [34] feature selection. Fisher projection and Principal Component Analysis (PCA) have been also widely used as dimensionality reducers on different modalities of AC signals (e.g., see [10] among others). An autoencoder can be viewed as a non-linear generalization of PCA [8]; however, while PCA has been applied in AC to transpose sets of manually extracted features into low dimensional spaces, in this paper auto-encoders are used to train unsupervised CNNs to transpose subsets of the raw input signals into a learned set of features. We expect that information relevant for prediction can be extracted more effectively using dimensionality reduction methods directly on the raw physiological signals than on a set of designer-selected extracted features. B. Training Models of Affect The selection of a method to create a model that maps a given set of features to predictions of affective variables is strongly influenced by the dynamic aspect of the features (stationary or sequential) and the format in which training examples are given (continuous values, class labels or ordinal labels). A vast set of off-the-shelf machine learning (ML) methods have been applied to create models of affect based on stationary features, irrespective of the specific emotions and modalities involved. These include Linear Discriminant Analysis [35], Multi-layer Perceptrons [32], K-Nearest Neighbors [36], Support Vector Machines [37], Decision Trees [38], Bayesian Networks [39], Gaussian Processes [29] and Fuzzy-rules [40]. On the other hand, Hidden Markov Models [41], Dynamic Bayesian Networks [42] and Recurrent Neural Networks [43] have been applied for constructing affect detectors that rely on features which change dynamically. In the approach presented here, deep neural network architectures reduce hierarchically the resolution of temporal signals down to a set of features that can be fed to simple stateless models eliminating the need for complex sequential predictors. In all the above-mentioned studies, the prediction targets are either class labels or continuous values. Class labels are assigned either using an induction protocol (e.g., participants are asked to self-elicit an emotion [36], presented with stories to evoke a specific emotion [44]) or via ratingor rank-based questionnaires given to users experiencing the emotion (self-reports) or experts (third-person reports). If ratings are used, they can be binned into discrete or binary classes (e.g., on a scale from 1 to 5 measuring stress, values above or below 3 correspond to the user at stress or not at all, respectively [45]) or used as target values for supervised learning (e.g., two experts rate the amount of sadness of a facial expression and the average value is used as the sadness intensity [46]). Alternatively, if ranks are used, the problem of affective modeling becomes one of preference learning. In this paper we use object ranking methods—a subset of preference learning algorithms [47], [48]—which train computational models using partial orders among the training samples. These methods allow us to avoid binning together ordinal labels and to work with comparative questionnaires, which provide more reliable self-report data compared to ratings, as they generate less inconsistency and order effects [12]. Object ranking methods and comparative (rank) questionnaires have been scarcely explored in the AC literature, despite their well-known advantages. For example, Tognetti et al. [49] applied Linear Discriminant Analysis to learn models of preferences over game experiences based on physiological statistical features and comparative pairwise selfreports (i.e., participants played pairs of games and ranked games according to preference). On the same basis, Yannakakis et al. [50], [51] and Martínez et al. [34], [33] trained single and multiple layer perceptrons via genetic algorithms (i.e., neuroevolutionary preference learning) to learn models for several affective and cognitive states (e.g., fun, challenge and frustration) using physiological and behavioral data, and pairwise self-reports. In this paper we introduce a deep learning methodology for data given in a ranked MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 23 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Input Signal Convolutional Layer 1 g g g g Feature Maps 1 Pooling Layer 1 Feature Maps 1 (Subsampled) Feature Extraction (Convolutional Neural Network) Convolutional Layer 2 g g g g Feature Maps 2 Pooling Layer 2 Feature Maps 2 (Subsampled) (a) Extracted Features x0 x1 x2 x3 x4 x5 x6 x7 x8 Model of Affect (Single-Layer Perceptron) (b) FIGURE 1 Example of structure of a deep ANN architecture. The architecture contains: (a) a convolutional neural network (CNN) with two convolutional and two pooling layers, and (b) a single-layer perceptron (SLP) predictor. In the illustrated example the first convolutional layer (3 neurons and path length of 20 samples) processes a skin conductance signal which is propagated forward through an average-pooling layer (window length of 3 samples). A second convolutional layer (3 neurons and patch length of 11 samples) processes the subsampled feature maps and the resulting feature maps feed the second average-pooling layer (window length of 6 samples). The final subsampled feature maps form the output of the CNN which provides a number of extracted (learned) features which feed the input of the SLP predictor. format (i.e., Preference Deep Learning) for the purpose of modeling affect. III. Deep Artificial Neural Networks We investigate an effective method of learning models that map signals of user behavior to predictions of affective states. To bypass the manual ad-hoc feature extraction stage, we use 24 a deep model composed from (a) a multi-layer convolutional neural network (CNN) that transforms the raw signals into a reduced set of features that feed (b) a single-layer perceptron (SLP) which predicts affective states (see Fig. 1). Our hypothesis is that the automation of feature extraction via deep learning will yield physiological affect detectors of higher predictive power, which, in turn, will deliver affective models IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® of higher accuracy. The advantages of deep learning techniques mentioned in the introduction of the paper have led to very promising results in computer vision as they have outperformed other state-of-the-art methods [52], [53]. Furthermore, convolutional networks have been successfully applied to dissimilar temporal datasets (e.g., [54], [25]) including electroencephalogram (EEG) signals [55] for seizure prediction. To train the convolutional neural network (see Section III-A) we use denoising auto-encoders [56], an unsupervised learning method to train filters or feature extractors which transform the information of the input signal (see Section IIIB) in order to capture a distributed representation of its leading factors of variation, but without the linearity assumption of PCA. The SLP is then trained using backpropagation [57] to map the outputs of the CNN to the given affective target values. In the case study examined in this paper, target values are given as pairwise comparisons (partial orders of length 2) making error functions commonly used with gradient descent methods, such as the difference of squared errors or cross-entropy, unsuitable for the task. For that purpose, we use the rank margin error function for preference data [58], [59] as detailed in Section III-C below. Additionally, we apply an automatic feature selection method to reduce the dimensionality of the feature space improving the prediction accuracy of the models trained (see Section III-D). A. Convolutional Neural Networks Convolutional or time-delay neural networks [25] are hierarchical models that alternate convolutional and pooling layers (see Fig. 1) in order to process large input spaces in which a spatial or temporal relation among the inputs exists (e.g., images, speech or physiological signals). Convolutional layers contain a set of neurons that detect different patterns on a patch of the input (e.g., a time window in a time-series or part of an image). The inputs of each neuron (namely receptive field) determine the size of the patch. Each neuron contains a number of trainable weights equal to the number of its inputs and an additional bias parameter (also trainable); the output is calculated by applying an activation function (e.g., logistic sigmoid) to the weighted sum of the inputs plus the bias (see Fig. 2). Each neuron scans sequentially the input, assessing at each patch location the similarity to the pattern encoded on the weights. The consecutive outputs generated at every location of the input assemble a feature map (see Fig. 1). The output of the convolutional layer is the set of feature maps resulting from convolving each of the neurons across the input. Note that the convolution of each neuron produces the same number of outputs as the number of samples in the input signal (e.g., the sequence length) minus the size of the patch (i.e., the size of the receptive field of the neuron), plus 1 (see Fig. 1). As soon as feature maps have been generated, a pooling layer aggregates consecutive values of the feature maps resulting from the previous convolutional layer, reducing their resolution with 14 x = [x0 x1 g x19] Input 13 0 0 5 10 15 12 t4 t23 [x0 g 1 w 00 w 0 g 0 w 20 w 9 x9 w 19 g w 29 g x19] w 019 w 119 w 219 Neurons s(x$w0 + i0) s(x$w1 + i1) s(x$w2 + i2) y0 y1 y2 y0 Output y2 y1 t4 FIGURE 2 Convolutional layer. The neurons in a convolutional layer take as input a patch on the input signal x. Each of the neurons calculates a weighted sum of the inputs (x . w), adds a bias parameter i and applies an activation function s(x). The output of each neuron contributes to a different feature map. In order to find patterns that are insensitive to the baseline level of the input signal, x is normalized with mean equal to 0. In this example, the convolutional layer contains 3 neurons with 20 inputs each. t2 t3 t4 t t t y0 = [y0 y0 y0 ] Input 2 3 4 y1 = [y1 y1 y1 ] t t2 Pooling Window t t [y22 y23 y24] = y2 t4 y0 y1 y2 Avg Avg Avg y0 y1 y2 y0 Output y 2 y1 t2–4 FIGURE 3 Pooling layer. The input feature maps are subsampled independently using a pooling function over non-overlapping windows, resulting in the same number of feature maps with a lower temporal resolution. In this example, an average-pooling layer with a window length of 3 subsamples 3 feature maps. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 25 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Input Encoder Output Decoder Reconstructed Input FIGURE 4 Structure of an auto-encoder. The encoder generates the learned representation (extracted features) from the input signals. During training the output representation is fed to a decoder that attempts to reconstruct the input. a pooling function (see Fig. 3). The maximum or average values are the two most commonly used pooling functions providing max-pooling and average-pooling layers, respectively. This aggregation is typically done inside each feature map, so that the output of a pooling layer presents the same number of feature maps as its input but at a lower resolution (see Fig. 1). B. Auto-Encoders An auto-encoder (AE) [60], [8], [21] is a model that transforms an input space into a new distributed representation (extracted features) by applying a deterministic parametrized function (e.g., single layer of logistic neurons) called the encoder (see Fig. 4). The AE also learns how to map back the output of the encoder into the input space, with a parametrized decoder, so as to have small reconstruction error on the training examples, i.e., the original and corresponding decoded inputs are similar. However, constraints on the architecture or the form of the training criterion prevent the auto-encoder from simply learning the identity function everywhere. Instead, it will learn to have small reconstruction error on the training examples (and where it generalizes) and high reconstruction error elsewhere. Regularized auto-encoders are linked to density estimation in several ways [56], [61]; see [62] for a recent review of regularized auto-encoders. In this paper, the encoder weights (used to obtain the output representation) are also used to reconstruct the inputs (tied weights). By defining the reconstruction error as the sum of squared differences between the inputs and the reconstructed inputs, we can use a gradient descent method 26 such as backpropagation to train the weights of the model. A denoising auto-encoder (DA) [56] is a variant of the basic model that during training adds a variable amount of noise to the inputs before computing the outputs. The resulting training objective is to reconstruct the original uncorrupted inputs, i.e., one minimizes the discrepancy between the outputs of the decoder and the original uncorrupted inputs. Auto-encoders are among several unsupervised learning techniques that have provided remarkable improvements to gradient-descent supervised learning [4], especially when the number of labeled examples is small or in transfer settings [62]. ANNs that are pretrained using these techniques usually converge to more robust and accurate solutions than ANNs with randomly sampled initial weights. In this paper, we use a DA method known as Stacked Convolutional Auto-encoders [63] to train all convolutional layers of our CNNs from bottom to top. We trained the filters of each convolutional layer patchwise, i.e., by considering the input at each position (one patch) in the sequence as one example. This allows faster training than training convolutionally, but may yield translated versions of the same filter. C. Preference Deep Learning The outputs of a trained CNN define a number of learned features extracted from the input signal. These, in turn, may feed any function approximator or classifier that attempts to find a mapping between the input signal and a target output (i.e., affective state in our case). In this paper, we train a single layer perceptron to learn to predict the affective state of a user based on the learned features of her physiology (see Fig. 1). To this aim, we use backpropagation [57], which optimizes an error function iteratively across a number of epochs by adjusting the weights of the SLP proportionally to the gradient of the error with respect to the current value of the weights and current data samples. We use the Rank Margin error function [64] that given two data samples {xP ,xN} such that XP is preferred over (or should be greater than) xN is calculated as follows: E ^x P, x Nh = max " 0, 1 - ^ f ^x Ph - f ^x Nhh,, (1) where f (xP) and f (xN) represent the outputs of the SLP for the preferred and non-preferred sample, respectively. This function decreases linearly as the difference between the predicted value for preferred and non-preferred samples increases. The function becomes zero if this difference is greater than 1, i.e., there is enough margin to separate the preferred “positive example” score f (xP) from the nonpreferred “negative example” score f (xN). By minimizing this function, the neural network is driven towards learning outputs separated at least by one unit of distance between the preferred and non preferred data sample. In each training epoch, for every pairwise preference in the training dataset, the output of the neural network is computed for the two data samples in the preference (preferred and non preferred) IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® and the rank-margin error is backpropagated through the network in order to obtain the gradient required to update the weights. Note that while all layers of the deep architecture could be trained (including supervised fine-tuning of the CNNs), due to the small number of labeled examples available here, the Preference Deep Learning algorithm is constrained to the last layer (i.e., SLP) of the network in order to avoid over fitting. D. Automatic Feature Selection Automatic feature selection (FS) is an essential process towards picking those features (deep learned or ad-hoc extracted) that are appropriate for predicting the examined affective states. In this paper, we use Sequential Forward Feature Selection (SFS) for its low computational effort and demonstrated good performance compared to more advanced, nevertheless time consuming, feature subset selection algorithms such as the genetic-based FS [34]. While a number of other FS algorithms are available for comparison, in this paper we focus on the comparative benefits of learned physiological detectors over ad-hoc designed features. The impact of FS on model performance is further discussed in Section VI. In brief, SFS is a bottom-up search procedure where one feature is added at a time to the current feature set (see e.g., [48]). The feature to be added is selected from the subset of the remaining features such that the new feature set generates the maximum value of the performance function over all candidate features for addition. Since we are interested in the minimal feature subset that yields the highest performance, we terminate selection procedure when an added feature yields equal or lower validation performance to the performance obtained without it. The performance of a feature set selected by automatic FS is measured through the average classification accuracy of the model in three independent runs using 3-fold cross-validation. In the experiments presented in this paper, the SFS algorithm selects the input feature set for the SLP model. IV. The Maze-Ball Dataset The dataset used to evaluate the proposed methodology was gathered during an experimental game survey where 36 participants played four pairs of different variants of the same video-game.The test-bed game named Maze-Ball is a 3D prey/predator game that features a ball inside a maze controlled by the arrow keys.The goal of the player is to maximize her score in 90 seconds by collecting a number of pellets scattered in the maze while avoiding enemies that wander around. Eight different game variants were presented to the players.The games were different with respect to the virtual camera profile used, which determined how the virtual world was presented on screen. We expected that different camera profiles would induce different experiences and affective states, which would, in turn, reflect on the physiological state of the players, making it possible to predict the players’ affective self-reported preferences using information extracted from their physiology. Blood volume pulse (BVP) and skin conductance (SC) were recorded at 31.25 Hz during each game session. The players filled in a 4-alternative forced choice questionnaire after completing a pair of game variants reporting whether the first or the second game of the pair (i.e., pairwise preference) felt more anxious, exciting, frustrating, fun and relaxing, with options that include equally or none at all [33]. While three additional labels were collected in the original experiment (boredom, challenge and frustration), we focus only on affective states or states that are implicitly linked to affective experiences, such as fun (thereby, removing the cognitive state of challenge), and report only results for states in which prediction accuracies of over 70% were achieved in at least one of the input feature sets examined (thereby, removing frustration). Finally, boredom was removed due to the small number of clear preferences available (i.e., most participants reported not feeling bored during any of the games). The details of the Maze-Ball game design and the experimental protocol followed can be found in [33], [34]. A. Ad-Hoc Extraction of Statistical Features This section lists the statistical features extracted from the two physiological signals monitored. Some features are extracted for both signals while some are signal-dependent as seen in the list below. The choice of those specific statistical features is made in order to cover a fair amount of possible BVP and SC signal dynamics (tonic and phasic) proposed in the majority of previous studies in the field of psychophysiology (e.g., see [15], [65], [51] among many). ❏ Both signals (a ! {BV P, SC}): Average E {a}, standard deviation v {a}, maximum max {a}, minimum min {a}, the difference between maximum and minimum signal recording D a = max {a} - min{a}, time when maximum a occurred t max {a}, time when minimum a occurred t min {a} and the difference D at = t max {a} - t min {a}; autocorrelation (lag equals 1) of the signal t a1 and mean of the absolute values of the first and second differences of the signal [15] d a1 and d a2 respectively). ❏ BVP: Average inter-beat amplitude E {IBAmp}; given the inter-beat time intervals (RR intervals) of the signal, the following Heart Rate Variability (HRV) parameters were computed: the standard deviation of RR intervals v {RR}, the fraction of RR intervals that differ by more than 50 msec from the previous RR interval pRR50 and the root-mean-square of successive differences of RR intervals RMS RR [65]. ❏ SC: Initial, SCin, and last, SClast, SC recording, the difference between initial and final SC recording D SC l - i = SC last - SC in and Pearson’s correlation coefficient RSC between raw SC recordings and the time t at which data were recorded. V. Experiments To test the efficacy of DL on constructing accurate models of affect we pretrained several convolutional neural networks— using denoising auto-encoders—to extract features for each of the physiological signals and across all reported affective states in the dataset. The topologies of the networks were selected after preliminary experiments with 1- and 2-layer CNNs and MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 27 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q Weight Value THE WORLD’S NEWSSTAND® N1/5 N2/5 15 35 15 35 N3/5 N4/5 N5/5 15 35 15 35 N4/5 N5/5 0.5 0.0 -0.5 15 35 Time (s) (a) Weight Value N1/5 N2/5 N3/5 0.5 A. Skin Conductance 0.0 -0.5 15 30 15 30 15 30 Time (s) 15 30 15 30 (b) FIGURE 5 Learned features of the best-performing convolutional neural networks. Lines are plotted connecting the values of consecutive connection weights for each neuron Nx. The x axis displays the time stamp (in seconds) of the samples connected to each weight BV P within the input patch. (a) CNN SC 80 (skin conductance). (b) CNN 1 # 45 (blood volume pulse). trained using the complete unlabeled dataset. In all experiments reported in this paper the final number of features pooled from the CNNs is 15, to match the number of ad-hoc extracted statistical features (see Section IV-A). Although a larger number of pooled features could potentially yield higher prediction accuracies, we restricted the size to 15 to ensure a fair comparison against the accuracies yielded by the ad-hoc extracted features. The input signals are not normalized using global, baseline or subject-dependent constants; instead, the first convolutional layer of every CNN subtracts the mean value within each patch presented, resulting in patches with a zero mean value inside the patch, making learned features that are only sensitive to variation within the desired time window (patch) and insensitive to the baseline level (see Fig. 2). As for statistical features, we apply z-transformation to the complete dataset: the mean and the standard deviation value of each feature in the dataset are 0 and 1, respectively. Independently of model input, the use of preference learning models—which are trained and evaluated using within-participant differences—automatically minimizes the effects of between-participants physiological differences (as noted in [33], [12] among other studies). We present a comparison between the prediction accuracy of several SLPs trained either on the learned features of the CNNs or on the ad-hoc designed statistical features. The affective models are trained with and without automatic feature selection and compared. This section presents the key findings derived from the SC (Section V-A) and the BVP (Section V-B) signals and concludes with the analysis of the fusion of the two physiological signals (Section V-C). All the 28 experiments presented here run for 10 times and the average (and standard error) of the resulting models’ prediction accuracies are reported. The prediction accuracy of the models is calculated as the average 3-fold cross-validation (CV) accuracy (average percentage of correctly classified pairs on each fold). While more folds in cross-validation (e.g., 10) or other validation methods such as leave-one-out cross-validation are possible, we considered the 3-fold CV as appropriate for testing the generalizability of the trained ANNs given the relatively small size of (and the high across-subject variation existent in) this dataset. The focus of the paper is on the effectiveness of DL for affective modeling. While the topology of the CNNs can be critical for the performance of the model, the exhaustive empirical validation of all possible CNN topologies and parameter sets is out of the scope of this paper. For this purpose—and also due to space considerations—we have systematically tested critical parameters of CNNs (e.g., the patch length, the number of layers, and the number of neurons), we have fixed a number of CNN parameters (e.g., pooling window length) based on suggestions from the literature and we discuss results from representative CNN architectures. In particular, for the skin conductance signal we present results on two pretrained CNNs. The first, labeled CNN SC 20 # 11, contains two convolutional layers with 5 logistic neurons per patch location at each layer, as well as average-pooling over non-overlapping windows of size 3. Each of the neurons in the first and second convolutional layer has 20 and 11 inputs, respectively. The second network (labeled as CNN SC 80 ), contains one convolutional layer with 5 logistic neurons of 80 inputs each, at each patch location. Both CNNs examined here are selected based on a number of criteria. The number of inputs of the first convolutional layer of the two CNNs considered were selected to extract features at different time resolutions (20 and 80 inputs corresponding to 12.8 and 51.2 seconds, respectively) and, thereby, giving an indication of the impact the time resolution might have on performance. Extensive experiments with smaller and larger time windows did not seem to affect the model’s prediction accuracy. The small window on the intermediate pooling layer was chosen to minimize the amount of information lost from the feature maps while the number of inputs to the neurons in the next layer was adjusted to cover about a third of the pooled feature maps. Finally, we selected 5 neurons in the first convolutional layer as a good compromise between expressivity and dissimilarity among the features learned: a low number of neurons derived features with low expressivity while a large number of neurons generally resulted in features being very similar. Both topologies are built on top of an average-pooling layer with a window length of 20 samples and are topped up with an average-pooling layer that pools 3 outputs per neuron. Although SC is usually sampled at high frequencies (e.g., 256 Hz), we believe that the most affect-relevant information contained in IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q 1) Deep Learned Features Figure 5(a) depicts the values of the 80 connection weights of the five neurons in the convolutional layer of the CNN SC 80 which cover 51.2 seconds of the SC signal (0.64 seconds per weight) on each evaluation. The first neuron (N1) outputs a maximal value for areas of the SC signal in which a long decay is followed by 10 seconds of an incremental trend and a final decay. The second neuron (N2) shows a similar pattern but the increment is detected earlier in the time window and the follow-up decay is longer. A high output of these neurons would suggest that a change in the experience elicited a heightened level of arousal that decayed naturally seconds after. The forth neuron (N4) in contrast, detects a second incremental trend in the signal that elevates the SC level even further. The fifth neuron (N5) also detects two increments but several seconds further apart. Finally, the third neuron (N3) detects three consecutive SC increments. These last three neurons could detect changes on the level of arousal caused by consecutive stimuli presented few seconds apart. Overall, this convolutional layer captures long and slow changes (10 seconds or more) of skin conductance. These local patterns cannot be modeled with the same precision using standard statistical features related to variation (such as standard deviation and average first/second absolute differences), which further suggests that dissimilar aspects of the signal are extracted by learned and ad-hoc features. 2) DL vs. Ad-Hoc Feature Extraction Figure 6(a) depicts the average prediction accuracies (3-fold CV) of SLPs trained on the outputs of the CNNs compared to the corresponding accuracies obtained by SLPs trained on the ad-hoc extracted statistical features. Both CNN topologies yield predictors of relaxation with accuracies over 60% SC (66.07% and 65.38% for CNN SC 20 # 11, and CNN 80 , respectively), which are significantly higher than the models built on statistical features. Given the performance differences among these networks, it appears that learned local features could detect aspects of SC that were more relevant to the prediction of this particular affective state than the set of adhoc statistical features proposed. Models trained on automatically selected features further validate this result [see Fig. 6(b)] showing differences with respect to statistical features above 5%. Furthermore, the relaxation models trained on selected 90 80 SC CNN80 SC CNN20#11 Statistical 70 60 50 40 Relaxation Excitement Anxiety Fun (a) Average Prediction Accuracy the signal can be found at a lower time resolutions as even rapid arousal changes (i.e., a phasic change of SC) can be captured with a lower resolution and at a lower computational cost [66], [33]. For that purpose, the selection of this initial pooling stage aims to facilitate feature learning at a resolution of 1.56 Hz. Moreover, experiments with dissimilar pooling layers showed that features extracted on higher SC resolutions do not necessarily yield models of higher accuracy. The selection of 5 neurons for the last convolutional layer and the following pooling layer was made to achieve the exact number of ad-hoc statistical features of SC (i.e. ,15). Average Prediction Accuracy THE WORLD’S NEWSSTAND® 90 80 SC CNN80 SC CNN20#11 Statistical 70 60 50 40 Relaxation Anxiety Excitement Fun (b) FIGURE 6 Skin conductance: average accuracy of SLPs trained on statistical features (statistical), and features pooled from each of the SC CNN topologies (CNN SC 20 # 11 and CNN 80 ). The black bar displayed on each average value represents the standard error (10 runs). (a) All features. (b) Features selected via SFS. ad-hoc features, despite the benefits of FS, yield accuracies lower than the models trained on the complete sets of learned features. This suggests that CNNs can extract general information from SC that is more relevant for affect modeling than statistical features selected specifically for the task. An alternative interpretation is that the feature space created by CNNs allows backpropagation to find more general solutions than the greedy-reduced (via SFS) space of ad-hoc features. For all other emotions considered, neither the CNNs nor the ad-hoc statistical features lead to models that can significantly improve chance prediction (see [67] for random baselines on this dataset). When feature selection is used [see Fig. 6(b)], CNN-based models outperform statistical-based models on the prediction of every affective state with accuracies above 60% with at least one topology. Despite the difficulty of predicting complex affective states based solely on SC, these results suggest that unsupervised CNNs trained as a stack of denoising auto-encoders form a promising method to automatically extract features from this modality, as higher prediction accuracies were achieved when compared against a well-defined set of ad-hoc statistical features. Results also show that there are particular affective states (relaxation and anxiety, to a lesser degree), in which DL is able to automatically extract features that are beneficial for their prediction. On the other hand, it appears that DL has a lesser effect in predicting some affective states (fun and excitement) based on the SC signal compared to models build on the ad-hoc designed features. Prediction accuracies in those affective states for both type MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 29 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q Average Prediction Accuracy THE WORLD’S NEWSSTAND® 90 80 BVP CNN1#45 BVP CNN30#45 Statistical 70 60 50 40 Relaxation Excitement Anxiety Fun Average Prediction Accuracy (a) 90 80 BVP CNN1#45 BVP CNN30#45 Statistical 70 60 50 40 Relaxation Anxiety Excitement Fun (b) FIGURE 7 Blood volume pulse: average accuracy of SLPs trained on statistical features (statistical), and features pooled from each of the P BV P CNN topologies (CNN BV 1 # 45 and CNN 30 # 45 ). The black bar displayed on each average value represents the standard error (10 runs). (a) All features. (b) Features selected via SFS. of features (ad-hoc or CNN-extracted) are rather low, suggesting that SC is not an appropriate signal for their modeling in this dataset. It is worth mentioning that earlier studies on this dataset [67] report higher accuracies on the ad-hoc statistical features than those reported here. In that study, however, two different signal components were extracted from the SC signal, leading to three times the number of features examined in this paper (i.e., 45 features). Given the results obtained in this paper, it is anticipated that by using more learned features—for example, combining CNNs with different input lengths that would capture information from different time resolutions—DL can reach and surpass those baseline accuracies. B. Blood Volume Pulse Following the same systematic approach for selecting CNN topology and parameter sets, we present two convolutional networks for the experiments on the Blood Volume Pulse (BVP) signal. The CNN architectures used in the experiments feature the following: 1) one max-pooling layer with nonoverlapping windows of length 30 followed by a convolutional layer with 5 logistic neurons per patch location and 45 inputs at each neuron P (CNN BV 1 # 45 ); and 2) two convolutional layers with 10 and 5 logistic neurons per patch location, respectively, and an intermediate max-pooling layer with a window of length 30. The neurons of each layer contain 30 and 45 inputs, respectively P (CNN BV 1 # 45 ). As in the CNNs used in the SC experiments, both topologies are topped up with an average-pooling layer that reduces the length of the outputs from each of the 5 output neu- 30 rons down to 3—i.e., the CNNs output 5 feature maps of length 3 which amounts to 15 features. The initial pooling layer of the first network collects the maximum value of the BVP signal every 0.96 seconds, which results in an approximation of the signal’s upper envelope—that is a smooth line joining the extremes of the signal’s peaks. Decrements in this function are directly linked with increments in heart rate (HR), and further connected with increased arousal and corresponding affective states (e.g., excitement and fun [33], [18]). Neurons with 45 inputs were selected to capture long patterns (i.e., 43.2 seconds) of variation, as sudden and rapid changes in heart rate were not expected during the experiment game survey. The second network follows the same rationale but the first pooling layer—instead of collecting the maximum of the raw BVP signal—processes the outputs of 10 neurons that analyze signal patches of 0.96 seconds, which could operate as a beat detector mechanism. 1) Deep Learned Features Figure 5(b) depicts the 45 connection weights of each neuP ron in CNN BV 1 # 45 which cover 43.2 seconds of the BVP signal’s upper envelope. Given the negative correlation between the trend of the BVP’s upper envelope and heart rate, neurons produce output of maximal values when consecutive decreasing weight values are aligned with a time window containing an HR increment and consecutive increasing weight values with HR decays. On that basis, the second (N2) and fifth (N5) neurons detect two 10-second-long periods of HR increments, which are separated by an HR decay period. The first (N1) and the forth (N4) neuron detect two overlapping increments on HR, followed by a decay in N4. The third neuron (N3), on the other hand, detects a negative trend on HR with a small peak in the middle. This convolutional layer appears to capture dissimilar local complex patterns of BVP variation which are, arguably, not available through common ad-hoc statistical features. 2) DL vs. Ad-Hoc Feature Extraction Predictors of excitement and fun trained on features extracted P with CNN BV 1 # 45 outperformed the ad-hoc feature sets—both the complete [see Fig. 7(a)] and the automatically selected feature sets [see Fig. 7(b)]. It is worth noting that no other model improved baseline accuracy using all features [see Fig. 7(a)]. In particular, excitement and fun models based on statistical features achieved performances of 61.1% and 64.3%, respectively, which are significantly lower than the corresponding accuracies P of CNN BV 1 # 45 [68.0% and 69.7 %, respectively—see Fig. 7(b)] P and not significantly different from the accuracies of CNN BV 1 # 45 with the complete set of features [57.3% and 63.0%, respectively—see Fig. 7(a)]. Given the reported links between fun P and heart rate [18], this result suggests that CNN BV 1 # 45 effectively extracted HR information from the BVP signal to predict reported fun. The efficacy of CNNs is further supported by the results reported in [67] where SLP predictors of fun trained on statistical features of the HR signal (in the same dataset examined here) do not outperform the DL models IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q C. Fusion of SC and BVP To test the effectiveness of learned features in fused models, we combined the outputs of the BVP and SC CNN networks presented earlier into one SLP and compared its performance against a combination of all ad-hoc BVP and SC features. For space considerations we only present the combination of the best performing CNNs trained on each signal BV P individually—i.e., CNN SC 80 and CNN 1 # 45 . The fusion of CNNs from both signals generates models that yield higher prediction accuracies than models built on ad-hoc features across all affective states, using both all features and subsets of selected features (see Fig. 8). This result further validates the effectiveness of CNNs for modeling affect from physiological signals, as models trained on automatically selected learned features from the two signals yield prediction accuracies around 70-75%. In all cases but one (i.e., anxiety prediction with SFS) these performances are significantly higher than the performances of corresponding models built on commonly used ad-hoc statistical features. VI. Discussion Even though the results obtained are more than encouraging with respect to the applicability and efficacy of DL for affective modeling, there are a number of research directions that should be considered in future research.While the Maze-Ball game dataset includes key components for affective modeling and is representative of a typical affective modeling scenario, our PDL approach needs to be tested on diverse datasets. The reduced size of the dataset limited the number of features that could be learned. Currently, deep architectures are widely used to extract thousands of features from large datasets, which yields models that outperform other state-of-the-art classification or regression methods (e.g., [27]). We expect that the application of DL to model affect in large physiological datasets would show larger improvements with respect to statistical features and provide new insights on the relationship between physiology and affect. Moreover, to be able to demonstrate robustness of the algorithm, more and dissimilar modalities of user input need to be considered, and different domains (beyond games) need to be explored. To that 90 SC+BVP CNN80+1#45 80 Statistical 70 60 50 40 Relaxation Excitement Anxiety Fun (a) Average Prediction Accuracy presented in this paper. For reported fun and excitement, CNN-based feature extraction demonstrates a great advantage of extracting affect-relevant information from BVP bypassing beat detection and heart rate estimation. Models built on selected features for relaxation and anxiety yielded low accuracies around 60%, showing small differences between learned and ad-hoc features, which suggests that BVP-based emotional manifestations are not the most appropriate predictors for those two states in this dataset. Despite the challenges that the periodicity of blood volume pulse generates in affective modeling, CNNs managed to extract powerful features to predict two affective states, outperforming the statistical features proposed in the literature and matching more complex data processing methods used in similar studies [67]. Average Prediction Accuracy THE WORLD’S NEWSSTAND® 90 SC+BVP CNN80+1#45 80 Statistical 70 60 50 40 Relaxation Anxiety Excitement (b) Fun FIGURE 8 Fusion of SC and BVP signals: average accuracy of SLPs trained on blood volume pulse and skin conductance using statistical features on the raw signal (statistical) and features pooled from SC + BV P BV P CNN SC 80 and CNN 1 # 45 CNN 80 + 1 # 45 . The black bar displayed on each average value represents the standard error (10 runs). (a) All features. (b) Features selected via SFS. end, different approaches to multimodal fusion in conjunction with DL need to be investigated. The accuracies obtained across different affective states and modalities of user input, however, already provide sufficient evidence that the method would generalize well in dissimilar domains and modalities. The paper did not provide a thorough analysis of the impact of feature selection to the efficiency of DL as the focus was put on feature extraction. To that end, more feature selection methods will need to be investigated and compared to SFS. While ad-hoc feature performances might be improved with more advanced FS methods, such as geneticsearch based FS [34], the obtained results already show that DL matches and even beats a rather effective and popular FS mechanism without the use of feature selection in several experiments. Although in this paper we have compared DL to a complete and representative set of ad-hoc features, a wider set of features could be explored in future work. For instance, heart rate variability features derived from the Fourier transformation of BVP (see [33]) could be included in the comparison. However, it is expected that CNNs would be able to extract relevant frequency-based features as their successful application in other domains already demonstrates (e.g., music sample classification [54]). Furthermore, other automatic feature extraction methods, such as principal component analysis, which is common in domains, such as image classification [68], will be explored for psycho-physiological modeling and compared to DL in this domain. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 31 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® blood volume pulse) individually and on their fusion for predicting the reported affective states of relaxation, anxiety, excitement and fun (given as pairwise preferences). The dataset is derived from 36 players of a 3D prey/predator game. The proposed preference deep learning (PDL) approach overcomes standard ad-hoc feature extraction used in the affective computing literature as it manages to yield models of equal or significantly higher prediction accuracy across all affective states examined. The increase in performance is more evident when automatic feature selection is employed. Results, in general, suggest that DL methodologies are highly appropriate for affective modeling and, more importantly, indicate that ad-hoc feature extraction can be redundant for physiology-based modeling. Furthermore, in some affective states examined (e.g., relaxation models built on SC; fun and excitement models built on BVP; relaxation models built on fused SC and BVP), DL without feature selection manages to reach or even outperform the performances of models built on ad-hoc extracted features which are boosted by automatic feature selection. These findings showcased the potential of DL for affective modeling, as both manual feature extraction and automatic feature selection could be ultimately bypassed. With small modifications, the methodology proposed can be applied for affect classification and regression tasks across any type of input signal. Thus, the method is directly applicable for affect detection in one-dimensional time-series input signals such as electroencephalograph (EEG), electromyograph (EMG) and speech, but also in two-dimensional input signals such as images [27] (e.g., for facial expression and head pose analysis). Finally, results suggest that the method is powerful when fusing different type of input signals and, thus, it is expected to perform equally well across multiple modalities. Learned features derived from DL architectures may define data-based extracted patterns, which could lead to the advancement of our understanding of emotion manifestations via physiology. Despite the good results reported in this paper on the skin conductance and blood volume pulse signals, we expect that certain well-designed ad-hoc features can still outperform automatically learned features. Within playing behavioral attributes, for example, the final score of a game—which is highly correlated to reported fun in games [69]—may not be captured by convolutional networks, which tend to find patterns that are invariant with respect to the position in the signal. Such an ad-hoc feature, however, may carry information of high predictive power for particular affective states. We argue that DL is expected to be of limited use in low resolution signals (e.g., player score over time) which could generate well-defined feature spaces for affective modeling. An advantage of ad-hoc extracted statistical features resides in the simplicity to interpret the physical properties of the signal as they are usually based on simple statistical metrics. Therefore, prediction models trained on statistical features can be analyzed with low effort providing insights in affective phenomena. Artificial neural networks have traditionally been considered as black boxes that oppose their high prediction power to a more difficult interpretation of what has been learned by the model. We have shown, however, that appropriate visualization tools can ease the interpretation of neural-network based features. Moreover, learned features derived from DL architectures may define data-based extracted patterns, which could lead to the advancement of our understanding of emotion manifestations via physiology (and beyond). Finally, while DL can automatically provide a more complete and appropriate set of features when compared to adhoc feature extraction, parameter tuning is a necessary phase in (and a limitation of) the training process. This paper introduced a number of CNN topologies that performed well on the SC and BVP signals while empirical results showed that, in general, the performance of the CNN topologies is not affected significantly by parameter tuning. Future work, however, would aim to further test the sensitivity of CNN topologies and parameter sets as well as the generality of the extracted features across physiological datasets, reducing the experimentation effort required for future applications of DL to psychophysiology. VII. Conclusions This paper introduced the application of deep learning (DL) to the construction of reliable models of affect built on physiological manifestations of emotion. The algorithm proposed employs a number of convolutional layers that learn to extract relevant features from the input signals. The algorithm was tested on two physiological signals (skin conductance and 32 Acknowledgment The authors would like to thank Tobias Mahlmann for his work on the development and administration of the cluster used to run the experiments. Special thanks for proofreading goes to Yana Knight. Thanks also go to the Theano development team, to all participants in our experiments, and to Ubisoft, NSERC and Canada Research Chairs for funding. This work is funded, in part, by the ILearnRW (project no: 318803) and the C2Learn (project no. 318480) FP7 ICT EU projects. References [1] R. Picard, Affective Computing. Cambridge, MA: MIT Press, 2000. [2] R. Calvo and S. D’Mello, “Affect detection: An interdisciplinary review of models, methods, and their applications,” IEEE Trans. Affective Comput., vol. 1, no. 1, pp. 18–37, 2010. [3] G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006. [4] Y. Bengio, “Learning deep architectures for AI,” Found. Trends® Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009. [5] I. Arel, D. Rose, and T. Karnowski, “Deep machine learning–A new frontier in artificial intelligence research [Research Frontier],” IEEE Comput. Intell. Mag., vol. 5, no. 4, pp. 13–18, Nov. 2010. [6] M. Dash and H. Liu, “Feature selection for classification,” Intell. data anal., vol. 1, nos. 1-4, pp. 131–156, 1997. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® [7] M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, “Unsupervised learning of invariant feature hierarchies with applications to object recognition,” in Proc. IEEE Conf. Computer Vision Pattern Recognition, 2007, pp. 1–8. [8] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006. [9] Y. Bengio and O. Delalleau, “On the expressive power of deep architectures,” in Algorithmic Learning Theory. Berlin, Germany: Springer-Verlag, 2011, pp. 18–36. [10] E. Vyzas and R. Picard, “Affective pattern classification,” in Proc. AAAI 1998 Fall Symp. Emotional Intelligent: The Tangled Knot Cognition, pp. 176–182, 1998. [11] H. P. Martínez and G. N. Yannakakis, “Mining multimodal sequential patterns: A case study on affect detection,” in Proc. 13th. Int. Conf. Multimodal Interfaces, 2011, pp. 3–10. [12] G. N. Yannakakis and J. Hallam, “Ranking vs. preference: A comparative study of selfreporting,” in Proc. 4th Int. Conf. Affective Computing Intelligent Interaction, 2011, pp. 437–446. [13] J. Morris, “Observations: SAM: The self-assessment manikinan efficient cross-cultural measurement of emotional response,” J. Advertising Res., vol. 35, no. 6, pp. 63–68, 1995. [14] J. Russell, “A circumplex model of affect,” J. Personality Social Psychol., vol. 39, no. 6, p. 1161, 1980. [15] R. Picard, E. Vyzas, and J. Healey, “Toward machine emotional intelligence: Analysis of affective physiological state,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 10, pp. 1175–1191, 2001. [16] D. Ververidis and C. Kotropoulos, “Automatic speech classification to five emotional states based on gender information,” in Proc. Eusipco, Vienna, pp. 341–344, 2004. [17] D. Giakoumis, D. Tzovaras, K. Moustakas, and G. Hassapis, “Automatic recognition of boredom in video games using novel biosignal moment-based features,” IEEE Trans. Affective Comput., vol. 2, no. 3, pp. 119–133, July-Sept. 2011. [18] G. N. Yannakakis and J. Hallam, “Entertainment modeling through physiology in physical play,” Int. J. Human-Comput. Stud., vol. 66, no. 10, pp. 741–755, Oct. 2008. [19] S. Pincus, “Approximate entropy as a measure of system complexity,” in Proc. National Academy Sciences, 1991, vol. 88, no. 6, pp. 2297–2301. [20] N. Lesh, M. Zaki, and M. Ogihara, “Mining features for sequence classification,” in Proc. 5th ACM Int. Conf. Knowledge Discovery Data Mining, 1999, pp. 342–346. [21] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layerwise training of deep networks,” in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press 2007, vol. 19, p. 153. [22] P. Ekman and W. Friesen, “Facial action coding system: A technique for the measurement of facial movement,” in From Appraisal to Emotion: Differences Among Unpleasant Feelings, Motivation and Emotion, P. C. Ellsworth, and C. A. Smith, Eds. Palo Alto, CA: Consulting Psychologists Press, 1988, vol. 12, pp. 271–302. [23] G. Caridakis, S. Asteriadis, K. Karpouzis, and S. Kollias, “Detecting human behavior emotional cues in natural interaction,” in Proc. 17th Int. Conf. Digital Signal Processing, July 2011, pp. 1–6. [24] A. Kleinsmith and N. Bianchi-Berthouze, “Affective body expression perception and recognition: A survey,” IEEE Trans. Affective Comput., vol. PP, no. 99, p. 1, 2012. [25] Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time series,” in The Handbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press 1995, vol. 3361, pp. 255–258. [26] M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, “Subject independent facial expression recognition with robust face detection using a convolutional neural network,” Neural Netw., vol. 16, no. 5, pp. 555–559, 2003. [27] S. Rifai, Y. Bengio, A. Courville, P. Vincent, and M. Mirza, “Disentangling factors of variation for facial expression recognition,” in Proc. European Conf. Computer Vision, 2012, pp. 802–822. [28] J. Susskind, A. Anderson, and G. E. Hinton, “The Toronto face dataset,” U. Toronto, Toronto, ON, Canada, Tech. Rep. UTML TR 2010-001, 2010. [29] A. Kapoor, W. Burleson, and R. Picard, “Automatic prediction of frustration,” Int. J. Human-Comput. Stud., vol. 65, no. 8, pp. 724–736, 2007. [30] S. Tognetti, M. Garbarino, A. Bonarini, and M. Matteucci, “Modeling enjoyment preference from physiological responses in a car racing game,” in Proc. IEEE Conf. Computational Intelligence Games, 2010, pp. 321–328. [31] C. Lee and S. Narayanan, “Toward detecting emotions in spoken dialogs,” IEEE Trans. Speech Audio Processing, vol. 13, no. 2, pp. 293–303, 2005. [32] J. Wagner, J. Kim, and E. André, “From physiological signals to emotions: Implementing and comparing selected methods for feature extraction and classification,” in Proc. IEEE Int. Conf. Multimedia and Expo, 2005, pp. 940–943. [33] G. N. Yannakakis, H. P. Martínez, and A. Jhala, “Towards affective camera control in games,” User Model. User-Adapted Interact., vol. 20, no. 4, pp. 313–340, 2010. [34] H. P. Martínez and G. N. Yannakakis, “Genetic search feature selection for affective modeling: A case study on reported preferences,” in Proc. 3rd Int. Workshop Affective Interaction Natural Environments, 2010, pp. 15–20. [35] D. Giakoumis, A. Drosou, P. Cipresso, D. Tzovaras, G. Hassapis, T. Zalla, A. Gaggioli, and G. Riva, “Using activity-related behavioural features towards more effective automatic stress detection,” PLoS ONE, vol. 7, no. 9, p. e43571, 2012. [36] O. AlZoubi, R. Calvo, and R. Stevens, “Classification of EEG for affect recognition: An adaptive approach,” in AI 2009 Proc. 22nd Australasian Joint Conf. Advances in Artificial Intelligence, pp. 52–61. 2009. [37] M. Soleymani, M. Pantic, and T. Pun, “Multimodal emotion recognition in response to videos,” IEEE Trans. Affective Comput., vol. 3, no. 2, pp. 211–223, 2012. [38] S. Mcquiggan, B. Mott, and J. Lester, “Modeling self-efficacy in intelligent tutoring systems: An inductive approach,” User Model. User-Adapted Interact., vol. 18, no. 1, pp. 81–123, 2008. [39] H. Gunes and M. Piccardi, “Bi-modal emotion recognition from expressive face and body gestures,” J. Netw. Comput. Appl., vol. 30, no. 4, pp. 1334–1345, 2007. [40] R. Mandryk and M. Atkins, “A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies,” Int. J. Human-Comput. Stud., vol. 65, no. 4, pp. 329–347, 2007. [41] J. F. Grafsgaard, K. E. Boyer, and J. C. Lester, “Predicting facial indicators of confusion with hidden Markov models,” in Affective Computing and Intelligent Interaction, (Series Lecture Notes in Computer Science), S. D’Mello, A. Graesser, B. Schuller, and J.-C. Martin, Eds. Berlin, Germany: Springer-Verlag, 2011, vol. 6974, pp. 97–106. [42] R. Kaliouby and P. Robinson, “Real-time inference of complex mental states from facial expressions and head gestures,” in Real-Time Vision Human-Computer Interaction. New York: Springer-Verlag, 2005, pp. 181–200. [43] H. Kobayashi and F. Hara, “Dynamic recognition of basic facial expressions by discrete-time recurrent neural network,” in Proc. Int. Joint Conf. Neural Networks, Oct. 1993, vol. 1, pp. 155–158. [44] K. Kim, S. Bang, and S. Kim, “Emotion recognition system using short-term monitoring of physiological signals,” Med. Biol. Eng. Comput., vol. 42, no. 3, pp. 419–427, 2004. [45] J. Hernandez, R. R. Morris, and R. W. Picard, “Call center stress recognition with person-specific models,” in Affective Computing and Intelligent Interaction, (Series Lecture Notes in Computer Science), S. D’Mello, A. Graesser, B. Schuller, and J.-C. Martin, Eds. Berlin, Germany: Springer-Verlag, 2011, vol. 6974, pp. 125–134. [46] J. Bailenson, E. Pontikakis, I. Mauss, J. Gross, M. Jabon, C. Hutcherson, C. Nass, and O. John, “Real-time classification of evoked emotions using facial feature tracking and physiological responses,” Int. J. Human-Computer Stud., vol. 66, no. 5, pp. 303–317, 2008. [47] J. Fürnkranz and E. Hüllermeier, “Preference learning,” Künstliche Intell., vol. 19, no. 1, pp. 60–61, 2005. [48] G. N. Yannakakis, “Preference learning for affective modeling,” in Proc. Int. Conf. Affective Computing Intelligent Interaction, Amsterdam, The Netherlands, Sept. 2009, pp. 126–131. [49] S. Tognetti, M. Garbarino, A. Bonanno, M. Matteucci, and A. Bonarini,“Enjoyment recognition from physiological data in a car racing game,” in Proc. 3rd Int. Workshop Affective Interaction Natural Environments, 2010, pp. 3–8. [50] G. N. Yannakakis, J. Hallam, and H. H. Lund, “Entertainment capture through heart rate activity in physical interactive playgrounds,” User Model. User-Adapted Interact., vol. 18, no. 1, pp. 207–243, 2008. [51] G. N. Yannakakis and J. Hallam, “Entertainment modeling through physiology in physical play,” Int. J. Human-Comput. Stud., vol. 66, no. 10, pp. 741–755, 2008. [52] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25. Cambridge, MA: MIT Press, 2012. [53] C. Farabet, C. Couprie, L. Najman, Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE Trans. Pattern Anal. Mach. Intell., p. 1–15, 2013. [54] P. Hamel, S. Lemieux, Y. Bengio, and D. Eck, “Temporal pooling and multiscale learning for automatic annotation and ranking of music audio,” in Proc. 12th Int. Conf. Music Information Retrieval, 2011, pp. 729–734. [55] P. Mirowski, Y. LeCun, D. Madhavan, and R. Kuzniecky, “Comparing SVM and convolutional networks for epileptic seizure prediction from intracranial EEG,” in Proc. IEEE Workshop Machine Learning Signal Processing, 2008, pp. 244–249. [56] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proc. Int. Conf. Machine Learning, 2008, pp. 1096–1103. [57] D. Rumelhart, Backpropagation: Theory, Architectures, and Applications. Hillsdale, NJ: Lawrence Erlbaum, 1995. [58] B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and K. Weinberger, “Learning to rank with (a lot of ) word features,” Inform. Retrieval, vol. 13, no. 3, pp. 291–314, 2010. [59] D. Grangier and S. Bengio, “Inferring document similarity from hyperlinks,” in Proc. ACM Int. Conf. Information Knowledge Management, 2005, pp. 359–360. [60] G. E. Hinton and R. S. Zemel, “Autoencoders, minimum description length, and Helmholtz free energy,” in Proc. Neural Information Processing System NIPS’1993, 1994, pp. 3–10. [61] G. Alain, Y. Bengio, and S. Rifai, “Regularized auto-encoders estimate local statistics,” Dept. IRO, Université de Montréal, Montreal, QC, Canada, Tech. Rep. Arxiv Report 1211.4246, 2012. [62] Y. Bengio, A. Courville, and P. Vincent, “Unsupervised feature learning and deep learning: A review and new perspectives,” Université de Montréal, Tech. Rep. Arxiv Report 1206.5538, 2012. [63] J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, “Stacked convolutional autoencoders for hierarchical feature extraction,” in Proc. Int. Conf. Artificial Neural Networks and Machine Learning, pp. 52–59, 2011. [64] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proc. Int. Conf. Machine Learning, 2008, pp. 160–167. [65] J. Goldberger, S. Challapalli, R. Tung, M. Parker, and A. Kadish, “Relationship of heart rate variability to parasympathetic effect,” Circulation, vol. 103, no. 15, p. 1977, 2001. [66] N. Ravaja, T. Saari, M. Salminen, J. Laarni, and K. Kallinen, “Phasic emotional reactions to video game events: A psychophysiological investigation,” Media Psychol., vol. 8, no. 4, pp. 343–367, 2006. [67] H. P. Martínez, M. Garbarino, and G. N. Yannakakis, “Generic physiological features as predictors of player experience,” Affective Comput. Intell. Interact., pp. 267–276, 2011. [68] W. Zhao, R. Chellappa, and A. Krishnaswamy, “Discriminant analysis of principal components for face recognition,” in Proc 3rd IEEE Int. Conf. IEEE Automatic Face Gesture Recognition, 1998, pp. 336–341. [69] H. P. Martínez, K. Hullett, and G. N. Yannakakis, “Extending neuro-evolution preference learning through player modeling,” in Proc. IEEE Conf. Computational Intelligence and Games, Copenhagen, Denmark, Aug. 2010, pp. 313–320. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 33 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Abe Kazemzadeh, Sungbok Lee, and Shrikanth Narayanan University of Southern California, USA I. Introduction W ords and natural language play a central role in how we describe and understand emotions. One can learn about emotions first-hand by observing physiological or behavioral data, but to communicate emotional information to others who are not first-hand observers, one must use natural language descriptions to communicate the emotional information. The field of affective computing deals with creating computer systems that can recognize and understand human emotions. To realize the goals of affective computing, it is necessary not only to recognize and model emotional behavior, but also to understand the language that is used to describe such emotional behavior. For example, a computer system that recognizes a user’s emotion from speech should not only recognize the user’s emotion from expressive speech acoustics, but also understand when the user says “I am beginning to feel X,” where “X” is a variable representing some emotion word or description. The ability to understand descriptions of emotions is important not only for human-computer interaction, but also in deliberative decision making activities where deriving behavioral analytics is based on natural language (for example, in mental health assessments). Such analytics often rely on abstract scales that are defined in terms of natural language. This paper looks at the problem of creating a computational model for the conceptual meaning of words used to name and describe emotions. To do this, we represent the meaning of emotion words as interval type-2 fuzzy sets (IT2 FSs) that constrain an abstract emotion space. We present two models that represent different views of what this emotion space might be like. The first model consists of the Cartesian product of the abstract scales of valence, activation, and dominance. These scales have been postulated to represent the conceptual meaning of emotion words [1]. The second model is based on scales Digital Object Identifier 10.1109/MCI.2013.2247824 Date of publication: 11 April 2013 34 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 1556-603X/13/$31.00©2013IEEE M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Abstract—This paper presents two models that use interval type-2 fuzzy sets (IT2 FSs) for representing the meaning of words that refer to emotions. In the first model, the meaning of an emotion word is represented by IT2 FSs on valence, activation, and dominance scales. In the second model, the meaning of an emotion word is represented by answers to an open-ended set of questions from the game of Emotion Twenty Questions (EMO20Q). The notion of meaning in the two proposed models is made explicit using the Fregean framework of extensional and intensional components of meaning. Inter- and intra-subject uncertainty is captured by using IT2 FSs learned from interval approach surveys. Similarity and subsethood operators are used for comparing the meaning of pairs of words. For the first model, we apply similarity and subsethood operators for the task of translating one emotional vocabulary, represented as a computing with words (CWW) codebook, to another. This act of translation is shown to be an example of CWW that is extended to use the three scales of valence, activation, and dominance to represent a single variable. We experimentally evaluate the use of the first model for translations and mappings between vocabularies. Accuracy is high when using a small emotion vocabulary as an output, but performance decreases when the output vocabulary is larger. The second model was devised to deal with larger emotion vocabularies, but presents interesting technical challenges in that the set of scales underlying two different emotion words may not be the same. We evaluate the second model by comparing it with results from a single-slider survey. We discuss the theoretical insights that the two models allow and the advantages and disadvantages of each. activation, and dominance scales. In the second model, emotion concepts are represented as lists of propositions and associated truth values. In both models, the algebraic properties of fuzzy sets can be used as a computational model for the meaning of an emotion word. We outline the properties of these models and describe the methodology that estimates the fuzzy set shape parameters from data collected in interval approach surveys [2], [3]. In an interval approach survey, subjects rate words on abstract scales, but instead of picking a single value on the scales (as in a Likert scale survey), users select interval ranges on these scales. In the two models we present, the survey results are aggregated into fuzzy sets for words in an emotion vocabulary. The fuzzy set representation allows one to compute logical relations among these emotion words. By using the relations of similarity and subsethood as measures of mappings between items of two vocabularies, one can translate between these vocabularies. This allows us to use our first model for several applications that involve mapping between vocabularies of emotion words: converting emotion labels from one codebook to another, both when the codebooks are in the same language (for example, when using different emotion annotation schemes) and when they are in different languages, such as when translating emotion words from one language to another (here, Spanish and English). These applications show one way our proposed model may be used and provide experimental evidence by which we can evaluate the model. For evaluation of the first model, we compare the translation applications with human performance in these tasks as a benchmark. © CORBIS derived from answers to yes/no questions, where each scale can be seen as the truth value of a proposition. In each model, the meaning of an emotion word is represented as a fuzzy set in an emotion space, but the two models represent different theoretical organizations of emotion concepts. In the first, a spatial metaphor is used to organize emotion concepts on valence, MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 35 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® One of the contrastive traits of this research is that we try to use the dimensional approach and fuzzy logic to model emotion concepts used in natural language descriptions of emotions [11], rather than characterizing data from emotional human behavior [14], [4]–[7]. Focusing on the conceptual meaning of emotion words allows us to consider cases where emotion is communicated through linguistic meaning, as opposed to paralinguistics or body language. The dimensional approach has been used to both describe emotional data and emotion concepts but more often than not this distinction is not made clear. By describing our model of the meaning of emotion words in terminology established by the philosophy of language, we hope to clarify this issue. Furthermore, by rigorously defining the idea of an emotional variable and operations on such variables in terms of fuzzy logic, we can establish general relations such as similarity and subsethood that can be applied even if the underlying representation of valence, activation, and dominance is changed. Another contrast between this work and other research using fuzzy logic to represent emotional dimensions is that we use IT2 FSs [15] and the interval approach [3]. This allows our model to account for both inter- and intrasubject variability. Compared with the earlier developments of [16]–[18], this paper offers a more detailed description of the theoretical framework and analysis of experimental results by incorporating subsethood and applying newer developments to the interval approach [19] (Section III-D). This paper also extends these results by proposing a second model to deal with larger emotion vocabularies (Section IV-C). By constraining our focus to a conceptual level, we focus on input/output relations whose objects are words, rather than observations of stimuli and behavior. As such, this work can be seen as an instance of Computing with Words (CWW) [20], [21], [22]. CWW is a paradigm that considers words as the input and output objects of computation. Perceptual computing [23], [24] is an implementation of the CWW paradigm that we draw upon in this work. The rest of the paper is organized as follows. In Section II, we describe what we mean by the “meaning” of emotion words. This is an important topic on its own, but we give an introduction that we deem sufficient for the purposes of this article. In Section III, we describe the fuzzy logic framework and the proposed computational models for emotion words. In Section IV, we describe the experimental implementation and testing of the models. The results are presented in Section V. We discuss advantages and disadvantages of these models in Section VI conclude in Section VII. Focusing on the conceptual meaning of emotion words allows us to consider cases where emotion is communicated through linguistic meaning, as opposed to paralinguistics or body language. Our results show that performance of the first model decreases when the vocabulary size gets larger, which indicates that a three-scale representation for emotions is ideal only for small vocabularies. To address this limitation, our second model uses inspiration from the game of twenty questions, where players can identify a large set of objects using question-asking. Because people’s beliefs about emotions can be subjective, many of the answers to questions about emotions are vague and can be represented as fuzzy sets. For evaluation of this model, we test the estimated IT2 FS on data from different subjects who took a single-value survey by finding the membership of these points in the estimated IT2 FS. Other research has presented related methodologies–using fuzzy logic for affective computing, emotion lexical resource development, and representing emotions using valence, activation, and dominance dimensions. We will commence by describing some of these works and the novelties that will be introduced by our paper. There are many examples where fuzzy logic has been applied to the task of recognizing and representing observed emotional behavior. [4] gives an example where fuzzy logic is applied to multimodal emotion recognition. Other examples of fuzzy logic in emotion recognition are [5]–[7], which use fuzzy logic rules to map acoustic features to a dimensional representation in valence, activation, and dominance. [8] uses an IT2 FS model for emotion recognition from facial expressions. The model of [9] uses fuzzy logic for emotional behavior generation. Another related trend of research is the development of lexical resources. Our work can be seen as a lexical resource framework like the Dictionary of Affective Language (DAL) [10]. In this work, 8745 common English words were evaluated for valence and activation (as well as a third dimension, imagery). The methodology for collecting the data in this paper was similar to our survey in presenting subjects with words as stimuli, but in the DAL the values of each word’s dimensions are the mean across all subjects, so there is no estimate of the intra-subject variation. Also, compared with the DAL, we focus on words that are names of emotions, rather than words that might have emotional connotation. As such, our approach is more geared toward analyzing the meaning of short utterances explicitly referring to emotions, which we call natural language descriptions of emotion [11], while the dictionary of affect would be more appropriate for characterizing the emotional tone at the document-level. Another related research trend outside the domain of affective computing is the study of linguistic description of signals [12], [13], which aims to associate words with the signals they describe. 36 II. The Meaning of Meaning What does it mean to say that our model represents the meaning of emotion words? We believe this is an important question and therefore we will briefly discuss meaning in general in Section II-A and then explain how it relates to the meaning of emotion words in Section II-B. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® A. Meaning in General In an influential paper around the end of the 19th century, the philosopher of language Gottlob Frege described two components of meaning: extension and intension [25]. The extensional component of meaning is a mapping from words to things in the world, whereas the intensional meaning is a mapping from words to concepts. The stereotypical example of this is illustrated by the terms “morning star,” “evening star,” and “Venus.” The extensional meaning of these three terms is the same, namely the second planet in the solar system. However, the intensional meaning of these three terms is different, which explains why the three terms cannot be freely substituted in an arbitrary sentence without changing the meaning of the sentence. In this paper, we focus on the meaning of individual words, but we touch upon the topic of the meaning of phrases in the second model. Although the notion of extension and intension are most frequently associated with the field of philosophy of language, the idea can also be described in mathematical terms [26]. One can think of the extension of a function as a set of ordered pairs, where the first item of the pair is an input to the function and the second item in the pair is the corresponding output. The intensions of a function are described by their symbolic or algorithmic representations. Therefore we can have “ f ^ x h = x 2 ” or “ f (x) = x ) x ” as intensions of the extensional set of pairs “ 1, 1 , 2, 4 , 3, 9 , f.” Extension and intension have been formally described in the study of formal concept analysis [27]. We believe that by defining meaning in this way, we can describe our model more precisely. Without explicitly describing “meaning,” whether in terms of extension and intension or otherwise, this important concept tends to get blurred. Although, this topic is complex, the intuition behind it is rather simple: similar, intuitive distinctions along the lines of intension and extension are common. Extension-related terms include: referent, percept, object, empirical data, Aristotelian world view, or stimulus meaning. Intension-related terms include: signified, concept, subject, knowledge structure, schema, Platonic world view, or linguistic meaning. The process of understanding a word is a mapping, or interpretation, from the word itself to the word’s meaning, whether it be intensional or extensional. We argue that, when understanding natural language in the absence of first-hand, perceptual evidence, people refer to intensional meaning rather than extensional meaning. It is intensional meaning that we focus on in this paper. B. The Meaning of Emotion Words According to the definition of meaning described above, the extensional meaning of an emotion word is the set of human behaviors and states of the world that the word refers to. The intensional meaning of an emotion word is the concept that people have when using it to communicate. Although most other examples of emotion research do not make an explicit distinction between intensional and extensional meaning, it seems that many tend towards extensional meaning, especially those that deal with the analysis of emotional data that has been annotated with emotional labels. In this view, the extensional meaning of an emotion word used as an annotation label refers to the set of all data to which it has been applied. The focus on intensional meaning in this work therefore can be seen as one of its distinguishing features, though it could be said that machine learning that generalizes from training data is in fact a way to infer intentional meaning. The question then arises about the form of this intensional meaning, in particular, how we can simulate this subjective form of meaning, with respect to emotion words, in a computer. The two computational models we describe mirror two different theoretical views of intensional meaning. One view seeks to represent the intensional meaning of emotion words as points or regions of an abstract, low-dimensional semantic space of valence, activation, and dominance. The other view seeks to represent the intensional meaning of emotion words in relation to other propositions. This latter perspective is exemplified in the Emotion Twenty Question (EMO20Q) game. EMO20Q is played like the normal twenty questions guessing game except that the objects to be guessed are emotions. One player, the answerer, picks an emotion word and the other player, the questioner, tries to guess the emotion word by asking twenty or fewer yes-no questions. Each question can be seen as a proposition about the emotion word, which prompts an answer that ranges on a scale from assent to dissent. Scale-based models of emotion have an interesting history that goes back to Spearman’s attempts to measure general intelligence using factor analysis. At first Spearman hypothesized that there was one underlying scale that could represent a person’s intelligence, but later it came to be realized that intelligence was a complex concept that required multiple scales. Factor analysis was the method used to isolate these scales, and in turn factor analysis was used in the pioneering work [28] that first identified valence, activation, and dominance as factors in the connotative meanings of words. In [28], psychologists, aided by one of the early computers, conducted semantic differential surveys that tried to measure the meaning of words on Likert scales whose endpoints were defined by thesaurus antonyms. Valence, activation, and dominance were identified as interpretations of the factors that were encountered. Some of the early applications of this emotional model to language are [29], [1], [30], [10]. The pictorial representation of these dimensions, which we use in the interval surveys, was developed by [31]. It should be noted that the valence, dominance, and activation representation is merely a model for emotional meaning and these scales most likely do not exhaustively describe all emotional concepts. In [32] it is argued that four dimensions are required; “unpredictability” in addition to the three scales we use. The approach we advocate here is based on an algebraic model that is generalizable to any scales. Our choice of the three scales for this model was motivated by their wide usage and to balance theoretical and practical concerns. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 37 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® III. IT2 FS Model for the Meaning of Emotion Words Perceptual Computer for Translation Word in Language 1 Encoder Fuzzy Sets Computing with Words (CWW) Engine Fuzzy Sets Word in Language 2 Decoder FIGURE 1 Translation as a perceptual computer. The second model we propose takes a different perspective. Rather than having theoretically motivated scales for various characteristics of emotions, the second model aims to represent the intentional meaning of emotion words in terms of natural language propositions that can be assented to or dissented from. This view could also be construed as an abstract scale of truth with respect to various propositions (which has been considered in the study of veristic fuzzy sets [33]–[35]), but we see this view as qualitatively different from the first model. The reason why we see the propositional model as different from the scale-based model is that, first, the number of propositions about emotions will generally be larger than the number of emotion words, whereas in the case of the scale-based representation the number of scalar dimensions will be smaller than the emotion vocabulary size. Another reason that the propositional model can be considered qualitatively different than the scale-based model is that propositions can be verbally (or orthographically) expressed as linguistic stimuli, whereas abstract scales carry more cognitive implications and are language independent. Some questions from EMO20Q closely correspond to the scales in the first model, e.g., “is it positive?” is similar to valence, “is it a strong emotion?” is similar to activation, and “is it related to another person?” hints at dominance. However, model 2 contains many questions that are very specific, such as “would you feel this emotion on your birthday?”. The models we propose can be seen as an algebraic representation where theoretical entities like emotion concepts are considered virtual objects [36] with abstract scales. In this view, a collection of scales that describe an object can be seen as a suite of congruence relations. Recall that a congruence relation / (mod P) is an equivalence relation that holds given some property or function P. A suite of congruence relations is a bundle of equivalence relations " +i: i d I , , again, given some property P. In both of the models we present, P are fuzzy sets in an emotion space. In the case of the first model we present, I is a set which can contain valence, activation, and/or dominance. In the case of the second model, I is a set of propositions derived from the EMO20Q game [37]–[40]. For example, for the statement that “ f makes you smile,” we can say that happy and amused are congruent given this statement about smiling behavior. In terms of the scales, the equivalence relations on each scale divide the scale space into equivalence classes. In the next section, we describe this space of emotions in more detail. 38 A. Emotion Space and Emotional Variables Let E be an emotion space, an abstract space of possible emotions (this will be explained later in terms of valence, activation, and dominance, but for the time being we will remain agnostic about the underlying representation). An emotion variable f represents an arbitrary region in this emotion space, i.e., f 1 E , with the subset symbol 1 used instead of set membership ^! h because we wish to represent regions in this emotion space in addition to single points. The intensional meaning of an emotion word can be represented by a region of the emotion space that is associated with that word. An emotion codebook C = ^W C, eval C h is a set of words W C and a function eval C that maps words of W C to their corresponding region in the emotion space, eval C :W C " E. Thus, an emotion codebook can be seen as a dictionary for looking up the meaning of words in a vocabulary. Words in an emotion codebook can also be seen as constant emotion variables. The region of the emotion space that eval C maps words to is determined by interval surveys, as described in Section III-D. We consider two basic relations on emotion variables: similarity and subsethood. Similarity, sm : E # E, is a binary equivalence relation between two emotion variables (we will see that the fuzzy logic interpretation of similarity will actually be a function, sm : E # E " 60, 1@, which measures the amount of similarity between the variables rather than being true or false). Subsethood, ss : E # E, is a binary relation between two emotion variables that is true if the first variable of the relation is contained in the second. Like similarity, the fuzzy logic interpretation of subsethood is a value between zero and one. Further details are provided in Section III-C, where we will define the fuzzy logic interpretation of these relations. Finally, a translation is a mapping from the words of one vocabulary to another, as determined by the corresponding codebooks: translate :W 1 # C 1 # C 2 " W 2 , (1) which is displayed schematically in Figure 1. This can be decomposed by thinking of C 1 # C 2 as a similarity or subsethood matrix, which is denoted as the CWW engine in the figure. Translation can be seen as selecting the word from the output language w output ! W 2 such that the similarity or subsethood is maximized for a given w input ! W 1 . In the case of similarity, the translation output is w output = arg max sm ^eval C2 ^w 2h, eval C1 ^w inputhh , w2 ! W2 (2) where the argmax functions as the decoder in Figure 1. The formulation of similarity and subsethood in terms of IT2 FSs IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® will be described in Section III-C and we will empirically evaluate the use of similarity and subsethood for use in translation in Section V. c 1 b B. Fuzzy Logic and Emotion Concepts In Section III-A, the definition of an emotion space E followed a traditional set theoretic formulation. Traditional, nonfuzzy sets have crisp boundaries, which means that we can precisely determine whether a region in the emotion space is a member of any given set representing an emotion word. However, this seems to contradict the intuition and evidence that emotion concepts are somewhat vague and not precisely defined sets [41]. There are several sources of uncertainty that theoretically preclude precise set boundaries in either of the two models we present. There is modeling uncertainty because a computational model is necessarily an approximation of human thought processes. There is measurement uncertainty because the precision on these scales may be limited by perceptual processes of mapping sensory data to concepts and in distinguishing between concepts. Finally, there is uncertainty due to inter- and intra-subject variation. Postulating a blurred boundary between emotion concepts leads us to use fuzzy logic, in particular IT2 FSs. If we deem that emotion concepts can be represented as fuzzy sets in either of these two models, then how do we determine the shapes of sets in this space? As we describe later in Section III-D, we use the interval approach survey methodology. One can think of a Likert type of survey where the scales represents valence, activation, and dominance and then query users with emotion words as stimuli; however, subjects may be unsure about picking a specific point on the scale due to vagueness in the meaning of emotion words, especially broadly defined emotion words like those typically used as primary emotions. To deal with this intra-subject uncertainty, we turn to interval surveys and IT2 FSs. Just as type-1 fuzzy sets extend classical sets by postulating set membership grade to be a point in [0,1], type-2 fuzzy sets further extend this generalization by defining a membership function’s membership grade at a given point in the domain to be a distribution in [0,1] rather than a single point, which allows for uncertainty in the membership grade [42]. The rationale for type-2 fuzzy logic is that even if a membership function takes a value between 0 and 1, there is still no uncertainty being represented because the membership value is a fixed point. What is represented by type-1 fuzzy sets is partial membership, not uncertainty. Whenever there is uncertainty, type-2 fuzzy logic is motivated on theoretical grounds [21]. The region of uncertainty in the membership grade with respect to the domain is known as the footprint of uncertainty. While general type-2 fuzzy logic systems account for uncertainty, they are more conceptually and computationally complex, and methods to estimate them directly from human input are still ongoing areas of research [43]. IT2 FSs use intervals to capture uncertainty of the membership grade [15]. a e’ 0 d a’ b’ c’ d’ FIGURE 2 Example of a trapezoidal interval type-2 membership function (IT2 MF). A normalized trapezoidal IT2 MF can be specified with nine parameters, (a, b, c, d, a’, b’, c’, d’, e’). The trapezoidal height of the upper membership function (e), can be omitted in normalized IT2 FSs because it is always equal to 1. Instead of an arbitrary distribution in [0, 1] as is the case for general type-2 fuzzy sets, IT2 FSs use an interval [l, u] in [0,1] to represent an area of uniform uncertainty in the membership function’s value, where 0 # l # u # 1 are the lower and upper bounds of the uncertainty interval, respectively. IT2 FSs can be regarded as a first-order representation of uncertainty because they are the simplest type of fuzzy set that will account for uncertainty in the membership function. Also, as will be discussed in Section III-D, there is a method for constructing IT2 FSs from human input, which makes the use of IT2 FSs practical for human-computer interaction. IT2 FSs have been widely used because they approximate the capability to represent the uncertainty of general type-2 fuzzy set models while still using many of the same techniques used for type-1 fuzzy sets. IT2 FSs can be represented as two type-1 membership functions: an upper membership function, which defines the upper bound of membership, and a lower membership function, which represents the lower bound on membership. When these coincide, the IT2 FS reduces to a type-1 fuzzy set [44], [45]. If the difference between the upper and lower membership function is wide, this means that we have much uncertainty about the membership grade. An example of an interval type-2 membership function can be seen in Fig. 2. The area between the upper and lower membership functions is the footprint of uncertainty. In this paper, as an engineering decision we have restricted ourselves to trapezoidal membership functions, which can be specified in a concise way using a 5-tuple (a, b, c, d, e). The first number of the tuple, a, represents the x-value of the left side point of the base of the trapezoid, b represents the x-value of the left side point of the top of the trapezoid, c represents the x-value of the right side point of the top of the trapezoid, d represents the x-value of the right side point of the base of the trapezoid, and e represents the height of the trapezoid (i.e., the y-value of the top of the trapezoid). Since IT2 FSs consist of MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 39 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® an upper and lower membership function, they can be represented as a 10-tuple. However, in the case of normalized interval type-2 membership functions, those whose upper membership function reaches 1, we can leave out the height of the upper membership function and specify the fuzzy set as a 9-tuple consisting of a 4-tuple for the upper membership function with the fifth value assumed to equal be 1, and a 5-tuple for the lower membership function (we must include the fifth value, e´ as described above, because in general the height of the lower membership function can be anywhere between 0 and 1). C. Similarity and Subsethood Similarity and subsethood form important parts of our model of emotions. The notion of similarity allows us to indicate that some pairs of emotion concepts are more or less similar. For example, we would say that angry is more similar to frustration than it is to happiness. When we make this judgment, we do not explicitly consider specific experiential examples of angry, frustrated, and happy data. Rather, we argue that one can make similarity judgments based on a mental representations of emotions. Two people could have disjoint sets of formative emotional stimuli, but still largely agree on the emotion concepts which form the intensional meaning of emotion words. In the fuzzy logic interpretation, similarity ranges from 0 to 1, where 1 is equality of two membership functions, and 0 indicates that the membership functions have no overlap. The notion of subsethood allows us to capture that some general emotions might encompass other emotions. For example, “amused” might be a subset of “happy.” The notion of subsethood is defined for traditional sets as being a Boolean value, but for fuzzy sets it takes a value between 0 and 1. Similarity and subsethood are closely related. For clarity, we present the definitions of similarity and subsethood in terms of crisp sets, then type-1 and type-2 fuzzy sets. The definitions of the fuzzy set similarity and subsethood follow naturally from crisp sets. The general form of similarity is based on the Jaccard Index, which states that the similarity of two sets is the cardinality of the intersection divided by the cardinality of the union, i.e., sm J ^ A, B h = A+B . A,B (3) For fuzzy sets, the set operations of intersection and union (j and k) are realized by the min and max functions and the cardinality operator (| |) is realized by summing along the domain of the variable. Thus for type-1 fuzzy sets, sm J ^ A, B h = 40 / Ni =1 min ^ n A ^x ih, n B ^x ihh . / Ni =1 max ^ n A ^x ih, n B ^x ihh (4) For IT2 FSs, the right hand side of this equation becomes / Ni =1 min ^ n A ^x ih, n B ^x ihh + / Ni =1 min ^ n A ^x ih, n B ^x ihh , / Ni =1 max ^ n A ^x ih, n B ^x ihh + / Ni =1 min ^ n A ^x ih, n B ^x ihh (5) where n ^ x h and n ^ x h are the upper and lower membership functions, respectively.The formulas for similarity are symmetric ^sm J ^ A, B h = sm J ^B, A hh and reflexive ^sm J ^ A, A h = 1 h [23]. We also examined a different, earlier similarity method called the Vector Similarity Method (VSM) [46]. This method was used in earlier experiments [16], so we tested it in addition to the newer Jaccard-based method. The VSM uses intuition that similarity of a fuzzy set is based on two notions: similarity of shape and similarity of proximity. Thus, the similarity of two fuzzy sets can be seen as a two element vector: ss V ^ A, B h = ^ss shape ^ A, B h, ss proximity ^ A, B hhT . The similarity measure of proximity is based on the Euclidean distance between the fuzzy set centroids. The similarity measure of shape is based on the Jaccard similarity between the two fuzzy sets once their centroids have been aligned. To convert the vector similarity to a single scalar, the product of ss shape and ss proximity is taken. The subsethood measure is closely related to similarity and is based on Kosko’s subsethood [47] for type-1 fuzzy sets. The measure of subsethood of a set A in another set B is defined as: ss K ^ A, B h = A+B . A (6) As with the similarity metric, when the set and cardinality operators are replaced by their fuzzy logic realizations, one obtains / Ni =1 min ^ n A ^x ih, n B ^x ihh ss K ^ A, B h = / Ni =1 n A ^x ih (7) for the case of type-1 fuzzy sets and for type-2 fuzzy sets the right hand side of the equation becomes / Ni =1 min ^ n A ^x ih, n B ^x ihh + / Ni =1 min ^ n A ^x ih, n B ^x ihh . / Ni =1 n A ^x ih + / Ni =1 n A ^x ih (8) As opposed to similarity, subsethood is asymmetrical, i.e., ss K ^ A, B h ! ss K ^B, Ah . These equations give the similarity and subsethood measures for fuzzy variables of one dimension. To aggregate the similarity of the three dimensions of valence, activation, and dominance, we tr ied several methods: averaging the similarity of the individual dimensions sm avg ^ A, B h = 1/3 / i ! "Val.,Act.,Dom. , sm i ^ A i, B i h, taking the product of the similarity of the individual dimensions sm prod ^ A, B h = % i ! "Val.,Act.,Dom. , sm i ^A i, B ih, and taking the linguistic weighted average [48] sm lwa ^ A, B h = / i ! "Val.,Act.,Dom. , sm i ^ A i, B i h w i / / i ! "Val.,Act.,Dom , w i . The results of these different choices are described in Section V. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® D. Interval Surveys Using the Interval Approach IV. Methodology To estimate the interval type 2 fuzzy sets over the valence, This section describes the experimental methodologies that activation, and dominance scales, we used the interval were used to create the two models for emotion codebooks. approach [2], [3]. This survey methodology uses a Likert-like In the first, we use an interval approach survey for emotion scale but the subjects select interval ranges instead of single words and we adapt the CWW paradigm to account for numbers on the scale, which results in IT2 FSs. One of the 3-dimensional fuzzy scales, specifically, by implementing siminovelties of our work that adds to [2], [3] is that we look at larity and subsethood measures for fuzzy sets that have 3 modeling a phenomenon where the underlying variable is dimensions. In the case of the second model, the interval surcomposed of multiple scales: three separate scales (valence, vey is separate from the elicitation of emotional information. activation, and dominance) in the case of our first model, and The emotional information is collected from the EMO20Q an open-ended number of scales in our second model. game and thereafter the fuzzy sets are calculated from the The interval approach assumes that most people will be able answers to the questions in the game. to describe words on a scale, similar to a Likert scale. However, while the Likert scale approach allows the subject to choose A. Emotion Vocabularies only a single point on the scale, the interval approach allows In our experiments, we examined four different emotion the subject to select an interval that encloses the range on the vocabularies. The first vocabulary consisted of seven emotion scale that the word applies to. Thus, while a Likert scale can category words: angry, disgusted, fearful, happy, neutral, sad, and capture direction and intensity on a scale, the interval approach surprised. These are commonly used emotion categories used also captures uncertainty. This uncertainty that an individual for labeling emotional data. We refer to this vocabulary as user has about a word can be thought of as intra-user uncerEmotion Category Words. These emotions are posited to be tainty. The user does not need to know about the details of basic in that they are reliably distinguishable from facial interval type-2 fuzzy logic; they can indicate their uncertainty expressions [49]. as an interval which is then aggregated into IT2 FSs by the interval approach, which TABLE 1 Similarity between words of the Blog Moods vocabulary and the Emotion Category Word vocabulary. represent inter-user uncertainty. After collecting a set of intervals from ANGRY DISGUSTED FEARFUL HAPPY NEUTRAL SAD SURPRISED an interval approach survey, the interval AMUSED 0.004 0.003 0.005 0.060 0.004 0.005 0.053 approach estimates an IT2 FS that takes TIRED 0.006 0.003 0.034 0.001 0.038 0.196 0.001 into account the collective uncertainty of CHEERFUL 0.003 0.003 0.003 0.109 0.001 0.002 0.088 BORED 0.015 0.012 0.075 0.004 0.064 0.335 0.004 a group of subjects. This type of uncerACCOMPLISHED 0.015 0.013 0.008 0.151 0.006 0.008 0.139 tainty can be thought of as inter-user SLEEPY 0.007 0.005 0.018 0.009 0.172 0.128 0.010 uncertainty. The interval approach consists CONTENT 0.005 0.004 0.007 0.044 0.015 0.012 0.040 EXCITED 0.015 0.017 0.006 0.255 0.002 0.002 0.213 of a series of steps to learn the fuzzy sets CONTEMPLATIVE 0.006 0.004 0.012 0.006 0.161 0.075 0.007 from the survey data which can broadly be BLAH 0.014 0.010 0.049 0.005 0.166 0.359 0.007 grouped into the data part and the fuzzy AWAKE 0.020 0.017 0.016 0.061 0.015 0.014 0.068 CALM 0.003 0.002 0.011 0.007 0.137 0.069 0.008 set part. The data part takes the survey BOUNCY 0.009 0.012 0.002 0.361 0.000 0.001 0.311 data, preprocesses it, and computes statisCHIPPER 0.002 0.002 0.001 0.066 0.002 0.003 0.059 tics for it. The fuzzy set part creates type-1 ANNOYED 0.393 0.380 0.080 0.041 0.002 0.023 0.076 CONFUSED 0.026 0.020 0.064 0.014 0.046 0.170 0.017 fuzzy sets for each subject, and then aggreBUSY 0.068 0.079 0.049 0.111 0.013 0.012 0.116 gates them with the union operation to SICK 0.008 0.004 0.032 0.001 0.023 0.204 0.001 form IT2 FSs. A new version of the interANXIOUS 0.207 0.181 0.091 0.028 0.003 0.025 0.038 EXHAUSTED 0.015 0.011 0.048 0.003 0.046 0.298 0.004 val approach, the enhanced interval DEPRESSED 0.008 0.005 0.050 0.001 0.015 0.218 0.001 approach, was proposed in [19]. This CURIOUS 0.038 0.042 0.014 0.203 0.011 0.006 0.176 enhancement aims to produce tighter DRAINED 0.009 0.007 0.039 0.002 0.061 0.280 0.003 AGGRAVATED 0.578 0.618 0.114 0.047 0.002 0.020 0.087 membership functions by placing new ECSTATIC 0.000 0.000 0.000 0.108 0.000 0.000 0.117 constraints on the overlapping of subjectBLANK 0.006 0.004 0.017 0.005 0.133 0.137 0.006 specific membership functions in the reaOKAY 0.016 0.013 0.035 0.017 0.076 0.057 0.020 HUNGRY 0.084 0.082 0.029 0.045 0.013 0.034 0.052 sonable interval processing stage. We tested HOPEFUL 0.009 0.007 0.007 0.047 0.010 0.009 0.050 this method as well as the original interval COLD 0.005 0.003 0.026 0.001 0.047 0.123 0.002 approach and found that the enhanced CREATIVE 0.027 0.037 0.007 0.524 0.001 0.002 0.462 PISSED_OFF 0.383 0.363 0.052 0.016 0.000 0.008 0.035 interval approach did in fact yield tighter GOOD 0.004 0.003 0.004 0.067 0.005 0.006 0.060 membership functions, but that this did THOUGHTFUL 0.005 0.003 0.004 0.011 0.079 0.029 0.012 not necessarily improve the overall perforFRUSTRATED 0.186 0.233 0.068 0.022 0.001 0.012 0.030 CRANKY 0.325 0.351 0.099 0.045 0.002 0.022 0.060 mance measures when compared with the STRESSED 0.288 0.304 0.158 0.044 0.003 0.026 0.053 original method (c.f. Section VI). MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 41 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® The second vocabulary consisted of 40 words taken from the top 40 emotion mood labels used by the bloggers of LiveJournal (this blogging site lets users label each post with a mood label, which has been used as an annotated corpus for studying emotional text [50]). The words in this vocabulary are: accomplished, aggravated, amused, angry, annoyed, anxious, awake, blah, blank, bored, bouncy, calm, cheerful, chipper, cold, confused, contemplative, content, cranky, crazy, creative, curious, depressed, disgusted, drained, ecstatic, excited, exhausted, fearful, frustrated, good, happy, hopeful, hungry, neutral, okay, pissed off, sad, sick, sleepy, stressed, thoughtful, and tired. We refer to this vocabulary as Blog Moods. The third vocabulary was a list of 30 Spanish emotion words that was taken from the mental health initiative of a Southern California medical service provider. The words in the Spanish emotion vocabulary are: aburrido, agobiado, agotado, ansioso, apenado, asqueado, asustado, avergonzado, cauteloso, celoso, cómodo, confiado, confundido, culpable, deprimido, enamorado, enojado, esperanzado, extático, feliz, frustrado, histérico, malicioso, pasmado, rabioso, solitario, sorpredido, sospechoso, timido, and triste (see Table 1 in [17] for glosses of these words from a SpanishEnglish dictionary). We refer to this vocabulary as Spanish Emotion Words. The fourth vocabulary was elicited from subjects playing EMO20Q, both between two humans and also between a human and computer with the computer in the questioner role. [37], [38], [40]. These data sources resulted in a set of 105 emotion words. B. Valence, Activation, and Dominance Model (Model 1) The data collected from the interval surveys for the first model consists of four experiments: three surveys of 32 subjects for English and one survey of eight subjects for Spanish. All surveys had a similar structure. First, the surveys gave the subject instructions. Then the surveys sequentially presented the subject with emotion words, which we will refer to as the stimuli, one word per page. For each stimulus there were sliders for each of the three emotion dimensions. The sliders had two handles, which allowed the subjects to select the lower and upper points of ranges. The range of the sliders was 0–10. The maximum range allowed was 10 and the minimum range was 1 because the steps were integer values and the implementation imposed a constraint that the upper and lower endpoints could not be the same. Above each scale was a pictorial representations known as a self-assessment manikin [31] that aimed to illustrate the scale non-verbally. The overall structure of the Spanish survey was the same as the English one, but special care was required for the translation of the instructions and user interface elements. The first version of the translation was done by a proficient second-language Spanish speaker and later versions were corrected by native Spanish speakers. The subjects of the surveys were native speakers of Spanish with Mexican and Spanish backgrounds. In the surveys, each subject was presented with a series of randomized stimuli from one of the emotion vocabularies. The description of the stimuli regimen and other implementation 42 details for the experiments can be found in [16] for English and [17] for Spanish. Links to the surveys can be found at http://sail.usc.edu/~kazemzad/emotion_in_text_cgi/. One final issue was deciding whether similarity or subsethood was best for our task and how to aggregate these metrics for three dimensions. Both similarity and subsethood can be used as an objective function to be maximized by translation. [23, Chapter 4] recommends using subsethood when the output is a classification and similarity if the input and output vocabularies are the same, but it was not immediately clear what would be preferable for our tasks, so we tested the different methods empirically. Also, since this is one of the first studies that uses fuzzy sets that range over more than one dimension, we tested several ways of combining the similarities and subsethoods of the individual scales using the average, product, and linguistic weighted average as described in Section III-C. We also tried leaving dominance out as it is a distinguishing feature in only a few cases. The mapping from one vocabulary to another is done by choosing the word from the output vocabulary that has the highest similarity or subsethood with the input word. Here, similarity and subsethood are the aggregated scalewise similarities and subsethoods for valence, activation, and dominance. We examined several different mappings. In [16], we examined mapping from the blog mood vocabulary to the more controlled categorical emotion vocabulary, which simulates the task of mapping from a large, noisy vocabulary to a more controlled one. In this paper, we use mapping tasks that involved translation from Spanish to English to evaluate the estimated IT2 FSs. To empirically evaluate the performance of the mapping, we used a human translator to complete a similar mapping task. We instructed the translator to choose the best word or, if necessary, two words from the output vocabulary that matched the input word. A predicted result was considered correct if it matched one of the output words chosen by the evaluator. We also use multidimensional scaling to visualize the derived emotion space. Multidimensional scaling is similar to principal component analysis except that it operates on a similarity matrix instead of a data or covariance matrix. Since it operates directly on a similarity matrix, it is ideal for visualizing the results of aggregating the scale-wise similarities into a single similarity matrix. C. Propositional Model (Model 2) We devised the second model to address the results obtained from model 1, described in Section V, where we found that larger vocabulary sizes resulted in lower performance in the translation tasks. Our inspiration for the second model was that people can guess an object from a large, open-ended set by adaptively asking a sequence of questions, as in the game of twenty questions. The sequential questioning behavior thus motivated our representation and experimental design of the EMO20Q. The premiss of EMO20Q is that the IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Gradient Yes/No Answers YESSSSS!!!!!:):) Yes, Definitely YES!!! Yes!!! Yes! Yes Definitely Yes!! Yes Definitely Yes, That’s Right Certainly Yup Yep Yes, Very Likely Yes Usually Usually, Yes I Would Say Yes Yes, in General I’m Going to Say Yes Yes I Would Assume So Yes In General Yes Yes in General Yea Yes, I Think So Usually Yes, It Can Be Yes It Can Be Probably Mostly Generally Yes, It Is Possible Yes, It Could Yes I Suppose I Think So Possibly, Yes Yes to Some Extent Hmm, I’d Say Yes in General Yes at Least Possibly I Think So... Generally Yes, but Not Necessarily Eh... Yes, You Could Say So. Yes but Not Necessarily Possibly Perhaps Almost Kind of I Think So Kind of Sort of Sometimes Maybe :) Not Sure, Maybe It Depends It Can Be but Not Necessarily Depends Could Be but Not Necessarily Could Be Both Yes and No Hmm Not Exactly... (but Again One Could) No Not Necessarily but Could Be Not Definitely Possibly but It’s Doubtful It’s Possible, but Generally I Guess I’d Say No Not Quite Not That Much Possibly Not No, It Can Be but Not Necessarily Not Exactly That’s a Hard One... I Guess Not Really Uhm Not Necessarily Not Necessarily Not Really No Not Necessarily at Least Usually Not Ah...Hmmmm.... I Guess I Have to Say No... No, Not Necessarily Rarely Not Usually No Not Exactly Not Certainly No, Not Generally Not Probably In General, No Probably Not I Don’t Think So Not Possibly No in General No Not Really In General No No It Doesn’t Really Relate No, Not Normally No, Not Usually Mostly Not Certainly Not Nope No Definitely Not No, Not at All Nono No! 0 20 40 60 80 100 Truth Degree FIGURE 3 Fuzzy answers to yes/no questions obtained by presenting the answer phrase (x-axis labels) to users of Amazon Mechanical Turk, who responded by using a slider interface to indicate the truth-degree (y-axis). This plot was based on a single handle slider, in contrast to the interval approach surveys, in order to show an overview of the data. The results presented below are for the double handle slider and interval approach analysis. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 43 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Happy Val. 1 Neutral Val. 1 Angry Val. 1 0.5 0.5 0.5 0 0 0 0 5 10 Happy Act. 1 0 5 10 Neutral Act. 1 0.5 0 0.5 0 5 0 10 Happy Dom. 1 0.5 0 0 5 Neutral Dom. 1 0.5 0 5 10 0 0.5 0 5 10 Angry Act. 5 0.5 0.5 0 5 10 Angry Dom. 1 0 10 0 1 0.5 0 1 1 0 10 Sad Val. 0 0 5 10 Sad Act. 0 5 10 Sad Dom. 1 0.5 0 5 10 0 0 5 10 FIGURE 4 Example membership functions (MF’s) calculated with the interval approach for happy, neutral, angry, and sad emotions. All the membership functions shown here, except the valence for neutral, are shoulder MF’s that model the edges of the domain of n. The region between the upper and lower MF’s, the footprint of uncertainty, is shaded. The variables of Val., Act., and Dom. stand for valence, activation, and dominance. Multidimensional Scaling Plot of the Product of Distances Creative Happy Bouncy Surprised Accomplished Excited Cheerful Good Chipper Hopeful Awake Amused Content Ecstatic Curious 0.4 0.3 0.2 Busy 0.1 Component 2 twenty questions game is a way to elicit human knowledge about emotions and that the game can also be used to test the ability of computer agents to simulate knowledge about emotions. The experimental design of the EMO20Q game was proposed in [37] and since then we have collected data from over 100 human-human and over 300 humancomputer EMO20Q games. In this paper we focus on follow-up experiments that aim to understand the answers in the game in terms of fuzzy logic. More information about EMO20Q, including demos, code, and data, can be found at http://sail.usc.edu/emo20q. Although the questions asked in the EMO20Q game are required to be yes-no questions, the answers are not just “yes” or “no.” Often the answer contains some expression of uncertainty. Here we focus on the Thoughtful Okay 0 −0.1 −0.2 Neutral Contemplative Calm Confused Hungry Sleepy Blank Cold Blah Sick Depressed Drained Tired Sad Exhausted Bored Frustrated Fearful Pissed_Off Anxious Stressed Cranky Annoyed Angry Disgusted Aggravated −0.3 −0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 Component 1 FIGURE 5 Multidimensional scaling (2-D) representation of the emotion words’ similarity. This visualizes when the similarity of the individual valence, activation, and dominance dimensions were combined by taking their product. The words in the categorical emotion vocabulary are marked in bold. 44 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® ual scale-wise similarities as the aggregation method. In Fig. 5 fuzzy logical representation of answers to questions in the we display the results of calculating a similarity matrix game. Just as the first model uses valence, activation, and between the words of both vocabularies using multidimensional dominance scales to represent emotions, the second model scaling (MDS) [51]. MDS is a statistical approach in the same uses the questions from EMO20Q as scales that can be interfamily as principal components analysis (PCA) and factor preted on axes that range from “yes” to “no.” In this case, the analysis. We use MDS in this case because factor analysis has interval surveys we performed were not overtly about emounwanted assumptions (namely, a multivariate normal distritions, but rather to evaluate the answers on the scale from bution with linear relationships) and because PCA operates on “no” to “yes,” which we defined as a domain for fuzzy sets feature vectors as opposed to similarity matrices (and also that range from 0 to 100. assumes linear relationships). We performed MDS on the Using data from EMO20Q games, we collected a set of aggregated similarity measurements to qualitatively visualize questions and answers about emotions. We sampled a set of the emotion space as derived from the similarity matrix. The answers based on frequency of occurrence and how well the result of combining the similarities of the valence, activation, set covered the space from affirmative to negative answers. We and dominance dimensions was slightly different using sum also included some control stimuli not observed in the data but versus product aggregation. The sum aggregation produced a included to provide insight on how people would interpret more spread out distribution of the words in the space negation. For example, we included phrase groups like “cerinduced by MDS, while the product aggregation produced a tainly,” “not certainly” and “certainly not” that would allow us space where the emotions are more tightly clustered. This was to calibrate how the subjects would interpret phrases that because the product aggregation method was less sensitive to might have a logical interpretation. The final set of stimuli consmall dissimilarities. The multidimensional scaling plot also sisted of 99 answers. These were presented to subjects along allows one to see which emotions are close and potentially with either a single or double handle slider. Below in Figure 3, confusable. For example, “happy” and “surprised” are very we plot the responses for single sliders, which are easier to visuclose, as are “angry” and “disgusted.” Since mapping between alize than double sliders. In what follows, however, we present vocabularies, like MDS, is done using similarities, this implies the double handle slider results, which form the input to the that these pairs are confusable. Since the components derived interval approach methodology described above. from MDS are calculated algorithmically, they are not directly We conducted the interval approach survey on Amazon interpretable as in the case of factor analysis. Mechanical Turk (AMT), an internet marketplace for crowd sourcing tasks that can be completed online. The survey was conducted in sets of 30 stimuli to each of 137 subjects on TABLE 2 Similarity between Spanish and English emotion words. AMT who were ostensibly English speakANGRY DISGUSTED FEARFUL HAPPY NEUTRAL SAD SURPRISED ers from the U.S. The average amount of ABURRIDO 0.2284 0.2335 0.6370 0.1965 0.3196 0.4610 0.1230 ratings per stimulus was 38.5. V. Experimental Results In this section, we present the results of experiments that used the two models and the survey methodology described in Sections III-D, IV-B, and IV-C to estimate fuzzy set membership functions for the emotion vocabularies presented in Section IV-A, to calculate similarity and subsethood between emotion words as described in Section III-C, and to map between different emotion vocabularies. A. Valence, Activation, and Dominance Model (Model 1) Examples of the membership functions that were calculated for the emotion category vocabulary can be seen in Fig. 4. The distances between these membership functions and those of the blog moods vocabulary can be seen in Table 1, as calculated using the product of the individ- AGOBIADO AGOTADO ANSIOSO APENADO ASQUEADO ASUSTADO AVERGONZADO CAUTELOSO CELOSO CÓMODO CONFIADO CONFUNDIDO CULPABLE DEPRIMIDO ENAMORADO ENOJADO ESPERANZADO EXTÁTICO FELIZ FRUSTRADO HISTÉRICO MALICIOSO PASMADO RABIOSO SOLITARIO SORPRENDIDO SOSPECHOSO TIMIDO TRISTE 0.4762 0.2250 0.4579 0.2915 0.5445 0.4610 0.2701 0.0918 0.7396 0.0436 0.2835 0.2488 0.3275 0.2893 0.4371 0.8732 0.0929 0.3140 0.1329 0.6414 0.6522 0.3347 0.3102 0.5416 0.2657 0.3405 0.3026 0.0844 0.3376 0.5696 0.2344 0.4748 0.2928 0.5969 0.5324 0.2663 0.0957 0.6880 0.0510 0.3307 0.2531 0.3445 0.2914 0.5611 0.7125 0.0987 0.3108 0.1655 0.7271 0.6566 0.4270 0.3480 0.4616 0.2672 0.3803 0.3497 0.0857 0.3396 0.4611 0.4883 0.2837 0.7711 0.3885 0.3209 0.6345 0.5357 0.3335 0.3363 0.2382 0.7690 0.7051 0.5585 0.0942 0.3596 0.4023 0.0611 0.2293 0.3003 0.2804 0.3427 0.3910 0.2190 0.6091 0.1229 0.5129 0.3925 0.6502 0.3122 0.1425 0.3655 0.3128 0.4538 0.3508 0.2393 0.1848 0.1832 0.3686 0.4753 0.2202 0.2916 0.1529 0.4572 0.1940 0.5903 0.4305 0.6020 0.3021 0.2340 0.3540 0.2544 0.0945 0.0904 0.3336 0.3883 0.1092 0.1477 0.1495 0.4081 0.2703 0.1219 0.2045 0.2213 0.0660 0.3784 0.0515 0.3963 0.1393 0.1286 0.1375 0.3380 0.1055 0.1054 0.1798 0.1337 0.1796 0.1677 0.1550 0.2273 0.1931 0.0018 0.2549 0.1706 0.2084 0.3578 0.2389 0.2895 0.5135 0.1728 0.4065 0.2784 0.2141 0.4737 0.3126 0.2444 0.3518 0.0562 0.4498 0.3921 0.7058 0.0351 0.2654 0.1625 0.0268 0.0770 0.3026 0.1874 0.1322 0.3231 0.1402 0.5565 0.0746 0.2425 0.4436 0.5882 0.2175 0.1012 0.3598 0.1211 0.3199 0.3489 0.0713 0.0958 0.2390 0.2240 0.2821 0.0878 0.1401 0.0978 0.5774 0.3494 0.3270 0.7222 0.5046 0.3337 0.4272 0.2325 0.2654 0.3598 0.0396 0.3675 0.2900 0.0515 0.0852 MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 45 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® To check the mapping induced by the similarity matrices, we show in Table 1 the similarity matrix for the product aggregation of the dimension-wise similarity measures of the valence, activation, and dominance scales. The location of the maximum of each row (bold) shows the final translation from the larger vocabulary (rows) to the smaller vocabulary (columns). The most glaring error is that “fearful” is not in the range of the mapping from large vocabulary to small vocabulary due to relatively low similarity to any word in the blog mood vocabulary. Cases where one would expect to have a mapping to “fearful” (e.g., “anxious,” “stressed”) do show ele- Spanish to IEMOCAP Translation Performance 0.8 0.6 0.4 0.2 0.0 Sum/Avg Aggregation Product Aggregation Sum w/Valence and Activation Product w/Valence and Activation Linguistic Weighted Average VSM Similarity Jaccard Similarity Subsethood FIGURE 6 Performance of translating from the Spanish emotion vocabulary to the categorical emotion vocabulary, which was the set of emotion labels used for annotating the IEMOCAP corpus [52]. 0.5 Spanish to LiveJournal Translation Performance 0.4 0.3 vated similarity to “fearful” but “angry” or “disgusted” are higher. The observation that most of the values in the “fearful” column are lower than the other columns, we normalized each column by its maximum value. Doing this does in fact produce the intuitive mapping of “anxious” and “stressed” to “fearful,” but also changed other values. To better quantify the intuitive goodness of the mapping from one vocabulary to another, we undertook an evaluation based on human performance on the same mapping task. We found that at least one of the subject’s choices matched the predicted mapping except in the following five cases (i.e., performance of approximately 84%): “confused,” “busy,” “anxious,” “hungry,” and “hopeful.” Filtering out clearly nonemotion words like “hungry” may have improved the results here, but our aim was to use a possibly noisy large vocabulary, since the data came from the web. To see if the fuzzy logic approach agreed with a simpler approach, we converted the survey interval end-points to single points by taking the midpoints of the subjects’ intervals and then averaging across all subjects. As points in the 3-D emotion space, the mapping performance of Euclidean distance was essentially the same as those determined by the fuzzy logic similarity measures. However, a simple Euclidean distance metric loses some of the theoretical benefits we have argued for, as it does not account for the shape of the membership functions and cannot account for subsethood. Based on the membership functions from the Spanish survey and the previous English surveys, we constructed similarity matrices between the Spanish words as input and the English words as output. The similarity matrix of the Spanish words and the Emotion Category Word vocabulary are shown in Table 2. Overall, the best performance of 86.7% came from mapping from the Spanish vocabulary to the Emotion Category Word vocabulary using similarity (rather than subsethood), and aggregating the scale-wise similarities using the multiplicative product of the three scales. The performance of mapping from Spanish to the Blog Mood vocabulary was worse that with the Emotion Category Word vocabulary as output because the much larger size of the Blog Mood vocabulary resulted in more confusability. The best performance for this task was 50% using similarity and linguistic weighted average for aggregating the similarities. A comparison of the different similarity and aggregation methods can be seen in Fig. 6 for mapping from Spanish to the Emotion Category Word vocabulary and Fig. 7 for mapping from Spanish to the Blog Moods vocabulary. 0.2 B. Propositional Model (Model 2) 0.1 0.0 Sum/Avg Aggregation Product Aggregation Sum w/Valence and Activation Product w/Valence and Activation Linguistic Weighted Average VSM Similarity Jaccard Similarity Subsethood FIGURE 7 Performance of translating Spanish emotion words to liveJournal mood labels (colloquial emotion words). 46 For the propositional model, we collected a set of 1228 question-answer pairs from 110 human-human EMO20Q matches, in which 71 unique emotion words were chosen. In these matches, the players successfully guessed the other players’ emotion words in 85% of the matches, requiring on average 12 turns. In the set of question-answer pairs there were 761 unique answer strings. We selected a set of 99 answers based on IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® frequency of occurrence and how well the set covered the space from affirmaNo Maybe Kind Of Probably tive to negative answers. We used the 1.0 0.8 interval approach to obtain fuzzy sets 0.6 for the answers to yes/no questions. 0.4 0.2 A sample of these are shown in 0.0 Figure 8. To evaluate these, we deterNope Not Really Sometimes Yes Usually 1.0 mined the extent to which the medi0.8 ans from the single handle slider survey 0.6 0.4 were a full or partial members in the 0.2 fuzzy sets der ived from interval 0.0 Definitely Not Possibly Not I Think So Certainly approach’s double handle slider survey, 1.0 which used different subjects but the 0.8 0.6 same stimuli. We found that the IT2 0.4 FSs from the interval approach surveys 0.2 0.0 corresponded well with the singleNo, Not Normally Sort Of Perhaps Yes 1.0 slider data. All of the estimated IT2 FSs 0.8 except one contained the median of 0.6 the single-slider values, i.e., 99%. This 0.4 0.2 word, “NO!”, was a singleton IT2 FS 0.0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 at zero, while the median from the single slider was at one (on the scale from 0 to 100). The average value of the IT2 FIGURE 8 Example IT2 FSs calculated with the enhanced interval approach for answers to yes/ FS membership functions (which is an no questions. interval-valued range) at points corresponding to the median of the singleslider values was (0.41,0.84). To evaluate the enhanced interval emotions as a sparse vector of truth values over propositions approach (EIA), we found that the EIA-derived IT2 FSs perabout emotions. formed nearly as well. The IT2 FSs contained all but two of the First, we examine the relative benefits and drawbacks of the median single-slider (~98%) and the average membership of two models we proposed: the first model based on valence, the median single-slider values was (0.12,0.89). activation, and dominance scales, and the second model based Beyond these quantitative measurements, the membership on questions about emotions whose answers are rated on a functions from model 2 are qualitatively tighter than those of scale from true to false. model 1, especially with the enhanced interval approach. The first model captures intuitive gradations between emoThough some of the membership functions span large portions tions. For example, the relation of “ecstatic” and “happy” can be of the domain, these are answers that signify uncertainty (such seen in their values on the scales: “ecstatic” will a subset of as “kind of,” “I think so,” and “perhaps” in Figure 8). This was “happy” with valence and activation values more to the extreme in contrast to model 1, which more frequently resulted in periphery. Also, the scales used by the first model are languagebroad membership functions with wide footprints of uncerindependent, iconic representations of emotion, which enables tainty. The data and code for the experiments of model 2 can researchers to use the same scales for multiple languages. be accessed at http://code.google.com/p/cwwfl/. However, for the first model, each word needs an interval survey on the three scales to calculate the membership function for the word, which is laborious and limits the model to words VI. Discussion whose membership functions have been calculated already. Variables that range over sets and functions rather than indiAlso, as we have seen, performance degrades with the size of vidual numbers are important developments for modern the vocabulary. Some of the performance degradation can be mathematics, and further, variables that range over proofs, expected due to the inherent difficulty of making a decision automata languages, and programs further add to the richwith more choices. However, limiting the representation to ness of objects that can be represented with variables. This three scales does also limit the resolution and expressiveness of paper looked at expanding the domain of variables to the model. include emotions. To model a seemingly non-mathematical The second model, on the other hand, gives a better resoluobject in such a way, we use fuzzy sets, another relatively tion when there is a large number of emotions.With more emonew type of variable. This paper proposed two models for tions, more expressivity is needed than just valence, activation, emotion var iables, one that represented the meaning and dominance. To give examples of some of the emotion words of emotion words on a three dimensional axis of valence, from EMO20Q that are difficult to represent with only valence, activation, and dominance, and another that represented MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 47 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® We consider two basic relations on emotional variables: similarity and subsethood. activation, and dominance, we can see that “pride,” “vindication,” and “confidence,” might all have similar valence, activation, and dominance values, so it would be hard to distinguish these on the basis of only the three scales. By representing emotions with propositions based on questions from EMO20Q, we can use a single fuzzy scale for any arbitrary proposition: once the scales are established the bulk of the data can be collected purely in natural language. Moreover, the propositional truth-value scale can be used for other domains besides emotions. However, with the second model there is no clear way to compare emotions that were not asked the same set of questions. In the EMO20Q game, the questions are seen as they occur in the game. It will be necessary to collect more data outside of the game to make sure that all the prevalent questions are asked about each emotion. Even though we can use a single fuzzy scale for each proposition’s truth-value the set of all propositions about emotions is a vast open set, so data collection is still an issue. Since the propositions are based on a specific human language, the equivalence of different propositions in different languages is not as apparent as in the first model. There were several modifications that we made to the interval approach to make it more robust for when all intervals are discarded by the preprocessing. It was determined that the final removal of all intervals took place in the reasonable interval processing stage. The modification to the original interval approach involved keeping the intervals in this stage if all would have been removed. This had the effect of creating a very broad membership function with a lower membership function that was zero at all points. The enhanced interval approach improved the rejection of intervals in various stages by separately considering interval endpoint criteria and interval length criteria. For the first model, the enhanced interval approach yielded worse results when using the translation task as a evaluation metric. This was due to the narrower membership functions that the enhanced interval approach was designed to produce. In the case of similarity and subsethood calculation, the narrower membership function led to more zero entries in the calculation of similarity and subsethood. In the translation task, this resulted in a less robust translation because small variations in the membership function would yield a disproportionate change in similarity and subsethood values. However, in the case of the second model, where the fuzzy sets are used in a more traditional fashion, i.e., as propositional truth quantifiers, the enhanced interval approach did in fact yield membership functions that appeared to more tightly contain the single slider results and performed as well on the evaluation metric we used for this task. The different models both use IT2 FSs, but beyond that, they present different approaches in the representation of emotion descriptions. Because of the difference in approach 48 and the resulting format of the model, they were difficult to evaluate in the same way. For the first model, because the fuzzy scales of valence, activation, and dominance are directly tied to the emotion representation and because the scales are nonlinguistic in nature (they are labeled with a cartoon manikin), the cross-language translation task was a possible evaluation metric. However, the fuzzy scales used in the second model are indirectly linked to emotions via linguistic propositions about emotions. Since the propositions about emotions are specific to a given language, the translation task is not directly facilitated by this model. From the comments given by the subjects of the survey, for model 1, we found that subjects reported confusion with the scale of dominance, despite the pictorial representation in the survey. For model 2, we found that the interpretation of linguistic truth values was a source of reflection for the subjects and this provided insight into the variation that may have otherwise been attributed to lack of cooperation on the part of the Amazon Mechanical Turkers. For example, the stimulus “definitely,” from a logical point of view would be assumed to be a strong “yes.” However, several Turkers mentioned that they realized that, when they use the word “definitely,” they do not mean “definitely” in the logical sense, but rather that the colloquial meaning is somewhat more relaxed. From the fuzzy set representation point of view, it may be advantageous to recognize distinct senses for the meaning of words and phrases. In the case mentioned, the word “definitely” could have colloquial sense and a logical sense. Another example of this was in the control phrases we used in the second model. For example “not certainly” was often confused with “certainly not.” This is not to say that all the Turkers were cooperative and took the time to understand the task, but it shows that there are many factors involved with measuring uncertainty. From Figure 3, we can see that the default value of the slider (in this case, a single slider at the middle of the scale) was a salient point of outliers. Modeling the effects of uncooperative users who may click through as quickly as possible is one possible improvement that could be made to the interval approach from the data processing point of view. Our conclusion in comparing the two models is that for basic emotions the valence, activation, and dominance scales of model 1 would suffice. Examples of a use-case for the first model would be for converting a large, expressive set of emotion labels to a smaller set for the purpose of training a statistical classifier. However, for the class of all words used to describe emotions in natural language, the representational power of first model’s valence, activation, and dominance scales is not sufficient. To fully understand what a given emotion word means to someone, our work indicates that the second model is a better model if the modeling goal is to represent a larger vocabulary and finer shades of meaning. VII. Conclusions In this paper we presented two models to represent the meaning of emotion words. We gave an explicit description of IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® meaning in our models. The first model involved interpreting the emotion words as three-dimensional IT2 FSs on the dimensions of valence, activation, and dominance. This model allowed us to map between emotion vocabularies of different sizes and different languages. The mapping was induced by picking the most similar word of the output vocabulary given the input vocabulary word. The similarity used for this mapping was derived from similarity or subsethood measures of the individual dimensions that were aggregated into a single measure for each pair of input and output vocabulary words. We devised a second model that addresses the challenges that arise when the vocabulary of emotion words is large. Instead of the lower dimensional representation in terms of valence, activation, and dominance scales, the second model used a high dimensional representation where the emotion words were represented in terms of answers to questions about emotions, as determined from data from the EMO20Q game. In the second model, IT2 FSs were used to represent the truth values of answers to questions about emotions. We found that the second model was necessary to capture more highly nuanced meaning when the vocabulary of emotion words was large. Acknowledgment The authors would like to thank Jerry Mendel, Dongrui Wu, Mohammad Reza Rajati, Ozan Cakmak, and Thomas Forster for their discussion. We would also like to thank Rebeka Campos Astorkiza, Eduardo Mendoza Ramirez, and Miguel Ángel Aijón Oliva for helping to translate the Spanish version of our experiment. References [1] J. A. Russell and A. Mehrabian, “Evidence for a three-factor theory of emotions,” J. Res. Personality, vol. 11, pp. 273–294, Sept. 1977. [2] F. Liu and J. M. Mendel, “An interval approach to fuzzistics for interval type-2 fuzzy sets,” in Proc. Fuzzy Systems Conf., 2007, pp. 1–6. [3] F. Liu and J. M. Mendel, “Encoding words into interval type-2 fuzzy sets using an interval approach,” IEEE Trans. Fuzzy Syst., vol. 16, no. 6, pp. 1503–1521, 2008. [4] D. W. Massaro and M. M. Cohen, “Fuzzy logical model of bimodal emotion perception: Comment on ‘The perception of emotions by ear and by eye’ by de Gelder and Vroomen,” Cogn. Emotion, vol. 14, no. 3, pp. 313–320, 2000. [5] M. Grimm, K. Kroschel, E. Mower, and S. Narayanan, “Primitives-based evaluation and estimation of emotions in speech,” Speech Commun., vol. 49, pp. 787–800, Dec. 2006. [6] C. M. Lee and S. Narayanan, “Emotion recognition using a data-driven inference system,” in Proc. Eurospeech, Geneva, Switzerland, 2003, pp. 157–160. [7] D. Wu, T. D. Parsons, E. Mower, and S. Narayanan, “Speech parameter estimation in 3D space,” in Proc. IEEE Int. Conf. Multimedia Expo, 2010, pp. 737–742. [8] A. Konar, A. Chakraborty, A. Halder, R. Mandal, and R. Janarthanan, “Interval type-2 fuzzy model for emotion recognition from facial expression,” in Proc. Perception Machine Intelligence, 2012, pp. 114–121. [9] M. El-Nasr, J. Yen, and T. R. Ioerger, “Flame: Fuzzy logic adaptive model of emotions,” Auton. Agents Multi-Agent Syst., vol. 3, no. 3, pp. 219–257, 2009. [10] C. M. Whissell, The Dictionary of Affect in Language. New York: Academic Press, 1989, pp. 113–131. [11] A. Kazemzadeh, “Précis of dissertation proposal: Natural language descriptions of emotions,” in Proc. ACII (Doctoral Consortium), 2011, pp. 216–223. [12] S. Kim, P. G. Georgiou, S. S. Narayanan, and S. Sundaram, “Supervised acoustic topic model for unstructured audio information retrieval,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 2010, pp. 243–246. [13] S. Sundaram and S. S. Narayanan, “Classification of sound clips by two schemes: Using onomatopoeia and semantic labels,” in Proc. IEEE Int. Conf. Multimedia Expo, 2008, pp. 1341–1344. [14] M. Grimm and K. Kroschel, “Rule-based emotion classification using acoustic features,” in Proc. Int. Conf. Telemedicine Multimedia Communication, 2005. [15] Q. Liang and J. Mendel, “Interval type-2 fuzzy logic systems: theory and design,” IEEE Trans. Fuzzy Syst., vol. 8, no. 5, pp. 535–550, 2000. [16] A. Kazemzadeh, S. Lee, and S. Narayanan, “An interval type-2 fuzzy logic system to translate between emotion-related vocabularies,” in Proc. Interspeech, pp. 2747–2750, 2008. [17] A. Kazemzadeh, “Using interval type-2 fuzzy logic to translate emotion words from Spanish to English,” in Proc. IEEE World Conf. Computational Intelligence FUZZ-IEEE Workshop, 2010, pp. 1–8. [18] O. Cakmak, A. Kazemzadeh, and S. Yildirim, and S. Narayana, “Using interval type-2 fuzzy logic to analyze Turkish emotion words,” in Proc. APSIPA Annu. Summit Conf., 2012, pp. 1–4. [19] S. Coupland, J. M. Mendel, and D. Wu, “Enhanced interval approach for encoding words into interval type-2 fuzzy sets and convergence of the word FOUs,” in FUZZIEEE World Cong. Computational Intelligence, 2010, pp. 1–8. [20] J. M. Mendel, R. I. John, and F. Liu, “Computing with words and its relations with fuzzistics,” Inform. Sci., vol. 177, no. 4, pp. 988–1006, 2007. [21] J. M. Mendel, “Computing with words: Zadeh, Turing, Popper and Occam,” IEEE Comput. Intell. Mag., vol. 2, no. 4, pp. 10–17, 2007. [22] L. A. Zadeh, “Fuzzy logic = computing with words,” IEEE Trans. Fuzzy Syst., vol. 4, pp. 103–111, May 1996. [23] J. M. Mendel and D. Wu, Perceptual Computing: Aiding People in Making Subjective Judgements. Piscataway, NJ: IEEE Press, 2010. [24] J. M. Mendel and D. Wu, “Challenges for perceptual computer applications and how they were overcome,” IEEE Comput. Intell. Mag., vol. 7, pp. 36–47, Aug. 2012. [25] G. Frege, “Über sinn und bedeutung,” in Zeitschrift für Philosophie und Philosophische Kritik, 1892, pp. 25–50. [26] T. Forster, Logic, Induction, and Sets. Cambridge, U.K.: Cambridge Univ. Press, 2003. [27] B. Ganter, G. Stumme, and R. Wille, Eds., Formal Concept Analysis: Foundation and Applications. Berlin, Germany: Springer-Verlag, 2005. [28] C. E. Osgood, G. J. Suci, and P. H. Tannenbaum, The Measurement of Meaning. Urbana, IL: Univ. Illinois Press, 1957. [29] D. E. Heise, “Semantic differential profiles for 1000 most frequent English words,” Psychol. Monographs, vol. 79, no. 8, pp. 1–31, 1965. [30] J. A. Russell, “A circumplex model of affect,” J. Personality Soc. Psychol., vol. 39, no. 6, pp. 1161–1178, 1980. [31] M. M. Bradley and P. J. Lang, “Measuring emotion: The self-assessment manikin and the semantic differential,” J. Behav. Therapy Exp. Psych., vol. 25, pp. 49–59, Mar. 1994. [32] J. R. Fontaine, K. R. Scherer, E. B. Roesch, and P. C. Ellsworth, “The world of emotions is not two-dimensional,” Psychol. Sci., vol. 18, pp. 1050–1057, Dec. 2007. [33] I. B. Türksen, “Computing with descriptive and veristic words,” in Proc. Int. Conf. North American Fuzzy Information Processing Society, 1999, pp. 13–17. [34] L. A. Zadeh, “From search engines to question answering systems–the problems of world knowledge, relevance, deduction and precisiation,” in Fuzzy Logic and the Semantic Web, E. Sanchez, Ed. The Netherlands: Elsevier, 2006, ch. 9, pp. 163–211. [35] M. R. Rajati, H. Khaloozadeh, and W. Pedrycz, “Fuzzy logic and self-referential reasoning: A comparative study with some new concepts,” Artificial Intell. Rev., pp. 1–27, Mar. 2012. [36] T. Forster, Reasoning About Theoretical Entities. Singapore: World Scientific, 2003. [37] A. Kazemzadeh, P. G. Georgiou, S. Lee, and S. Narayanan, “Emotion twenty questions: Toward a crowd-sourced theory of emotions,” in Proc. ACII’11, 2011, pp. 1–10. [38] A. Kazemzadeh, J. Gibson, P. Georgiou, S. Lee, and S. Narayanan, “EMO20Q questioner agent,” in Proc. ACII (Interactive Event), 2011, pp. 313–314. [39] A. Kazemzadeh, S. Lee, P. G. Georgiou, and S. Narayanan, “Determining what questions to ask, with the help of spectral graph theory,” in Proc. Interspeech, pp. 2053–2056, 2011. [40] A. Kazemzadeh, J. Gibson, J. Li, S. Lee, P. G. Georgiou, and S. Narayanan, “A sequential Bayesian agent for computational ethnography,” in Proc. Interspeech, Portland, OR, 2012. [41] L. F. Barrett, “Are emotions natural kinds?” Perspectives Psychol. Sci., vol. 1, pp. 28–58, Mar. 2006. [42] L. A. Zadeh, “The concept of a linguistic variable and its application to approximate reasoning-I,” Inform. Sci., vol. 8, no. 3, pp. 199–249, 1975. [43] R. John and S. Coupland, “Type-2 fuzzy logic: Challenges and misconceptions,” IEEE Comput. Intell. Mag., vol. 7, pp. 47–52, Aug. 2012. [44] J. M. Mendel, Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions. Upper Saddle River, NJ: Prentice Hall, pp. 451–453, 2001. [45] J. M. Mendel, R. I. John, and F. Liu, “Interval type-2 fuzzy logic systems made simple,” IEEE Trans. Fuzzy Syst., vol. 14, no. 6, pp. 808–821, 2006. [46] D. Wu and J. Mendel, “A vector similarity measure for linguistic approximation: Interval type-2 and type-1 fuzzy sets,” Inform. Sci., vol. 178, no. 2, pp. 381–402, 2008. [47] B. Kosko, “Fuzzyness vs. probability,” Int. J. General Syst., vol. 17, nos. 2–3, pp. 211–240, 1990. [48] D. Wu and J. M. Mendel, “The linguistic weighted average,” in FUZZ-IEEE, Vancouver, BC, pp. 566–573, 2006. [49] P. Ekman, “Facial expression and emotion,” Amer. Psychol., vol. 48, no. 4, pp. 384– 392, 1993. [50] G. Mishne, “Applied text analytics for blogs,” Ph.D. dissertation, Univ. Amsterdam, Amsterdam, The Netherlands, 2007. [51] T. F. Cox and M. A. A. Cox, Multidimensional Scaling, 2nd ed. Boca Raton, FL: CRC Press, 2000. [52] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. Chang, S. Lee, and S. Narayanan, “IEMOCAP: Interactive emotional dyadic motion capture database,” J. Lang. Resour. Eval., vol. 42, no. 4, pp. 335–359, 2008. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 49 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Qiong Wu and Chunyan Miao Nanyang Technological University, SINGAPORE I. Introduction W ith the advances in computer graphics, communication technologies and networking, virtual worlds are rapidly becoming part of the educational technology landscape [1]. Dede [2] suggests that the immersive interfaces offered by virtual worlds can promote learning, by enabling the design of educational experiences that are challenging or even impossible to duplicate in real world. In recent years, the usage of virtual worlds within the educational context is growing quickly. The New Media Consortium (NMC) Annual Survey on Second Life (SL) received 170% increase in response rate between 2007 and 2008. They also found that many of the educators who earlier used the existing SL, have started creating their own virtual worlds in less than a year’s time [3]. Virtual Singapura1 (VS) is a Virtual Learning Environment (VLE) designed to facilitate the learning of plant transport systems in lower secondary school. It has been employed in various studies, such as design perspectives for learning in VLE, pre-service teachers’ perspectives on VLE in science education, product failure and impact of structure on learning in VLE, slow pedagogy in scenario-based VLE, and what students learn in VLE, etc. [4]–[8]. Till 1 http://virtualsingapura.com/game/project/ © DIGITAL STOCK Digital Object Identifier 10.1109/MCI.2013.2247826 Date of publication: 11 April 2013 50 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 1556-603X/13/$31.00©2013IEEE M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® the same is provided in Section II. Berlyne [17] identified date, over 500 students in Singapore and over 300 students in four factors, viz., novelty, uncertainty, conflict and complexity, Australia have played VS. During the field studies of Virtual that can stimulate curiosity and determine the stimulation Singapura, several issues with learning in VLE have been level. Wundt [18] postulated an inverted U-shape relationship observed. First, students tend to spend more time exploring between stimulation level and the arousal of three curiosthe landscape of the virtual world rather than concentrating ity-related emotions: boredom, curiosity and anxion the learning content. Second, some low-funcety. This relationship demonstrates that tioning students studying alone in VLE often too little stimulation results in boredom, get confused or stuck, and require constant too much stimulation results in anxiety guidance from teachers or game designAbstract—Existing Virtual and only optimal stimulation can ers to move forward. Learning Environments (VLE) have two major issues: (1) students result in curiosity. Based on these observations, we tend to spend more time playing than Based on these psychological propose a virtual peer learner to learning and (2) low-functioning students background, curiosity appraisal reside in VLE and accompany stuoften face difficulty progressing smoothly. To for the proposed virtual peer dents in learning. The idea is address these issues, we propose a virtual peer learner is modeled as a twoderived from the common edulearner, which is guided by the educational theory of peer learning. To create a human-like, naturally step process: (1) determinacational practice of peer learnbehaving virtual peer learner, we build a computational tion of stimulation level and ing, where students learn with model of curiosity for the agent based on human psy(2) mapping from the and from each other without chology. Three curiosity-related emotions, namely boredeter mined stimulation the immediate intervention dom, curiosity and anxiety, are considered. The appraisal of level to the corresponding of a teacher [9]. Benefits of a these emotions is modeled as a two-step process: determination of stimulation level and mapping from the stimulaemotions. In the decisionpeer learner include: a peer tion level to emotions. The first step is modeled based on making system of the learner can present “learning Berlyne’s theory, by considering three factors that contribvirtual peer learner, curiostriggers”, that are interactions ute to the arousal of curiosity: novelty, conflict and comity-related emotions act as or experiences causing stuplexity. The second step is modeled based on Wundt’s theintrinsic rewards and infludents to try new things or to ory, by dividing the spectrum of stimulation level into three aforementioned emotion regions. Emotions ence the agent’s action think in novel ways; bi-direcderived from the appraisal process serve as intrinsic strengths. In order to demtional peer relationships can rewards for agent’s behavior learning and influence onstrate the effectiveness of facilitate professional and perthe effectiveness of knowledge acquisition. Empiricuriosity-related emotions, we sonal growth; and tapping into a cal results indicate curiosity-related emotions can simulate virtual peer learners in learner’s own experience can be drive a virtual peer learner to learn a strategy similar to what we expect from human stuVS and conduct two sets of both affirming and motivating dents. A virtual peer learner with curiosexper iment. The first set of [10]. Hence, a virtual peer learner ity exhibits higher desire for exploraexperiment shows that curiosityhas the potential to engage students tion and achieves higher learning related emotions can drive the virtual and motivate them to spend more time efficiency than one withpeer learner to learn a natural behavior on the learning content. Also, a virtual out curiosity. strategy similar to what we expect from peer learner can potentially help low-funchuman students. The second set of experiment tioning students to think and learn better in VLE. shows that a curious peer learner exhibits higher In order to design a virtual peer learner that can level of exploration breadth and depth than a non-curious emulate a real student and behave naturally in the learning propeer learner. cess, we believe a psychologically inspired approach is necessary. The rest of the paper is organized as follows: Section II In human psychology, studies have shown that curiosity is an presents the psychological background for this research. important motivation that links cues reflecting novelty and Section III provides a short review on existing curiosity challenge with natural behavior such as exploration, investigamodeling systems. Subsequently, in Section IV, we state tion and learning [11]. In Reiss’s [12] 16 basic desires that the key differences between our approach and the existing motivate our actions and shape our personalities, curiosity is curiosity modeling systems. Next, we present the prodefined as “the need to learn.” Attempts to incorporate curiosposed curious peer learner in Section V. Section VI disity into Artificial Intelligence find curious machines have cusses the experimental process and the results obtained. advanced behavior in exploration, autonomous development, Finally, the major conclusions and future works are sumcreativity and adaptation [13]–[16]. However, as a basic desire marized in Section VII. that motivates human active learning [12], the role of curiosity in a virtual peer learner is relatively unexplored. In this work, we study the role of curiosity in simulating II. Psychological Background human-like behavior for virtual peer learners. To model the In psychology, a major surge of study on curiosity began in appraisal process of curiosity, we get inspirations from psy1960s. Loewenstein [19] divided theories on curiosity into chological theories on human curiosity. A short review on three categories: incongruity theory, competence theory and MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 51 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® postulated an inverted U-shape relationship between the stimulation level and three curiosity-related emotions. According to him, too little stimulation results in boredom while too much stimulation results in anxiety, and only optimal stimulation results in curiosity. In this work, both Berlyne’s theory and Wundt’s theory serve as the psychological background for modeling humanlike curiosity in autonomous virtual peer learners. Artificial Intelligence research frequently assumes that the human decision-making process consists of maximizing positive emotions and minimizing negative emotions. drive theory. The incongruity theory holds on the idea that curiosity is evoked by violation of expectations [20], while the competence theory views curiosity as an intrinsic motivation to master one’s environments [21]. However, as Loewenstein noted, both the incongruity theory and competence theory fail to give a comprehensive account of curiosity. Hence, we focus on the drive theory, advocating the existence of a curiosity drive, either primary (homeostatic generated as hunger) or secondary (externally generated by stimuli) and look in depth at Berlyne’s theory. In order to understand curiosity, Berlyne conducted extensive studies by observing the behavior of humans and animals [17]. Different from traditional psychological researches that concentrated on problems of response selection (what response human will make to one standard stimulus at a time), Berlyne interpreted curiosity as a process of stimulus selection (when several conspicuous stimuli are introduced at once, to which stimulus will human respond). Consider a real life scenario, when a child is given several toys at the same time, he will choose one toy out of the many to play with. The study of curiosity tries to understand the underlying mechanism that drives the child to select one stimulus (toy) when faced with many choices. Berlyne identified four major factors, viz., novelty, uncertainty, conflict and complexity, that can lead to curiosity and determine the stimulation level. Novelty refers to something new. For instance, the child would be attracted to a toy with new features, such as a toy car with story telling functions. Uncertainty arises when a stimulus is difficult to classify. The likelihood or degree of uncertainty depends on the number of possible classes that the particular stimulus belongs to. For example, the child may be interested in a toy vehicle with both sails and wings, because he cannot immediately tell if it is a ship or plane. Conflict occurs when a stimulus arouses two or more incompatible responses simultaneously in an organism. For example, the experience of playing with a toy car, requires the child to press the forward button to win a race with a friend’s toy car, while at the same time demands the child to press the backward button to dodge a barrier. This may engage the child in playing and make him decide to choose the car again. Complexity is roughly defined as the amount of variety or diversity in a stimulus pattern. For example, the child may choose a jigsaw puzzle with twenty pieces rather than one with only four pieces. However, a higher level of stimulation does not necessarily lead to a higher level of curiosity. Wundt [18] introduced the theory of “optimal level of stimulation” and 52 III. Existing Curiosity Modeling Systems In the past two decades, curiosity has successfully attracted attention of numerous researchers in the field of Artificial Intelligence. In this section, we will provide a short review on existing curiosity modeling systems. From the machine learning perspective, curiosity has been proposed as algorithm principles to focus learning on novel and learnable regularities, in contrast to irregular noise. For example, Schmidhuber [22] introduced curiosity into modelbuilding control systems. In his work, curiosity is modeled as the prediction improvement between successive situations and is an intrinsic reward value guiding the selection of training examples such that the expected performance improvement is maximized. In autonomous robotic developmental systems, Oudeyer and Kaplan [23] proposed an Intelligent Adaptive Curiosity (IAC) mechanism and modeled curiosity as the prediction improvement between similar situations instead of successive situations. Curiosity has also been modeled in exploratory agents to explore and learn in uncertain domains. For example, Scott and Markovitch [16] introduced curiosity for intelligent agents to learn unfamiliar domains. They adopted a heuristic that “what is needed is something that falls somewhere between novelty and familiarity,” where novelty is defined as a measure of how uncertain the agent is about the consequence of a stimulus. Uncertainty is implemented as Shannon’s entropy of all the possible outcomes to a stimulus. The system can learn a good representation of the uncertain domain because it will not waste resources on commonly occurred cases but concentrate on less common ones. Another work is done by Macedo and Cardoso [13], who modeled curiosity in artificial perceptual agents to explore uncertain and unknown environments. This model relies on graph-based mental representations of objects and curiosity is implemented as the entropy of all parts that contain uncertainty in an object. In creative agents, curiosity has been modeled as an intrinsic evaluation for novelty. For example, Saunders and Gero [24] developed a computational model of curiosity for “curious design agents” to search for novel designs and to guide design actions. A Self-Organizing Map (SOM) is employed as the “conceptual design space” for the agent. For a given input, novelty is implemented as a measure of cluster distance.This measure reflects the similarity between newly encountered design patterns with previously experienced ones. In Merrick and Maher’s IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® model [14], they utilized an improved SOM model named Habituated SelfOrganizing Map (HSOM) to cluster Sensors similar tasks and novelty is calculated by External Stimuli a habituation function. Stateexternal To summarize, in existing works, Curiosity Appraisal curiosity has been integrated into Learnt Learning of Determination of Emotions Knowledge agents’ learning module and decision Stimulation Level Reinforcement State-Action Memory module to enhance their perforMapping mance. However, these agents can Mapping from Energy hardly be perceived to be believable Stimulation Intrinsic Level to Emotions by a human observer. There are two Constraint Agent’s main reasons for this: (1) existing Actions models lack a comprehensive psyUpdate Emotions chological theory as background, and Influence on Action Strength (2) agents perceive environment on the machine language level (featureUpdate Actuators based knowledge representation) rather than on the human language level (semantic knowledge represenFIGURE 1 Architecture of the curious peer learner. tation). Hence, in this work, we attempt to build a computational model of curiosity based on human psychology and by [30]. Another function of curiosity-related emotions is their adopting a semantic knowledge representation method. influence on the agent’s knowledge acquisition ability. This is inspired by human nature, where our learning ability can be regulated by different emotion states [31]. IV. An Overview of Our Approach An overview of the key innovations in our approach is given as follows: V. The Curious Peer Learner First, to mimic a human student, a virtual peer learner In this section, we present the proposed virtual peer learner should perceive the VLE at the same level as a human stuwith curiosity-related emotions, referred to as curious peer dent does. Hence, instead of feature-based knowledge reprelearner. Architecture of the curious peer learner is shown in sentations, most commonly utilized in existing works, we Fig. 1. It can be observed that the curious peer learner can employ a semantic knowledge representation, that can easily sense external states (e.g., in a learning zone) and receive exterbe interpreted by humans and is more suitable for designing nal stimuli (e.g., learning tasks). The external stimuli can trigger virtual peer learners. In this work, we adopt Concept Map the curious peer learner to perform curiosity appraisal. The (CM), a semantic knowledge representation stemming from curiosity appraisal requires learnt knowledge stored in the the learning theory of constructivism. It has been widely agent’s memory and consists of two steps: determination of applied in classrooms for knowledge organization [25] and stimulation level and mapping from stimulation level to emomany educational softwares for modeling the mind of stutions. Emotions derived from the curiosity appraisal process dents [26], [27]. serve two functions: (1) as reinforcement value for the learning Second, the measurement of stimulation level incorporates of state-action mapping, and (2) as influence on action three dimensions of information proposed by Berlyne [17], strengths (e.g., the depth of learning). Actions (e.g., explore) including novelty, conflict and complexity. The calculation of derived from the learning of state-action mapping module are stimulation level is based on an extension and transformation of performed by actuators, and update intrinsic constraints of the Tversky’s ratio model [28]. agent (e.g., energy). In the rest of this section, detailed working Third, we explicitly model three curiosity-related emotions: mechanism of each module will be introduced. boredom, curiosity and anxiety. They are appraised based on Wundt’s theory [18], by adopting two thresholds to divide the A. Memory and Knowledge Representation spectrum of stimulation into three emotion regions. We adopt Concept Maps (CMs) to represent the semantic Finally, curiosity-related emotions are utilized as intrinsic knowledge in both learning tasks (knowledge to be learnt) and reward functions to guide the virtual peer learner’s learning the agent’s memory (knowledge already learnt). of behavior strategy. This is inspired by the frequently adopted A CM is a graph-based representation that describes semanassumption in intrinsically motivated reinforcement learning tic relationships among concepts. It can be represented by a that human decision-making process consists of maximizing directed graph with nodes and edges interconnecting nodes. positive emotions and minimizing negative emotions [29], We formalize the symbolic representation of CMs as follows: MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 53 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® 2. CO2 1. Water Is the Material for Is the Material for 6. Sun Light Aid 3. Chloroplast Is the Location of 7. Photosynthesis Produce Produce 4. Sugar 5. O2 FIGURE 2 CM for the learning task “photosynthesis.” A CM M with n concepts, is defined as: M = {C, L}: 1) C = {c i ; c i ! Pc; i = 1, 2, g, n;} represents the concepts, where Pc is a set of predefined concepts in VLE; 2) L = {l ij ; l ij ! Pl , {null}; i = 1, 2, g, n; j = 1, 2, g, n;} represents the labels describing the relationships between two concepts, where Pl is a set of predefined labels in VLE. Based on the above definition, in CMs, concepts and relationships are all semantic expressions. An example of CM is shown in Fig. 2, wherein the concept set is " c 1: water, c 2: CO 2, c 3: chloroplast, c 4: sugar, c 5: O 2, c 6: sunlight, c 7: photosynthesis , and the label set is " l 17, l 27: is the material for, l 37: is the location of, l 67: aid, l 74, l 75: produce , . A relationship in M is defined as a knowledge point, denoted by k = ^c i, c j, l ij h, where l ij ! null. For example, a knowledge point in Fig. 2 is (water, photosynthesis, is the material for). Knowledge in both learning tasks and the agent’s memory is represented by CMs. Each learning task can be represented by a set of knowledge points, denoted by T = " k 1; k 2, g, k m , . For example, the CM in Fig. 2 can be designed to be a learning task with six knowledge points. Knowledge related to learning task T that has been learnt by the virtual peer learner is represented by Ts, contained in the agent’s memory. B. Curiosity Appraisal Based on psychological theories, curiosity appraisal is modeled as a two-step process: determination of stimulation level and mapping from the stimulation level to emotions. 1) Determination of Stimulation Level Each learning task in VLE is considered as a stimulus. As defined in the previous section, for each learning task, there is a set of knowledge points associated, denoted by T = " k 1; k 2, g, k m , . This set of knowledge points are intended to be learnt by the agent upon finishing the learning task. According to Berlyne, four factors: novelty, uncertainty, conflict and complexity, can stimulate curiosity. With CM based knowledge representation, the most salient factors that can be appraised in a learning task (stimulus) include novelty, conflict and complexity. Novelty and conflict can 54 be reflected in the dissimilarity between knowledge points to be learnt in the learning task (T ) and learnt ones in the agent’s memory (Ts). Complexity can be reflected by the total amount of knowledge points intended to be learnt in the learning task (T ). The appraisal of uncertainty may require more complex knowledge representation that contains uncertain information and will be studied in future works. Next, the appraisal of novelty, conflict and complexity is discussed in detail. We define a novel knowledge point in T as the knowledge point that is a member of T but does not have a corresponding knowledge point in Ts, with the same order of concepts. This indicates that the agent has not learnt the knowledge point before. All novel knowledge points in T are kept in the novelty o Ts . Formally, set, denoted by T o Ts = " k k ! T / J7kl ! Ts, c i = c il / c j = c lj ,, Tk = ^c i, c j, l ij h, kl = ^c il, c lj, l lij h . (1) A conflicting knowledge point in T is defined as the knowledge point that is a member of T and has a corresponding knowledge point in Ts with same order of concepts, but with different labels. This indicates that the agent understands the knowledge point differently from the learning task. All conflicting knowledge points in T are kept in the conflict set, u Ts . Formally, denoted by T u Ts = " k k ! T / 7kl ! Ts, c i = c il / c j = c lj / l ij ! l lij ,, Tk = ^c i, c j, l ij h, kl = ^c il, c lj, l lij h . (2) It can be deduced from the definition that the conflict set u Ts = Ts u T. operator u is symmetric, i.e. T It can also be deduced that set difference T - Ts equals to the union of novelty set and conflict set, i.e., T - Ts = o Ts hj ^T u Tsh . Hence, the set difference from T to Ts con^T tains two types of information in this context: novelty and conflict. In order to measure the level of combined novelty and conflict, we extend Tversky’s classic set similarity measurement, referred to as the ratio model [28], by introducing asymmetry to the novelty and conflict information contained in the set difference. According to the ratio model, the similarity between two sets A and B can be represented by [28]: S ^ A, B h = f ^A + Bh , f ^ A + B h + a f ^ A - B h + b f ^B - Ah a, b $ 0, (3) where f is a scale function, and a, b define the degree of asymmetry. According to Tversky, f is usually the cardinality of a set, reflecting the salience or prominence of various members in the set. Also, f satisfies additivity, i.e., f ^X j Y h = f ^X h + f ^Y h . In the ratio model, S(A, B) is interpreted as the degree to which A is similar to B, where A is the subject of comparison and B is the reference. One naturally focuses on IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q the subject of comparison. Hence, the features of the subject are usually weighed more heavily than the features of the reference, i.e., a 2 b. Next, we extend Tversky’s ratio model to introduce asymmetric measure to the novelty and conflict subsets in the set difference as follows: Let o Bh + p f ^A u B h, d, p $ 0 g^A - Bh = d f ^A - (4) and S ^ A, B h = f ^A + Bh , f ^ A + B h + a g^ A - B h + b g^B - A h a , b $ 0, (6) where D(A, B) is the normalized value containing the dissimilarity information between set A and B. Based on the definition given in (6), the difference between knowledge points in task T and agent’s memory Ts can be represented by: D ^T, Tsh = 1 - S ^T, Tsh a g ^ T - Ts h + b g ^ Ts - T h = , f ^T + Tsh + a g ^T - Tsh + b g ^Ts - T h a , b $ 0. Region of Boredom H1 Region of Curiosity Region of Anxiety H2 Stimulus Intensity (5) where g ^ A - B h is a function of the set difference from A to B, with asymmetry introduced to the novelty and conflict subsets. The parameters d and p give importance to novelty and conflict respectively and determine the degree of asymmetry. Thus, S(A, B) measures the similarity between set A and B, with asymmetry between the set difference: A - B and B - A (determined by a and b ), as well as asymmetry between the two types of information contained in the set difference: novelty and conflict (determined by d and p ). S(A, B) gives the measure of similarity between two sets. However, novelty and conflict are contained in the dissimilarity between two sets, as the union of novelty and conflict forms o Ts h j ^T u Ts h . Hence, the set difference, i.e., T - Ts = ^T in order to measure novelty and conflict, we must define the dissimilarity D(A, B) between two sets: D ^ A, B h = 1 - S ^ A, B h a g ^ A - B h + b g ^B - Ah = , f ^ A + B h + a g ^ A - B h + b g ^B - Ah a , b $ 0, Unpleasantness Pleasantness THE WORLD’S NEWSSTAND® (7) In the appraisal of curiosity, T is the subject of comparison and Ts is the reference. Here, we give full importance to the subject T, because only the difference from T to Ts, i.e., T - Ts, reflects the stimulus’s information, consisting of novelty and conflict. The difference from Ts to T, i.e., Ts - T, also contains two sources of information: (1) learnt knowledge points that are o T, and (2) conflicting not given in the learning task, i.e. Ts o T does not u T. However, Ts knowledge points, i.e., Ts - FIGURE 3 The Wundt curve. reflect the stimulus’s property but rather the agent’s knowledge u T has been considered in T - Ts not given in task T. Also, Ts (due to the symmetry of operator u ). Hence, in the appraisal of curiosity, we assign a = 1 and b = 0. As a result, the difference between T and Ts can be simplified as: t ^T, Tsh = 1 - S ^T, Ts h D g ^T - Tsh = f ^T kTsh + g ^T - Ts h o Tsh + p f ^T u Tsh d f ^T = o Tsh + p f ^T u Tsh , f ^T kTsh + d f ^T d, p $ 0. (8) t reflects the combined appraisal of It can be observed that D novelty and conflict in a learning task T. Now, let us consider the third factor that governs the stimulus selection-complexity. In the context of VLE, complexity of a task T can be measured by the normalized salience of all knowledge points contained in the task, represented by: P ^T h = f ^T h , C = " T1, T2, g, Tn , , max T l! C f ^T lh (9) where C is the set of all the predefined tasks in VLE. t , because Here, we model complexity as a scaling factor for D the value of novelty and conflict can be amplified in very complex tasks and reduced in very simple tasks. For example, searching for an intended piece in a jigsaw puzzle with 1000 pieces is more difficult than searching in one with 10 pieces. Hence, the stimulation level of a learning task T, denoted by X ^T h, is defined as: t ^T, Ts h , X ^T h = P ^T h $ D (10) t ^T, Tsh where P (T ) is the measure of complexity and D reflects the combined appraisal of novelty and conflict in a stimulus as given in (8). MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 55 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® 2) Mapping from Stimulation Level to Emotions In psychology, Wundt introduced the Wundt curve (Fig. 3), an inverted “U-shape” relationship between the stimulation intensity and arousal of emotions [18]. Three emotions are associated along the spectrum of stimulus intensity, where too little stimulation results in boredom, too much stimulation results in anxiety, and optimal stimulation results in curiosity. Based on Wundt’s theory, the appraisal of curiosity related emotions is modeled as follows: If X ^T h # i 1 & Boredom, If i 1 1 X ^T h 1 i 2 & Curiosity, If X ^T h $ i 2 & Anxiety, 0 # i1 # i2 # 1 (11) where X ^T h is the stimulation level of learning task T, obtained from (10), and i 1, i 2 are thresholds that split the stimulus intensity axis into three emotion regions. The two thresholds determine the curious peer learner’s propensity towards each emotion. For example, if i 1 is close to 0 and i 2 is close to 1, then the virtual peer learner will become easily curious about any learning task. On the contrary, if i 1 is very close to i 2, then the virtual peer learner will have a narrow curious region and become very picky about learning tasks. C. Learning of State-Action Mapping In real life, a curious student often exhibits higher tendency to explore and higher ability to acquire novel information [32]. Theoretically, a virtual peer learner with curiosity should also exhibit higher tendency for exploration and higher ability for knowledge acquisition than one without curiosity. In VLE, the believability of a virtual peer learner largely depends on its strategy of state-action mapping. Hence, we allow the virtual peer learner to adapt its behavior strategy based on the mechanism of reinforcement learning [33]. In the remaining part of this section, learning of state-action mapping for the virtual peer learner is presented. 1) States of the Virtual Peer Learner For the virtual peer learner, a state is the combination of inner state and external state: State = State inner # State external . The inner state is defined as a two tuple: State inner =1 emotion, energy 2. Here, emotion denotes the current emotion state of the virtual peer learner. Since we mainly focus on the curiosity-related emotions, emotion can take four values: curiosity, boredom, anxiety and no_emotion. Curiosity, boredom, anxiety are the possible emotions for the virtual peer learner when learning tasks are nearby, i.e. there exist stimuli. When no stimuli are nearby, the virtual peer learner’s emotion is set to be no_emotion. The second element energy is an intrinsic constraint that the virtual peer learner should take into consideration when choosing actions. This is a natural constraint in any learning environment for a real human student. When a student feels fatigue, he/ she will need some rest before continuing to work. Also, different learning tasks may cause different levels of fatigue. For example, 56 browsing through the topics is much easier than studying a topic after it is chosen. Hence, a good student knows how to adjust his/her learning strategy to properly spend energy. The value of energy changes as follows: energy = energy + E ^ a h, a ! Action , (12) where E is a function mapping from an action a to its cost of energy. When energy 1 0, the virtual peer learner can only choose rest to recharge energy. The external state Stateexternal reflects the virtual peer learner’s relation to VLE, and is defined as a two tuple: State external =1 in_learning_zone, next_to_task 2. In_learning_ zone and next_to_task are binary values indicating whether the virtual peer learner is in learning zones and whether its location is within a range to learning tasks, respectively. 2) Actions of the Virtual Peer Learner In VLE, a human student can take a great variety of actions according to the design of control. Here, we focus on three action categories relating to the acquisition of knowledge: explore, rest and study. The first category contains actions of a human student to explore interesting learning contents in VLE. Examples include clicking objects, reading information, etc. When a student is tired of learning new stuff, he/she will choose some leisure activities such as roaming around, chatting with friends, etc. These actions are categorized under the group of rest. The third category is study, including all actions related to gaining knowledge, such as reading, writing, answering questions, etc. Here, we stay on the level of action categories and do not specify the actions in each category. Hence, actions of the virtual peer learner are defined as follows: Action = " explore, rest, study , . (13) The action explore can update next_to_task in the agent’s external state and decrease certain amount of energy in its inner state. The action rest can recover certain amount of energy in the agent’s inner state. The action study can incorporate a certain amount of new knowledge in the current learning task to the agent’s memory. We define the learning efficiency D of a virtual peer learner as: D = l, 0 # l # 1 , (14) where l is the base learning ability when the virtual peer learner is in no_emotion state. The action study is implemented by randomly selecting D percentage of new knowledge points in the current learning task and recording them into the agent’s memory. 3) The Learning Process In this system, the virtual peer learner will only have the knowledge about what actions can be taken in each state, but not the IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® model of the world or the predefined rules on which action to choose. Hence, we adopt a modeless reinforcement learning mechanism, Q-learning [33], for the agent. The goal of Q-learning is to estimate the Q(s, a) values, expected rewards for executing action a in state s. Each Q(s, a) is updated based on Q ^s, ah = ^1 - ~h $ Q ^s, ah + ~ $ ` r + cV ^slh j , (15) V ^slh = max a ! Action Q ^sl, ah (16) where is the best reward the agent can expect from the new state sl . Here, a is an action and Action is the set of actions available for the agent. r is the reinforcement value. c is the discount factor that defines how much expected future rewards affect decision now. ~ is the learning rate that controls how much weight is given to the reward just experienced. 4) The Roles of Curiosity-Related Emotions Emotions perform vital functions in human decision-making. Psychological studies show that humans voluntarily seek novel things due to the pleasure of satiating curiosity [17]. Artificial Intelligence research frequently assumes that the human decision-making process consists of maximizing positive emotions and minimizing negative emotions [29], [34], [30]. Based on these observations, curiosity-related emotions are employed as reinforcement functions to guide the virtual peer learner’s learning of behavior strategies. The emotion curiosity gives positive reinforcement, while both boredom and anxiety give negative reinforcement. Also, in humans, emotion can influence action strengths [31]. For example, a student who is interested in a subject will concentrate more and achieve higher learning efficiency than one who is bored with the same subject. Hence, the second role of curiosity-related emotions is to influence actions. Here, we consider their influence on action study. The virtual peer learner’s learning efficiency D can be influenced by emotions as follows: D = l + F ^emotionh , trated in Fig. 4, where Fig. 4(a) shows the landscape of VS, designed based on 19th century Singapore. Fig. 4(b) shows the ant hole trigger where students can shrink their avatars to go inside the tree environment. For the proposed curious peer learner, we follow the methodology of Learner-Centered Design [35]. It advocates that technology applications must focus on the needs, skills, and interests of the learner. Before developing the curious peer learner, we had described the functionalities of a curious peer learner and illustrated possible interaction scenarios. Some examples are shown in Fig. 5. It can be observed from Fig. 5(a) that the curious peer learner (embodied by a butterfly) is posing questions to a student in order to stimulate his/her thinking on the learning content. Fig. 5(b) demonstrates that the curious peer learner is directing the student’s attention to the potentially interesting learning content. We conducted a pre-study among several students who have played VS. Responses from questionnaires and interviews have shown that students are interested in having a (a) (17) where F is a mapping from the emotion to its influence on the agent’s learning efficiency. Here, F(curiosity) returns a positive value, while F(boredom) and F(anxiety) both return negative values. D is always capped between 0 and 1. VI. Experiment In this section, we present the experimental details and the experimental results obtained. (b) A. Virtual Singapura Virtual Singapura (VS) is a virtual learning environment designed for lower secondary school students to learn the plant transport systems. The virtual world environment in VS is illus- FIGURE 4 The virtual environment in VS. (a) The VS landscape designed based on the 19th century Singapore. (b) The ant hole trigger where students can shrink their avatars and go inside a tree. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 57 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Concept Map is a semantic knowledge representation method stemming from the learning theory of constructivism. curious peer learner in VS. Some of the comments from the students include: “I prefer to have a virtual friend to play with me,” “I will have a better knowledge if I can compare my knowledge with a virtual friend,” “A virtual friend who can be curious about the ant holes, water molecules, and shooting games will make me feel more interested.” B. Experiment Setup In order to test the effectiveness of curiosity-related emotions on influencing the virtual peer learner’s learning of behavior (a) strategy, we build up a simulation environment in VS. The simulation environment consists of two main elements: ❏ Learning Zones (LZs) that contain predefined learning tasks. ❏ Virtual peer learners that reside in VLE to take learning tasks. 1) Generation of Learning Tasks For each LZ, we generate learning tasks based on one expert CM with 20 concepts and 200 relationships. From this expert CM, we spawn 15500 learning tasks. Each learning task is a submap of the expert CM with 5 concepts randomly chosen. Hence, in one LZ, there are in total 200 knowledge points and 15500 learning tasks. Some of the knowledge points will be repeated in different learning tasks and the agent can not learn more than the number of predefined knowledge points (200) in one LZ. 2) Parameter Setting for the Virtual Peer Learners In this experiment, we focus on examining the effect of curiosity-related emotions by comparing the performance of virtual peer learners with and without curiosity-related emotions. Hence, we simulate two types of virtual peer learner: a curious peer learner with curiosity appraisal process and a non-curious peer learner without curiosity appraisal process. The parameter setting for the two types of virtual peer learner is summarized in Table 1. First, there are four parameters ^d, p, i 1, i 2h regarding curiosity appraisal. Only the curious peer learner will have curiosity appraisal process. The four parameters can be understood as describing a specific personality towards curiosity. As this paper only considers the comparison between agents with and without curiosity, the parameters are chosen to represent a curious peer learner with an intermediate level of curiosity. d and p are non negative real numbers that give importance to novelty and conflict respectively. If d is greater than p , then the agent will focus more on novelty.This means that the agent will magnify the contribution of novelty to stimulation level and lessen the contribution of conflict to stimulation level. Here, we consider the curious peer learner with equal preference to novelty and conflict, and set d and p both as 1. For the two parameters that split the stimulus TABLE 1 Parameter setting. (b) FIGURE 5 Possible interaction scenarios of a curious peer learner in VS. (a) The curious peer learner embodied as a butterfly to stimulate students’ thinking towards the learning content. (b) The curious peer learner embodied as a butterfly to direct students’ attention to the interesting learning content. 58 FUNCTIONALITY RELATED PARAMETER VALUE CURIOSITY APPRAISAL (ONLY FOR CURIOUS PEER LEARNER) d p 1 1 0.3 0.8 REINFORCEMENT LEARNING ~ i1 i2 c KNOWLEDGE ACQUISITION l 0.5 0.8 0.5 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® intensity axis (Fig. 3) into three regions, we In psychology, Berlyne identified four factors, viz., choose intermediate level for both i 1 and i 2, with i 1 = 0.3 and i 2 = 0.8. This is intuitive novelty, uncertainty, conflict and complexity, that can because most of humans will have an intermelead to curiosity and determine the stimulation level. diate level of appraisal on curiosity-related emoWundt postulated an inverted U-shape relationship tions, and few people will have extremes such as no region for curiosity ^i 1 = i 2h or no region between stimulation level and three curiosity-related for negative emotions ^i 1 = 0 and i 2 = 1h . emotions: boredom, curiosity and anxiety. Second, for both curious and non-curious peer learners, they will perform reinforcement tem designers. We set the output of actions as follows: rest learning with the same parameter settings. The difference is recharges energy fully to 10, explore consumes a small amount that the curious peer learner has curiosity-related emotions as of energy by 0.05 and study consumes a large amount of reinforcement functions, while the non-curious peer learner energy by 3. This is because browsing through topics does not. ~ and c are the two parameters determining the (explore) will be much easier than starting to work on the reinforcement learning process in (15). They are all real numtopic (study) after it is chosen. bers within the range of [0,1]. ~ is the learning rate that conSimilarly, for the function mapping from emotions to their trols how much weight is given to the reward just experienced. influence on the agent’s learning efficiency in (17), we set the A high value of ~ can cause very sudden changes in the learnt influence as -0.2 for boredom, +0.3 for curiosity and -0.4 for Q-value, while a very low value of this parameter causes the anxiety. This is based on human’s natural tendency. A student learning process to be slow. Hence, an intermediate value of ~ with positive emotion tends to achieve better learning effiis favorable. In this experiment, we set ~ = 0.5. The parameter ciency due to mental excitement. On the contrary, the learnc is the discount factor that defines how much expected future ing efficiency of a student with negative emotion can be rewards affect the decision now. A high value of this parameter harmed due to the intrinsic resistance to learning. gives more importance to future rewards, while a low value gives more importance to current rewards. From the experiment, it showed that the agent needs a comparatively high C. Experimental Results value of c to learn a proper behavior strategy. Hence, we set c In this section, two sets of experiment are conducted and the with the value 0.8. results are analyzed. Third, the base learning efficiency l in (14) determines the knowledge acquisition ability of virtual peer learners. l is a real 1) The Effect of Curiosity-Related number within the range of [0,1]. The higher the value, the Emotions on Behavior Learning higher percentage of new knowledge will be accurately In this experiment, we analyze the effect of curiosity-related acquired and integrated into the agent’s memory. We set same emotions on the learning of state-action mapping in the curibase learning efficiency for both the curious and non-curious ous peer learner. peer learners. The difference is that only the curious peer We split the simulation into two phases: reinforcement learning phase and steady phase. In the reinforcement learning learner’s learning efficiency can be influenced by curiosityrelated emotions. In this experiment, we do not consider virphase, the curious peer learner is allowed to learn a behavior strategy based on random exploration of state-action pairs and tual peer learners with extremely high learning abilities or extremely low learning abilities. For example, a virtual peer the intrinsic rewards generated by curiosity-related emotions. In the steady phase, the curious peer learner is put into a new learner with l = 1 can learn everything in a learning task and a LZ to exploit the behavior strategy learnt in the reinforcement virtual peer learner with l = 0 can learn nothing in a learning learning phase. For each phase, we ran 104 steps. task. Hence, we choose to simulate virtual peer learners with intermediate level of learning abilities, and set l as 0.5. 3) Function Mappings The function mappings are summarized in Table 2. Function f (A) is the scaling function in (8), reflecting the salience of various members in a set. Here, we adopt the most commonly used scale function in Tversky’s ratio model, i.e. the cardinality of a set [28]. The energy cost function in (12) determines the intrinsic constraint of the virtual peer learners. In this system, the function acts as a system rule that both the curious and noncurious peer learners should consider while choosing action strategies. Here, the settings are purely determined by the sys- TABLE 2 Function mapping. FUNCTION INPUT OUTPUT f(A) A E(a) REST EXPLORE STUDY RECHARGE FULLY TO 10 -0.05 -3 F(emotion) BOREDOM CURIOSITY ANXIETY -0.2 +0.3 -0.4 A MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 59 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q No. of k THE WORLD’S NEWSSTAND® 10 5 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1.4 1.6 1.8 Simulation Steps 2 # 104 (a) 0 0.2 0.4 0.6 0.8 1 1.2 2 # 104 Simulation Steps (b) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 # 104 Simulation Steps (c) 0 0.2 0.4 0.6 0.8 1 1.2 Simulation Steps (d) 1.4 1.6 1.8 2 # 104 No. of k FIGURE 6 Behavior of the curious peer learner in training and testing phases. (a) Knowledge points learnt. (b) Study. (c) Explore. (d) Rest. 10 5 0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Simulation Steps (a) 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Simulation Steps (b) 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Simulation Steps (c) 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Simulation Steps (d) FIGURE 7 Behavior of the curious peer learner in the testing phase. (a) Knowledge points learnt. (b) Study. (c) Explore. (d) Rest. 60 In Fig. 6, we plot the agent’s behavior in both the reinforcement learning and steady phases. Fig. 6(a) shows the number of knowledge points learnt in each simulation step when the action study is taken. It can be observed that the activation pattern in Fig. 6(a) corresponds to the activation pattern of study in Fig. 6(b). Fig. 6(b)–(d) plot the activation pattern for action study, explore and rest, respectively. It can be observed from Fig. 6 that the curious peer learner learnt a strategy in the reinforcement learning phase. This is shown by the significantly different behavior patterns in the two phases. Let us consider Fig. 6(b) “Study.” It can be seen that the curious peer learner took a significantly less number of action study in the steady phase than in the reinforcement phase. Also, the number of knowledge points learnt (shown in Fig. 6(a) “Knowledge point learnt”) in the steady phase maintained a comparatively higher level than that in the reinforcement learning phase. It can be deduced that the curious peer learner learnt an “intelligent” strategy similar to what we expect from human students: they will take learning tasks only when interested and learn with a high efficiency, rather than take whichever learning task appearing in front of them. Next, let’s analyze Fig. 6(c) and Fig. 6(d). It can be observed that in the reinforcement learning phase, the curious peer learner randomly chose explore or rest with no strategy. But in the steady phase, the curious peer learner would continuously do exploration before an interesting learning task was detected, on the condition that the agent had enough energy. This resembles the behavior of human students: in learning process, they will first search for topics, and only when an interesting topic is found, they will start working on it. When they feel tired, they will take a rest before continuing search or study. Hence, it can be deduced that curiosity-related emotions successfully guide the curious peer learner to learn a natural state-action mapping. The IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® positive rewards from curiosity and the negative rewards from boredom and anxiety can guide the curious peer learner to search for interesting learning tasks with intermediate level of stimulation and avoid learning tasks with either too low or too high stimulation. This phenomenon is in line with Wundt’s theory. TABLE 3 Exploration comparison between virtual peer learners with curiosity and without. MEASURE CURIOSITY NO CURIOSITY EXPLORATION BREADTH 949 386 EXPLORATION DEPTH 5.33 0.74 No. of k 2) The Comparison of Exploration Breadth The exploration breadth and depth of both virtual peer and Depth Between a Curious Peer Learner learners are summarized in Table 3. The exploration breadth is and a Non-Curious Peer Learner calculated by the total number of learning tasks browsed by In this experiment, we compare the performance of two virthe agent in the testing phase. It can be observed that the tual peer learners: one curious peer learner with curiositycurious peer learner explored 949 tasks while the non-curirelated emotions as intrinsic rewards and one non-curious peer ous peer learner explored only 386 tasks. The exploration learner without curiosity-related emotions. In order to analyze breadth of the curious peer learner is around 3 times that of the performance of the two virtual peer learners, two indicathe non-curious peer learner. The exploration depth is calcutors are defined: lated by the average number of knowledge points learnt in 1) Exploration breadth:This indicator is measured by the numthe learning tasks studied by the agent in the testing phase. It ber of learning tasks browsed through by the agent in a can be seen that the curious peer learner learnt 5.33 (128 period of time. It indicates the virtual peer learners’ tenknowledge points learnt in total and studied 24 learning dency to explore for interesting learning tasks. tasks) knowledge points per learning task, while the non2) Exploration depth:This indicator is measured by the average curious peer learner only learnt 0.74 (138 knowledge points number of knowledge points learnt per learning task. It learnt in total and studied 187 learning tasks) knowledge indicates the virtual peer learners’ average learning efficiency. points per learning task. The number 0.74 is less than 1 In this experiment, we first trained the two virtual peer because for the non-curious peer learner, it sometimes learnt learners in one LZ. Then, we put them in another LZ for nothing new in the learning tasks taken. This indicates that testing. We run 104 steps for both training and testing phase. for the non-curious peer learner, it wasted a lot of time on In order to compare the performance of the two virtual peer learners, they always perform in same LZs. The behavior of the two virtual peer learners in the testing phase are 10 shown in Fig. 7 and Fig. 8, respectively. 5 By comparing Fig. 7 and Fig. 8, it 0 can be observed that the curious peer 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 lear ner lear nt a behavior strategy Simulation Steps resembling that of a human student, (a) while the non-curious peer learner chose actions randomly and behaved irrationally. The curious peer learner 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 followed a behavior pattern that shows Simulation Steps continuity in exploration for interest(b) ing learning tasks. It only studied learning tasks when its curiosity was aroused and tried to minimize negative emotions by avoiding tasks that are too low 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 or too high in stimulation. The tenSimulation Steps dency of maximizing positive emotions (c) guarantees an averagely high learning efficiency. However, the non-curious peer learner randomly chose to explore or rest without showing a continuity in 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 the exploration behavior. Also, it did Simulation Steps not choose learning tasks deliberatively (d) and spent a lot of time on learning tasks that were not interesting and FIGURE 8 Behavior of the non-curious peer learner in the testing phase. (a) Knowledge points learnt with very low efficiency. learnt. (b) Study. (c) Explore. (d) Rest. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 61 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® studying learning tasks that are not interesting and gained little knowledge from them. Hence, it can be deduced that curiosity-related emotions can directly guide the virtual peer learner to learn strategic behavior resembling a natural student and indirectly influence the acquisition of knowledge. The curiosity-related emotions drive the virtual peer learner to enhance both its exploration breadth and depth. This mimics real life where a curious student tends to explore more as well as learn deeper than a student who is less curious. VII. Conclusion In this work, we have modeled human-like curiosity for virtual peer learners in a virtual learning environment. The proposed model is built based on psychological theories by Berlyne and Wundt. In this model, the curiosity appraisal is a two-step process: determination of stimulation level and mapping from the stimulation level to emotions. The emotions serve as intrinsic reward functions for the agent’s behavior learning, as well as influence the agent’s knowledge acquisition efficiency. Simulation results have shown that curiosity-related emotions can guide the virtual peer learner to behave as naturally as a real student does. Also, the comparison between a curious peer learner and a non-curious peer learner has shown that the curious peer learner can demonstrate higher tendency for exploration (exploration breadth), as well as higher average learning efficiency (exploration depth). The rationale behind this research is that a virtual peer learner may have the potential to practise “peer learning,” a common educational practice that can benefit learning in multiple aspects. A virtual peer learner residing in VLE can possibly engage students and motivate them to spend more time on the learning content. Also, a virtual peer learner can potentially help low-functioning students to think and learn better in VLE. We acknowledge several limitations. First, although uncertainty is part of Berlyne’s theory, it is not modeled in this work due to the deterministic nature of concept maps. Future work will investigate ways to bring uncertainty into our knowledge representation. Second, actions of the virtual peer learner are designed on an abstracted level. In the future, we will design more complex action sets for the virtual peer learner so as to achieve higher believability. Third, many of the parameters in the experiments are empirically set to demonstrate the plausibility of the proposed model. In the future, we plan to experiment with different parameter settings to analyze the performance. Lastly, the proposed curious peer learner is evaluated by simulations only. Large scale field studies to deploy VS with the curious peer learners in schools are currently being planned. Acknowledgment The authors would like to thank the Ministry of Education (MOE), Singapore, for the Interactive Digital Media (IDM) challenge funding to conduct this study. 62 References [1] J. Wiecha, R. Heyden, E. Sternthal, and M. Merialdi, “Learning in a virtual world: Experience with using second life for medical education,” J. Med. Internet Res., vol. 12, no. 1, pp. 1–27, 2010. [2] C. Dede, “Immersive interfaces for engagement and learning,” Science, vol. 323, no. 5910, pp. 66–69, 2009. [3] A. L. Harris and A. Rea, “Web 2.0 and virtual world technologies: A growing impact on IS education,” J. Inform. Syst. Educ., vol. 20, no. 2, pp. 137–144, 2009. [4] M. J. Jacobson, B. Kim, C. Miao, and M. Chavez, “Design perspectives for learning in virtual worlds,” in Design for Learning Environments of the Future, New York: SpringerVerlag, 2010, pp. 111–141. [5] S. Kennedy-Clark, “Pre-service teachers perspectives on using scenario-based virtual worlds in science education,” Comp. Educ., vol. 57, no. 4, pp. 2224–2235, 2011. [6] S. Kennedy-Clark, “Designing failure to encourage success: Productive failure in a multi-user virtual environment to solve complex problems,” in Proc. European Conf. Technology Enhanced Learning, 2009, pp. 609–614. [7] M. Tanti and S. Kennedy-Clark, “MUVEing slowly: Applying slow pedagogy to a scenario-based virtual environment,” in Proc. Ascilite Curriculum, Technology Transformation Unknown Future, Sydney, Australia, 2010, pp. 963–967. [8] S. Kennedy-Clark and K. Thompson, “What do students learn when collaboratively using a computer game in the study of historical disease epidemics, and why?” Games Culture, vol. 6, no. 6, pp. 513–537, 2011. [9] D. Boud, R. Cohen, and J. Sampson, Peer Learning in Higher Education: Learning from and with Each Other. London: Routledge, 2001. [10] M. J. Eisen, “Peer-based learning: A new-old alternative to professional development,” Adult Learn., vol. 12, no. 1, pp. 9–10, 2001. [11] T. B. Kashdan and M. F. Steger, “Curiosity and pathways to wellbeing and meaning in life: Traits, states, and everyday behaviors,” Motivation Emotion, vol. 31, no. 3, pp. 159–173, 2007. [12] S. Reiss, Who Am I? The 16 Basic Desires That Motivate Our Actions and Define Our Personalities. New York: The Berkley Publishing Group, 2000. [13] L. Macedo and A. Cardoso, “The role of surprise, curiosity and hunger on exploration of unknown environments populated with entities,” in Proc. Portuguese Conf. Artificial Intelligence, 2005, pp. 47–53. [14] K. Merrick, “Modeling motivation for adaptive nonplayer characters in dynamic computer game worlds,” Comput. Entertainment, vol. 5, no. 4, pp. 1–32, 2008. [15] R. Saunders, “Towards a computational model of creative societies using curious design agent,” in Proc. Int. Conf. Engineering Societies Agents World VII, 2007, pp. 340–353. [16] P. D. Scott and S. Markovitch, “Experience selection and problem choice in an exploratory learning system,” Mach. Learn., vol. 12, nos. 1-3, pp. 49–67, 1993. [17] D. E. Berlyne, Conflict, Arousal, and Curiosit. New York: McGraw-Hill, 1960. [18] W. M. Wundt, Grundzüde Physiologischen Psychologie. Leipzig, Germany: W.Engelman, 1874. [19] G. Loewenstein, “The psychology of curiosity: A review and reinterpretation,” Psychol. Bull., vol. 116, no. 1, pp. 75–98, 1994. [20] D. O. Hebb, The Organization of Behavior. New York: Wiley, 1949. [21] R. W. White, “Motivation reconsidered: The concept of competence,” Psychol. Rev., vol. 66, no. 5, pp. 297–333, 1959. [22] J. Schmidhuber, “Curious model-building control systems,” in Proc. IEEE Int. Joint Conf. Neural Networks, 1991, pp. 1458–1463. [23] P. Y. Oudeyer, F. Kaplan, and V. Hafner, “Intrinsic motivation systems for autonomous mental development,” IEEE Trans. Evol. Comp., vol. 11, no. 2, pp. 265–286, 2007. [24] R. Saunders and J. S. Gero, “A curious design agent,” in Proc. Conf. Computer Aided Architectural Design Research Asia, 2001, pp. 345–350. [25] J. D. Novak and D. B. Gowin, Learning How to Learn. Cambridge, U.K.: Cambridge Univ. Press, 1984. [26] G. Biswas, K. Leelawong, D. Schwartz, and N. Vye, “Learning by teaching: A new agent paradigm for educational software,” Appl. Artif. Intel., vol. 19, nos. 3–4, pp. 363–392, 2005. [27] Q. Wu, C. Miao, and Z. Shen, “A curious learning companion in virtual learning environment,” in Proc. IEEE Int. Conf. on Fuzzy Systems, 2012, pp. 1–8. [28] A. Tversky, “Features of similarity,” Psychol. Rev., vol. 84, no. 4, pp. 327–352, 1977. [29] A. G. Barto, S. Singh, and N. Chentanez, “Intrinsically motivated learning of hierarchical collections of skills,” in Proc. Int. Conf. Development Learn, 2004, pp. 112–119. [30] M. Salichs and M. Malfaz, “A new approach to modeling emotions and their use on a decision making system for artificial agents,” IEEE Trans. Affective Comput., vol. 3, no. 99, pp. 56–68, 2011. [31] H. Hu, E. Real, K. Takamiya, M. G. Kang, J. Ledoux, R. L. Huganir, and R. Malinow, “Emotion enhances learning via norepinephrine regulation of ampa-receptor trafficking,” Cell, vol. 131, no. 1, pp. 160–173, 2007. [32] H. I. Day and A. Wiley, “Online library curiosity and the interested explorer,” Perform. Instruction, vol. 21, no. 4, pp. 19–22, 1982. [33] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artif. Intell. Res., vol. 4, no. 1, pp. 237–285, 1996. [34] E. T. Rolls, “A theory of emotion, its functions, and its adaptive value,” in Emotions in Humans and Artifacts. Cambridge, MA: MIT Press, 2003, pp. 11–34. [35] E. Soloway, M. Guzdial, and K. E. Hay, “Learner-centered design: The challenge for HCL in the 21st century,” Interactions, vol. 1, no. 2, pp. 36–48, 1994. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Goal-Based Denial and Wishful Thinking © DIGITAL STOCK 1997 César F. Pimentel, INESC-ID and Instituto Superior Técnico, PORTUGAL Maria R. Cravo, Instituto Superior Técnico, PORTUGAL I. Introduction and Motivation thinking, a widely known coping phenomeAbstract—Denial and wishful non [6], [5], [15], [2], [11]. ne of the aims of Affective thinking are well known affecWhen one’s gathered evidence leads Computing [19] is to design tive phenomena that influence human belief processes. Put simply, to conflicting beliefs, wishful thinking agents that behave according they consist of tendencies to disbecan affect the outcome of resolving to models of human affeclieve what one would not like to be such a conflict, thus determining tive phenomena. Among these phetrue, and believe what one would like to one’s resulting beliefs. Sometimes, nomena are the roles that emotions be true. We present an approach to an agent’s wishful thinking can even be play on various cognitive probelief dynamics, that simulates denial and wishful thinking, using the agent’s goals as the source responsible for a conflict among cesses, such as attention, reasonof affective preference. Our approach also addresses beliefs, when there is no coning, decision making, and belief several issues concerning the use of conventional flicting evidence. selection. In this paper, we belief revision in human-like autonomous agents. In our In AI, belief revision is the focus on an affective pheapproach, every goal produces a wishful thought about its process responsible for dealnomenon that concer ns achievement, in the form of a weak belief. Consequently, a “disliked situation” leads to inconsistent beliefs, thus ing with conflicting beliefs. belief selection. triggering belief revision. Furthermore, the agent’s Conventional belief revision Human beings are biased belief set is selected based on a measure of prefertheories aim to maintain consistowards believing in what they ence that accounts for the “likeability” of beliefs tency within a set of basic beliefs, would like to be true (wishful (among other factors). We test an instantiation called context, upon the arrival of thinking) and not believing in what of our model to assess whether the resulting agent could produce the intended behavnew conflicting information. These they would not like to be true iors. The agent produces behaviors of theories always accept the new infor(denial). These tendencies constitute denial and wishful thinking, not only mation, and inconsistencies are solved two sides of the same coin, and can as biases in belief selection, but also by abandoning beliefs from the context. both be generally viewed as wishful as the triggers to hold or reject A human user is typically responsible for certain beliefs. defining the order(s) among beliefs that is Digital Object Identifier 10.1109/MCI.2013.2247831 O Date of publication: 11 April 2013 1556-603X/13/$31.00©2013IEEE MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 63 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® what is the believed context. Since belief revision aims at deciding what is the believed context, we cannot properly use a fixed order (or orders) among beliefs based on their resistance to change, without falling into a cyclical definition. For this reason, we believe that belief revision is best guided by an order among contexts, rather than an order among beliefs. In this paper we present Wishful Thinking Revision (WTR) [20], an approach to belief revision, and overall belief dynamics, that addresses the three issues discussed in this section. More specifically, WTR is a framework aimed at supporting, in AI agents, belief revision with the following properties: ❏ Non-prioritized. New information is not necessarily believed. ❏ Autonomous. Revision is not dependent on the external definition of orders. ❏ Context-oriented. In order to properly model the influence of likeability in a belief ’s resistance to change, the preferred context is chosen according to an order (or orders) among contexts, instead of an order (or orders) among beliefs. ❏ Simulates wishful thinking. The wishful thinking phenomenon is modeled, within the scope with respect to goal satisfaction.1 The main idea behind WTR is that every goal generates a tendency to believe in its achievement, and we model this tendency as a weak belief.This way, any information that contradicts the achievement of a goal (i.e., any undesirable information) gives rise to an inconsistency, thus triggering belief revision. Typically, the belief in the goal achievement is abandoned, when not further supported by any evidence, but exceptions may occur, depending on various factors. The fact that WTR generates weak beliefs from goals does not mean that our agent will believe that all of its goals are, by default, achieved. Such an approach could be acceptable only for preservation goals [16] that, by definition, start out achieved, and one hopes that their state does not change. In WTR, a weak belief is not treated as a common belief, as explained below. Although not always viewed as such, “belief is a matter of degree” [9, p. 21]. You may believe that by using regular mail, the letter you are about to send will be safely delivered to its destination. However, if your envelope contains money instead of just a simple letter, you will most likely not believe in that safe delivery, and prefer to send it via registered or insured mail. In WTR we model this relative notion of belief by means of a function that determines causal strength of a belief. This causal strength is an attempt to capture the degree of certainty of a belief. Depending on the situation, a belief may or not be filtered out according to its value of causal strength. This paper is organized as follows. In Section II we briefly describe the conventional approach to belief revision, and in Section III we review the wishful thinking phenomenon. Next, in Section IV, we introduce our representations and assumptions, and in Section V we formalize our approach to belief Put simply, belief is a relative notion, in the sense that a piece of data may be a belief for some aim, and may not be a belief for another aim. used to determine which beliefs are kept and which are abandoned. Given this state of affairs, if we want to approach belief revision in the context of a human-like autonomous AI agent, some important questions arise with respect to conventional belief revision theories: ❏ Why should the agent always prefer the new information over its previous beliefs? ❏ How can the agent autonomously generate its own order(s) among beliefs? ❏ Can human-like preferences, in belief revision, be adequately expressed using an order (or orders) among beliefs? The first question arises because it is not acceptable that human-like agents always believe the new information. In other words, we need revision to be non-prioritized, as discussed by Hansson [8]. The second question arises because, if we want our agents to be autonomous, their belief revision processes cannot be dependent on the external definition of orders among beliefs. To better understand the reason for the third question, we start by making a distinction between two types of belief strength that, given their similarity, are commonly modeled as one and the same: ❏ Certainty. The certainty of a belief corresponds to how strongly one feels that belief is true. ❏ Resistance to change. A belief ’s resistance to change corresponds to how strongly one feels against abandoning that belief. This concept can be extended to represent also how strongly one feels towards adding a new belief. Isaac Levi [12] also distinguishes these two types of strength, referring to the latter (resistance to change) as unchangeability (see also [8]). If I believe that “My mother boarded flight 17,” this belief ’s resistance to change can be highly dependent on whether or not I also believe that “Flight 17 crashed” (because keeping these two beliefs is highly undesirable). However, if the first belief is kept, then its certainty is the same regardless of whether or not the second belief is kept. Typical approaches to belief revision choose which beliefs to hold and which to reject based on the reasons that originated each belief. In our view, accounting for the reasons that originated each belief is enough to model certainty, but is not enough to adequately guide belief revision, since it does not capture the notion of likeability, illustrated in the example of the previous paragraph. Notice that resistance to change is, by definition, what guides belief revision, in humans. However, as shown in the example, a belief ’s resistance to change depends (among other aspects) on that belief ’s likeability which, in turn, depends on 64 1 This means that anything that the agent would like to be true must be expressed in terms of goals, in order to inf luence its beliefs. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® revision, the WTR model. In Section VI we discuss our view of the commonsense notion of belief, in light of the presented model. In Section VII we test our model using a sequence of scenarios, and in Section VIII, we compare our approach to related work. Finally, in Section IX we present the conclusions. II. Conventional Belief Revision An essential aspect of commonsense reasoning is the ability to revise one’s beliefs, that is, to change one’s beliefs when a new belief is acquired that is not consistent with the existing beliefs. In AI, Belief Revision theories decide which of the previous belief(s) should be abandoned in order to incorporate the new belief, and to keep the set of beliefs consistent. All belief revision theories try to keep the amount of change as small as possible, according to what is called the minimal change principle [9]. The reason for this principle is that beliefs are valuable, and we do not easily give them up. However, this principle is not enough to determine, in a unique way, the change to be made, and so belief revision theories assume the existence of an order among beliefs, which states that some beliefs are less valuable than others, and should be more easily abandoned. A number of belief revision theories have been developed since the seminal work of Alchourrón, Gärdenfors and Makinson [1]. An extensive overview of the past twentyfive years of research on such theories can be found in [4]. Typically, these theories assume that beliefs are represented by formulas of the language L of some logic, and represent the revision of a set of beliefs b, called context, with a formula U, by (b ) U). This represents the new set of beliefs, i.e. the new context, and must be such that: 1) It contains U; 2) It is consistent, unless, of course, U is a contradiction. To ensure that the result is a unique context, these theories either assume the existence of a total order among beliefs, or allow for a partial order, and abandon all conflicting beliefs whose relative value is not known, thus abandoning more beliefs than necessary. III. Wishful Thinking One of the most commonly known influences of affect on human reasoning and belief is wishful thinking, that is, a bias that shifts one’s interpretations/beliefs towards “liked” scenarios and away from “disliked” ones. Generally speaking, what one likes (as opposed to what one dislikes) is defined as what satisfies or facilitates one’s current desires, goals, or commitments. Wishful thinking is a widely known phenomenon, sometimes referred to by other names or as part of other, more general, concepts. In [6], the authors describe Motivational Force as the phenomenon where motivation (i.e., the desire for pleasure or for getting rid of discomfort) guides one’s thoughts and alters one’s beliefs’ resistance to change. Quoting Frijda and Mesquita, “The motivational source of the beliefs does much to explain their resistance against change. Abandoning a belief may undermine one’s readiness to act, and one may feel one cannot afford that.” [6, p. 66]. According to Frijda’s Law of Lightest Load [5], “Whenever a situation can be viewed in alternative ways, a tendency exists to view it in a way that minimizes negative emotional load.” Castelfranchi discusses how belief acceptance is influenced by likeability [3]. In [18], Paglieri models likeability as the degree of goal satisfaction that data represents. In his approach, likeability is one of the data properties that interfere in the process of belief selection. Throughout this paper we simply use the term wishful thinking, encompassing the tendency for: a) Wishful thinking (in the strict sense), as the belief in something because it is liked; b) Denial, as the rejection of a belief because it is disliked. Denial/ wishful thinking, as “two sides of the same coin”, are recognized as strategies of emotion-focused coping (see, e.g., [15], [2] and [11]). As a strategy of emotion-focused coping, wishful thinking aims at increasing emotional satisfaction in the individual and, most importantly, preventing extreme and/or prolonged negative affective states (such as shock or depression) that could otherwise hinder the individual’s performance in general. In this sense, wishful thinking is a mechanism of emotional control/ regulation and, like any such mechanisms, it must operate within a balanced trade off between satisfaction and realism. Favoring realism too much may lead to strong and/or long negative affective states, but favoring satisfaction too much leads to losing track of reality, and in both cases individual performance is impaired. It is therefore evident that wishful thinking, although often present, biasing belief strength and resistance to change, only in exceptional cases causes a change of what is believed. For the purpose of our work, we distinguish two types of effects that wishful thinking may have on one’s beliefs: ❏ Passive effects: When one has conflicting evidence, supporting two or more alternative situations, one’s beliefs may fall on a particular situation because it is more desirable, instead of on the situation that is supported by stronger evidence. ❏ Active effects: One may start believing something that is not supported by any evidence, simply because one would like it to be true. Conversely, one may not believe in something that is supported by evidence, even in the absence of opposing evidence, simply because one would not like it to be true. Active effects are rarer than passive effects, but may occur, for example, when highly important goals are at stake. Typically, one’s highest importance goals are preservation goals [16], such as goals related to keeping oneself (and others) alive and healthy. For instance, believing that someone close has died may evoke a feeling of terror, and one may engage in denial, even when there is only evidence indicating that it is true and no evidence indicating that it is false (active effects). If there is also evidence indicating that it is false (albeit weaker than the evidence indicating that it is true) denial is even more likely to occur (passive effects). IV. Assumptions and Terminology Our approach to belief revision, WTR, assumes an agent with reasoning capabilities based on a monotonic logic. Given an agent ag, we represent the language of the logic that ag uses to represent information (including beliefs) by L ag, and the derivability relation by =ag . MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 65 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® We assume that all agents in the current world are uniquely identified by their name. We represent the set of names of the agents by N, where N + " Obs, WT, Der , =Y 0. If ag is the agent using WTR, our model assumes that its internal state contains, among other items, the following information: ❏ The agent’s knowledge base, represented by KB(ag) ❏ The agent’s goals, represented by Goals(ag) ❏ For each other agent agi, the subjective credibility that our agent associates with agi, represented by Cred(ag, agi) ❏ The agent’s wishful thinking coefficient, represented by wt(ag). The knowledge base is where the agent keeps a record of reasons to believe in propositions. This information is important because WTR abandons all beliefs that lose their justifications. Moreover, the knowledge base keeps information about the origins of basic beliefs, which is used by WTR to measure belief strength. The representation we use, for the reasons stored in the knowledge base, is based on the representation defined by Martins and Shapiro, in the SWM logic [14], to record dependencies among formulas. Given an agent ag, KB(ag) is a set of supports, defined as triplets with the form U, T, a , where: ❏ U d L ag is the support’s formula and represents the proposition that is being supported; ❏ T d " Obs, WT, Der , , N is the support’s origin tag and indicates how U got in the knowledge base, in other words, what kind of reason supports belief in the proposition represented by U ; ❏ a 3 L ag is the support’s origin set and contains the formulas that U depends on, in this support (important for keeping track of the formulas that support derivations). If A = U, T, a is a support, we define form (A) = U, ot (A) = T, and os(A) = a. A can be of four kinds, depending on its origin tag, T: If T = Obs A is called an observation support. This means that the proposition represented by U was observed by the agent. If T = WT A is called a wishful thinking support. This means that the proposition represented by U originated, by wishful thinking, from one of the agent’s goals. If T = Der A is called a derivation support. This means that the proposition represented by U was derived from other formulas. If T d N A is called a communication support. This means that the proposition represented by U was communicated by the agent of name T. We point out that the same formula may have more than one support in the knowledge base. For instance, the agent may be informed of a fact, represented by U , by two different agents and also observe that fact. This would correspond to three separate supports with formula U : two communication supports and one observation support. Furthermore, observation, communication and wishful thinking supports are all called non-derivation supports. Formulas 66 that occur in derivation supports are known as derived formulas, and formulas that occur in non-derivation supports are known as hypotheses. Notice that a formula can be both a derived formula and a hypothesis, if there is at least one derivation support and one non-derivation support with that formula. When A is a derivation support, its origin set is the set of hypotheses underlying this specific derivation of form ^Ah . If A is a non-derivation support, its origin set is " form (A) , . For instance, suppose that agent ag’s knowledge base contains only three supports, as shown in (1). KB (ag) = {G A, Obs, {A} H , G A " B, Peter, {A " B} H , G B " C, Susan,{B " C} H} . (1) In other words, there are three hypotheses: A, A " B and B " C. The first was observed by the agent, the second was communicated by agent Peter, and the third was communicated by agent Susan. If the agent combines the first two hypotheses to derive B, this originates a derivation support with the origin set " A, A " B , (the hypotheses underlying the derivation). If, then, the agent combines the newly derived formula (B) with the third hypothesis (B " C ) to derive C, this originates another derivation support with the origin set " A, A " B, B " C , . After these two derivations take place, the agent’s knowledge base contains five supports, shown in (2). KB (ag) = {G A, Obs, {A} H , G A " B, Peter, {A " B} H , G B " C, Susan, {B " C} H , G B, Der, {A, A " B} H , G C, Der, {A, A " B, B " C} H} . (2) Now we move to the second item in the agent’s internal state, namely the agent’s goals. We recall that WTR aims at modeling wishful thinking within the scope of goal satisfaction. Hence, whatever the agent wants to be true, that is meant to be captured by WTR, must be expressed in terms of goals. We represent by Goals(ag) the set of goals of agent ag and, for every g d Goals(ag), we write: ❏ GDesc( g) d L ag to represent the goal’s description, that is, the formula representing the proposition that the agent wants to be true ❏ GImp( g) d @ 0, 1 6 to represent the goal’s importance, that is, the weight that the agent associates with the goal. WTR assumes that the agent associates a value of subjective credibility with each of the other agents in the current world. The value of subjective credibility that an agent ag 1 associates with another agent ag2 reflects the degree to which ag 1 believes in what ag2 communicates, and is represented by Cred( ag 1, ag 2 ) d @ 0, 1 6. This may start as a default value, and evolve based on the interactions between ag1 and ag2. The wishful thinking coefficient of an agent, ag, reflects the degree to which ag is susceptible to wishful thinking, and is IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® represented by wt (ag) d 60, 1 6. This value is As such, WTR treats, in a uniform fashion, “disliked assumed to be related to the agent’s personality traits (e.g., increasing with extraversion and situations” and “conflicting collected data.” decreasing with conscientiousness). While the creation of observation and communication supports is fairly intuitive, some rules need to context, because the knowledge base has derivation supports be made clear about the creation of derivation and wishful for all possible derivations. In other words, the expression for thinking supports. determining belief space becomes equivalent to {U: } = ag U} . Wishful thinking supports originate from the agent’s goals. If In WTR, we say that a context is consistent, as far as the agent ag is the agent considered by WTR, there is exactly one wishful knows, if and only if, for every formula in the corresponding thinking support in KB(ag) for each of ag’s goals, according to (3). belief space, its negation is not in that belief space. This is expressed in the definition of predicate Cons (Definition 3).3 6g ! Goals (ag): Definition 3: When } is the context believed by agent ag, we say that } is consistent, as far as ag knows, if and only if Cons( } , ag) G GDesc (g), WT, {GDesc (g)} H ! KB (ag) . (3) holds, where predicate Cons satisfies the following condition: The only exception to this rule is when the agent is Cons (}, ag) + 6U ! BS (}, ag): JU g BS (}, ag). deprived of wishful thinking (i.e., wt(ag) = 0), in which case no wishful thinking support exists in KB(ag). The management of derivation supports is out of the scope Notice that, for logically omniscient agents, this corresponds of WTR, but certain rules must be followed. Given agent ag, to saying that } is consistent, as far as ag knows, iff } E ag = . for any derivation support, G U, Der, a H ! KB (ag), the followAs explained above, if we consider an agent that is not logiing conditions must hold: cally omniscient, some important properties follow: 1) a = ag U (obviously, the formula must be derivable from 1) A logical consequence of the believed context is not necthe origin set) essarily a belief 2) A context that is consistent as far as the agent knows, may 2) J7al 1 a: al=ag U (origin sets must be minimal) be logically inconsistent. 3) U g a (self derivations are redundant and should not be This happens because an agent of this kind does not necesregistered). sarily derive everything that is possible, and is naturally ignorant An agent that always generates every possible derivation is concerning what was not yet concluded. Clearly, this is the case known as a logically omniscient agent. WTR accounts for both of humans. logically omniscient and non-omniscient agents. As expected, a context, in WTR, is defined as a set of hypotheses. Depending on the believed context, we can determine the V. Wishful Thinking Revision valid supports for a formula according to Definition 1. In this section we describe how belief revision occurs in WTR. Definition 1: When } is the context believed by agent ag, the In other words, we describe the process that determines the set of valid supports for a formula, U, is given by Sups (U, }, ag), agent’s beliefs at a given moment. defined as: If ag is our agent, we divide the hypotheses in the agent’s knowledge base (i.e., in KB(ag)) in two (possibly intersecting) sets: Sups (U, }, ag) = {A ! KB (ag): form (A) = U / os (A) 3 }} . b 0 ={U: 7 (A ! KB (ag)) form (A)= U / ot (A) ! N , {Obs}} is the set of collected data. It contains all the hypotheses In other words, the valid supports for a formula are all the that originate from the world (via observations and supports for that formula where the origin set is entirely believed communications). (i.e., where the origin set is a subset of the believed context). An agent’s beliefs, at a given moment, are all the hypotheses c 0 ={U: 7 (A ! KB (ag)) form (A) = U / ot (A) = WT} is the in the context believed by that agent, and all the derived formuset of wishful thoughts. It contains all the hypotheses that originate las that can be derived from that context. Put simply, an agent’s from goals (via wishful thinking). beliefs are all the formulas that have at least one valid support. WTR is responsible for determining, at a given moment, Definition 2: When } is the context believed by agent ag, ag’s what consistent subset of b 0 , c 0 is believed. Such subset is obvibelief space2 (i.e., the set with ag’s beliefs) is given by BS( } , ag), ously a context, since the elements of b 0 , c 0 are hypotheses. defined as: We represent that context, i.e. the believed context, by b c. We distinguish two (possibly intersecting) subsets of b c: BS (}, ag) = {U: Sups (U, }, ag) ! 4}. c b = b + b 0 is the set of base beliefs, that is, of collected data that is believed. Notice that, for logically omniscient agents, the belief space corresponds to all the logical consequences of the believed 3 2 The concept of belief space is also adapted from the SWM logic [14]. Throughout the remaining sections of this paper, wherever we write about consistency, without further specifying the type of consistency, we refer to the definition of Cons (Definition 3). MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 67 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® = c + c 0 is the set of wishful beliefs, that is, of wishful thoughts that are believed. Figure 1 depicts the main sets involved in the revision process, and the existing support dependencies. The agent’s belief space, at a given moment, corresponds to the union of the believed context (b c) with the set labeled Derived Beliefs. Derived Beliefs is the set of all derived formulas that are not hypotheses and have at least one valid support. A belief revision theory that does not account for wishful thinking simply aims at finding a consistent context in the collected data, b 0 . Since WTR aims at finding a consistent context in b 0 + c 0, it addresses, not only conflicts among collected data, but also conflicts between collected data and wishful thoughts. As such, WTR treats, in a uniform fashion, “disliked situations” and “conflicting collected data.” In fact, in this paradigm, a contradiction can have, on each side of the conflict, combined forces from collected data and/or from wishful thinking. Wishful thinking forces are of a weaker nature and, usually, not enough to singlehandedly overthrow collected data, but can easily be the element that “turns the tide” in a conflict among collected data. As discussed in Section I, WTR is context-oriented, in other words, it is guided by an ordering of contexts (instead of an ordering of beliefs). More specifically, WTR determines the believed context, b c, by comparing the values of preference of several candidate contexts. This process consists of three steps: 1) Determining the candidate contexts (described in Section V-A) 2) Determining the preference of each candidate context (described in Section V-B) 3) Choosing a context (described in Section V-C). c b mean all the contexts that may potentially be chosen as the believed context, depending only on a measurement of context preference. A context is a candidate context, if and only if it satisfies the three following conditions: ❏ It is a subset of b 0 , c 0 ❏ It is consistent, as far as the agent knows ❏ It is a maximal set, in other words, it is not a proper subset of another candidate context. The first condition is necessary because, as explained in Section V, the believed context is a subset of b 0 , c 0 . Notice that b 0 , c 0 contains the only basic formulas for which there are reasons to believe, that is, the only hypotheses. The second condition is common in all belief revision theories, following from the fact that an agent with inconsistent beliefs is ineffective. Finally, the third condition is also common in all belief revision theories, following the same criterion behind the minimal change principle [9] (see Section II): We do not reject a belief for which there are reasons to believe and no reasons against it. Requiring that candidate contexts be maximal ensures that, if } 1 is a candidate context and } 2 1 } 1, then } 2 is not a candidate context, because choosing } 2 corresponds to rejecting belief in } 1 \} 2, for no reason. Following these three conditions, the set of candidate contexts is determined according to Definition 4. Definition 4: If b 0 and c 0 are, respectively, the collected data and wishful thoughts of an agent, ag, the set of candidate contexts is given by Cand (b 0, c 0, ag), defined as: Cand (b 0, c 0, ag) = {m 3 b 0 , c 0: Cons (m, ag) / (J7n 3 b 0 , c 0: Cons (n, ag) / (m 1 n))}. A. Candidate Contexts In order to determine the believed context, the first step is to determine the candidate contexts. By candidate contexts, we Derived Beliefs (Valid) Der Supports Context (bc) Wishful Beliefs (c) Base Beliefs (b) Collected Data (b0) Obs/Comm Supports World Wishful Thoughts (c0) WT Supports Goals FIGURE 1 WTR: Main sets and support dependencies. 68 B. Context Preference As explained, WTR selects the believed context among the candidate contexts, depending on their value of preference. In this section we explain how the preference of a context is determined. Note that, in the process of determining the preference of a context, we assume that context is believed, and represent it by b c (the believed context). In order to determine the preference of a context, a more basic measure is necessary: belief certainty. As discussed in Section I, the certainty of a belief is based on the reasons that originated that belief, i.e., its causes. For this reason, we refer to belief certainty as causal strength. In WTR, the causal strength of a given belief is determined according to that belief ’s valid supports. If U is believed by agent ag, when ag’s believed context is c c b (i.e., U d BS (b , ag)), the causal strength (or, simply, strength) of U is given by CauStr(U, b c, ag) d @ 0, 1@ . A belief with a causal strength of 1 represents a belief without doubt, and we call it an absolute certainty (or, simply, certainty). It will become clear that, in WTR, certainties can only be overthrown by other conflicting certainties. Since causal strength of a belief is based on that belief ’s valid supports, we start by determining the causal strength conveyed by each of those supports, and then combine the IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® results. In this line of thought, if A is a valid In our approach, every goal produces a wishful support for its formula, when b c is the context believed by agent ag, we say that the thought about its achievement, in the form of a weak (causal) strength conveyed by A is given by belief. Consequently, a “disliked situation” leads to SuppStr(A, b c, ag) d @ 0, 1@ . c inconsistent beliefs, thus triggering belief revision. So, if agent ag’s believed context is b and U d BS (b c, ag), the value of CauStr beliefs and the third focuses on the context’s wishful beliefs and (U, b c, ag) results from combining the values of SuppStr their opposite counterparts (i.e., disliked beliefs): (A, b c, ag), for each A in Sups(U, b c, ag). 1) The most important factor is the number of absolute cerWTR does not impose any particular definition for functainties in the base beliefs. Given the meaning of absolute tion CauStr, however, certain conditions are postulated. For any certainty, it is only natural that an agent always believes in formula, U , believed by agent ag, when ag’s believed context is c all of its absolute certainties. The only situation where we b , the definition of CauStr should be such that the following find conceivable not to believe in an absolute certainty is conditions hold: if there are other conflicting absolute certainties (an atyp1) If there is a support, in Sups(U, b c, ag) , that conveys a ical situation). To achieve this behavior, in WTR, we strength of 1, then CauStr(U, b c, ag) = 1; ensure that a context with more absolute certainties (than 2) Otherwise, having one more support in Sups(U, b c, ag), another), among the base beliefs, always has a greater or having a higher strength of a support in Sups value of preference. (U, b c, ag), increases CauStr(U, b c, ag). 2) Apart from certainties, the number and causal strength of These postulates impose merely intuitive properties, respecthe other (uncertain) base beliefs also influences the preftively: 1) If I have a reason to be absolutely certain that U is erence of the context. Obviously, the agent prefers beliefs true, other reasons supporting belief in U will not invalidate that have a greater degree of certainty over beliefs that my certainty; 2) If I believe in U (but not with absolute cerhave a lower degree, and prefers to keep a larger number tainty), having more reasons or stronger reasons to believe in U of beliefs over keeping a smaller number (according to increases the level of certainty of my belief in U . the minimal change principle). Once again, WTR does not impose any particular defini3) The third factor is likeability, meant to capture the inflution for function SuppStr, however, certain conditions are posence of wishful thinking. More specifically, a context’s tulated. For any formula, U , believed by agent ag, when ag’s likeability is an assessment of the corresponding belief believed context is b c , any other agent ag l , and any set of space, in terms of: a) the number and strength of beliefs in hypotheses a 3 b c, the definition of SuppStr should be such goal achievements (wishful beliefs), in combination with that the following conditions hold: the importance of the corresponding goals, and b) the 1) SuppStr ^ U, ag l , " U , , b c, ag h increases with Cred number and strength of beliefs in negations of goal (ag, ag l ) achievements, in combination with the importance of the 2) SuppStr^ U, WT, " U , , b c, ag h increases with the imporcorresponding goals. tance of the goal with description U, and with wt(ag) The context preference (or, simply, preference) that an agent, ag, 3) SuppStr ^ U, Der, Y 0 , b c, ag h = 1 attributes to a context, b c , is given by CtxPrf(b c, ag) d R + . 4) SuppStr ^ U, Der, a , b c, ag h remains the same if a has Since we want the number of certainties, among base beliefs, one more certainty, decreases if a has one more non-cer(i.e., factor 1. discussed above) to have more weight than all tainty, and increases if the strength (considering context c other factors, we define the preference of a context as that numb \ " U ,, to disregard derivation cycles) of one of the ber (of certainties), added to a value in @ 0, 1 6 that accounts for beliefs in a increases. These postulates impose merely intuitive properties, respecthe remaining factors (i.e., factors 2. and 3. discussed above).This tively: 1) The more credible I find someone, the stronger I added value is given by LessSigPrf(b c, ag) d @ 0, 1 6 (Less believe that person’s communications (this is, in fact, how we Significative Preference). This formulation ensures that having one defined Cred, in Section IV); 2) If having U as a goal is a reason more certainty always grants more context preference than any for me to believe in U , that reason’s strength increases with the other combination of factors. goal’s importance and with how susceptible I am to wishful Definition 5: Given agent ag and context b c (where b is thinking (this is in accordance with the meaning of goal imporobtained from b c and ag, as defined in the beginning of Section V), tance and with the definition of wt, discussed in Section IV); function CtxPrf is defined as follows: 3) A belief that corresponds to a tautology is obviously a cerCtxPrf (b c, ag) = # {U ! b: CauStr (U, b c, ag) = 1} tainty; 4) If I derive U , based on a set of believed hypotheses, the strength of this derivation decreases with the overall uncer+ LessSigPrf (b c, ag). tainty in that set of hypotheses. Finally, in WTR, the preference of a context accounts for We define LessSigPrf(b c, ag) as a mapping, to @ 0, 1 6, three factors, where the first two focus on the context’s base of LSP (b c, ag) d R Zero preference is mapped to 0.5, positive MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 69 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® preference (R +) is mapped to the interval @ 0.5, 1 6, and negative preference (R -) is mapped to the interval @ 0, 0.5 6. We chose a simple function (Definition 6) that performs this mapping, obviously a continuous increasing function (to preserve the order of preference). Definition 6: Given agent ag and context b c , function LessSigPrf is defined as follows: LessSigPrf (b c, ag) = LSP (b c, ag) + 0.5. 1 + 2 # |LSP (b c, ag)| Hence, LSP (b c, ag) corresponds to a value of preference, in R, that accounts factors 2. and 3. already discussed, namely: ❏ The uncertain base beliefs. This measure of preference is represented by UncertPrf(b c, ag) d R 0+ . ❏ Likeability. This measure of preference is represented by LkbPrf(b c, ag) d R. To this end, LSP (b c, ag) combines the values of UncertPrf c (b , ag) and LkbPrf(b c, ag) , as an average that is weighted according to the agent’s wishful thinking coefficient. Definition 7: Given agent ag and context b c, function LSP is defined as follows: LSP(b c, ag) = (1 - wt (ag)) # UncertPrf (b c, ag) + wt (ag) # LkbPrf(b c, ag) . Function UncertPrf is based on the uncertain base beliefs, determined by function Uncert. Definition 8: When b c is the context believed by agent ag (where b is obtained from b c and ag, as defined in the beginning of Section V ), the set of uncertain base beliefs in b c is given by Uncert (b c, ag), defined as follows: Uncert (b c, ag) = {U ! b: CauStr(U, b c, ag) ! 1} . WTR does not impose any particular definition for function UncertPrf, however, certain conditions are postulated. Given a context, b c , believed by an agent, ag, the definition of UncertPrf should be such that the following conditions hold: 1) When Uncert(b c, ag) = Y 0, then UncertPrf(b c, ag) = 0 2) The value of UncertPrf(b c, ag) increases when there is one more belief in Uncert(b c, ag) or when the strength of one of the beliefs in Uncert(b c, ag) increases (remaining an uncertain belief ). These postulates impose merely intuitive properties, respectively: 1) When there are no uncertain base beliefs, the preference conveyed by this component is the lowest (zero); 2) As mentioned, the agent prefers beliefs that have a greater degree of certainty over beliefs that have a lower degree, and prefers to keep a larger number of beliefs over keeping a smaller number (according to the minimal change principle). Function LkbPrf is based on the goals believed to be achieved and on the goals believed not to be achieved, determined by functions Achv and NotAchv, respectively. Definition 9: When b c is the context believed by agent ag (where c is obtained from b c and ag, as defined in the beginning of 70 Section V), the set of goals that ag believes to be achieved is given by Achv(b c, ag), and the set of goals ag believes not to be achieved is given by NotAchv(b c, ag), defined as follows: Achv (b c, ag) = {g ! Goals (ag): GDesc (g) ! c}; NotAchv (b c, ag) = {g ! Goals (ag): JGDesc (g) ! BS (b c, ag)} . WTR does not impose any particular definition for function LkbPrf, however, certain conditions are postulated. Given a context, b c , believed by an agent, ag, the definition of LkbPrf should be such that the following conditions hold: 1) When Achv(b c, ag) = NotAchv(b c, ag) = Y 0, then LkbPrf (b c, ag) = 0 2) LkbPrf(b c, ag) increases when there is one more goal in Achv(b c, ag), or when a goal in Achv(b c, ag) has its importance increased or is believed with greater strength 3) LkbPrf(b c, ag) decreases when there is one more goal in NotAchv(b c, ag) , or when a goal in NotAchv(b c, ag) has its importance increased or its negation is believed with greater strength 4) The more important the goal, the greater the impact (on likeability) of changing the strength of belief in its achievement or its negation. These postulates impose merely intuitive properties, respectively: 1) Since likeability, in WTR, is defined in terms of goal satisfaction, having no beliefs regarding the achievement of goals presents neither positive nor negative likeability; 2) Believing that more goals are achieved, believing that more important goals are achieved, or believing more strongly that goals are achieved, are all factors that increase likeability; 3) Conversely, believing that more goals are not achieved, believing that more important goals are not achieved, or believing more strongly that goals are not achieved, are all factors that decrease likeability; 4) Changing the strength of a belief about a goal achievement has a greater impact on likeability when that goal is more important. C. The Believed Context In Section V-A we explain how WTR determines the set of candidate contexts, and in Section V-B we explain how WTR can associate a value of preference with each of those contexts. The context that is believed by the agent is now chosen as the candidate context with highest value of preference. In other words, if b 0 and c 0 are, respectively, agent ag’s collected data and wishful thoughts, ag’s believed context is b c, as determined by (4). b c = arg ! Cand( max, ,ag)CtxPrf (}, ag). } b0 c0 (4) VI. Degrees of Belief As explained in the previous sections, WTR considers a broad notion of belief. Tendencies/inclinations to believe are represented as weak beliefs, that is, beliefs with a very low degree of certainty (causal strength). Typical examples of weak beliefs, in IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® WTR, are those that originate from the existence of goals (active effects of wishful thinking). Clearly this does not correspond to the commonsense notion of belief. If I have the goal of having a boat and somehow do not have any information confirming or disconfirming that I have one, I do not automatically believe that I have one just because I would like to. In this section we explain how a belief in WTR may or may not represent a belief for commonsense. Labeling a piece of data as a “belief” corresponds to saying that, for a certain purpose/aim, one will rely on the fact that that piece of data is true. As Paglieri puts it, “beliefs are data accepted as reliable (…) considered as ‘safe ground’ for reasoning and action” [17]. In other words, for commonsense, a belief is only a belief if one holds it with some minimum degree of certainty that makes it reliable enough for reasoning and action. This degree of certainty varies, depending on the particular reasoning/action (henceforth referred to as aim) to which the belief is relevant. Put simply, belief is a relative notion, in the sense that a piece of data may be a belief for some aim, and may not be a belief for another aim. We recall that, in WTR, belief certainty is modeled with function CauStr (defined in Section V-B). Hence, given the appropriate thresholds that depend on the particular aims, this relative notion of belief is modeled using a straightforward approach. If b c is the context believed by agent ag, and taim d 60, 1@ is the threshold defined by some aim (aim) that is dependent on a belief, U d BS (b c, ag), then U is also a belief for the purposes of aim if and only if (5) holds. CauStr (U, b c, ag) $ t aim . (5) The determination of the appropriate threshold for a given aim is out of the scope of WTR. Intuitively, these thresholds should be associated to an assessment of: a) The potential losses when the aim is followed, based on a false belief; b) The potential gains when the aim is followed, based on a true belief; c) The difficulty in acquiring more information that supports or disconfirms the belief. VII. Testing WTR In the previous sections we have presented the WTR model. In this section we test this model, by observing the belief states of an agent that uses WTR. In Section VII-A, we present a concrete instantiation of the WTR model. In Section VII-B, we test this instantiation in a sequence of scenarios that are representative of the most relevant types of possible situations. A. An Instantiation of WTR As we have seen, the definition of some of the functions used by WTR is not imposed by the model, therefore WTR may have different instantiations. More concretely, functions CauStr, SuppStr, UncertPrf and LkbPrf are left undefined, though their definitions must follow the postulates presented in Section V-B. In this section we make an instantiation of WTR by presenting a definition for each of these four functions. In accordance with the postulates presented in Section V-B, we present a definition for function CauStr (Definition 10). Definition 10: For any agent ag, any context b c, and any belief U d BS (b c, ag): CauStr (U, b c, ag) = 1- % (1 - SuppStr (A, b c, ag)) . A ! Sups(U, b , ag) c Notice that the expression used in this function is equivalent to the expression that determines the probability of the disjunction of independent events. In accordance with the postulates presented in Section V-B, we present a definition for function SuppStr (Definition 11). Definition 11: For any agent ag, any context b c, any belief U d BS (b c, ag), and any support U, T, a d Sups (U, b c, ag): SuppStr ( U, T, a , b c, ag) = Z 1, ] (1 -wt (ag))/wt(ag) , ] GImp(g) [ ]% CauStr (U i, b c \ " U ,, ag), ] Ui d \ Cred (ag, T), a if T = Obs; if T = WT, where U = GDesc (g); if T = Der; if T d N. According to this definition, an observation support conveys a strength of 1, making every observed belief an absolute certainty. With respect to communications supports, the conveyed strength equals the credibility attributed to the communicating agent (the agent of name T ). The expression used to determine the strength conveyed by a wishful thinking support increases with the importance of the corresponding goal and with the agent’s wishful thinking coefficient, in a way that: a) The limit of the conveyed strength, as the coefficient approaches 1, is 1; b) The limit of the conveyed strength, as the coefficient approaches 0, is 0; c) When the coefficient is 0.5 the conveyed strength equals the goal importance. We recall that when the coefficient is 0 there are no wishful thinking supports, as explained in Section IV. Finally, the expression used to determine the strength conveyed by a derivation support consists of a multiplication of the causal strength of the hypotheses in its origin set ( a ). This ensures that the result decreases with the number of uncertainties in a and with their degree of uncertainty (i.e. the distance between their causal strength and 1). Notice that to determine the causal strength of the beliefs in a we consider that the believed context does not include U , to disregard any existing cyclical derivations. In accordance with the postulates presented in Section V-B, we present a definition for function UncertPrf (Definition 12). Definition 12: For any agent ag, and any context b c: UncertPrf (b c, ag) = / CauStr (U, b c, ag) . U ! Uncert(b , ag) c Notice that the expression used in this function simply sums the causal strength of the beliefs in Uncert (b c, ag) (i.e., of MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 71 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® TABLE 1 Table of definitions, indexed by reference number. REF. DEFINITION 1 WHEN } IS THE CONTEXT BELIEVED BY AGENT ag, THE SET OF VALID SUPPORTS FOR A FORMULA, U , IS GIVEN BY Sups(U, }, ag), DEFINED AS: Sups (U, }, ag) = " A d KB (ag): form (A) = U / os (A) 3 } , . 2 WHEN } IS THE CONTEXT BELIEVED BY AGENT ag, ag’S BELIEF SPACE (I.E., THE SET WITH ag’S BELIEFS) IS GIVEN BY BS (}, ag), DEFINED 0,. AS: BS (}, ag) = " U: Sups (U, }, ag) ! Y 3 WHEN } IS THE CONTEXT BELIEVED BY AGENT ag, WE SAY THAT } IS CONSISTENT, AS FAR AS ag KNOWS, IF AND ONLY IF Cons (}, ag) HOLDS, WHERE PREDICATE CONS SATISFIES THE FOLLOWING CONDITION: Cons (}, ag) + 6U ! BS (}, ag): JU g BS (}, ag) . 4 IF b 0 AND c 0 ARE, RESPECTIVELY, THE COLLECTED DATA AND WISHFUL THOUGHTS OF AN AGENT, ag, THE SET OF CANDIDATE CONTEXTS IS GIVEN BY Cand (b 0, c 0, ag), DEFINED AS: Cand (b 0, c 0, ag) = {m 3 b 0 , c 0: Cons (m, ag) / (J7n 3 b 0 , c 0: Cons (n, ag) / (m 1 n))} . GIVEN AGENT ag AND CONTEXT b c (WHERE b IS OBTAINED FROM b c AND ag, AS DEFINED IN THE BEGINNING OF SECTION V), FUNCTION CtxPrf IS DEFINED AS FOLLOWS: 5 CtxPrf(b c, ag) = # {U ! b: CauStr (U, b c, ag) = 1} + LessSigPrf(b c, ag) . GIVEN AGENT ag AND CONTEXT b c , FUNCTION LessSigPrf IS DEFINED AS FOLLOWS: 6 LessSigPrf (b c, ag) = LSP (b c, ag) + 0.5. 1 + 2 #| LSP (b c, ag) | GIVEN AGENT ag AND CONTEXT b c , FUNCTION LSP IS DEFINED AS FOLLOWS: 7 LSP (b c, ag) = (1 - wt (ag)) # UncertPrf (b c, ag) + wt (ag) # LkbPrf(b c, ag) . WHEN b c IS THE CONTEXT BELIEVED BY AGENT ag (WHERE b IS OBTAINED FROM b c AND ag, AS DEFINED IN THE BEGINNING OF 8 SECTION V), THE SET OF UNCERTAIN BASE BELIEFS IN b c IS GIVEN BY Uncert (b c, ag) , DEFINED AS FOLLOWS: Uncert(b c, ag) = {U ! b: CauStr (U, b c, ag) ! 1} . WHEN b c IS THE CONTEXT BELIEVED BY AGENT ag (WHERE c IS OBTAINED FROM b c AND ag, AS DEFINED IN THE BEGINNING OF 9 SECTION V), THE SET OF GOALS THAT ag BELIEVES TO BE ACHIEVED IS GIVEN BY Achv (b c, ag) , AND THE SET OF GOALS ag BELIEVES NOT TO BE ACHIEVED IS GIVEN BY NotAchv (b c, ag) , DEFINED AS FOLLOWS: Achv (b c, ag) = { g ! Goals (ag): GDesc (g) ! c}; NotAchv (b c, ag) = {g ! Goals (ag): JGDesc (g) ! BS (b c, ag)} . FOR ANY AGENT ag, ANY CONTEXT b c , AND ANY BELIEF U d BS (b c, ag): 10 CauStr (U, b c, ag) = 1 - % (1 - SuppStr (A, b c, ag)) . A ! Sups(U, b c, ag) 11 FOR ANY AGENT ag, ANY CONTEXT b c , ANY BELIEF U d BS (b c, ag), AND ANY SUPPORT U, T, a d Sups(U, b c, ag): Z 1, if T = Obs; ] ] GImp (g) (1 - wt (ag))/wt (ag), if T = WT, where U = GDesc (g); SuppStr ( U, T, a , b c, ag) = [ c if T = Der; ] % U i d a CauStr (U i, b \ " U ,, ag), ] Cred (ag, T), if T ! N. \ 12 FOR ANY AGENT ag, AND ANY CONTEXT b c: UncertPrf (b c, ag) = CauStr(U, b c, ag) . / U ! Uncert(b c, ag) FOR ANY AGENT ag, AND ANY CONTEXT b c: 13 LkbPrf( b c, ag) = c / g ! Achv(b c, ag) CauStr (GDesc (g), b c, ag) # GImp (g) m - the uncertain base beliefs of b c ) . The result of this sum consists of the preference represented by uncertain base beliefs. In accordance with the postulates presented in Section V-B, we present a definition for function LkbPrf (Definition 13). Definition 13: For any agent ag, and any context b c: LkbPrf (b c, ag) = c / CauStr (GDesc(g), b c, ag) # GImp (g) m - / CauStr (JGDesc (g), b c, ag) # GImp (g) . g ! Achv(b c, ag) g ! NotAchv(b c, ag) 72 / CauStr (JGDesc(g), b c, ag) # GImp (g) g ! NotAchv(b c, ag) Notice that the expression used in this function simply adds a term for every belief in the achievement of a goal, and subtracts a term for every belief in the negation of a goal achievement. Each of these terms consists of the strength of the corresponding belief multiplied by the importance of the corresponding goal. B. Example Scenarios In this section we show the behavior of an agent that uses WTR, in a sequence of scenarios that capture the most relevant types of situations, with respect to WTR. The concrete IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® instantiation of WTR used in this section is the Favoring realism too much may lead to strong and/or one presented in the previous section (Section VII-A). In Table 1 we review all the definitions long negative affective states, but favoring satisfaction presented in this article. too much leads to losing track of reality, and in both Each scenario corresponds to a particular cases individual performance is impaired. stage of the agent’s collected data, wishful thoughts, and other parameters. In these scenarios we analyze the agent’s belief states, as below: ❏ CauStr (U bt, b c, ag) - 0.086. So, have(Boat) is a very weak belief. In fact, it is not a belief for the purposes of answer ❏ What is the context believed by the agent (and resulting (since 0.086 < 0.2). belief space)? ❏ What is the certainty (causal strength) with which the agent ❏ CauStr (U al, b c, ag) - 0.887. So, alive(Mother) is a relatively holds a given belief ? Is it enough to be a belief for the purstrong belief and a belief for the purposes of answer (since pose of a given aim? 0.887 $ 0.2). We are interested in showing how WTR is capable of simuSo, even for the purpose of giving an answer that does not lating passive and active effects of the wishful thinking pheinvolve a big commitment (reflected in the low threshold), the nomenon, explained in Section III. agent considers herself ignorant regarding her possession of a We assume an agent, ag, that reasons using first order logic. In boat. On the other hand, since the goal of having her mother other words, L ag = L FOL and = ag = = FOL . We also assume that alive has a very high importance, the active effects of wishful thinking lead the agent to answer that her mother is alive, even the agent is not logically omniscient and that, unless specified though there is no evidence supporting it. otherwise, the agent’s wishful thinking coefficient is wt(ag) = 0.3. Although it might seem strange that the agent does not We refer to the agent in the female form (representing a know anything regarding her mother’s state, one can perhaps human female). As a simplification, we call answer to the aim of imagine that (for some reason) the agent has not been in conanswering to any uncompromising question about a belief (e.g., tact with her mother for many years. In this sense, suppose “do you believe you have a boat?”). Since such answers do not that the agent would like to buy a new expensive TV set for involve any big commitments from the agent, we define a relaher mother. We refer to this aim as TV, and we assume that tively low strength threshold for the corresponding aim, more tTV = 0.9, with respect to belief in U al . Consequently, the specifically tanswer = 0.2 (see Section VI). agent would not buy the new TV (at least not at the 1) Scenario 1 moment) because: To begin, we assume that the agent has not collected any data, ❏ CauStr (U al, b c, ag) - 0.887. So, although alive(Mother) is a in other words, b 0 =Y relatively strong belief, it is not a belief for the purposes of 0. The agent has, however, two goals: To TV (since 0.887 < 0.9). have a boat (a goal of medium-low importance), and to have her mother alive (an extremely important preservation goal). 2) Scenario 2 More concretely, Goals(ag) = " g bt, g al ,, where: Here we expand the previous scenario with some collected ❏ U bt = GDesc( gbt ) = have(Boat) data. What is typical of active-pursuit goals (such as the goal of ❏ U al = GDesc( gal ) = alive(Mother) having a boat) is that one already knows that the goal is not ❏ GImp( gbt ) = 0.35 achieved when that goal is set. For instance, if I set a goal of ❏ GImp( gal ) = 0.95. having a boat, I typically know that I do not have it because, Since b 0 , c 0 is consistent, it becomes the only candidate through regular observation and memory of the events in my context, that is, Cand (b 0, c 0, ag) = {{U bt, U al}} . Given that life, I know that I have never bought a boat and none was ever there is only one candidate context, it becomes the believed given to me. For the purposes of this scenario, we say that the context. Moreover, since there are no derivations, the agent’s agent observes she does not have a boat ( JU bt ). In addition, belief space also corresponds to that context. In other words, c c b = {U bt, U al} and BS (b , ag) = {U bt, U al} . agent David (another agent) tells ag that her mother boarded flight number 17, though ag thinks of David as having a not too Since there is no information contradicting the agent’s high credibility, more specifically, Cred(ag, David) = 0.5. We wishful thoughts (that she has a boat and that her mother is represent the information communicated by David as: alive), they become beliefs. This phenomenon corresponds to the active effects of wishful thinking, as explained in Section ❏ U 17 = inFlight (Mother, 17). III. But to what degree are these beliefs reliable? For instance, Since JU bt (the negation of one of the agent’s wishful let us suppose that the agent is questioned about these two thoughts) was observed, b 0 , c 0 is inconsistent. As a result, there beliefs. In order to determine if these are beliefs for the purare two candidate contexts, Cand (b 0, c 0, ag) = {{U bt, U al, U 17}, pose of aim answer, we need to compare their strength with the {JU bt, U al, U 17}}, with the following values of preference: appropriate threshold (we recall that tanswer = 0.2), as explained ❏ CtxPrf ({U bt, U al, U 17}, ag) - 0.775 in Section VI: ❏ CtxPrf ({JU bt, U al, U 17}, ag) - 1.749. MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 73 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Notice that, while believing in U bt slightly adds to context preference due to likeability, believing in JU bt adds necessarily more because it consists of an absolute certainty. As explained in Section V-B, likeability is one of the aspects that contributes to the less significative preference (a value in @ 0, 1 6), while an absolute certainty conveys a preference of 1. This is why the preference of {JU bt, U al, U 17} is almost 1 unit above the preference of {U bt, U al, U 17} . The same phenomenon occurs in all the following scenarios, preventing contexts that contain U bt bt to be believed (which is natural, since JU bt is an absolute certainty). Consequently, the believed context is b c = {JU bt, U al, U 17}, and the resulting belief space is BS ( b c, ag) = {JU bt, U al, U 17} . Let us focus on what changed in the agent’s beliefs, in relation to the previous scenario. Now the agent has two new beliefs: That she does not have a boat and that her mother boarded flight 17. We look at the certainty (i.e. causal strength) of these two beliefs, to conclude whether or not they are beliefs for the purpose of answer: ❏ CauStr (JU bt, b c, ag) = 1. So, Jhave(Boat) is an absolute certainty and, obviously, a belief for the purposes of answer (since 1 $ 0.2). ❏ CauStr (U bt, b c, ag) = 0.5. So, inFlight(Mother, 17) is a belief of intermediate strength and a belief for the purposes of answer (since 0.5 $ 0.2). So, regarding these two beliefs, the agent answers that she does not have a boat and that her mother boarded flight 17. In other words, the agent believes her observation and David’s communication, and the strength of these beliefs is far greater than that of the belief of having a boat in the previous scenario. 3) Scenario 3 Continuing from the previous scenario, the agent now watches a news report, announcing that flight number 17 crashed, leaving no survivors. The agent quickly concludes that this information, combined with what David has told her ( U 17 ), imply that her mother has died. Assume that Reporter is the agent who communicated the crash, who our agent (ag) finds quite credible, more specifically, Cred(ag, Reporter) = 0.8. We represent (part of) the news report about the plane crash as: ❏ U cr = 6 (x) inFlight (x, 17) " Jalive(x) . We consider that the agent’s knowledge base contains a derivation support, JU al, Der, {U 17, U cr } , because the agent was able to conclude that " U 17, U cr , , = FOL JU al . Now there is a second inconsistency in b 0 , c 0 because U 17 and U cr imply believing in JU al (hence, are inconsistent with U al ). As a result, there are six candidate contexts, Cand (b 0, c 0, ag) = {{U bt, U al, U 17}, {U bt, U al, U cr }, {U bt, U 17, U cr }, {JU bt, U al, U 17}, " JU bt, U al, U cr ,, " JU bt, U 17, U cr ,, with the following values of preference: ❏ CtxPrf ({U bt, U al, U 17}, ag) - 0.775 ❏ CtxPrf ({U bt, U al, U cr }, ag) - 0.811 ❏ CtxPrf ({U bt, U 17, U cr }, ag) - 0.808 ❏ CtxPrf ({JU bt, U al, U 17}, ag) - 1.749 ❏ CtxPrf ({JU bt, U al, U cr }, ag) - 1.793 ❏ CtxPrf ({JU bt, U 17, U cr }, ag) - 1.790. 74 Consequently, the believed context is b c = {JU bt, U al, U cr }, and the resulting belief space is BS ( b c, ag) = {JU bt, U al, U cr } . Notice that the agent does not believe her mother boarded flight 17 and, instead, believes that her mother is alive, despite having no evidence to support it (active effects of wishful thinking). The agent is in denial concerning her mother’s death, mainly due to the very high goal importance of having her mother alive (0.95) and to the low credibility of David (who communicated U 17 ). Notice that denial was achieved simply by rejecting belief in the “weakest link,” namely David’s communication, U 17 (instead of the news report, U cr , that has a higher causal strength). Note, however, that the personality of the agent (more concretely, the agent’s wishful thinking coefficient) is also responsible for this denial. 4) Scenario 4 We now suppose that another agent, Bruno, tells ag that her mother boarded flight 17. Our agent considers Bruno quite credible, more specifically, Cred(ag, Bruno) = 0.8. The candidate contexts are, as expected, the same as those in Scenario 3, but with different values of preference: ❏ CtxPrf ({U bt, U al, U 17}, ag) - 0.820 ❏ CtxPrf ({U bt, U al, U cr }, ag) - 0.811 ❏ CtxPrf ({U bt, U 17, U cr }, ag) - 0.833 ❏ CtxPrf ({JU bt, U al, U 17}, ag) - 1.804 ❏ CtxPrf ({JU bt, U al, U cr }, ag) - 1.793 ❏ CtxPrf ({JU bt, U 17, U cr }, ag) - 1.819. Consequently, the believed context is b c = {JU bt, U 17, U cr }, and the resulting belief space is BS ( b c, ag) = {JU bt, U 17, U cr , JU al} . As explained in Scenario 3, one of the reasons why the agent was able to be in denial was that David has a low credibility and his communication was the only support for U 17 . This was the easiest way that the agent could deny her mother’s death, that is, by rejecting the fact that she (her mother) boarded flight 17. In this scenario, this fact is no longer easy to reject (the causal strength of U 17 increased from 0.5 to 0.9) because there is a second agent, Bruno, claiming that it is true and, moreover, he is considered quite credible. Consequently, our agent (ag) is no longer in denial, as shown above. If Bruno would have communicated, for example, U cr instead of U 17 , denial would still occur because the agent would still be able to easily reject U 17 . 5) Scenario 5 Continuing from the previous scenario, we now suppose that agent Susan tells ag that her (ag’s) mother did not board flight 17. The credibility that our agent attributes to Susan is 0.6 (in other words, Cred(ag, Susan) = 0.6). Now, the six candidate contexts must take into account the new hypothesis (JU 17) , hence, Cand(b 0, c 0, ag) = {{U bt, U al, U 17}, {U bt, U al, JU 17, U cr }, {U bt, U 17, U cr }, {JU bt, U al, U 17}, {JU bt, U al, JU 17, U cr }, {JU bt, U 17, U cr }} . The preference for each of these contexts is: IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® CtxPrf ({U bt, U al, U 17}, ag) - 0.820 As we have shown (see Section VII-B), wishful thinking CtxPrf ({U bt, U al, JU 17, U cr }, ag) - 0.856 CtxPrf ({U bt, U 17, U cr }, ag) - 0.833 in WTR is not merely a bias in the resolution of CtxPrf ({JU bt, U al, U 17}, ag) - 1.804 conflicts among inconsistent data (passive effects); it CtxPrf ({JU bt, U al, JU 17, U cr }, ag) - 1.846 can sometimes be the sole cause for having a belief, CtxPrf ({JU bt, U 17, U cr }, ag) - 1.819. Consequently, the believed context is or for evoking belief revision when certain beliefs are c b = {JU bt, U al, JU 17, U cr }, and the resulting undesirable (active effects). belief space is BS ( b c, ag) = {JU bt, U al, JU 17, U cr ,} . Both DBR and WTR are approaches to belief dynamics, Susan’s communication was “just what the agent wanted to for autonomous agents. They both produce non-prioritized hear” to be able to deny, once again, her mother’s death. The belief revision that is biased by the likeability of beliefs. Also, new hypothesis is that the agent’s mother did not board flight both models measure this likeability according to the satisfac17 (JU 17), so U 17 becomes, once again, easy to reject. tion of goals; Affective states are not modeled explicitly, but This result illustrates an important aspect of WTR. Notice rather implicitly, through the preference conveyed by likeability. that the fact that the agent’s mother boarded flight 17 is supOne of the main differences between the two models is ported by communications from David and Bruno, and: that DBR is belief-oriented while WTR is context oriented, ❏ Without wishful thinking, the agent would never reject this and this difference is specially important in the determination belief because it is the word of both David and Bruno against of likeability. the word of Susan, and even Bruno, alone, is considered As explained in Section I, a context-oriented approach is more credible than Susan (the causal strength of U 17 is 0.9 more adequate, since the preference of a belief may be depenwhile the causal strength of JU 17, is 0.6). dent on certain other beliefs being kept or abandoned. This is ❏ Wishful thinking alone (in this situation) is also not enough mostly obvious when it comes to the preference conveyed by to allow the agent to reject such a strong belief ( U 17 ), as can likeability because, for instance, a belief may be desired/undebe concluded from the results of Scenario 4. sired only due to the presence of another belief. Therefore, the denial that occurs in this scenario was only Another important difference, between the two approaches, possible because of a combination of rational and affective facis that, in WTR, we model active effects of wishful thinking, an tors (i.e., the support from Susan’s communication combined aspect that is not modeled in DBR. More concretely, in DBR with the agent’s desire to believe that her mother is alive). This beliefs originate from data which, in turn, comes from the outcombined effect captures, in WTR, the passive effects of wishside world. The only internally generated beliefs are those that ful thinking, explained in Section III. originate from other beliefs, through inference. In WTR, howWTR is guided by an order (or orders) among contexts ever, beliefs can also originate from goals, by means of wishful that does not necessarily have a correspondence to some order thinking supports. Any inconsistencies, between these wishful (or orders) among hypotheses. For instance, suppose that, thoughts and collected data, trigger belief revision, the same because JU 17 is in the believed context and U 17 is not (in way inconsistencies among collected data do. this scenario), we assume an order among hypotheses where In [7], Jonathan Gratch and Stacy Marsella start by presentJU 17 is preferred to U 17 . Notice that this preference is a coning a framework that describes appraisal and coping as two sequence of the agent’s belief in U cr (that makes U 17 strongly related operations. As the authors put it, “Appraisal “unwanted,” given her goals). If, for some reason, the agent characterizes the relationship between a person and their physirejects the belief in U cr the resulting order, between JU 17 and cal and social environment (...) and coping recruits resources to U 17 , is reversed. repair or maintain this relationship” [7]. This view is based on Smith and Lazarus’ cognitive motivaVIII. Comparison with Related Work tional-emotive system [21]. Gratch and Marsella describe this In the previous sections we have presented the WTR framesystem’s architecture, highlighting that the consequence of work, and shown how it can be used to manage an agent’s appraisal (the action tendencies, the affect and the physiological beliefs according to the aims discussed in Section I. In this secresponses) triggers coping which, in turn, acts on the antecedtion we compare WTR with two other approaches that are ents of appraisal. These antecedents may be the environment, in related to some extent. the case of problem-focused coping, or the evaluation of the In [17] and [18], Fabio Paglieri presents an approach to belief situation, in the case of emotion-focused coping. Following the revision in the context of cognitive agents: Dataoriented Belief guidelines of this framework, the authors implement a specific Revision (in short, DBR). More precisely, DBR is a model of cogcomputational model: EMA [7], [13] (named after Lazarus’ nitive agents’ epistemic dynamics (of which belief revision is a book “Emotion and Adaptation” [10]). part). The model builds upon the distinction between data (inforNote that coping commonly refers to how the individual mation stored in the agent’s mind) and beliefs (information the deals with strong negative emotions. Although the authors agent considers reliable for further reasoning and direct action). ❏ ❏ ❏ ❏ ❏ ❏ MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 75 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® write “we view it as a general response to all kinds of emotions, strong and weak, negative and positive” [7, p. 287], this view does not seem to have been applied, with respect to positive emotions, to the strategy of wishful thinking, in EMA. More concretely, wishful thinking is triggered only as a response to a negative appraisal and, unlike WTR, EMA does not model active effects of wishful thinking. Another fundamental distinction is that, while WTR targets the problem of belief revision, EMA does not: WTR aims to find a consistent set of beliefs, and denial of a belief implies removing it from that set of beliefs and maintaining the relationships that exist among the remaining beliefs; In EMA, inconsistencies are allowed, each belief is associated with some probability of being true, and denial/wishful thinking consists of adjusting these probabilities in order to improve a negatively charged appraisal. effects are achieved by enabling every goal to produce a tendency to believe in its achievement (a wishful thought); This way, any information that contradicts the achievement of a goal (i.e., any undesirable information) gives rise to an inconsistency, thus triggering belief revision; Wishful thoughts are typically too weak and, therefore, abandoned or just filtered out, but exceptions may occur, depending on various factors. We recall that WTR addresses wishful thinking in terms of goal satisfaction. Clearly one’s desires and preferences cannot all be reduced to goals. Consequently, WTR does not account for all forms of wishful thinking, and neither does it account for the large variety of emotions that influence belief dynamics in humans. We view this work as one step toward the design of belief processes that incorporate affective phenomena and are suitable for human-like autonomous agents. References IX. Conclusions With the aim of addressing belief revision, in the context of human-like autonomous agents, we have identified the following issues concerning conventional belief revision: ❏ Why should an agent always prefer new information over its previous beliefs? ❏ How can an agent autonomously generate its own order(s) among beliefs? ❏ Can human-like preferences, in belief revision, be adequately expressed using an order (or orders) among beliefs? To address these issues and enable the simulation of affective preferences, we propose WTR, an approach to an agent’s belief dynamics, with the following properties: ❏ Non-prioritized. New information is not necessarily believed. ❏ Autonomous. Revision is not dependent on the external definition of orders. ❏ Context-oriented. The preferred context is chosen according to an order (or orders) among contexts, instead of an order (or orders) among beliefs. As discussed in Section I, this is necessary because a belief ’s resistance to change may depend on the other beliefs. ❏ Simulates wishful thinking. It simulates passive and active effects of the wishful thinking phenomenon, within the scope with respect to goal satisfaction. As we have shown (see Section VII-B), wishful thinking in WTR is not merely a bias in the resolution of conflicts among inconsistent data (passive effects); it can sometimes be the sole cause for having a belief, or for evoking belief revision when certain beliefs are undesirable (active effects). Passive effects are achieved by accounting for the likeability of beliefs, when measuring the preference of a context. Active 76 [1] C. E. Alchourrón, P. Gärdenfors, and D. Makinson, “On the logic of theory change: Partial meet functions for contraction and revision,” J. Symbolic Logic, vol. 50, no. 2, pp. 510–530, 1985. [2] C. S. Carver, M. F. Scheier, and J. K. Weintraub, “Assessing coping strategies: A theoretically based approach,” J. Personality Social Psychol., vol. 56, no. 2, pp. 267–283, 1989. [3] C. Castelfranchi, “Guarantees for autonomy in cognitive agent architecture,” in Intelligent Agents: Theories, Architectures and Languages, (Lecture Notes on Artificial Intelligence 890), M. Woolridge, and N. Jennings, Eds. Berlin, Germany: Springer-Verlag, 1995, pp. 56-70. [4] E. Fermé, and S. O. Hansson, “AGM 25 years—twenty-five years of research in belief change,” J. Philosophical Logic, vol. 40, no. 2, pp. 295–331, 2011. [5] N. H. Frijda, Ed., The Laws of Emotion. Mahwah, NJ: Lawrence Erlbaum Associates, 2007. [6] N. H. Frijda, and B. Mesquita, “Beliefs through emotions,” in Emotions and Beliefs— How Feelings Influence Thoughts, N. H. Frijda, A. S. R. Manstead, and S. Bem, Eds. Cambridge, U.K.: Cambridge Univ. Press, 2000. [7] J. Gratch, and S. Marsella, “A domain-independent framework for modeling emotion,” J. Cogn. Syst. Res., vol. 5, no. 4, pp. 269–306, 2004. [8] S. O. Hansson, “Ten philosophical problems in belief revision,” J. Logic Comput., vol. 13, no. 1, pp. 37–49, 2003. [9] G. Harman, Change in View: Principles of Reasoning. Cambridge, MA: MIT Press, 1986. [10] R. S. Lazarus, Emotion and Adaptation. New York: Oxford Univ. Press, 1991. [11] R. S. Lazarus, S. Folkman, Stress, Appraisal and Coping. New York: Springer, 1984. [12] I. Levi, The Fixation of Belief and Its Undoing. Cambridge, MA: Cambridge Univ. Press, 1991. [13] S. Marsella, and J. Gratch, “EMA: A computational model of appraisal dynamics,” in Proc. 18th European Meeting Cybernetics Systems Research, 2006, pp. 601–606. [14] J. P. Martins, and S. C. Shapiro, “A model for belief revision,” Artif. Intell., vol. 35, no. 1, pp. 25–79, 1988. [15] R. R. McCrae, and P. T. Costa Jr, “Personality, coping, and coping effectiveness in an adult sample,” J. Personality, vol. 54, no. 2, pp. 385–405, 1986. [16] A. Ortony, G. L. Clore, and A. Collins, The Cognitive Structure of Emotions. New York: Cambridge Univ. Press, 1988. [17] F. Paglieri, “Data-oriented belief revision: Toward a unified theory of epistemic processing,” in Proc. of STAIRS 2004, E. Onaindia, and S. Staab, Eds. Amsterdam, The Netherlands: IOS Press, 2004, pp. 179–190. [18] F. Paglieri, “See what you want, believe what you like: Relevance and likeability in belief dynamics,” in Proc. of AISB 2005 Symp. ‘Agents That Want and Like: Motivational and Emotional Roots of Cognition and Action,’ L. Cañamero, Ed. Hatfield, U.K.: AISB, 2005, pp. 90-97. [19] R. Picard, Affective Computing. Cambridge, MA: MIT Press, 1997. [20] C. F. Pimentel, “Emotional reasoning in AI: Modeling some of the inf luences of affects on reasoning,” Ph.D. dissertation, Inst. Superior Técnico, Univ. Técnica de Lisboa, Lisbon, Portugal, Dec. 2010. [21] C. A. Smith, and R. Lazarus, “Emotion and adaptation,” in Handbook of Personality: Theory & Research, L. A. Pervin, Ed. New York: Guilford Press, 1990, pp. 609–637. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Gouhei Tanaka The University of Tokyo, JAPAN Complex-Valued Neural Networks: Advances and Applications, by Akira Hirose (Wiley-IEEE Press, 2013, 320 pp.) ISBN: 9781-1183-4460-6. C omplex-valued neural networks (CVNNs) are artificial neural networks that are based on complex numbers and complex number arithmetic. They are particularly suited for signal and information with complex amplitude, i.e., amplitude and phase, as typically found in wave phenomena including electromagnetic wave, light wave, sonic wave, electron wave, and electroencephalogram (EEG). In this decade, the application fields of CVNNs have been considerably expanded together with the development of their theories and algorithms. This book covers the recent advances of CVNNs and their variants, demonstrating their applicability to optimization of telecommunication systems, blind source separation of complex-valued signals, N-bit parity problems, wind prediction, classification problems in complex domain, brain computer interface, digital predistorter design for high power amplifiers, and color face image recognition. The contents of the book include not only conventional CVNNs but also quaternion neural networks and CliffordDigital Object Identifier 10.1109/MCI.2013.2247895 Date of publication: 11 April 2013 1556-603X/13/$31.00©2013IEEE Book Review developed in the individual chapters, by algebraic neural networks, which are the extending or generalizing the counterextended neural networks utilizing parts of the conventional artificial neuhypercomplex number systems. The new ral networks. methods and challenges for establishThe book begins with an introducment of these hypercomplex-valued tion to the theories and applications of neural networks are more highlighted, conventional CVNNs. In the former half, compared with the first-ever book on the representative application fields of CVNNs published ten years ago [1]. The CVNNs are compactly book reviewed here presented. The engiprovides an excellent neer ing applications overview of the current include antenna design, trends in the research of beam-forming, radar CVNNs for students image processing, sonic and researchers interComplexand ultrasonic processested in computational Valued ing, communication sigintelligence as well as Neural nal processing, image offers up-to-date theories and applications of Networks processing, traffic signal control, quantum comCVNNs for experts and Advances and Applications putation, and optical practitioners. The readinformation processing. ers can refer to the Akira Hirose In the latter half, the introductory textbook emphasis is placed on [2] on CVNNs for the difference between m o re f u n d a m e n t a l the CVNNs and the aspects and refer also to ordinary real-valued the book [3] for other neural networks. It is shown in numerical research topics related to CVNNs. experiments that a feedforward layered The entire book is organized into CVNN yields a better generalization ten chapters. The first chapter is an ability for coherent signals compared to overview of the methods and applicaother methods. tions of conventional CVNNs. The Chapter 2 deals with CVNNs nine consecutive chapters focus on difwhose adaptable parameters lay on ferent kinds of CVNNs and hypercomcomplex manifolds. Based on differenplex-valued neural networks. Various tial geometrical methods, efficient methods relying on CVNNs and optimization algorithms to adapt the hypercomplex-valued neural networks, parameters of such CVNNs are preincluding learning algorithms, optimisented and successfully applied to zation methods, classification algosignal processing problems. The probrithms, system estimation methods, and lems include the purely algorithmic system prediction methods, are MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 77 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® The book reviewed here provides an excellent overview of the current trends in the research of CVNNs for students and researchers interested in computational intelligence as well as offers up-to-date theories and applications of CVNNs for experts and practitioners. problem of averaging the parameters of a pool of cooperative CVNNs, multichannel blind deconvolution of signals in telecommunications, and blind source separation of complexvalued signal sources. It is beneficial that pseudocodes of the learning procedures for all of these problems are listed in this chapter. Chapter 3 focuses on an N-dimensional vector neuron, which is a natural extension of a complex-valued neuron in two-dimensional space to its N-dimensional version. First, relevant neuron models with high-dimensional parameters are briefly reviewed to locate the N-dimensional vector neuron. Next, the author defines the N-dimensional vector neuron which can represent N signals as one cluster and reveals its decision boundary to consist of N hyperplanes which intersect orthogonally each other. The generalization ability of a single N-dimensional vector neuron is demonstrated for N-bit parity problem. Finally, the presented method is compared with other layered neural networks in terms of the number of neurons, the number of parameters, and the number of layers. In Chapter 4, learning algorithms with feedforward and recur rent CVNNs are systematically described by using Wirtinger calculus. The Wirtinger calculus, which generalizes the concept of derivatives in complex domain, enables to perform all the computations of well-known learning algorithms with CVNNs directly in the complex domain. For feedforward layered CVNNs, the complex gradient descent algorithm and the complex Levenberg-Marquardt algorithm are derived with the complex gradient. For recurrent type CVNNs, the complex real-time recurrent learning algorithm and the complex extended Kalman 78 filter algorithm are obtained utilizing the Wirtinger calculus. Computer simulation results are given to verify the above four algorithms. Chapter 5 presents associative memory models with Hopfield-type recurrent neural networks based on quaternion, which is a four-dimensional hypercomplex number. In the introduction to quaternion algebra, the definition of quaternion is given and its analyticity in the quaternionic domain is described. Then, stability analysis is performed by means of energy functions for several different types of quaternionvalued neural networks. The different types of recurrent networks are constructed with bipolar state neurons, continuous state neurons, and multistate neurons. All of these quaternion-valued networks are shown to work well as associative memory models by implementing typical learning rules including the Hebbian rule, the projection rule, and the local iterative learning rule. Chapter 6 concentrates on recurrent-type Clifford neural networks. This chapter starts with the definition of Clifford algebra and the basic properties of the operators in hypercomplex number systems. Subsequently, a Hopfieldtype recurrent Clifford neural network is proposed as an extension of the classical real-valued Hopfield neural network, with an appropriate definition of an energy function for the Clifford neural network. Finally, under several assumptions on the weight coefficients and the activation functions, the existence of the energy function is proved for two specific types of Clifford neural networks. Chapter 7 provides a meta-cognitive learning algorithm for a single hidden layer CVNN, called Meta-cognitive Fully Complex-valued Relaxation Network (McFCRN), consisting of a cognitive component and a meta-cognitive one. First, it is explained that the learning strategy of the neural network (cognitive part) is controlled by a self-regulatory learning mechanism (meta-cognitive part) through sample deletion, sample learning, and sample reserve. After the drawbacks of the conventional meta-cognitive CVNNs such as Metacognitive Fully Complex-valued Radial Basis Function Network (McFCRBF) and the Complex-valued Self-regulatory Resource Allocation Network (CSRAN) are pointed out, the learning algorithm of McFCRN is presented with a pseudocode. The performance of McFCRN is evaluated in a synthetic complex-valued function approximation problem and benchmarks of real-valued classification problems, in comparison with the other existing methods. In Chapter 8, a multilayer feedforward neural network with multi-valued neurons (MLMVNs), found in the monograph [4], is applied to brain-computer interfacing (BCI) aiming at extracting relevant information from the human brain wave activity. Following a general introduction to the concept of BCI using EEG recordings, a particular type of BCI based on Steady-State Visual Evoked Potential (SSVEP) is focused, in which the EEG signals are obtained as responses to the target stimulus, flickering at a certain frequency. Subsequently, the MLMVN is presented to decode the phase-coded SSVEP-based BCI. The performance of the MLMVN is demonstrated to show a better result compared with other methods in terms of decoding accuracy. In Chapter 9, complex-valued B-spline neural networks are developed to identify a complex-valued Wiener system, which compr ises a linear dynamical model followed by a nonlinear static transformation. A CVNN based on B-spline curves consisting of many polynomial pieces is presented to estimate the complex-valued nonlinear function in the complex-valued Wiener model. For identification of the system, an algorithm to estimate the parameters is given based on Gauss-Newton method with the aid of De Boor algorithm. An algorithm to compute the IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® inverse of the estimated nonlinear function is also presented. The performance of the presented method is demonstrated in the application to the design problem of the digital predistorter in wireless communication system, which compensates the distortion caused by the high power amplifiers with memory. Chapter 10 is about a quaternion fuzzy neural network for view-invariant color face image recognition. First, conventional face recognition systems are briefly reviewed. Second, several face recognition systems are introduced, including Principal Component Analysis (PCA), Non-Negative Matrix Factorization (NMF), and Block Diagonal Non-Negative Matrix Factorization (BDNMF). The view-invariant color face image recognition system combining a quaternion-based color face image correlator and a max-product fuzzy neural network classifier is then presented. Finally, the presented method is shown to outperform conventional methods including NMF, BDNMF, and hypercomplex Gabor filter in classifying view-invariant, noise In summary, this book contains a wide variety of hot topics on advanced computational intelligence methods which incorporate the concept of complex and hypercomplex number systems into the framework of artificial neural networks. influenced, and scale invariant color face images from a database. In summary, this book contains a wide variety of hot topics on advanced computational intelligence methods which incorporate the concept of complex and hypercomplex number systems into the framework of artificial neural networks. In most chapters, the theoretical descriptions of the methodology and its applications to engineering problems are excellently balanced. This book suggests that a better information processing method could be brought about by selecting a more appropriate information representation scheme for specific problems, not only in artificial neural networks but also in other computational intelligence frameworks. The advantages of CVNNs and hypercomplex-valued neural networks over real-valued neural networks are confirmed in some case studies but still unclear in general. Hence, there is a need to further explore the difference between them from the viewpoint of nonlinear dynamical systems. Nevertheless, it seems that the applications of CVNNs and hypercomplex-valued neural networks are very promising. References [1] A. Hirose, Complex-Valued Neural Networks: Theories and Applications. Singapore: World Scientific, 2003. [2] A. Hirose, Complex-Valued Neural Networks. New York: Springer-Verlag, 2006. [3] T. Nitta, Complex-Valued Neural Networks: Utilizing High-Dimensional Parameters. Hershey, PA: IGI Global, 2009. [4] I. Aizenberg, N. Aizenberg, and J. Vandewalle, MultiValued and Universal Binary Neurons: Theory, Learning, and Applications. Kluwer: Norwell, MA, 2000. Innovation doesn’t just happen. Read first-person accounts of IEEE members who were there. IEEE Global History Network www.ieeeghn.org MAY 2013 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 79 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Conference Calendar Gary B. Fogel Natural Selection, Inc., USA * Denotes a CIS-Sponsored Conference D Denotes a CIS Technical CoSponsored Conference D The 4th International Conference on Intelligent Control and Information Processing (ICICIP 2013) June 9–11, 2013 Place: Beijing, China http://www.conference123.org/icicip2013 * 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2013) D The 20th International Conference on Neural Information Processing (ICONIP 2013) July 7–10, 2013 Place: Hyderabad, India General Chair: Nik Pal http://www.isical.ac.in/~fuzzieee2013/ November 3–7, 2013 Place: Daegu, Korea http://iconip2013.org/ D Ninth International Conference on Intelligent Computing (ICIC 2013) D International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP 2013) July 28–31, 2013 Place: Nanning, China http://www.ic-ic.org December 12–13, 2013 Place: Bayonne, France http://www.smap2013.org * International Joint Conference on Neural Networks (IJCNN 2013) * 2014 IEEE Conference on Computational Intelligence in Financial Engineering and Economics June 20–23, 2013 Place: Cancun, Mexico General Chair: Carlos Coello Coello http://www.cec2013.org/ August 4–9, 2013 Place: Dallas, Texas, USA General Co-Chairs: Plamen Angelov and Daniel Levine http://www.ijcnn2013.org March 27–28, 2014 Place: London, United Kingdom General Chair: Antoaneta Serguieva Website: TBD D The 7th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS 2013) D International Conference on Image Analysis and Processing (ICIAP 2013) July 3–5, 2013 Place: Taichung, Taiwan http://voyager.ce.fit.ac.jp/conf/cisis/2013/ September 11–13, 2013 Place: Naples, Italy http://www.iciap2013-naples.org D International Symposium on Neural Networks (ISNN 2013) July 4–6, 2013 Place: Dalian, China http://isnn.mae.cuhk.edu.hk/ D International Joint Conference on Awareness Science and Technology and Ubi-Media Computing (iCAST/UMEDIA 2013) Digital Object Identifier 10.1109/MCI.2013.2247899 Date of publication: 11 April 2013 November 2–4, 2013 Place: Aizu, Japan Website: TBD D The 2013 International Conference on Brain Inspired Cognitive Systems (BICS 2013) June 9–11, 2013 Place: Beijing, China http://www.conference123.org/bics2013/ * 2013 IEEE Congress on Evolutionary Computation (IEEE CEC 2013) 80 * 2014 Conference on Computational Intelligence in Bioinformatics and Computational Biology (IEEE CIBCB 2014) May 21–24, 2014 Place: Hawaii, USA General Chair: Steven Corns Website: TBD * 2014 IEEE World Congress on Computational Intelligence (IEEE WCCI 2014) July 6–14, 2014 Place: Beijing, China General Chair: Derong Liu http://www.ieee-wcci2014.org/ IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2013 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® Advertisers Index The Advertisers Index contained in this issue is compiled as a service to our readers and advertisers: the publisher is not liable for errors or omissions although every effort is made to ensure its accuracy. Be sure to let our advertisers know you found them through IEEE Computational Intelligence Magazine. Company page# IEEE Marketing Department CVR 4 URL Phone www.ieee.org/tryieeexplore IEEE Media Advertising Sales Offices James A.Vick Sr. Director, Advertising Business +1 212 419 7767; Fax: +1 212 419 7589 [email protected] ____________ Marion Delaney Advertising Sales Director +1 415 863 4717; Fax: +1 415 863 4717 [email protected] ____________ Susan Schneiderman Business Development Manager +1 732 562 3946; Fax: +1 732 981 1855 [email protected] ___________ Product Advertising Mid-Atlantic Lisa Rinaldo +1 732 772 0160; Fax: +1 732 772 0164 [email protected] ___________ NY, NJ, PA, DE, MD, DC, KY, WV New England/South Central/ Eastern Canada Jody Estabrook +1 774 283 4528; Fax: +1 774 283 4527 [email protected] ____________ CT, ME,VT, NH, MA, RI, AR, LA, OK, TX. CANADA: Nova Scotia, Prince Edward Island, Newfoundland, New Brunswick, Quebec Southwest Thomas Flynn +1 770 645 2944; Fax: +1 770 993 4423 [email protected] ____________ VA, NC, SC, GA, FL, AL, MS, TN Midwest/Central Canada Dave Jones +1 708 442 5633; Fax: +1 708 442 7620 [email protected] ____________ IL, IA, KS, MN, MO, NE, ND, SD, WI, OH. CANADA: Manitoba, Saskatchewan, Alberta New England/Eastern Canada Liza Reich +1 212 419 7578; Fax: +1 212 419 7589 [email protected] ________ ME,VT, NH, MA, RI. CANADA: Nova Scotia, Prince Edward Island, Newfoundland, New Brunswick, Quebec Midwest/Ontario, Canada Will Hamilton +1 269 381 2156; Fax: +1 269 381 2556 [email protected] ____________ IN, MI. CANADA: Ontario Southeast Cathy Flynn +1 770 645 2944; Fax: +1 770 993 4423 [email protected] ___________ VA, NC, SC, GA, FL, AL, MS, TN West Coast/Mountain States/ Western Canada Marshall Rubin +1 818 888 2407; Fax: +1 818 888 4907 [email protected] ____________ AZ, CO, HI, NM, NV, UT, CA, AK, ID, MT, WY, OR, WA CANADA: British Columbia Europe/Africa/Middle East/Asia/ Far East/Pacific Rim Heleen Vodegel +1 44 1875 825 700; Fax: +1 44 1875 825 701 [email protected] ____________ Europe, Africa, Middle East, Asia, Far East, Pacific Rim, Australia, New Zealand Recruitment Advertising Mid-Atlantic Lisa Rinaldo +1 732 772 0160; Fax: +1 732 772 0164 [email protected] ____________ CT, NY, NJ, PA, DE, MD, DC, KY, WV Midwest/South Central/Central Canada Darcy Giovingo +1 847 498 4520; Fax: +1 847 498 5911 [email protected] ____________ AR, LA, TX, OK, IL, IN, IA, KS, MI, MN, NE, ND, SD, OH, WI, MO. CANADA: Ontario, Manitoba, Saskatchewan, Alberta West Coast/Mountain States/ Southwest/Asia Tim Matteson +1 310 836 4064; Fax: +1 310 836 4067 [email protected] ____________ AK, AZ, CA, CO, HI, ID, MT, NM, NV, OR, UT, WA, WY. CANADA: British Columbia Europe/Africa/Middle East Heleen Vodegel +1 44 1875 825 700 Fax: +1 44 1875 825 701 [email protected] ____________ Europe, Africa, Middle East Digital Object Identifier 10.1109/MCI.2013.2247900 M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND® While the world benefits from what’s new, IEEE can focus you on what’s next. Develop for tomorrow with today’s most-cited research. Over 3 million full-text technical documents can power your R&D and speed time to market. t *&&&+PVSOBMTBOE$POGFSFODF1SPDFFEJOHT t *&&&4UBOEBSET t *&&&8JMFZF#PPLT-JCSBSZ t *&&&F-FBSOJOH-JCSBSZ t 1MVTDPOUFOUGSPNTFMFDUQVCMJTIJOHQBSUOFST IEEE Xplore® Digital Library Discover a smarter research experience. Request a Free Trial www.ieee.org/tryieeexplore Follow IEEE Xplore on M q M q Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page M q M q MQmags q THE WORLD’S NEWSSTAND®