pdf - 7MB - Annual Conference
Transcription
pdf - 7MB - Annual Conference
Digital Convergence in a Knowledge Society The 7th Information Technology and Telecommunication Conference IT&T 2007 Institute of Technology Blanchardstown Dublin, Ireland 25-26 October 2007 Gabriel-Miro Muntean Markus Hofmann Brian Nolan (Eds) IT&T 2007 General Chair’s Letter As the General Chair of the 2007 Information Technology and Telecommunications (IT&T) Conference, it gives me great pleasure to introduce you to this year’s conference. This IT&T Conference - held over the 25th and 26th October 2007 at the Institute of Technology Blanchardstown, Dublin, Ireland – has as its major focus ‘Digital Convergence in a Knowledge Society” and welcomed papers on various themes such as wired and wireless networks, next generation web, games and entertainment, health informatics and security and forensics. We have collected over 20 papers from academics across Ireland and the UK within this peer-reviewed book of proceedings. A doctoral consortium session will also be held as part of the conference involving researchers soon to complete their PhD studies. These sessions will be preceded by plenary talks given by ICT experts from Irish academia and industry. I would like to take this opportunity to thank our sponsors for their generous support. These include: IBM, Ericsson, the Council of Directors of the Institutes of Technology, the Institution of Engineering and Technology, IEEE – CC Ireland Chapter and IRCSET. Special thanks go to the IT&T 2007 Conference organisational team (consisting of academics from across the University and Institute of Technology sectors), the Technical Programme Committee, the Technical Chairs and also the Financial Chair for the marvelous work, high standards and excellent results achieved. I also extend a very warm welcome all the attendees of this IT&T 2007 conference at the Institute of Technology Blanchardstown, Dublin. The conference website is: http://www.ittconference.ie I wish you a wonderful conference! Dr. Brian Nolan General Chair of IT&T 2007 Head of the Department of Informatics School of Informatics and Engineering Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15, Ireland iii iv Technical Programme Committee Chairs’ Letter Dear Colleagues, As Technical Programme Chairs, we would like to welcome you to the Seventh Information Technology and Telecommunications Conference (IT&T 2007) hosted by the Blanchardstown Institute of Technology, Dublin, Ireland. IT&T is an annual international conference which not only publishes research in the areas of information technologies and telecommunications, but also brings together researchers, developers and practitioners from the academic and industrial environments, enabling research interaction and collaboration. The focus of the seventh IT&T is “Digital Convergence in a Knowledge Society”. We welcomed research papers with topics in e-learning technologies, Web 2.0 and next generation web, ubiquitous and distributed computing, adaptive computing, health informatics, wired and wireless networks, sensor networks, network management, quality of experience and quality of service, digital signal processing, speech and language processing, games and entertainment, computer vision, security and forensics and open source developments. All submitted papers were peer-reviewed by the Technical Programme Committee members and we would like to express our sincere gratitude to all of them for their help in the reviewing process. After the review process, twenty two papers were accepted and will be presented during six technical sessions spanning the two days of the conference. A doctoral consortium session will also be held with researchers who are nearing completion of their PhDs. These sessions will be preceded by plenary talks given by ICT experts from Irish academia and industry. We hope you will have a very interesting and enjoyable conference. Dr. Gabriel-Miro Muntean, Dublin City University, Ireland Nick Timmons, Letterkenny Institute of Technology, Ireland v vi IT&T 2007 Chairs and Committees Conference General Chair Brian Nolan, Institute of Technology Blanchardstown Technical Programme Committee Chairs Gabriel-Miro Muntean, Dublin City University Nick Timmons, Letterkenny Institute of Technology Doctoral Symposium Committee Chair Declan O'Sullivan, Trinity College Dublin Doctoral Symposium Technical Committee Brian Nolan, Institute of Technology Blanchardstown Cristina Hava Muntean, National College of Ireland Dave Lewis, Trinity College Dublin John Keeney, Trinity College Dublin Matt Smith, Institute of Technology Blanchardstown Patronage & Sponsor Chair Dave Denieffe, Institute of Technology Carlow Proceedings Editors Gabriel-Miro Muntean, Dublin City University Markus Hofmann, Institute of Technology Blanchardstown Brian Nolan, Institute of Technology Blanchardstown Organising Committee Brian Nolan, Institute of Technology Blanchardstown Dave Denieffe, Institute of Technology Carlow David Tracey, Salix Declan O’Sullivan, Trinity College Dublin Enda Fallon, Athlone Institute of Technology Gabriel-Miro Muntean, Dublin City University Jeanne Stynes, Cork Institute of Technology John Murphy, University College Dublin Mairead Murphy, Institute of Technology Blanchardstown Markus Hofmann, Institute of Technology Blanchardstown Matt Smith, Institute of Technology Blanchardstown Nick Timmons, Letterkenny Institute of Technology vii Technical Programme Committee Anthony Keane, Institute of Technology Blanchardstown Arnold Hensman, Institute of Technology Blanchardstown Brian Crean, Cork Institute of Technology Brian Nolan, Institute of Technology Blanchardstown Cormac J. Sreenan, University College Cork Cristina Hava Muntean, National College of Ireland Dave Denieffe, Institute of Technology Carlow Dave Lewis, Trinity College Dublin David Tracey, Salix Declan O'Sullivan, Trinity College Dublin Dirk Pesch, Cork Institute of Technology Enda Fallon, Athlone Institute of Technology Gabriel-Miro Muntean, Dublin City University Hugh McCabe, Institute of Technology Blanchardstown Ian Pitt, University College Cork Jeanne Stynes, Cork Institute of Technology Jim Clarke, TSSG, Waterford Institute of Technology Jim Morrison, Letterkenny Institute of Technology John Murphy, University College Dublin Kieran Delaney, Cork Institute of Technology Larry McNutt, Institute of Technology Blanchardstown Liam Kilmartin, National University of Ireland Galway Mark Davis, Dublin Institute of Technology Mark Riordan, Institute of Art Design and Technology Dun Laoghaire Markus Hofmann, Institute of Technology Blanchardstown Martin McGinnity, University of Ulster Belfast Matt Smith, Institute of Technology Blanchardstown Michael Loftus, Cork Institute of Technology Nick Timmons, Letterkenny Institute of Technology Nigel Whyte, Institute of Technology Carlow Paddy Nixon, University College Dublin Pat Coman, Institute of Technology Tallaght Paul Walsh, Cork Institute of Technology Richard Gallery, Institute of Technology Blanchardstown Ronan Flynn, Athlone Institute of Technology Sean McGrath, University of Limerick Stephen Sheridan, Institute of Technology Blanchardstown Sven van der Meer, TSSG, Waterford Institute of Technology viii Table of Contents Session 1: Trust & Security Chaired by: John Murphy, University College Dublin Trust Management In Online Social Networks 3 Bo Fu, Declan O'Sullivan Irish Legislation regarding Computer Crime 13 Anthony Keane Distributed Computing for Massively Multiplayer Online Games 21 Malachy O'Doherty, Jonathan Campbell A Comparative Analysis of Steganographic Tools 29 Abbas Cheddad, Joan Condell, Kevin Curran, Paul McKevitt Session 2: Computing Systems Chaired by: Matt Smith, Institute of Technology Blanchardstown A Review of Skin Detection Techniques for Objectionable Images 40 Wayne Kelly, Andrew Donnellan, Derek Molloy Optical Reading and Playing of Sound Signals from Vinyl Records 50 Arnold Hensman Optimisation and Control of IEEE 1500 Wrappers and User Defined TAMs Michael Higgins, Ciaran MacNamee, Brendan Mullane ix 60 Session 3: Applications Chaired by: Stephen Sheridan, Institute of Technology Blanchardstown MemoryLane: An Intelligent Mobile Companion for Elderly Users 72 Sheila Mc Carthy, Paul Mc Kevitt, Mike McTear, Heather Sayers Using Scaffolded Learning for Developing Higher Order Thinking Skills 83 Cristina Hava Muntean, John Lally Electronic Monitoring of Nutritional Components 91 Zbigniew Fratczak, Gabriel-Miro Muntean, Kevin Collins A Web2.0 & Multimedia solution for digital music 98 Helen Sheridan, Margaret Lonergan Session 4: Algorithms Chaired by: Brian Nolan, Institute of Technology Blanchardstown Adaptive ItswTCM for High Speed Cable Networks 108 Mary Looney, Susan Rea, Oliver Gough, Dirk Pesch Distributed and Tree-based Prefetching Scheme for Random Seek Support in P2P Streaming 116 Changqiao Xu, Enda Fallon, Paul Jacob, Yuansong Qiao, Austin Hanley Parsing Student Text using Role and Reference Grammar 122 Elizabeth Guest A Parallel Implementation of Differential Evolution for Weight Adaptation in Artificial Neural Networks 130 Stephen Sheridan x Session 5a: Wired & Wireless Chaired by: David Tracey, Salix The Effects of Contention between stations on Video Streaming Applications over Wireless Local Area Networks- an experimental approach 139 Nicola Cranley, Tanmoy Debnath, Mark Davis An Investigation of the Effect of Timeout Parameters on SCTP Performance 147 Sheila Fallon, Paul Jacob, Yuansong Qiao, Enda Fallon Performance Analysis Of Multi-hop Networks 156 Xiaoguang Li, Robert Stewart, Sean Murphy, Sumit Roy Session 5b: Wired & Wireless Chaired by: Nick Timmons, Letterkenny Institute of Technology EmNets - Embedded Networked Sensing 167 Ken Murray, Dirk Pesch, Zheng Liu, Cormac Sreenan Dedicated Networking Solutions for Container Tracking System 175 Daniel Rogoz, Fergus O'Reilly, Kieran Delaney Handover Strategies in Multi-homed Body Sensor Networks Yuansong Qiao, Xinyu Yan, Enda Fallon, Austin Hanley xi 183 Session 6: Doctoral Symposium Chaired by: Declan O’Sullivan, Trinity College Dublin Hierarchical Policy–Based Autonomic Replication 192 Cormac Doherty, Neil Hurley Sensemaking for Topic Comprehension 196 Brendan Ryder, Terry Anderson A Pedagogical-based Framework for the Delivery of Educational Material to Ubiquitous Devices 203 Caoimhin O'Nuallain, Sam Redfern xii ITT07 Author Index A Anderson, M Terry 196 Jonathan Kevin Abbas Kevin Joan Nicola Kevin 21 50 29 91 29 139 29 Mark Tanmoy Kieran Cormac Andrew 139 139 175 192 40 MacNamee, Ciaran Matthews, Adrian McCarthy, Sheila McKevitt, Paul McTear, Mike Molloy , Derek Mullane, Brendan Muntean, Cristina Hava Muntean, Gabriel-Miro Murphy, Sean Murray, Ken Muthukumaran, Panneer C Campbell, Casey, Cheddad, Collins , Condell, Cranley, Curran, D Davis, Debnath, Delaney, Doherty, Donnellan, O F Fallon, Fallon, Fratczak, Fu, Enda 116, 147, 156, 183 Sheila 147 Zbigniew 91 Bo 3 Oliver Elizabeth Austin 116, 147, 156, 183 Gareth 183 Arnold 50 Michael 60 Neil 192 Paul 116, 147 Anthony Kenneth Wayne 13 183 40 Dirk 108, 167 Yuansong 116, 147, 183 Rea, Redfern, Rogoz, Roy, Ryder, Susan Sam Daniel Sumit Brendan 108 203 175 156 196 Heather Helen Stephen Weiping Rostislav Cormac Robert 72 98 130 167 167 167 156 Duong 167 Changqiao 116 Xinyu 183 S T L Laffey, Lally, Li, Liu, Lonergan, Looney, Pesch, Sayers, Sheridan, Sheridan, Song, Spinar, Sreenan, Stewart, K Keane, Kearney Kelly, 21 175 203 175 3 R J Jacob, Malachy Brendan Caoimhin Fergus Declan P Qiao, 108 122 H Hanley, Hay, Hensman, Higgins, Hurley, O'Doherty, O'Flynn, O'Nuallain, O'Reilly, O'Sullivan, Q G Gough, Guest, 60 183 72 29, 72 72 40 60 83 91 156 167 167 Dennis John Xiaoguang Zheng Margaret Mary Ta, 175 83 156 167 98 108 X Xu, Y Yan, xiii xiv Session 1 Trust & Security 1 2 Trust Management in Online Social Networks Bo Fu, Declan O’Sullivan School of Computer Science & Statistics, Trinity College, Dublin [email protected], [email protected] Abstract The concept of trust has been studied significantly by researchers in philosophy, psychology and sociology; research in these fields show that trust is a subjective view that varies greatly among people, situations and environments. This very subjective characteristic of trust however, has been largely overlooked within trust management used in the online social network (OSN) scenario. To date, trust management mechanisms in OSNs have been limited to access control methods that take a very simplified view of trust and ignore various fundamental characteristics of trust. Hence they fail to provide a personalized manner to manage trust but rather provide a “one size fits all” style of management. In this paper we present findings which indicate that trust management for OSNs needs to be modified and enriched and outline the main issues that are being addressed in our current implementation work. Keywords: Trust, Online Social Networks, multi-faceted, personalisation, ratings. 1 Introduction The concept of social networking dates back to 1930s, when Vannevar Bush first introduced his idea about “memex” [Vannevar, 1996], a “device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility”, and predicted that “wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified.” Since the launch of the first online social networking website USENET [Usenet] in 1979, we have seen a dramatic increase of online social networks such as Bebo [Bebo], Facebook [Facebook] and MySpace [MySpace] just to name a few, these OSNs allow users to discover, extend, manage, and leverage their personal as well as professional networks online. OSNs serve various purposes, mostly center around the following topics: business, education, socializing and entertainment. Business oriented OSNs help registered individuals make connections, build business contacts and maintain professional networks for potential career opportunities; as well as allowing organizations to advertise their products and services. Examples of such OSNs are LinkedIn [LinkedIn], Ecademy [Ecademy], Doostang [Doostang], XING [XING] and Plaxo [Plaxo]. Educational OSNs usually focus on groups of people who wish to gain knowledge in the same field through the forms of blogs and link sharing with a great variety of subject matters. Examples of such networks can be found in many institutions, where intranets are set up for specific schools, faculties, or classes. Socializing OSNs aim to provide users with a virtual environment in which online communities can exchange news, keep in touch with friends and family, and make new connections. Usually, various features are implemented which allow users to keep journals, post comments and news, upload pictures and videos as well as send each other messages. Such OSNs tend to centre around themes, such as music, movies, personal life, etc., and are designed to be either user-centric or topic-centric, where online communities can focus on developing profiles all about oneself or developing particular 3 hobbies. Several examples of this type of OSNs are 43 Things [43 Things], CarDomain [CarDomain], Friendster [Friendster], Hi5 [Hi5], and MOG [MOG]. Closely associated with socializing OSNs are entertaining OSNs, where focus on personal aspects of the online communities is less visible, compared to the entertainment attributes these communities may offer to a network. For example, on YouTube [YouTube], focus is shifted away from personal profiles, and video sharing feature is greatly valued. Since its launch in early 2005, YouTube has quickly become the home of video clip entertainment, it now accounts for 29% of the U.S. multimedia entertainment market [USA Today, 2006]. According to registration requirements, OSNs can be grouped into two main categories, sites that are open to anyone and sites that are invitation only. Anyone is welcomed to set up an account and put up a representation of oneself in open invite OSNs, such as Graduates.com [Graduates], and Friends Reunited [Friends Reunited]. However, in order to join some sites, you need to be invited by a trusted member, aSmallWorld [aSmallWorld] is an example of such OSNs, where high profile celebrities are among its registered members. The predominant business model for most OSNs is advertising. It is free for anyone to join, and revenue is made by selling online advertising on these websites. However, a number of OSNs charge their members for the information or services they provide, such as LinkedIn where employers can advertise their vacancies looking for suitable candidates. The remainder of this paper is organized as follows. We first examine the state of the art in trust management mechanisms deployed in OSNs in section 2, which has led to our belief that very little attention is being paid to personalized trust management in OSNs. Next, in order to explore this, we designed an online questionnaire to determine if our initial belief was well founded. The design and execution of the survey is then presented in section 3, followed by the findings in section 4. These findings have helped us identify issues, discussed in section 5, that users have with current trust management in OSNs. And finally, these identified issues have provided a backdrop for the prototyping of our solution that is currently underway and briefly described in section 6. 2 State of the Art 2.1 Trust – Definitions and Characteristics Trust is an elusive notion that is hard to define, the term “trust” stands for a diversity of concepts depending on the person you approach. To some, trust is predictability, where evidence of one’s reputation suggests a most-likely outcome; to others, trust is dependability, where one truly believes in and depends upon another; yet, to many, trust is simply letting others make decisions for you and knowing that they would act in your best interest. Several notable definitions of trust are presented below. Grandison and Sloman [Grandison & Sloman, 2000] defined trust as “the firm belief in the competence of an entity to act dependably, securely, and reliably within a specified context.” Mui et al. [Mui et al., 2002] defined trust as “a subjective expectation an agent has about another’s future behaviour based on the history of their encounters.” Olmedilla et al. [Olmedilla et al., 2005] stated that “Trust of a party A to a party B for a service X is the measurable belief of A in that B behaves dependably for a specified period within a specified context (in relation to service X).” In summary, trust can not be defined by a single consensus, there is a wide and varied range of synonyms for trust, and the answer to “what is trust” can not be easily provided. Hence, significant challenges are presented for modelling trust in the semantic Web, therefore, it is important for us to concentrate on the core characteristics of trust [Golbeck, 2005; Dey, 2001] which remain true regardless of how trust is modelled. 4 Trust is Asymmetric. Between two parties, trust level is not identical. A may trust B 100%, however, B may not necessarily feel the same way about A; B may only trust A 50% in return for example. Arguably, trust can be transitive. Let us say that A and B know each other very well and are best friends, B has a friend named C whom A has not met. But since A knows B well and trusts B’s choices in making friends, A may trust C to a certain extent even though they have never met. Now let us say C has a friend named D whom neither A nor B knows well, A could find it hard to trust D. Hence, it is reasonable to state that as the link between nodes grow longer, trust level decreases. However, others [Grandison, 2003; Abdul-Rahman, 2004] disagree and argue that trust is nontransitive, [Zinnermann, 1994] states that if I have a good friend whom I trust dearly, who also trusts that the president would not lie, does that mean that I would therefore trust that the president would not lie either? Trust is personalised. Trust is a subjective point of view, two parties can have very different opinions about the trustworthiness of the same person. For example, a nation may be divided into groups who strongly support the political party in charge and groups who would strongly disagree. Trust is context-dependent. Trust is closely associated with overall contexts, in other words, trust is context-specific [Gray, 2006]. One may trust another enough to lend that person a pencil, but may find the person hard to trust with a laptop for instance. 2.2 Current Trust Mechanisms in Online Social Networks Current trust mechanisms used in OSNs have been limited to simple access control mechanisms, where authorization is required to contact, to write on, and to read all or part of a user’s profile, given that blogging and commenting features are enabled. Communities in OSNs are usually categorized into groups, i.e., one’s family, friends, neighbours, etc., with all or limited access to one’s photos, blogs and other resources presented. In Bebo for instance, a user can acquire URL for his/her profile which then is viewable to anyone with a browser, or he/she can set the profile “private” which means that only the connected friends to this user are authorized to view the profile and everything presented in it. In Yahoo! 360° [Yahoo!360], access control mechanism is refined by letting users set their profiles and blogs viewable to the general public, their friends, friends of their friends or just the users themselves. The site allows users the freedom to create specific friend categories, such as friends in work, friends met while travelling, etc. Users can then control whether to be contacted via email or messenger by anyone in the Yahoo! 360° network, people whom one is connected to, or only those in the defined categories. In Facebook, privacy settings of a profile is further refined by allowing the owner of a profile grant different levels of access to sections of a profile such as contact information, groups, wall, photos, posted items, online status, and status updates. Also, users can decide whether they would like the search engine to list their profiles in search results; as well as whether they would like to notify friends with their latest activities. Finally, a user can select which parts of the profile are to be displayed to the person who tries to contact him/her through a poke, message, or friend request. Among many notable OSNs, we have found that controlling access seems to be the only way to express trust, where users group their connections into categories and grant all or limited access to these specified categories. Studies [Ralph, Alessandro et al. 2005] of FaceBook have shown that many people who are connected to a person are not necessarily “friends” as such, but simply people whom this person does not dislike. Hence, there is a great variety of the levels of trust among these connected “friends” of a person. However, this variety of trust level has not been captured in OSNs, and users can not annotate their variety of trust in a person, nor can they personalise that trust depending on the situation. In some cases, we want private information to be known only by a small group of people and not by random strangers. Such information may be where you live, how much money you make, etc., in an OSN environment, you probably would dislike the idea of letting random strangers read comments left by your friends detailing a trip you are about to take, for safety reasons. In other instances, we are willing to reveal personal information to anonymous strangers, but not to those who 5 know us better. For example, if desired, one can state one’s sexuality on a profile page and broadcast that to the world, however, one may not be ready to reveal that very piece of information to the family and friends whom one trusts most. 2.3 Related Work Much research has been carried out in the field of computer science in relation to trust management, various algorithms, systems and models have been produced, such as PGP [Zimmerman, 1995], REFEREE [Chu et al, 1997], SULTAN [Grandison et al, 2001], FOAF [Dumbill et al, 2002], TRELLIS [Gil et al, 2002], Jøsang’s trust model [Jøsang A., 1996], Marsh’s trust model [Marsh, 1994] and many more. In particular, a multi-faceted model of trust that is personalisable and specialisable [Quinn, 2006] has been designed in the Knowledge and Data Engineering Group (KDEG) [KDEG] from the Computer Science Department in Trinity College Dublin. While reviewing trust management systems in computer science, Quinn found that current methods “tend to use a single synonym, or definition in the use of trust… such approaches can only provide a generic, non-personalised trust management solution”. To address this problem of the lack of potential for personalizing trust management, a multi-faceted model of trust that is both personalisable and specialisable was proposed, implemented and evaluated. In the proposed model, trust is divided into concrete concept and abstract concept with attributes of their own, where the former includes credibility, honesty, reliability, reputation and competency attributes, and the later with belief, faith and confidence attributes. Ratings are then given to each of the eight attributes, and trust is calculated as the weighed average of these ratings. The claim for this model is that it has “the ability to capture an individual’s subjective views of trust, also, capture the variety of subjective views of trust that are exhibited by individuals over a large and broad population”, which in turn, provides “a tailored and bespoke model of trust”. In addition to demonstrating its personalization capabilities, Quinn demonstrated how the model could be specialised to any application domain. The two applications that were used to trial the model and approach were web services composition and access control in a ubiquitous computing environment. However, Quinn did speculate in his conclusions that the model would be suitable for use in the OSN domain. 3 Survey Design and Execution Given the lack of trust management features within OSNs and our belief that such features would be welcomed, we decided to explore with users whether Quinn’s multi-faceted model of trust that enables personalization and provides the freedom of annotating trust subjectively be welcomed in OSNs? And what would be the desired functionalities if such a trust model would be integrated into OSNs? With these questions in mind, A Survey of Online Social Networks was designed. The questionnaire groups participants into three categories as follows, people who are currently using OSNs, people who have used OSNs in the past but are no longer active, and finally, people who have never used OSNs. With the former two categories, the survey aimed to find out user behaviour in relation to trust management aspect in OSNs, and gather user experience with existing trust mechanisms. With the last category, we aimed to find out why some have not or will not use OSNs. Most importantly, without excluding anyone, regardless of participants’ experience with OSNs and current trust mechanisms, we asked for their desired trust features as well as their opinions on the multi-faceted model of trust. A trial questionnaire was first designed and road tested in a computer science postgraduate class, where a group of thirteen people took part in the survey, which has helped the refinement of the official questionnaire. Considering their flexibility, feasibility and easy data gathering factors, online questionnaires was convenient as we were aiming at a large audience, therefore, SurveyMonkey [SurveyMonkey] was chosen to host the online survey on the 27th of May, 2007, over a period of two weeks time. Invitations 6 to take part in the survey were sent out via email, to targeted third level institutions in Ireland, and interested parties were encouraged to distribute the questionnaire further. 4 Findings In total, 393 people took part in answering the online questionnaire. Among which, 59% were male, 41% were female. 68% of respondents were undergraduate students, 21% were postgraduate student and with the remaining being college employees. Most survey participants come from science related background, with a high 70% of people either studying for or having a degree in engineering, computer science or information technology related fields. 4.1 Category One – Active OSN users Among 243 respondents who are currently using OSNs, the majority of the profiles are set to be viewable by the general public, while less than 20% of people allow only direct linked friends to view their profiles, as Figure 1 shows. 71.60% 19.75% 4.12% 4.53% People directly linked with you Only some of your Other friends of directly linked your directly linked friends friends Anyone Figure 1: Access settings of user profiles – Category One We asked the question of whether these users are happy with the available ways of controlling access to their profiles. As Figure 2 shows, most people are pleased with current access control methods, while around 20% of the respondents are not concerned with it and less than 10% of people are not pleased with it. Among reasons given for their dissatisfaction, almost every comment of those 10% of people was in relation to the lack of better access controls to user profiles. For example, many mentioned that in Bebo, despite having a private profile, others can still send emails to the profile owner. 72.43% 18.11% 9.47% Yes No Don’t care Figure 2: User satisfaction towards current access control methods Since the majority of this category has public profiles, we asked the question of whether they trust random strangers to view their profiles, as well as the question of whether access control really is necessary. As Figure 3 shows, despite having public viewable profiles, only 25% of these people actually stated the fact that indeed, they do trust anyone and everyone to view their profiles. Most people however, claimed that they do not, while also a large number of people are not bothered by it at 7 the same time. We have found a similar contradictive response regarding the necessity of access control in OSNs, as Figure 4 shows, only less than 20% of these people think it is not necessary, while most people, nearly 55% of the respondents believe that controlling access is necessary, and around 25% of people are not concerned. 38.72% 35.74% 25.53% Yes No Don’t care Figure 3. Would you trust random strangers to view your profile? 54.89% 25.53% 19.57% Yes No Don’t care Figure 4. Is it necessary that only certain people can view certain parts of your profile? 4.2 Category Two – No Longer Active OSN users During their memberships of the 50 respondents in this category, 46% of people had set their profiles accessible by anyone, as Figure 5 shows, 26% allowed only direct linked people to view their profiles. 46.00% 26.00% 16.00% 12.00% People directly linked with you Only some of your Other friends of directly linked your directly linked friends friends Anyone Figure 5. Access settings of user profiles – Category Two When asked about why have they stopped using OSNs, this category of people gave several interesting reasons. For instance, a lot of people lost interest in OSNs, sometimes due to unpleasant personal experience, or the completion of research or work related projects, or simply do not have time for them any more. In our survey, 5% of people in category two view OSNs as a rather sad way of replacing real life associations, particularly since a lot of sites keep records of the number of visits a profile gets, turning OSNs into forms of popularity contests. However, at the same time, many acknowledge the fact that OSNs are cheap alternatives to keep updated with others, but believe that a refinement in their structure is needed. In particular, privacy concerns were top of the list, with 8 individuals mentioning unpleasant experiences during their membership. For example, on some sites, comments left by close friends are displayed to everyone connected to an individual or sometimes, anyone with a browser; also, being contacted unwillingly by random strangers or friends of a connected friend whom they barely knew. Unfortunately, ways to stop these from happening do not always seem to work, distress and frustration had been caused due to the limited methods that are available. When asked whether they think access control of profiles are necessary in OSNs, this group of people had a similar response to category one. Among 47 participants who answered this question, 66% of people believe that it is necessary, only 6% of people disagree, with the remaining not caring. 4.3 Category Three – Not Users of OSNs as yet We were interested to find out why this group of people have never used OSNs, among 57 respondents, some had no interest, some had no time, others dislike the idea of having private information on the Internet and a small number of people have not heard of OSNs, as Figure 6 shows. Again, privacy concerns and the lack of freedom of controlling access to information have been mentioned by the 21.05% of people who stated otherwise when answering this question. 40.35% 35.09% 21.05% 19.30% 12.28% Have never Not interested heard of OSNs in using OSNs Don’t have time Don't want to put personal things on the internet Other (please specify) Figure 6. Why have you never used OSNs? Among 52 participants from this category, we asked whether it is likely for them to use OSNs in the future and whether they believe controlling access to profiles are necessary, 44% of people stated that they would start using OSNs in the future and 69% of whom think it is necessary to control access, only 4% of people disagree and 27% say that they do not care. 4.4 Desired Trust Features and Opinions on a Proposed Solution If a multi-faceted model of trust with the eight trust attributes: credibility, honesty, reliability, reputation, competency, belief, faith and confidence, is to be integrated into OSNs, would that be welcomed? Would ratings of these eight attributes of a person portrait subjective views of trust in OSNs? With the aim of finding out more on our proposed solution, we asked our participants’ views on desired trust features in OSNs as well as their feelings towards a rating feature. We asked 315 participants, which of those eight attributes of trust are most important in their opinions, as figure 7 shows, honesty appears to be the most important factor, closely followed by credibility and reliability, as well as reputation. 9 60.95% 47.94% 39.68% 24.13% 17.78% 15.56% en ce Fa i th 3.81% Co nf id Be lie f es ty Re lia bi lity Re pu ta tio n Co m pe te nc y Ho n Cr ed ib ilit y 2.54% Figure 7. Views on the eight attributes of trust When asked if a user would like to see the ratings given by others, 44% of participants said yes, 36% said no, with the remaining not caring about it. However, when asked whether they would like to rate others, 67% of people think it is unnecessary, only 9% of respondents believe that it would be helpful, another 10% of people do not care and with the remaining not being able to decide on the subject. 5 Analysis Several issues have been discovered during the survey, as discussed below: Current trust mechanisms need to be refined. Most mentioned unpleasant experiences are related to a lack of, or unsatisfying privacy control, while a large number of OSNs fail to allow users to express their various degrees of trust in a person, or a group of people context-specifically. Hence, refinement of current trust mechanisms is welcomed in OSNs. Personalisation is not provided in current trust mechanisms. Users cannot personalise trust with their subjective views in OSNs at the moment; important trust characteristics as mentioned in section 2 are not captured in OSNs. Even though trust levels vary among members of defined groups, users can not adjust their levels of trust among their connected friends using current trust mechanisms deployed in OSNs. Users are unsure about a multi-faceted model of trust with rating features. Contradictive findings in relation to a trust rating feature suggest that on one hand, users think that such facilities would help in gaining better control of online profiles, but on the other hand, they find it hard to rate someone they know personally. Such opinions could be the result of a lack of understanding regarding the proposed solution, as for a large percentage of candidates, the word “rating” is very open to interpretation, it would be hard for them to simply imagine what ratings could be like without having the slightest ideas of how-to go about doing it. Also, we need to recognise limitations of the questionnaire, phrasing of the questions and limited open-ended questions in the survey may have restricted the amount of quality data. 6 Current Work In order to find out whether the proposed multi-faceted model of trust would truly satisfy user requirements regarding trust management in OSNs, currently, implementation of a small scale OSN named miniOSN is in progress, powered by Ruby on Rails [RoR] and a trust management approach strongly influenced by Quinn’s multi-faceted model of trust. miniOSN has functionalities of a basic online social networking website, it allows users to create accounts for themselves with a username, password and a valid email address. Users of miniOSN can then set up representations of themselves, upload photos, post blog entries, as well as leaving comments in connected friends’ profiles. The trust management approach implemented in miniOSN aims to capture the fundamental characteristics of trust found in the literature review and has the following main features: • Each user holds ratings of his/her connected friends in the database, which are only viewable to this particular owner and can be adjusted at any time 10 • • • • • Ratings can be given to credibility, honesty, reliability, reputation, competency, belief, faith and confidence attributes of a person The owner of a resource - be it a picture, a blog, or a comment - can set trust requirements before distributing that resource All users and resources have the highest ratings by default unless specified otherwise Users decide whether to transfer a same set of trust values to all other friends of a connected friend Users decide which connected friends should start with what ratings Profile owners can then express trust with personalization, adjust minimum trust rating requirements when granting access to certain resources in their profiles. For example, a family member with a high rating in honesty but a low rating in competency cannot read a certain blog entry; while a work colleague with high ratings in reputation and competency but low rating in reliability cannot see a particular group of photos. Evaluating such an OSN integrated with the multi-faceted model of trust is part of our continuing research agenda. References [43Things] 43Things website, http://www.43things.com [Abdul-Rahman, 2004] Abdul-Rahman, A., (2004). “A Framework for Decentralised Trust Reasoning”, Ph.D. thesis, University of London, UK. [aSmallWorld] aSmallWorld website, http://www.asmallworld.net [Bebo] Bebo website, http://www.bebo.com [CarDomain] CarDomain website, http://www.cardomain.com [Chu et al, 1997] Chu, Y., Feigenbaum, J., LaMacchia, B., Resnick, P., and Strauss, Ma., (1997). ‘REFEREE: Trust Management for Web Applications.’, The World Wide Web Journal, 1997, 2(3), pp. 127-139. [Dey, 2001] Dey, A., (2001). “Understanding and Using Context”, Personal and Ubiquitous Computing 5(1): 4-7. [Doostang] Doostang website, http://www.doostang.com [Dumbill et al, 2002] Dumbill, E., (2002). ‘XML Watch: Finding friends with XML and RDF.’, IBM Developer Works’, June 2002. Last retrieved from http://www-106.ibm.com/developerworks/xml/library/xfoaf.html [Ecademy] Ecademy website, http://www.ecademy.com [Facebook] Facebook website, http://www.facebook.com [Friends Reunited] Friends Reunited website, http://www.friendsreunited.com [Friendster] Friendster website, http://www.friendster.com [Gil et al, 2002] Gil, Y., Ratnakar, V., (2002). ‘Trusting Information Sources One Citizen at a Time’, Proceedings of the First International Semantic Web Conference (ISWC), Sardinia, Italy, June 2002. [Golbeck, 2005] Golbeck, J. A., (2005). “Computing and Applying Trust in Web-Based Social Networks”, Ph.D. thesis, University of Maryland. [Graduates] Graduates website, http://graduates.com [Grandison, 2003] Grandison, T., (2003). “Trust Management for Internet Applications”, Ph.D. thesis, University of London, UK. [Grandison et al, 2001] Grandison, T., Sloman, M., (2001). ‘SULTAN - A Language for Trust Specification and Analysis’, Proceedings of the 8th Annual Workshop HP Open View University Association (HP-OVUA), Berlin, Germany, June 24-27, 2001. [Grandison & Sloman, 2000] Grandison, T., and Sloman, M., (2000). “A survey of trust in internet applications”, IEEE Communications Surveys and Tutorials, 4(4):2–16. [Gray, 2006] Gray, E. L., (2006). “A Trust-Based Management System”, Ph.D. thesis, Department of Computer Science and Statistics, Trinity College, Dublin. [Hi5] Hi5 website, http://www.hi5.com 11 [Jøsang A., 1996] Jøsang A., (1996). “The right type of trust for distributed systems”, Proceedings of the 1996 workshop on new security paradigms. Lake Arrowhead, California, United States, ACM Press. [KDEG] Knowledge and Data Engineering Group website, http://kdeg.cs.tcd.ie [LinkedIn] LinkedIn website, http://www.linkedin.com [Marsh, 1994] Marsh S., (1994). “Formalising Trust as a Computational Concept”, Ph.D. thesis, Department of Mathematics and Computer Science, University of Stirling. [MOG] MOG website, http://mog.com [Mui et al., 2002] Mui, L., Mohtashemi, M., and Halberstadt, A., (2002). “A computational model of trust and reputation”, In Proceedings of the 35th International Conference on System Science, pages 280–287. [MySpace] MySpace website, http://www.myspace.com [Olmedilla et al., 2005] Olmedilla, D., Rana, O., Matthews, B., and Nejdl, W., (2005). “Security and trust issues in semantic grids”, In Proceedings of the Dagsthul Seminar, Semantic Grid: The Convergence of Technologies, volume 05271. [Plaxo] Plaxo website, http://www.plaxo.com [Quinn, 2006] Quinn K., (2006). “A Multi-faceted Model of Trust that is Personalisable and Specialisable”, Ph.D. thesis, Department of Computer Science and Statistics, Trinity College, Dublin. [Ralph, Alessandro et al. 2005] Ralph, Alessandro et al., (2005). “Information revelation and privacy in online social networks”, Proceedings of the 2005 ACM workshop on Privacy in the electronic society. Alexandria, VA, USA, ACM Press. [RoR] Ruby on Rails project homepage, http://www.rubyonrails.org [SurveyMonkey] Survey Monkey website, http://www.surveymonkey.com [USA Today, 2006] USA Today, (2006). “YouTube serves up 100 million videos a day online”, last retrieved from http://www.usatoday.com/tech/news/2006-07-16-youtubeviews_x.htm [USENET] USENET website, http://www.usenet.com [Vannevar, 1996] Vannevar, B., (1996). "As we may think." interactions 3(2): 35-46. [XING] XING website, http://www.xing.com [Yahoo!360] Yahoo!360 website, http://360.yahoo.com [YouTube] YouTube website, http://youtube.com [Zimmermann, 1994] Zimmermann, P., (1994). “PGP(tm) User's Guide”, October 1994. [Zimmerman, 1995] Zimmerman, P.R., (1995). “The Official PGP Users Guide”, MIT Press, Cambridge, MA, USA, 1995. 12 Irish Legislation regarding Computer Crime Anthony J. Keane Department of Informatics School of Informatics and Engineering Institute of Technology Blanchardstown Dublin 15 [email protected] Abstract Most people that use computers, whether for personal or work related activities, do so oblivious of the general legalities of their actions, in terms of the enacted legislation of the State. Of course, we all have a good idea of the obvious illegal activities like using computers to commit criminal fraud, theft or to view child pornography, especially where high profile cases of these crimes appear in the news media from time to time. This paper is an overview of the Irish legislation regarding computer crime and examines how the wordings of these laws are interpreted by the legal community in the identification of what could be considered a computer crime. Keywords: Computer Forensics, Computer Crime, Irish Law 1 Introduction The Internet is a global communications system that allows easy access to resources and people. It has been adopted by business as a means of increasing their customer base and improving their ability to provide their service. Criminals have also adopted it as another means of committing crime. The mechanism of the Internet is based in technology protocols, many of which are open standard and easily available, so with a little effort in educating one self on the inner working of the Internet, the criminal mind can conceive many imaginative ways of misrepresenting themselves and tricking the remote user into divulging their personal details, financial details, user access codes and passwords. Who hasn’t received a spam email asking for bank details, offering to get large amounts of money for a small deposit, and similar type of get-rich-quick schemes? Other tricks are not illegal but border on being so and are defiantly unethical, are the selling of so-called special drugs reporting that they can satisfy some social desire on the part of the customer. The commercialisation of the Internet is a relatively recent phenomenon and it is only since the late 1990s that companies have wanted to do business on the Web and for users to communicate via email. It is estimated today that they are over one billion users with a presence on the Internet and as such, has attracted the attention of the criminals where they actively targeted Internet users’ everyday to relieve them of their cash and identities. Their primary means of contacting individuals is by spamming users with email messages (MessageLabs[1] have reported detecting over 83% of email traffic as being spam). Other approaches involve hacking into networks and as early as 1999 a number of high profile hacks were reported, the Hotmail email service was broken into and user accounts could be accessed with the use of their passwords, the New York stock exchange was attacked, Microsoft is constantly attacked as well as NASA and the Pentagon. Closer to home, a recent survey was conducted of Irish businesses and according to the results of “The ISSA/UCD Irish Cybercrime Survey 2006: The Impact of Cybercrime on Irish Organisations” report[2], Irish organisations are significantly affected by cybercrime where virtually all (98%) of respondents indicated that they had experienced some form of cybercrime with losses of productivity and data being the main 13 consequences. High profile attacks have been the Department of Finance where the phone system was hijacked and used to run up a bill of thousands of euros over one weekend. It is from the explosion of growth in computer use that a new field of computer science has emerged to deal with computer related crime and it is called Computer Forensics. It was initially developed by police enforcement agencies, like FBI[3] where techniques, tools and best practices were needed for information in a crime to be extracted from computer storage devices and used as evidence in the prosecution of the case. Today the Computer Forensics field has many contributors from academic research groups to professional companies specialising in Security and Forensics and the law enforcements agencies. There is also a variety of propriety application tool kits for analysing storage media together with a growing array of free open source tools. There are three areas of demand for the services of a computer forensics professional, the criminal area, the corporate requirement, the private / civil area. Here we look at the criminal area and concentrate on the legislation in force in Ireland that is available for prosecution of computer related crimes. We ask the following questions: How is the legislation framed and what computer related activities are considered illegal? 2 Irish Legislation In Irish Law, there is no individual legislation Act that is specially targeted at computer crime, instead computer crime has been treated as an afterthought and incorporated in Acts whose primary focus is elsewhere. As such, most of the computer crime related offences can be found in section 5 of the Criminal Damage Act, 1991[4] and Section 9 of the Criminal Justice (Theft and Fraud) Offences Act 2001[5]. 2.1 The Criminal Damage Act 1991, Section 5 states that: (1) A person who without lawful excuse operates a computer— (a) within the State with intent to access any data kept either within or outside the State, or (b) outside the State with intent to access any data kept within the State, shall, whether or not he accesses any data, be guilty of an offence and shall be liable on summary conviction to a fine not exceeding ¼634 or imprisonment for a term not exceeding 3 months or both. (2) Subsection (1) applies whether or not the person intended to access any particular data or any particular category of data or data kept by any particular person. We need to look in other sections in the Criminal Damages Act of 1991 to get the definition for criminal damage and to see how it applies to data. The offence of criminal damage is identified in section 2, part 1 which states that “A person who without lawful excuse damages any property belonging to another intending to damage any such property or being reckless as to whether any such property would be damaged shall be guilty of an offence”. The offence applies to data as follows: (i) to add to, alter, corrupt, erase or move to another storage medium or to a different location in the storage medium in which they are kept (whether or not property other than data is damaged thereby), or (ii) to do any act that contributes towards causing such addition, alteration, corruption, erasure or movement. Note also that the term "data" is defined as information in a form in which it can be accessed by means of a computer and includes a program. 14 At first glance, the Criminal Damages Act appears to create an offence for the unauthorised operation of a computer and unauthorised access of data. However the wording of the Act is sufficiently loose to raise some comments from the legal community as regards its meaning and how it would apply in court. The following points have been made by McIntyre[6] regarding the Criminal Damages Act: • it create an offence for the modification of any information stored on a computer whether or not it has an adverse effect, • the Act doesn’t differentiate between less serious offences of unauthorised access and more serious offences of actual damage, • it has undefined terms like “operate” and “computer”, • section 5 creates an offence of operating a computer without lawful excuse but section 6 discusses lawful excuse in terms of authority to access data and not operate a computer, McIntyre uses two examples to illustrate the problems of interpreting the offence: Example 1: Suppose that X sends an email to Y, which travels via Z’s computer. X will be seen to have “operated” Y’s and Z’s computers since he has caused them to executed programs to deliver and process his email. If Y indicated that the email was unwelcome, then X could be charged with operating Y’s computer without lawful excuse and guilty of an offence under section 5 of the Act. Example 2: X uses Y’s computer without Y’s permission, to access data he is entitled to access. This is unauthorised operation but not unauthorised access. This is an offence under section 5 but section 6 suggests that X has lawful excuse and so no offence has occurred. Conversely, X uses Z’s computer with permission to access data he is not entitled to access. This is authorised access but unauthorised access. This is not an offence under section 5 but with section 6 taken into account, an offence of unlawful access has occurred. As the unauthorised access to information is handled by the Criminal Damage Act, 1991 and is supposed to safeguard the possibility where a “hacker” has not committed any damage, fraud or theft but has tried or succeeded to gain access to a computer system. When a system is damaged, then Section 2 of the Criminal Damage Act, 1991 is used. This creates the offences of intentional or reckless damage to property. While the wording of the Act does not explicitly use computer terms like virus, an offence could be applied to damage caused to a computer system by a virus or similar computer generated attack. Reckless damage under section 2 has as the penalties a minimum fine of ¼12,700 to a maximum imprisonment for a term not exceeding 10 years or both. One of the problems with the legislation is the poor definition of computer terms, for example data and computer with the reason given for this approach as a means to prevent the legislation from becoming obsolete by the rapid advancement of technology. However the range of meaning of data could lead to a scenario outlined by Karen Murray[7]; “The Criminal Damage Act 1991 has sought to avoid ambiguous definitions by avoiding a definition at all. This may have bizarre results; the human memory is undoubtedly a “storage medium” for ‘data’; if a hypnotist causes a person to forget something, have they committed criminal damage?” Murray argues that such vagueness may be subject to a Constitutional challenge in Ireland under the doctrine where “the principle that no one may be tried or punished except for an offence known to the law is a fundamental element of Irish and common-law system and essential security against arbitrary prosecution”. In other words, “if there is no way of determining what the law is, there is no crime” 15 2.2 The Criminal Justice (Theft and Fraud) Offences Act 2001, Section 9 states that: (1) A person who dishonestly, whether within or outside the State, operates or causes to be operated a computer within the State with the intention of making a gain for himself or herself or another, or of causing loss to another, is guilty of an offence. (2) A person guilty of an offence under this section is liable on conviction on indictment to a fine or imprisonment for a term not exceeding 10 years or both. The Criminal Justice Act 2001 followed the Electronic Commerce Act 2000 and provides for the dishonest operation or cause of to operate a computer. This was seen as a safe guard for the new age of electronic commerce. The term “dishonestly” is defined in section 2 of the Act as meaning “without a claim of right made in good faith”. This definition and wording of the Act could be interrupted in a broad sense to mean that if someone honestly uses a computer where he or she does so with claim of right made in good faith, there is no offence no matter what they did with the computer. Kelleher-Murray[8] argues that the Act appears to cover almost any use of a computer which could be considered to be dishonest. McIntyre[6] observations are as follows: • The Act does not differentiate between the gain being dishonest or honest. • If dishonesty is considered in a wider context, then the severity of the offence is not inline with similar offences committed without a computer, for example the maximum penalty for the sale of copyright material out a suitcase in the street is 5 years while selling similar material over the Internet has a 10 year maximum penalty. • Act does not apply where computer is misused for improper purpose, for example the collection of information to commit a crime. Many Irish businesses have suffered from Denial of Service (DoS) attacks and T.J. McIntyre uses the denial of service attacks as an example to show the difficulty in applying the Irish legislation to an actual offence. The difficulty lies with what law has actually been broken. In a denial of service attack, no unauthorised access has been made, no data or information has been moved or modified, the perpetrator has used his computer of which he has authorised access and operated honestly. McIntyre argues that an indirect prosecution may be possible where unauthorised access and criminal damage of other computers used to commit the denial of service attack would apply and he cites a UK case of (R vs Aaron Caffrey 2003)[9] as an example of how this was unsuccessfully attempted to get a prosecution for data modification under the UK Computer Misuse Act 1990. 3 Developments in Computer Law Irish Computer Law is currently under review and this is mainly due to Ireland signing up to the European Convention on Cyber-Crime and also the adoption of the Council Framework Decision. McIntyre expects new changes to Irish legislation over the next few years and urges the legislators not to adopt a minimalist approach to reform by tinkering around the edges of existing laws but to give the area of computer crime the special attention it requires in its own right and engage with a comprehensive reform program. However any new computer crime laws should be balanced so as not to exclude research and testing of live systems least it be taken as an attack and prosecuted as an offence. An example of such tight legislation is the recent amendments to the UK Computer Misuse 16 Act that could possibility criminalise legitimate security researchers and guidelines are being urgently sought to clarify the law[10]. The European Convention on Cyber-Crime[11] 3.1 Since 1995 the EU has been trying to get a consensus on how to tackle cross-border Internet related criminal activities. In 2001 it finally got an agreement to what has become known as the Convention on Cybercrime. Ireland became a signature in 2002 but it only came into force on 1st July 2004. The cybercrime convention represents the first international attempt to legislate for cross-border criminal activity involving computers. The Convention covers offences against the confidentiality, integrity and availability of computer data and systems. It also covers computer related offences of forgery and fraud, content related offences and offences related to infringement of copyright and related rights like illegal file sharing. It also covers rules for interception, collection, preservation and disclosure of computer data and information. In the broad definition of computer crime, the term cybercrime is viewed as a subcategory and generally associated with the Internet. The Convention on Cybercrime covers the following three broad areas: • All signatures criminalise certain online activities. This will require changes to the Irish legislation since some of the offences do not exist at the moment in Irish Law. • States should requires operators of telecommunications networks and ISPs to institute more detailed surveillance of network traffic and have real-time analysis • States cooperate with each other in an investigation of cybercrime by allowing data to be shared among them “but with an opt-out clause if investigations of its essential interests are threatened”. As the legislation reflects the needs of law enforcement rather than public interest groups, opponents of the Convention have cited the lack of privacy issues and forced cooperation clause as endangering the right to privacy for citizens in the EU. 3.2 Council Framework Decision 2005/222/JHA of 24 February 2005 on attacks against information systems[12] The Council Framework Decision on attacks against Information Systems defines the following criminal offences for punishable as: • • • illegal access to information systems; illegal system interference (the intentional serious hindering or interruption of the functioning of an information system by inputting, transmitting, damaging, deleting, deteriorating, altering, suppressing or rendering inaccessible computer data); illegal data interference. In all cases the criminal act must be intentional, instigating, aiding, abetting and attempting to commit any of the above offences will also be liable to punishment. The Member States will have to make provision for such offences to be punished by effective, proportionate and dissuasive criminal penalties. 3.3 Other Irish Laws of Interest • The Child Trafficking and Pornography Act 1998, makes it an offence to traffic in children for sexual exploitation or to allow a child to be used for child pornography. It also makes it an offence to knowingly produce, distribute, print or publish, import, export, sell show or possess an item of child pornography. The Act contains penalties of up to 14 years in prison. The Act makes it an offence to participate or facilitate in the distribution of child pornography which is an issue for Internet Service Providers, in particular, that may give rise to potential criminal liability under the Act. 17 • Irish Data Protection laws are contained in the Data Protection Act 1988[13], the Data Protection (amendment) Act 2003 together with EC Regulations 2003 (Directive 2000/31/EC and EC Electronic Privacy Regulations 2003 (SI 535/2003). The objective is to protect the privacy of an individual by controlling how data relating to that person is processed. The Data Protection Act creates an offence of gaining unauthorized access to personal data and the data protection rules go well beyond any Constitutional or European Court Human Rights (ECHR) right to privacy. • Right to Privacy: Computer intruders with less than honest motivations might not have much expectation of privacy but the honest computer user would assume their privacy was protected by the law. Here it is interesting to note that the Irish constitution does not explicitly protect this right, only gives an implied right. The Supreme Court has ruled an individual may invoke the personal rights provision in Article 40.3.1 to establish an implied right to privacy. This article provides that "The State guarantees in its laws to respect, and, as far as practicable, by its laws to defend and vindicate the personal rights of the citizens". The Irish Supreme Court recognises its existence in the case of Kennedy and Arnold v. Ireland. Here the Supreme Court ruled that the illegal wiretapping of two journalists was a violation of the constitution, stating: The right to privacy is one of the fundamental personal rights of the citizen which flow from the Christian and democratic nature of the State…. The nature of the right to privacy is such that it must ensure the dignity and freedom of the individual in a democratic society. This can not be insured if his private communications, whether written or telephonic, are deliberately and unjustifiably interfered with. The European Convention on Human Rights gives a stronger protection for the individual’s right to privacy. In Article 8 of the convention, “everyone has the right to respect for his private and family life, his home and correspondence”. This was used in a recent case in the UK which highlighted the effectiveness of this convention when an employee’s email and Internet access was monitored in a College and while she lost the case in the UK, she won it in Europe when the employer (the UK government) was found to be in breach of the European Court of Human Rights and had to pay damages and legal costs. The ruling implies that employers can only monitor business communications and not the private use of a telecommunications system, assuming the user has authorised access via an acceptable user policy. In September 2006 the Irish civil rights group Digital Rights Ireland (DRI) started a High Court action against the Irish Government challenging new European and Irish laws requiring mass surveillance. DRI Chairman TJ McIntyre said “These laws require telephone companies and internet service providers to spy on all customers, logging their movements, their telephone calls, their emails, and their internet access, and to store that information for up to three years. This information can then be accessed without any court order or other adequate safeguard. We believe that this is a breach of fundamental rights. We have written to the Government raising our concerns but, as they have failed to take any action, we are now forced to start legal proceedings”. [14] 18 4 Concluding Comments It is evident that from this paper that cyber criminals and ordinary computer users can be prosecuted for computer crimes under various Acts but the success of the case may depend on the interpretation of the law to that particular crime, at that time, “Laws which are not specifically written to prohibit criminal acts using computers are rarely satisfactory”[15]. This review of computer crime articles and papers from members of the legal profession has shown that the Irish computer legislation has been written in a manner that allows various interruptions to be taken and suffers from an effort to be sufficiently vague to encompass future technological crimes. The confusion could be sorted once the legislation is tried and tested in the courts but the author was unable to find any examples of this in Irish courts. Internationally there are many examples of computer crimes being prosecuted only to have the verdicts overturned at a higher level or through the European Courts. The two examples, given in appendix 1 below, caught my attention and may be of some interest to other academics are the Tsunami case and the Copland case. While these are not applications of Irish Crime Law, they do demonstrate the type of restrictions that could be applied in the future if the reforms to Irish computer crime laws are not drafted in a skilled and knowledgeable fashion. Appendix 1: Tsunami Case[16] In 2005 a college lecturer, Daniel Cuthbert donated money to the relief via a charity website for the Asian Tsunami disaster,. He entered his personal details and credit card details but when he did not receive a response after a few days he became concerned that he had given his details to a spoof phishing site. In an attempt to find out more about the site he did a couple of very basic penetration tests. If they resulted in the site being insecure, he would have contacted the authorities but after a few basic attempts failed to gain entry into the site he was satisfied it was secure and assumed the site was ok. There were no warning messages showing that he had tried to access an unauthorised area but he had triggered an internal intrusion detection system (IDS) at the company that ran the site and they notified the police. Later he was arrested and prosecuted under the UK Computer Misuse Act 1990[17]. The relevant section of the Act is Section 1 states that a person is guilty of an offence if: “he causes a computer to perform any function with intent to secure access to any program or data held in any computer; the access he intends to secure is unauthorised; and he knows at the time when he causes the computer to perform the function.” Due to the wide scope of the Act, the Judge, with ‘some considerable regret’ had no option but to find Daniel Cuthbert guilty under the Computer Misuse Act 1990 and he was fined. While this is English law and we don’t have an equivalent Irish case, as yet, it does highlight the care needed when performing a penetration test (ethical hacking) if you are to be confident that you are not acting illegally. Appendix 2: Copland v UK Case[18] Ms. Lynette Copland worked in a Welsh college as a personal assistant and discovered that the college deputy principal was secretly monitoring her telephone, email and internet use. The College has no policy in place for informing employees that their communications might be monitored. She claimed that this amounted to a breach of her right to privacy under Article 8 of the European Convention on Human Rights[19] . The UK government admitted that monitoring took place, but claimed that this did not amount to an interference where there was no actual listening in on telephone calls or reading of emails. Although there had been some monitoring of the applicant’s telephone calls, e-mails and internet usage, this did not extend to the interception of telephone calls or the analysis of the content of websites visited by her. The UK Government argued that the monitoring thus amounted to nothing more than the analysis of automatically generated information which, of itself, did not constitute a failure to respect private life or correspondence. However, the European Court disagreed, holding that this monitoring and storage of details of telephone and internet use was itself an interference under Article 8. The Court considered that the collection and storage of personal information relating to the applicant’s telephone, as well as to her e-mail and internet usage, without her knowledge, amounted to an interference with her right to respect for her private life and correspondence within the meaning of Article 8. 19 References [1] Messagelabs; http://www.messagelabs.com/intelligence.aspx [2] “The ISSA/UCD Irish Cybercrime Survey 2006: The Impact of Cybercrime on Irish Organisations”, http://www.issaireland.org/cybercrime [3] FBI Computer Forensics; http://www.fbi.gov/hq/lab/fsc/backissu/oct2000/computer.htm [4] Criminal Damage Act 1991; http://www.irishstatutebook.ie/1991/en/act/pub/0031/index.html [5] Criminal Justice (Theft and Fraud) Act 2001 http://www.irishstatutebook.ie/2001/en/act/pub/0050/index.html [6] McIntyre T.J., “Computer Crime in Ireland”; Publ. in Irish Criminal Law Journal, vol. 15, no.1 2005 http://www.tjmcintyre.com/resources/computer_crime.pdf [7] Karen Murray, “Computer Misuse Law in Ireland”, May 1995 Irish Law Times 114 [8] D. Kelleher “Cracking down on the hack-pack” 23rd Oct 2000 Irish Times p8. [9] R v Aaron Caffrey 2003 http://www.computerevidence.co.uk/Cases/CMA.htm [10] Ethical hacker protection and security-breach notification law http://www.out-law.com/page-8374 [11] European Convention on Cybercrime http://conventions.coe.int/Treaty/en/Treaties/Html/185.htm [12] Council Framework Decision on Attacks against Information Systems http://europa.eu.int/eur-lex/en/com/pdf/2002/com2002_0173en01.pdf [13] Irish Data Protection Act 1988 http://www.dataprotection.ie/docs/Data_Protection_Act_1988/64.htm [14] Data Retention in Ireland by TJ McIntyre 2006 http://www.tjmcintyre.com/2007/02/data-retention-in-ireland-stealth-bad.html [15] Dennis Kelleher and Karen Murray, “Information Technology Law in Ireland” 1997 Dublin ; Sweet & Maxwell p.253 [16] Tsunami Case http://www.theregister.co.uk/2005/10/06/tsunami_hacker_convicted/ [17] Computer Misuse Act 1990 http://www.opsi.gov.uk/acts/acts1990/Ukpga_19900018_en_1.htm [18] Case of Copland v. The United Kingdom 2007 http://www.bailii.org/eu/cases/ECHR/2007/253.html [19] European Convention on Human Rights http://www.bailii.org/eu/cases/ECHR/2007/253.html 20 Distributed Computing for Massively Multiplayer Online Games Malachy O’Doherty 1 , Dr. Jonathan Campbell 2 1 Letterkenny Institute of Technology, [email protected] 2 Letterkenny Institute of Technology, [email protected] Abstract This paper discusses a novel approach to distributing work amongst peers in Massively Multiplayer Online Games (MMOGs). MMOGs cater for thousands of players each providing a node. Traditionally, the networking approach taken is that of client-server. This paper examines previous approaches taken to distribute server workload. Then by analysing the problem domain puts forward an approach which aims to maximise the amount of distribution, in a secure manner, by concentrating on distributing tasks as opposed to distributing data. Furthermore the approach uses techniques to reduce the instability caused by the wide variance in node latency which exists in this type of scenario. The result is that the client nodes act as processing nodes for specific job types. The central server becomes a job controller, receiving connections and organising which nodes are used, which of the tasks they perform and when. By doing this server bandwidth is reduced and hence costs. Keywords: Games, Distributed 1. Introduction Multiplayer games are becoming an ever more popular aspect of computer gaming. They come in different forms, LAN based (MMG), internet based i.e. online (MMOG), including the following different genres, role playing (MMORPG) and first person shooter (MMOFPS). Even online casinos are a form of multiplayer online game (e.g. http://www.casino.com/). The expansion of this sector of the market can be of no real surprise. From a commercial point of view it is of great benefit. It provides both the once-off profit from the purchase of the disc product and a continuing revenue stream from the monthly subscription fee. Added to this is the fact that for many recent games, (real) money can be earned in various ways. Some examples being: • By auctioning off artifacts, which exist in the game world, on the likes of eBayT M . This is external to the game. • Newer games such as Second LifeT M enable the creation of artifacts by the players in the game world and sale of such items. 21 • Latterly there is yet another possible revenue stream, commission on the exchange of real money to game world currency. The down side of providing the servers to host the game world is the extra cost. This includes the cost of the support staff, heating and electricity but especially the cost of the bandwidth needed to enable gameplay. One of the main reasons for the client-server architecture being used is the online gaming axiom “Never trust the client”. This comes about because of the fact that there always seems to be a proportion of the clients who will try to gain an unfair advantage over other players, that is cheat. By shifting as much of the workload as possible onto the client machines a reduction in the bandwidth requirement for the server can be achieved. It will also lead to the situation where the server(s) becomes more of a central control with little involvement in the gameplay. The client machines become a network of machines which collectively host the gameplay for the game world with the server acting as a job controller. This is ambitious given that: • the client should not be trusted • the makeup and size of the set of connected clients can vary quite dramatically • there is always the unreliability of the network to contend with 2. Review of Problem Area 2.1. Deployment Structures Currently most Massively Multiplayer Games (MMGs) follow the server client model. This is where a central server is the sole arbitrator as to the game state [4][7]. The rational for this is an online gaming axiom “never trust the client”. As the scale of the game increases to massively multiplayer, one server is not powerful enough to cope with the required number of simultaneous clients. Thus many game worlds use a number of servers, each dedicated to a region within the game world, which collectively govern the whole game world. These server ‘shards’ as they are called [4], are networked together and keep themselves synchronized and consistent, acting as a grid [9]. But it is important to note that this grid acts as one server and so is still fulfilling the client-server model. In addition there is the effect that increasing participant numbers has on bandwidth. The more successful the game is the greater the bandwidth requirement for the server becomes. This means an increasing cost base which is a limiting factor for profits. The goal of reducing server element numbers and bandwidth requirements has led to a number of approaches being tried in the past. They all look to achieve their aim by introducing varying degrees and methods of reducing the centralized nature of the client-server paradigm. The aim being to distribute one or more aspects of the multiplayer game, not to shards, but to the computers of players. A number of topologies have been explored. Server-client (see Fig.1), cluster (under various names) and lastly fully distributed. The cluster approach is sometimes referred to as a tiered structure. It entails a number of client nodes being designated to adopt both the server and client roles. We will refer to this as a cluster server. Each cluster server is a client to either the main (central) server or to another cluster server, and acts as a server to a group of other clients. Fig.2 shows two clusters attached to the main server. This represents two tiers of client node. The nodes operating as client/server form one tier and those operating as client only a second tier. This approach is used with a slight twist [2] in that each cluster is configured as a fully distributed subnetwork, see Fig.3, with any one of the nodes acting as the link to the main server. This may be a direct link or via another node. This hybrid structure should be able to reduce the server bandwidth requirement, however, the lower level 22 servers are in a trusted position being the active connection to the central server. Since this server is hosted on a client machine this arrangement is susceptible to cheating as the game state can be tampered with easily. Peer-to-peer overlays, typically used for file sharing (for example BitTorrentT M ), have been applied in [9][11] and also in [8][5]. Typical problems with using peer-to-peer overlays is that they are susceptible to cheating and churn (the process of many players logging on and off at the same time). Figure 1. Network typography ServerClient Figure 2. Network Topography - Cluster Figure 3. Network Topography - full Distribution One example of a game which is fully distributed is “Age of Empires”T M [13] which has the full game on each player’s machine and uses a star configuration in the network to connect all the player machines to all the others (see Fig.3). This leads to a situation where every move/event which occurs in the gameplay is broadcast to every other machine. The consequence is that a lot of redundant information is being passed around the network and essentially the same work is being done on every machine. Other interesting approaches to using distributed technologies to the area of MMOGs have been explored. The idea of using software agents was considered by [4]. In this framework a player’s character would have to move to another player’s machine which is acting as a region server. The question (apart from those relating to bandwidth and latency) is could a cheating player hijack the agent or replace it with a suicidal version before it moves back? After all the region server is deemed to have the ’original’ version of the agent. Ultimately what is not addressed is the need for a secure environment for a mobile agent to work in. That is a tamperproof region of memory and tamperproof access rights, such that the agent could confirm that the local application is unaltered and that data flow with it is secure. 2.2. Synchronization All distributed systems are concerned with synchronization. Each networked machine has its own timers and will suffer from drift (i.e. the presented time will gradually vary from the true time due to inaccuracies in the time-keeping process). Thus even if a number of machines all start with the same time, over a period, each will drift at a differing rate and loose synchronosity. Furthermore latency makes it difficult (for obvious reasons) to, for instance, compare timestamps (if a high degree of accuracy is required). Many approaches to synchronization have been adopted: Lamport’s clock (logical clock), vector clocks and matrix clocks. For a discussion on these see [3]. However timekeeping for a multiplayer game is a rather special case. There is no need to keep track of an absolute time. Indeed, since one of the things a game should aim to achieve is a disassociation from reality and an attachment to the game world, the link with time is gone and replaced with a sequence of events. This leads on to the next point, namely that each player does not have the same view of the game world. This is true not just because of position and viewing angle, but also because of latency. Each player ’sees’ the result 23 of their last update, which is the sum of the events which had been registered up to that point. Another player with a different latency on their connection will have their update occur at a slightly different point and thus with a different set of events that have been registered. It is because of this that is said that clients ‘see’ only an approximation of the true game world. The aim of reducing this lack of consistency (and to avoid some forms of cheating) lead to the concept of ”bucket synchronization” being put forward [6]. This is where game events received over a period of time are, metaphorically, placed in a bucket. The game proceeds then as a sequence of these buckets, thus providing synchronization as a game event can be considered to have occurred at a point in the sequence. This however does not address the problem of cheating. In an attempt to address the issue of cheating lockstep was introduced [1] and later modified to asynchronous synchronization [8]. Lockstep uses a two phase commit approach. Each node (player) first presents a hash of their move. Then once all have presented, each reveals the move. This allows verification that the move is unaltered after the start of the reveal phase. Asynchronous synchronization is an extension to lockstep. It allows the use of the proximity of game characters (i.e. in the game world space) to decide which nodes must be involved in the lockstep process. These measures improved the situation and countered some cheats but is susceptible to others such as collusion. 2.3. Cheat Categorization When considering the requirements for any type of MMG it is important to consider how to ensure (as much as is feasibly possible) fair play. This can in a general way be considered as software security which is in turn linked to cryptography. One excellent text on this subject is [12] which shows how to approach the whole idea of security of data. This type of approach has been taken by [7] when considering the type of cheats typically used in games. Recategorizing these we can apply areas of susceptability: game logic; application; network. Now if we look more closely at these categories discussed in [7] and the type of cheats in each we see the following. 1. Game logic. This covers cheats such as when a player discovers that by using an item and dropping it at the same time they create a replica. While this is undoubtedly unfair to other players and may cost the game company lost revenue, it is actually a legal action in the game. The problem is that it was an unintentional ability that the game designer(s) never guarded against. Thus it is debatable if this is really a classification of cheat or of application error (i.e. designers should not blame the player for their mistakes). 2. Application. This is the home of the dedicated and knowledgable cheat. It covers activities such as decompiling the code and altering it to modify various attributes, for example making walls on the local node see-through, thus giving the cheat an advantage. It need not be so complicated a task. Some games give the player character a plain text file which specifies the character abilities which can easily be altered to the players advantage. 3. Network. Here we are concerned with types of cheat which do not depend on the particular game being played but rather on the interworking of the various nodes in the game. A number of these exist. (a) Infrastructure. Not really a cheat, this is where game play is disrupted by an attack on the basic infrastructure e.g. a denial of service attack. (b) Fixed-delay. Here the cheat delays their outgoing network packets. This results in the other players receiving events later and the cheat receiving the other player’s events and having some extra time to react. 24 (c) Time-stamp. Consider the situation where the cheat alters his local code so that when his character is, say, hit by a bullet when he is trying to dodge and the code changes the time-stamp on the movement event to be in advance of the movement of the bullet. When the other player nodes reconcile the event timings it will appear as if the cheat managed to dodge the bullet. (d) Suppressed update. This one enables a form of hiding. The cheat stops their gameplay updates from being sent. As far as the other player nodes are concerned they have no position and are not rendered. Of course the cheat has to ensure that periodic updates are sent to ensure they are not dropped from the game and that they can recommence standard updates when it suits them. (e) Inconsistency. Rather than skipping update messages this cheat involves sending false data to one or more players. This allows the cheat to, say, appear to be some distance to the left of their true position. The cheat has to be careful that the false data and true data merge at a later time in a graceful way or it will be obvious that something is amiss. (f) Collusion. Here a number of cheaters help each other by detecting specific information of interest and passing it on. This is useful to them when the receiver is not supposed to be privy to the information. For example, cheat A tells fellow cheat B the location of player C but B should not know that given the current state of the game. 3. Design At the low level of implementation all computer games follow (for general play) the same basic approach. That is they flow through a game loop. Generally, multiplayer games running over some sort of network have client code which follows a modified game loop as in Fig. 4a. Comparing this with the game loop of Fig. 4b, we can see that two extra phases have been introduced for the approach described here. In particular, the added activity “net job(s)” in this figure is key to this approach. It is the use of this game loop that allows the gameplay logic to be executed utilizing client processor cycles instead of just the server. Having reviewed the available literature describing previous approaches a number of considerations occur. First the core idea of bucket synchronization (as described in chapter 2.2 ) is appropriate for games as it synergizes with the notion of animation. Next, it would seem that peer-to-peer mechanisms for persistency are insufficient for a gaming network where every node must be considered untrustworthy. Therefore game state persistency must remain the remit of a centralized server. A fully distributed architecture is only applicable to games where all of the players can accommodate the bandwidth requirements which will increase proportionately with the number of players. This precludes massively multiplayer games. The degree of distribution possible inevitably depends on the number of players. Taking an extreme situation of having just two players connected, one might think that the workload could be shared between them, ensuring that each does the work pertinent to the other. But this does not guard against cheating, in particular collusion cheating, so in principle the server should not off-load any of the work unless there is an excess of player nodes available. This way it can rotate the work around. Considering this the following fundamental design principles have emerged: • A client node should not know which player the work relates to. • A client node should not be given the same task repeatedly. • A client node should not be given tasks pertaining to the same game world geographical region sequentially. 25 (a) typical Game Loop (b) this Game Loop Figure 4. Game Loops Figure 5. frame Participation Sequences • A client node where calculations are done should not know the final destination of the results. • Bucket synchronization should be used. • Persistency of the game state is a responsibility of the main server. • Distribution of the workload can only take place when there are more client nodes than roles to be assigned. With the above in mind the most basic decision is to utilize bucket synchronization [6]. Henceforth this will be referred to as frame synchronization. Having decided on a frame structure for synchronizing network updates, it is appropriate to look at how the requirement to allow divergent latencies can be accommodated. Looking at Fig. 5 one can see how different clients can opt into different frequencies of update. It is important that whatever the rate of update opted for remains consistent, otherwise cheating is made more possible (it may be possible to allow reductions in the frequency of update if it is sanctioned by the server). Also there has to be a minimum number of clients in each frame so that the server can allocate jobs to minimise cheating potential. The potential disturbances caused by players leaving (either gracefully from the game or a dropped connection) needs to be compensated for and requires that each task be replicated. As an initial estimated figure the replication factor will be three. Three is chosen as it is the lowest number that provides both replication and can indicate a majority in the case of a disputed result (see [10] for a proof of this). 26 A case exists for using four as the minimum. This is based on the idea that in the case of three, if one node disconnects then only two remain and no majority decision can be made. In the case of using four as a minimum, one is designated as a backup. It performs the same calculations but is not included in the decision making. Furthermore the latency for each client node must be taken into consideration. For example only nodes with a relatively low latency should be allowed to undertake tasks. This effectively increases the number of players that are required to be connected before the server can distribute workload. The overall result is a hybrid model which starts out as server client but with increasing numbers of players will transform to a more distributed form. In the new scenario the server acts as a (login) gateway to the game and as a controller - allocating jobs/roles to varying client nodes. So then what type of jobs/roles are required. Well there has to be a role which is responsible for receiving player updates (e.g. character movement). Then there is a role which is responsible for calculating the effect of all the various updates. Lastly there is the task of informing all concerned players of the results of the cumulative updates. With these roles on the client nodes the server, in conjunction with the processes running these roles, will act as one computer. Note that what is of interest is the distribution of workload, thus no consideration has been given to the changeover from client-server to distributed server. 4. Conclusions The prototype which was developed demonstrates that it is possible to redeploy game processing from a centralized server structure to a more de-centralized structure. In order to quantify the performance of the overall system, it would be necessary to conduct a large scale simulation and/or mathematical analysis. This would need to measure the difference between the client-server approach and the approach outlined here in these regards: 1. Required server processing power. 2. Required client processing power. 3. Server bandwidth requirement. 4. Client bandwidth requirement. 5. Range of client bandwidths that can be accommodated. In each case the measurements would need to use the same game scenario and to be on the basis of a defined number of client nodes being connected. Indeed they should be repeated for each of a range of such deployments. As referred to in 2.1 the notion of using mobile agents [4] could be particularly useful for autonomous characters. These need to be controlled by the server even in the distributed model presented. With a secure environment to relocate to this responsibility could be de-centralized. References [1] N. E. Baughman and B. N. Levine. Cheat-proof playout for centralized and distributed online games. INFOCOM 2001, pages 104–113, 2001. [2] K. chui Kim, I. Yeom, and J. Lee. A hybrid mmog server architecture. IEICE TRANSACTIONS on Information and Systems, E87-D(12):2706–2713, December 2004. [3] G. F. Coulouris, J. Dollimore, and T. Kindberg. Distributed systems : concepts and design. Addison-Wesley, Wokingham, England ; Reading, Mass., 3rd edition, 2001. 27 [4] A. ElRhalibi and M. Merabti. Agents-based modeling for a peer-to-peer mmog architecture. ACM Computers in Entertainment, 3(2), April 2005. Article 3B. [5] C. Gauthier-Dickey, D. Zappala, and V. Lo. A fully distributed architecture for massively multiplayer online games. ACM SIGCOMM 2004 Workshop on Network and System Support for Games, pages 2706–2713, September 2003. Year of Publication: 2004. [6] C. Gauthier-Dickey, D. Zappala, and V. Lo. A fully distributed architecture for massively multiplayer online games. ACM SIGCOMM 2004 Workshop on Network and System Support for Games, pages 2706–2713, September 2003. Year of Publication: 2004. [7] C. Gauthier-Dickey, D. Zappala, V. Lo, and J. Marr. Low latency and cheat-proof event ordering for peer-to-peer games. Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video table of contents, June 2004. [8] A. S. John and B. N. Levine. Supporting p2p gaming when players have heterogeneous resources. Proceedings of the international workshop on Network and operating systems support for digital audio and video, pages 1–6, 2005. NOSSDAV’05. [9] B. Knutsson, H. Lu, W. Xu, and B. Hopkins. Peer-to-peer support for massively multiplayer games. Proceedings of IEEE INFOCOM’04, March 2004. [10] L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems, pages 382–401, July 1982. [11] H. Lu, B. Knutsson, M. Delap, J. Fiore, and B. Wu. The design of synchronization mechanisms for peer-to-peer massively multiplayer games. Technical Report Penn CIS Tech Report MS-CIS-04-xy.pdf, Computer and Information Science, University of Pennsylvania, 2004. [12] B. Schneier. Applied Cryptography. John Wiley & Sons, Inc, second edition, 1996. [13] M. Terrano and P. Bettner. 1500 archers on a 28.8: Network programming in age of empires and beyond. Proceedings of the 15th Games Developers Conference, March 2001. 28 A Comparative Analysis of Steganographic Tools Abbas Cheddad, Joan Condell, Kevin Curran and Paul McKevitt School of Computing and Intelligent Systems, Faculty of Engineering University of Ulster. Londonderry, Northern Ireland, United Kingdom Emails: {cheddad-a, j.condell, kj.curran, p.McKevitt}@ulster.ac.uk Abstract Steganography is the art and science of hiding data in a transmission medium. It is a sub-discipline of security systems. In this paper we present a study carried out to compare the performance of some common Steganographic tools distributed online. We focus our analysis on systems that use digital images as transmission carriers. A number of these systems exceptionally do not support embedding images rather they allow text embedding; therefore, we constrained the tools to those which embed images files. Visual inspection and statistical comparison methods are the main performance measurements we rely on. This study is an introductory part of a bigger research project aimed at introducing a robust and high payload Steganographic algorithm. Keywords: Steganography, Image Processing, Security Systems, Statistics. 1 Introduction In the realm of this digital world Steganography has created an atmosphere of corporate vigilance that has spawned various interesting applications. Contemporary information hiding is due to the author Simmons [Simmons, 1984] for his article titled “The prisoners’ Problem and the Subliminal Channel”. More recently Kurak and McHugh [Kurak and McHugh, 1992] published their work which resembles embedding into the 4LSBs (Least Significant Bits). They discussed image downgrading and contamination which is known now as Steganography. Steganography is employed in various useful applications e.g., copyright control of materials, enhancing robustness of image search engines and Smart IDs where individuals’ details are embedded in their photographs. Other applications are videoaudio synchronization, companies’ safe circulation of secret data, TV broadcasting, Transmission Control Protocol and Internet Protocol (TCP/IP) packets [Johnson and Jajodia, 1998], embedding Checksum [Bender et al., 2000]…etc. In a very interesting way Petitcolas [Petitcolas, 2000] demonstrated some contemporary applications. One of these was in Medical Imaging Systems where a separation is considered necessary for confidentiality between patients’ image data or DNA sequences and their captions e.g., Physician, Patient’s name, address and other particulars. A link however, must be maintained between the two. Thus, embedding the patient’s information in the image could be a useful safety measure and helps in solving such problems. For the sake of providing a fair evaluation of the selected software tools, we restricted our experiments to embedding images rather than text. The location of the message in the image can vary. The message may be spread evenly over the entire image or may be introduced into areas where it may be difficult to detect a small change such as a complex portion in the image. A complex area is also known as an area of high frequency in which there are considerable changes in colour intensity. Embedding can be performed in the image spatial domain or in the frequency domain. Embedding in the spatial domain can be achieved through altering the least significant bits of the bytes of image pixel values. This process can be in a sequential fashion or in a randomised form. Algorithms based on this method have a high payload, however the method is fragile, prone to statistical attacks and sometimes visual attacks can suffice. The second type of 29 method, the frequency domain method, is based on the embedding in the coefficient in the frequency domain (i.e., Discrete Cosine Transformation (DCT), Discrete Wavelet Transformation (DWT)). This type of technique is more robust with regard to common image processing operations and lossy compression. Another type of method is that of adaptive Steganography which adapts the message embedding technique to the actual content and features of the image. These methods can for example avoid areas of uniform colour and select pixels with large local standard deviation. Edge embedding can also be used alongside adaptive Steganography. 2. Steganographic Tools The tools that we used for this study are detailed and discussed below. Sources are shown as necessary. 2.1 Hide and Seek (V. 4.1) Hide and Seek is one of the older methods of Steganography [Wayner, 2002]. It uses a common approach and is relatively easy to apply in image and audio files [Johnson and Katzenbeisser, 2000]. Steganography by this method is carried out by taking the low order bit of each pixel and using it to encode one bit of a character. It creates some noise in the image unless a greyscale image is used. When using Hide and Seek with colour GIFs noise is very obvious. Although it has been asserted that greyscale GIFs do not display any of the artefacts or bad image effects associated with 8-bit colour images which have undergone Steganography, our experiment shows an obvious random salt and pepper like noise on the cover image. Hide and Seek can be used on 8-bit colour or 8-bit black and white GIF files that are 320 by 480 pixels in size (the standard size of the oldest GIF format) [Wayner, 2002]. There are 19200 (320*480/8) bytes of space available in this GIF image which gets rounded down in practice to 19000 for safe dispersion. In version 4.1 if the cover image is larger than allowed the stego-image is cropped or cut to fit the required size [Johnson et al, 2001]. When an image contains a message it should not be resized because if it has to be reduced part of the message bits will be lost. If the image is too small it is padded with black space. There is also a version 5.0. It works with a wider range of image sizes. However, this version of Hide and Seek also uses a restricted range of image sizes. The images must fit to one of these sizes exactly (320* 200, 320* 400, 320* 480, 640* 400 and 1024* 768) [Johnson et al, 2001]. In version 5 if the image exceeds the maximum allowed which is 1024*768 an error message is returned. If the image is smaller than the minimum size necessary the image containing the message is padded out with black space. The padded areas are added before the message is embedded and are therefore also used as areas in which to hide the message [Johnson et al, 2001]. But if the padded area is removed the message cannot be recovered fully. These characteristics of Hide and Seek stego-images lead searchers/crackers to the fact that a hidden message exists. Hide and Seek 1.0 for Windows 95 has no size limit restrictions and uses an improved technique for information hiding however it can still only be used on 8-bit images with 256 colours or greyscale [Johnson et al, 2001]. BMP images are used with this version instead of GIF images because of licensing issues with GIF image compression [Johnson et al, 2001]. A user chosen key can be inserted into a pseudo random number generator which will determine random number which indicate bytes in the image where the least significant bit is to be changed [Wayner, 2002]. This makes the system more secure as it has two layers of security. The positions in which the message bits are hidden are not in fact random but do follow some sort of pattern. An 8-byte header on the message controls how the message data is dispersed. The first two bytes indicate the length of the message. The second two are a random number key. The key is chosen at random when the message is inserted into the image. The key is firstly inserted into the random number generator [Wayner, 2002]. In Hide and Seek 4.1 there is a built in C code random number generator. A cryptographically secure random number generator could also be used to increase security or IDEA could be used to encrypt the random numbers using a special key. The third pair of bytes is the version of Hide and Seek used. The fourth pair of bytes is used to complete the eight byte block which is necessary for the IDEA cipher [Wayner, 2002]. The 8-byte block is encrypted using the IDEA cipher 30 which has an optional key and is then stored in the first 8 bytes of the image. If the key is not known the header information cannot be understood and the dispersion of the data in the image cannot be found. Stego-images will have different properties depending on the version of Hide and Seek used. In version 4.1 and version 5 all palette entries in 256 colour images are divisible by four for all bit values [Johnson et al, 2001]. Greyscale stego-images have 256 triples. They range in sets of four triples from 0 to 252 with incremental steps of 4 (0, 4, 8,…, 248, 252). This can be detected by looking at the whitish value which is 252 252 252. This signature is unique to Hide and Seek [Johnson et al, 2001], [Johnson and Jajodia, 1998]. Later versions of Hide and Seek do not produce the same predictable type of palette patterns as versions 4.1 and 5.0 [Johnson et al, 2001; Johnson and Jajodia, 1998]. The DOS command for the Hide and Seek software is as follows: x hide <infile.ext> <Cover.gif> [key] x seek <Stego.gif> <outfile.ext> [key] 2.2 S-Tools (V. 4) The S-Tools package was written by Andy Brown [Wayner, 2002]. Version 4 can process image or sound files using a single program (S-TOOLS.EXE). S-Tools involves changing the least significant bit of each of the three colours in a pixel in a 24-bit image [Wayner, 2002] for example a 24-bit BMP file [Johnson et al, 2001]. The problem with 24-bit images is that they are not commonly used on the web and tend to stand out (unlike GIF, JPEG, and PNG). This feature is not helpful to Steganography. It involves a pre-processing step to reduce the number of colour entries by using a distance measurement to identify neighbour colours in terms of intensity. After this stage each colour of the dithered image would be associated with two palette entries one of which will carry the hidden data. The software for S-Tools can reduce the number of colours in the image to 256 [Wayner, 2002]. The software uses the algorithm developed by Heckbert [Heckbert, 1982] to reduce the number of colours in an image in a way that will not visually disrupt the image [Wayner, 2002; Martin et al, 2005]. The algorithm plots all the colours in three dimensions (RGB). It searches for a collection of n boxes, which contains all of the colours in one of the boxes. The process starts with the complete 256*256*256 space as one box. The boxes are then recursively subdivided by splitting them in the best possible way [Wayner, 2002]. Splitting continues until there are n boxes representing the space. When it is finished the programme chooses one colour to represent all the colours in each box. The colour may be chosen in different ways: the centre of the box, the average box colour or the average of the pixels in the box. S-Tools as well as other tools based on LSBs in the spatial domain take for granted that least significant bits of image data are uncorrelated noise [Westfield and Pfitzmann, 1999]. The system interface is easy to use. It supports a drag and drop method to load images. Once the cover image is dragged in; the system will advise the user on how much data in bytes the image can hold. 2.3 Stella (V. 1.0) The Stella program derives its name from the Steganography Exploration Lab, which is located at the University of Rostock. Its embedding process exploits the visually low prioritised chrominance channels; the YUV-colour system is used. Here, the embedding algorithm considers only one channel and works as follows: 1. Consider the chrominance value of a given pixel. 2. Read a bit from the secret message. 3. To embed a “0”, decrease the chrominance value of the pixel by one. 4. To embed a “1”, increase the chrominance value of the pixel by one. 5. Go to the next pixel. It can be assumed that these slight changes of the chrominance values are smaller than a given JND (Just Noticeable Difference). 31 2.4 Hide in Picture (HIP) HIP (v 2.1) was created by Davi Tassinari de Figueiredo in 2002. Hide In Picture (HIP) uses bitmap images. If the file to be hidden is large, it may be necessary to modify more than a single bit (LSB) from each byte of the image, which can make this difference more visible. With 8-bit pictures, the process is a little more complicated, because the bytes in the picture do not represent colour intensities, but entries in the palette (a table of at most 256 different colours). HIP chooses the nearest colour in the palette whose index contains the appropriate least-significant bits. The HIP header (containing information for the hidden file, such as its size and filename) and the file to be hidden are encrypted with an encryption algorithm, using the password given, before being written in the picture. Their bits are not written in a linear fashion; HIP uses a pseudo-random number generator to choose the place to write each bit. The values given by the pseudo-random number generator depend on your password, so it is not possible for someone trying to read your secret data to get the hidden file (not even the encrypted version) without knowing the password. 2.5 Revelation Revelation was launched in 2005 by Sean Hamlin. It was entirely coded in Java, developed in the Eclipse IDE. It operates in the same manner as the previous methods in terms of LSB embedding. The basic logic behind their technique is the matching LSB coding which leaves a gray value not altered if its LSB matches the bit to be hidden. Otherwise a colour indexed as 2i will be changed to 2i+1 if the embedded bit is 1, or 2i+1 is shifted back to 2i in case of embedding a 0. Although the software’s author claimed the use of a smart embedding method and the Minimum Error Replacement (MER) algorithm to obtain a more natural Stego Image, the latter is prone to first order statistical attack as shown in Figure 3. 3 Steganalysis There are two stages involved in breaking a Steganographic system: detecting that Steganography has been used and reading the embedded message [Zollner et al, 1998]. Steganalysis methods should be used by the Steganographer in order to determine whether a message is secure and consequently whether a Steganographic process has been successful. Statistical attacks can be carried out using automated methods. A stego-image should have the same statistical characteristics as the carrier so that the use of a stenographic algorithm can not be detected [Westfield and Pfitzmann, 1999]. Therefore a potential message can be read from both the stego-image and the carrier and the message should not be statistically different from a potential message read from a carrier [Westfield and Pfitzmann, 1999]. If it were statistically different the Steganographic system would be insecure. Automation can be used to investigate pixel neighbourhoods and determine if an outstanding pixel is common to the image, follows some sort of pattern or resembles noise. A knowledge base of predictable patterns can be compiled and this can assist in automating the detection process [Johnson and Jajodia, 1998]. Steganalysis tools can determine the existence of hidden messages and even the tools used for embedding. Attacks on Steganography can involve detection and/or destruction of the embedded message. A multiple tools have been introduced to perform Steganalysis; among them is Chi-Square Statistical test, ANOVA test, StegSpy1, StegDetect2, Higher-level statistical tests3, etc. The adopted methods of evaluation in this study are as follows: 1 2 3 www.spy-hunter.com The algorithm described in: http://www.citi.umich.edu/u/provos/papers/detecting.pdf And the software available at: http://www.outguess.org/ http://www.cs.dartmouth.edu/~farid/publications/tr01.html 32 3.1 Visual Inspection of the Image This evaluation method is based on visual inspection of the image. The question is how vulnerable the different Steganographic techniques are to detection through visual inspection of the image for telltale distortion. 3.2 Statistical Analysis There are two main types of statistical analysis methods investigated here for comparative analysis. These are the peak signal-to-noise ratio and image histograms. They are outlined below. x Peak-Signal-to-Noise Ratio As a performance measurement for image distortion, the well known Peak-Signal-to-Noise Ratio (PSNR) which is classified under the difference distortion metrics can be applied on the stego images. It is defined as: C2 (1) PSNR 10 log 10 ( max ) MSE where MSE denotes Mean Square Error given as: MSE and 1 MN M N 2 ¦ ¦ ( S xy C xy ) x 1y 1 (2) 2 C max holds the maximum value in the image, for example: 1 in double precision intensity images 2 C max d 255 in 8-bit unsigned integer intensity images x and y are the image coordinates, M and N are the dimensions of the image, S xy is the generated stego image and C xy is the cover image. PSNR is often expressed on a logarithmic scale in decibels (dB). PSNR values falling below 30dB indicate a fairly low quality (i.e., distortion caused by embedding can be obvious); however, a high quality stego should strive for 40dB and above. x Image Histogram Histograms are graphics commonly used to display data distributions for quantitative variables. In the case of image; these variables or frequencies are the image intensity values. In this study we can trace any abnormalities in the Stego’s histogram. 3.3 Comparative Analysis and Results The following table (Table 1) tabulates different PSNR values spawned by aforementioned software applied on the images shown in Figure 1. Figure 2 and Figure 3 show the output of each of the tools and Figure 4 depicts the pair effect that appears on Stego images generated by the Revelation software. The authors suspect that the pair effect was caused by adopting the sequential embedding concept, which creates new close colours to the existing ones or reduces the frequency difference between adjacent colours. 33 Set A Set B Figure 1: Images used to generate tables 1. (Left to right) Set A: Cover image Boat, (321x481) and the secret image Tank, (155x151). Set B: Cover image (Lena 320x480), Secret image (77x92). Table 1: Summary of performance of the different software reported in this study. Software [Stella] PSNR Visual Inspection Set A Set B 18.608 22.7408 Very clear grainy noise in the Stego image, which renders it the worst performer in this study. 23.866 28.316 Little noise/accepts only 24-bit bmp files. Creates additional colour palette entries. In this case the original boat image has 32 colours and the generated Stego augmented the number to 256 by creating new colours. 26.769 16.621 Little noise/Works only with 24-bit images [S-Tools] 37.775 25.208 No visual evidence of tamper [Revelation] 23.892 24.381 No visual evidence of tamper, but pair effect appears on the histogram of some outputs [Hide&Seek] [Hide-in-Picture] 34 Hide and Seek Hide-in-Picture Stella S-Tools Revelation Original Figure 2: Set A. Stego images of each tool. Hide and Seek Hide-in-Picture Stella S-Tools Original Figure 3: Set B. Stego images of each tool. 35 Revelation Frequency Gray value Figure 4: Revelation leaves very obvious pair effect on the histogram. (Top) Original image histogram and (Bottom) Stego image generated by Revelation tool. 4. Conclusion We have presented a comparative study of some Steganographic tools distributed online. The Revelation tool seems to do a good job in hiding any visual tamper on the cover image, but the histogram of its generated output reveals some traces left by the tool which draws suspicion to the stego image. Hide-In-Picture earned a good PSNR value but the cover image was distorted a bit after the embedding. Hide and Seek shows very clear grainy salt and pepper noise on the stego image, the noise appears random. S-Tools shows better performance taking into consideration the two factors (i.e., PSNR and Visual Inspection). Stella leaves prints at the end of the stego file which can be picked up easily by some Steganalysis software like the one we tried, which is an open source application (ImageHide Hidden Data Finder v0.2, see “Internet Resources” at the References section). All the above tools can not resist image compression and that is why all their input image files were of lossless type (i.e., BMP, GIF). The authors affirm that S-Tools algorithm has the highest performance and its software provides a better graphical interface. It should be worth noting here that Steganographic tools have undergone significant improvement by exploiting JPEG compression method. For example F5 and OutGuess algorithms (see references) are strong enough to resist major attacks, moreover their resulting image Stego is of high quality (i.e., high PSNR). This study is meant for evaluating tools that act in the spatial domain, thus discussing F5 and OutGuess is out of scope of this evaluation. References [Simmons, 1984] Simmons, G. J., (1984). The Prisoners’ Problem and the Subliminal Channel. Proceedings of CRYPTO83- Advances in Cryptology, August 22-24. 1984. pp. 51.67. [Kurak, 1992] Kurak, C. and McHugh, J., (1992). A cautionary note on image downgrading. Proceedings of the Eighth Annual Computer Security Applications Conference. 30 Nov-4 Dec 1992 pp. 153-159. [Johnson, and Jajodia, 1998] Johnson, N. F. and Jajodia, S., (1998). Exploring Steganography: Seeing the Unseen. IEEE Computer, 31 (2): 26-34, Feb 1998. [Bender et al., 2000] Bender, W., Butera, W., Gruhl, D., Hwang, R., Paiz, F.J. and Pogreb, S., (2000). Applications for Data Hiding. IBM Systems Journal, 39 (3&4): 547-568. [Petitcolas, 2000] Petitcolas, F.A.P., (2000). “Introduction to Information Hiding”. In: Katzenbeisser, S and Petitcolas, F.A.P (ed.) (2000) Information hiding Techniques for Steganography and Digital Watermarking. Norwood: Artech House, INC. 36 [Wayner, 2002] Wayner, P. (2002). Disappearing Cryptography. 2nd ed. USA: Morgan Kaufmann Publishers. [Johnson and Katzenbeisser, 2000] Johnson and Katzenbeisser, (2000). “A survey of Steganographic techniques”. In: Katzenbeisser, S and Petitcolas, F.A.P (ed.) (2000) Information hiding Techniques for Steganography and Digital Watermarking. Norwood: Artech House, INC. [Johnson et al., 2001] Johnson Neil F., Zoran Duric, Sushil Jajodia, Information Hiding, Steganography and Watermarking - Attacks and Countermeasures, Kluwer Academic Publishers, 2001. [Heckbert, 1982] Heckbert Paul, Colour Image Quantization for Frame Buffer Display. In Proceedings of SIGGRAPH 82, 1982. [Martin et al, 2005] Martin, A., Sapiro, G. and Seroussi, G., (2005). Is Image Steganography natural?. IEEE Trans on Image Processing, 14(12): 2040-2050, December 2005. [Westfield and Pfitzmann, 1999] Andreas Westfield and Andreas Pfitzmann, Attacks on Steganographic Systems Breaking the Steganography Utilities EzStego, Jsteg, Steganos and S-Tools and Some Lessons Learned. Dresden University of Technology, Department of Computer Science, Information Hiding, Third International Workshop, IH'99 Dresden Germany, September / October Proceedings, Computer Science 1768. pp. 61- 76, 1999. [Zollner et al, 1998] Zollner J., H. Federrath, H. Klimant, A. Pfitzmann, R. Piotraschke, A. Westfeld, G. Wicke, G. Wolf, Modelling the Security of Steganographic Systems, Information Hiding, Second International Workshop, IH'98 Portland, Oregon, USA, Proceedings, Computer Science 1525. pp. 344354, April 1998. Internet Resources [Hide and Seek]: ftp://ftp.funet.fi/pub/crypt/mirrors/idea.sec.dsi.unimi.it/cypherpunks/steganography/hdsk41b.zip [S-Tools]: ftp://ftp.funet.fi/pub/crypt/mirrors/idea.sec.dsi.unimi.it/code/s-tools4.zip [Stella]: http://wwwicg.informatik.uni-rostock.de/~sanction/stella/ [Hide in Picture]: http://sourceforge.net/projects/hide-in-picture/ [Revelation]: http://revelation.atspace.biz/ [ImageHide Hidden Data Finder]: http://www.guillermito2.net [F5]: http://wwwrn.inf.tu-dresden.de/~westfeld/f5.html [OutGuess]: http://www.outguess.org/ 37 Session 2 Computing Systems 39 A Review of Skin Detection Techniques for Objectionable Images Wayne Kelly1, Andrew Donnellan1, Derek Molloy2 1 Department of Electronic Engineering, Institute of Technology Tallaght Dublin, Tallaght, Dublin 24, Ireland. [email protected] 2 School of Electronic Engineering, Dublin City University. [email protected] Abstract With the advent of high speed Internet connections and 3G mobile phones, the relative ease of access to unsuitable material has become a major concern. Real time detection of unsuitable images communicated by phone and Internet is an interesting academic and commercial problem. This paper is in two parts. Part I compares and contrasts the most significant skin detection techniques, feature extraction techniques and classification methods. Part II gives an analysis of the significant test results. This paper examines twenty-nine of the most recent techniques along with their specific conditions, mathematical foundations and their pros and cons. Finally, this paper concludes by identifying future challenges and briefly summarizes the proposed features of an optimal system for future implementation. Keywords: Skin Detection, Objectionable Image, Feature Extraction, Texture Analysis. 1 Introduction In early 2004 the Irish government demanded that its mobile phone networks must take responsibility for material transmitted across their systems and implement security precautions to prevent the distribution of objectionable material to minors. This was after two cases concerning the transmission of pornographic images to teenagers. The first incident was when sexually explicit images, showing a 14 year old girl, were found to be circulating amongst school students [43]. The second, when a teenage girl received pornographic images from an unidentified phone number [42]. The development of objectionable image detection systems has been instigated throughout the world by events such as the incidents which took place in Ireland in 2004. The identification processes generally follow the format, outlined in Figure 1, with each paper varying in its implementation technique. The structure of this report is as follows: Section 2 gives a comparison of the most significant skin detection techniques, while Section 3 and 4 give comparisons of the feature extraction techniques and classification methods respectively. Section 5 analyses the most significant test results. In Section 6 a discussion on future challenges and a summarized proposal for an implementation technique is given. Objectionable Image Input Image Skin Detection Feature Extraction Image Classification Benign Image Figure 1: General Objectionable Image Detection System 40 2 Skin Detection The detection of skin is an indication of the presence of a human limb or torso within a digital image. In recent times various methods of identifying skin within images have been developed. This section gives an overview of the main skin detection methods implemented for the detection of objectionable images. 2.1 Colour Spaces for Skin Detection Colour space can be described as the various ways to mathematically represent, or store, colours. Choosing a colour space for skin detection has become a contentious issue within the image processing world. Shin [32] found that colour space transformation was unnecessary in skin detection as RGB gave the best results and that the luminance component gave no improvement so could be ignored. Jayaram [35] found that the best performance was obtained by converting the pixels to SCT, HSI or CIE-LAB and using the luminance component did improve the results. Albiol [36] declares that if an optimum skin detector is designed for every colour space, then their performance will be the same. Gomez [34] states that for pixel based skin detection there is seldom an appropriate colour model for indoor and outdoor images, but does show that a combination of colour spaces can improve the performance (E of YES, red/green and H of HSV). 2.1.1 Basic Colour Space (RGB, Log-opponent RGB) One of the most commonly used methods for representing pixel information of a digital image is the RGB (Red, Green, Blue) colour space. In this colour space levels of red, green and blue light are combined to produce various colours. Jones and Rehg [5] identified 88% of pixels correctly while using RGB for simplicity and speed, as most web images use RGB colour space. It is also stated here that the accuracy could be increased if another colour space was used. RGB colour space has been used extensively in the detection of objectionable images [15][16][21]. Another form of RGB colour space is log-opponent RGB (IRgBy) [31] which is logarithmic transform of the RGB colour space. IRgBy relies on the blood (red) and melanin (yellow, brown) properties of human skin for detection, which is based around hue and saturation components of the colour space [1]. IRgBy does not contain a hue or saturation component so it must be calculated separately. As HSV and HSI colour spaces contain these components the IRgBy colour space has been largely ignored as a skin detection colour space but has been used in objectionable image detection and has showed poor results (70% accuracy with colour histogram) [7]. Comparisons of its skin detection capabilities has been made by Chan [3] showing it to give less accuracy compared to HSV. 2.1.2 Perception Colour Space (HSV, HSI) The HSV (Hue, Saturation, Value), also referred to as HSB (Hue, Saturation, Brightness), colour space is a nonlinear transform of RGB and can be referred to as being a perceptual colour space due to its similarity to the human perception of colour. Hue is a component that describes pure colour (e.g. pure yellow, orange or red), whereas saturation gives a measure of the degree to which a pure colour diluted by white Light [33]. Value attempts to represent brightness along the grey axis (e.g. white to black), but as brightness is subjective it is therefore difficult to measure [33]. Along with RGB, HSV is one of the most commonly used colour spaces for skin detection [10], although sometimes said to give better results [3][6][25]. Q. Zhu et al [20] also notes that dropping the Value component and only using the Hue and Saturation components, can still allow for the detection 96.83% of the skin pixels. HSI (Hue, Saturation, Intensity), also referred to as HSL (Hue, Saturation, Luminance), is another perceptual colour space that gives good skin detection results. Like Value in HSV, Intensity is another representative of grey level, but decoupled from the colour components (Hue and Saturation). The 41 HSI colour space was used by Wang [24] as part of a content-based approach, stating that the skin and background pixels can be better differentiated using HSI rather than RGB. 2.1.3 Orthogonal Colour Space (YCbCr, YIQ, YUV) Often associated with digital videos the YCbCr colour space is one of the most popular colour spaces for skin detection [17][18][29]. YCbCr is a colour space where the luminance (Y) component and the two chrominance components (Cb Cr) are stored separately. Luminance is a representation of brightness in an image and chrominance defines the two attributes of a colour hue and saturation. Y is another representation of brightness and is obtained with a weighted sum of RGB, whereas Cb and Cr are obtained by subtracting the luminance (Y) from the Blue and Red components of RGB [29]. Due to the fact that the luminance and chrominance components are stored separately, YCbCr is greatly suited to skin detection and Shin [32] found that YCbCr gives the best skin detection results compared to seven other colour space transformations. YUV and YIQ are colour spaces normally associated with television broadcasts however they have been used in digital image processing. Similar to YCbCr the three components are in the form of one luminance (Y) and two chrominance (UV or IQ), where IQ and UV represent different coordinate systems on the same plane. Although both colour spaces have been used independently for skin detection [30], a combination of both YUV and YIQ together is used in objectionable image detection [4][8], giving poor results compared to the original RGB colour space. 2.2 Skin Detection by Colour Pixel colour classification can be complicated and there have been many suggested methods for classifying pixels as skin or non-skin colour in an attempt to achieve the optimum performance. Fleck et al [1] says that skin colours lie within a small region (red, yellow and brown) of the colour spectrum regardless of the ethnicity of the person within an image. Although this is a small region within the colour spectrum, it also incorporates other, easily identifiable, non-skin objects such as wood. Furthermore, human skin under significant amount of light can appear as a different colour. Colour detection methods can be classed as physical based, parametric or non-parametric. The choice of colour space can greatly affect the performance of both the physical based and parametric approaches, however the influence of the colour space choice is said to reduce greatly in the non-parametric approaches [36][37]. 2.2.1 Physical Based Approaches Using explicit threshold values in a colour space to detect skin is one of the most simplistic ways of detecting skin pixels. A physical based approach, using thresholds is often referred to as a colour model. This is the creation of parameters to stipulate the values a pixel can be if it is to be considered as skin. Example: Jiao et al [4] found that 94.4% of adult images could be detected using only the YUV and YIQ colour spaces, in which a pixel can be considered to be skin if (20 ≤ I ≤ 90) ∩ (100 ≤ θ ≤ 150) (Eq.1) where, §|V |· ¸¸ ©| U |¹ . θ = tan−1¨¨ (Eq.2) This method of skin detection can be used with a single colour space [1][17] for simplicity or multiple colour spaces [7][24] to increase accuracy. Related to the explicit threshold is the skin probability ratio (also known as skin likelihood). This is where a pixel is classified as skin using various probability theories to create a skin likelihood map. Ye et al [9] uses Bayes’ theorem to reduce the effect of variations in light while detecting skin 42 2.2.2 Parametric Approaches As previously discussed, skin colours lay within a small region of the colour spectrum, within this colour cluster skin is normally distributed i.e. Gaussian distribution. The Gaussian joint probability distribution function (pdf), a parametric approach, is a measure of skin likeness and is defined [30] as 1 p(c) = 1 2 (2π ) | ¦| 1 2 ª 1 º exp«− (c − μ)T ¦ −1(c − μ)» 2 ¬ ¼ , (Eq.3) where c is the colour vector, ȝ is the mean vector and is the diagonal covariance matrix. The Gaussian mixture model is a combination of Gaussian functions. The number of Gaussian functions used is critical and the choice of colour space is also of great importance [30]. It is widely regarded [5][37][38] that the Gaussian mixture model gives inferior results to that of such systems like the colour histogram, yet it has been extensively used for skin colour segmentation in objectionable image detection systems [27][28] showing surprisingly high sensitivity (92.2) and specificity (97.9) [12]. 2.2.3 Non-Parametric Approaches Colour histograms are a statistical method for representing the distribution of colour in an image and are constructed by counting the number of pixels of each colour. Jayaram et al [35] shows that the number of bins used in the histogram is a large factor in the performance of the skin detection. Duan [8] created a colour histogram of an image and use a support vector machine to classify it with 80.7% sensitivity and 90% specificity. Wang [2] uses weighed threshold values to create a skin colour histogram of an image, and then sums the entire histogram to establish the total skin within the image. Another use of the colour histogram is the likelihood histogram [6][22], created with the skin colour likelihood algorithm which establishes the probability of a pixel being a skin pixel. Jones and Rehg [5] used a set of training images to create two colour histograms of skin and non-skin pixels; maximum entropy modelling was then used to train a Bayes’ classifier with 88% accuracy. This model has been repeatedly used as part of other objectionable image detection systems [15][19]. A major issue with colour histograms is they only measure colour density, this means that two images, although completely unrelated, can have very similar histograms. A solution to this issue is the colour coherence vectors (CCV). CCV establishes the relevance (coherence) or irrelevance (incoherence) of a pixel to the region in which the pixel is situated, where a pixel’s colour coherence is the degree to which pixels of that colour are members of large similarly-coloured regions [39]. Jiao et al [4] found that using CCV along with a colour histogram improved specificity (87.7% to 90.4%) but decreased sensitivity (91.3% to 89.3%) 2.3 Skin Detection by Texture Although the texture of skin is quite distinct from a close range, skin texture appears smooth within most images. One of the biggest problems with skin colour modelling is falsely detecting non-skin regions as skin (false/positive) due to similar colour. Skin texture methods are principally used to boost the results of the skin colour modelling by reducing this false/positive rate. 2.3.1 Gabor Filter Gabor filters are band-pass filters that select a certain wavelength range around a centre wavelength using the Gaussian function. Gabor filters measure by performing image analysis in the space/wave number domain. Jiao [4] used a Gabor filter along with a Sobel edge operator to simply boost the performance of the skin colour detection finding that specificity was improved (63.3% to 87.7%) but sensitivity was decreased (94.4% to 91.3%). Whereas Wang [24] and Xu [27] use a Gabor filter to train a Gaussian mixture model to recognise skin and non-skin texture features. 43 2.3.2 Co-Occurrence Matrix The two-dimensional co-occurrence matrix measures the repetitive changes in the grey level (brightness) to measure texture. The matrix records the simultaneous occurrence of two values in a certain relative position. After the co-occurrence matrix has been constructed, the entropy, energy, contrast, correlation and homogeneity features of the image can be calculated. The co-occurrence matrix is used as a good trade off between accuracy and computation time [7][13]. 2.3.3 Neighbourhood Gray Tone Difference Matrix The neighbourhood grey tone difference matrix (NGTDM) is another texture feature analysis method very similar to the co-occurrence matrix as it measures the changes in intensity and dynamic range per unit area. NGTDM extracts the visual texture features such as Coarseness, Contrast, Busyness, Complexity and Strength. Cusano [10] used NGTDM with Daubechies' wavelets to extract the texture features of skin regions to boost the classification of skin. Other methods to help in skin classification include: region-growing algorithm [6], maximum entropy modelling [28], morphological operations [19], Bethe Tree Approximation and Belief Propagation [14], extreme density algorithm [29], entropy of intensity histogram [28] and median filters [1]. 3 Feature Extraction The classification of digital images is a memory hungry and computationally complex process. The solution for this is a process called feature extraction. Feature extraction is a form of dimension reduction, where resources used to describe large sets of data are simplified with as little loss to accuracy as possible. The colour and texture methods discussed previously are forms of feature extraction, but they are used solely in the classification of skin. This section discusses the features used in the classification of the objectionable image, predominately geometric and dimensional. 3.1 Skin Features After skin has been detected various features can be extracted. The skin area/image ratio is the percentage ratio of the image which is covered by skin. As most objectionable images would be predominately skin, the skin area/image ratio is used by most, if not all, the reviewed systems. This ratio does not depend on the method of skin classification and can be used as an input to the classifier [15][16] or as an early filtering system [2]. The amount [10], position [14], orientation [28], height and width [13], shape [17][20], eccentricity [21], solidity [21], compactness [19], rectangularity [19] and location [27][29] of skin regions are features used as input components to the machine learning classifiers. Liang [13] found that the height feature was the most important feature for the detection of objectionable images. The choice and implementation of classifier would stipulate the influence of the skin features, however it has been shown that skin features can improve accuracy [29]. The ability to extract these skin features depends on the method used in skin detection, if colour histograms are used then only the skin area/image ratio can be used, whereas using a skin likelihood map could allow the use of skin features such as position, orientation, height and width of skin regions. 3.2 Moments Moments are commonly used in shape and pattern recognition because a moment-based measure of shape can be derived that is invariant to translation, rotation and scale [2]. A descriptor, moments can be either geometric (Hu moments, Zenike moments) or statistical (mean, variance). Geometric moments are the product of a quantity and its perpendicular distance from a reference point or the tendency to cause rotation about a point or an axis. Statistical moments are the expected value of a 44 positive integral power of a random variable. Liang [13] found that the Hu moments are of less importance than the height skin feature and the skin area/image ratio, but of more importance than most of the other skin features when used with the Multi-Layer Perception classifier (NN). 3.3 Face Detection If it was assumed that all images with large areas of skin are objectionable, then a perfectly acceptable portrait image would be classed as objectionable. Face detection algorithms are used to filter any images whose skin pixels are mainly occupied by a face or faces. The face detection algorithms proposed by Viola and Jones [40] and Leinhart [41] give good trade offs between accuracy and computational speed, for this reason they have become popular methods of face detection in objectionable image detection systems [22][28]. 3.4 MPEG-7 Descriptors The eXperimentation Model (XM) software is used to access the MPEG-7 descriptors, which describe the basic characteristics of audio or visual features such as colour, texture and audio energy of media files. The descriptors that are available to image processing which have proven useful to objectionable image detection include the colour layout [11], colour structure [25], homogenous texture [11], edge histogram [11], region and contour shape [25] and dominant colour descriptor [26]. Kim [26] achieved high levels of accuracy with the colour structure descriptor used with the neural network classifier. 4 Classifiers A classifier is a mathematical method of grouping the images based on the results from the feature extraction and skin detection. Most of the systems class the images as benign or objectionable, however some have various levels such as topless, nude or sex image [25]. 4.1 Supervised Machine Learning Machine Learning is a field in artificial intelligence that develops algorithms to allow a computer to use past experience to improve performance. Supervised learning is when the algorithm learns from training data that shows desired outputs for various possible inputs and is the most used form of classification in the objectionable image detection field, with 22 of the reviewed publications using at least one of four various methods: Support Vector Machine (SVM), Neural Networks (NN), Decision Tree (DT) and k-Nearest Neighbour (k-NN). 4.1.1 Support Vector Machine The SVM is a kernel based classifier, that is relatively easy to train (compared to neural networks). The trade-off between accuracy and classifier complexity is controlled by the choice of an appropriate kernel function. Given a training set of benign and objectionable images the SVM will find the hyperplane between the two sets that will result in the highest number of benign images together and objectionable images together. The distance between the hyperplane and both sets must also be at its maximum. The SVM has been shown to be able to give high performance when used with the Gaussian mixture model (92.2% sensitivity and 97.9% specificity) [12], skin probability map (97.6% sensitivity and 91.5% specificity) [23] and colour histogram (89.3% sensitivity and 90.6% specificity) [4]. R. Cusano [10] found that the SVM gave better results than multiple decision trees. 4.1.2 Neural Networks NN are a machine learning algorithm based on how a biological brain learns by example. Classification is performed by a large number of interconnected neurones working simultaneously to process the image features and decide if the image is benign or objectionable. NN can implicitly detect 45 complex nonlinear relationships between independent and dependent variables, but can be computationally complex compared to SVM and can be difficult to train. Bosson [6] found that neural networks (83.9% sensitivity and 89.1% specificity) gave slightly better results to that of k-NN and SVM. Kim [26] attained 94.7% sensitivity and 95.1% specificity using NN with MPEG-7 Descriptors. 4.1.3 Decision Tree (DT) A DT is a classifier in the form of a tree structure, where each leaf node indicates the value of a target class and each internal node specifies a test to be carried out on a single attribute, with one branch and sub-tree for each possible outcome of the test. The classification of an instance is performed by starting at the root of the tree and moving through it until a leaf node is reached, which provides the classification of the instance. Zheng [18] shows that a DT can give 91.35% sensitivity and 92.3% specificity in detecting objectionable images. Zheng [19] also found that the DT (C4.5 method) gave higher accuracy than NN and SVM. 4.1.4 k-Nearest Neighbour The k-NN is based on finding the closest examples from the training data to classify an image as objectionable or benign. The training of the k-NN is very fast and Xu et al [27] found that the k-NN (81% sensitivity and 94% specificity) outperformed the NN (79% sensitivity and 91% specificity). 4.2 Statistical Classifier The Generalized Linear Model (GLM) extends the standard Gaussian (linear) regression techniques to models with a non-Gaussian response. GLM do not force data into unnatural scales, allowing for nonlinearity and non-constant variance structures in the data. Bosson [6] shows that the GLM can be used to detect objectionable images, the results acquired (83.9% sensitivity and 87.5% specificity) indicate that the NN, k-NN and SVM perform considerable better. 4.3 Geometric Classifier Fleck [1] used an Affine Imaging Model to identify limbs and torsos from detected skin regions, and then established if the limb and torso arrangement matches a geometric skeletal structure. Affine geometry is the geometry of vectors, which do not involve length or angle. Fleck achieved 52.2% sensitivity and 96.6% specificity using the Affine Imaging Model, which is poor compared to the machine learning classifiers. 4.4 Boosting Classification Boosting is the use of an algorithm to increase the accuracy of the learning classifiers and has been performed in two ways in the reviewed publications; Adaboost and Bootstrapping. Adaboost repeatedly calls weak classifiers, learning from each, correct and incorrect classification, but this process can be vulnerable to noise. Bootstrapping is where one is given a small set of labelled data and a large set of unlabelled data, and the task is to induce a classifier. Lee [29] shows that the addition of a boosting algorithm increases sensitivity from 81.74% to 86.29%. 5 Results The test results given are in the form of sensitivity and specificity, where sensitivity is defined as the ratio of the number of objectionable images identified to the total number of objectionable images tested and specificity is defined as the ratio of the number of benign images passed to the total number of benign images tested [2]. Due to space constraints Table 1 only shows the results of the reviewed publication whose sensitivity and specificity are both above 90%. As can be seen from this table the results of the detection systems look to give extremely high sensitivity and specificity. 46 Publication Sensitivity Specificity Wang et al 1997 [2] 91% 96% Yoo et al 2003 [11] Jeong et al 2004 [12] 93.47% 92.2% 91.61% 97.9% Zheng et al 2004 [18] 91.35% 92.3% Zhu et al 2004 [20] Belem et al 2005 [23] Kim at al 2005 [26] 92.75% 97.6% 94.7% 92.81% 91.5% 95.1% Dataset Source: Internet, Corel Library Internet Internet Internet, Corel Library Internet Internet Not Provided Ethnicity: Illumination Conditions: Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Table 1: Top 7 results from reviewed publications Table 1 also shows that very little information is given for the training and testing dataset used. If little information of the testing methods or images used is given, it is hard to accept the results presented. This is a frequent problem throughout most of the publications here as there is no standard objectionable image datasets; there is no sure way to adequately comparing all systems. Not all the publications omit the details of their datasets; Table 2 shows 5 publications which do give reasonable amounts of information on the datasets used to test their respected systems. This table illustrates that as the sensitivity increases from system to system the specificity decreases; this would suggest a more realistic set of results. Publication Sensitivity Specificity Fleck et al 1996[1] 52.2% 96.6% Jiao et al 2003 [4] 89.3% 90.6% Dataset Source: Internet, CDs, Magazines Internet, Corel Library Duan et al 2002 [8] 80.7% 90% Internet, Corel Library Cusano et al 2004 [10] 90.4% 88.4% Not Provided Lee et al 2004 [29] 86.4% 94.8% Not Provided Ethnicity: Illumination Conditions: Caucasians Various Caucasians, Asian Caucasians, Asian, European Caucasians, African, Indian Caucasians, African, Asian Not Provided Various Various Controlled Table 2: Publications that give adequate dataset information 6 Conclusion To reduce false-positives some papers have added various steps such as face detection and swimsuit detection. Generally the techniques have implemented a skin detection method, as large amounts of skin are generally a sign of the presence of naked people, followed by a feature extraction method, to identify the features such as shape and location, and finally classification from the results of the two previous steps. The right choice of method to perform colour analysis in the skin identification process directly stipulates the features that can be extracted from the image. The use of colour histograms to find the colour density of an image may identify if large skin areas are present, however they do not allow for features such as shape and location to be found. However, using colour histograms to train a Bayes’ probability algorithm has been proven to give good results [5]; note this is an old method and newer adaptive methods of skin detection have since been developed [12]. Much of the datasets used are described as being gathered randomly from the Internet (Some papers count logos as images from the Internet thus boosting their results.), nevertheless they do not state from what domain (Asian, American or European) or of what the images depict (indoor, outdoor, professional, amateur, etc). Both of these issues can affect the accuracy as the ethnicity of the persons within the images changes with the domain and the variations in quality and lighting could reduce the skin identification performance. The need for an academically available datasets is essential, however due to the nature of the images needed this may be impossible. There are legal and ethnical issues 47 surrounding the distribution of such images which prevent the creation of a dataset, as no academic institute wishes to be perceived as a distributor of pornographic material. After careful examination of the published papers it was decided that an optimal system would consist of: 1. HSV/HSI or YCbCr should be the choice of colour space for accuracy; RGB gives good results and can reduce computational complexity assuming most of the images are originally in RGB, (e.g. TIFF, GIF, PNG). 2. An adaptive skin colour technique should be used to eliminate variations in image quality and lighting. None of the reviewed publications give adequate solutions. This system must retain the ability for all feature extraction, so the use of a type of skin likelihood map may be preferable. 3. Gabor Filters have the greatest affect on increasing the specificity as texture analysis method. 4. The analysis of skin features such as location and orientation should be utilised along with a face detector to reduce false positives. 5. NN and SVM consistently give high levels of accuracy (need large datasets to train which may be an issue for some). 6. A Boosting algorithm such as Adaboost [29] should be used to boost the classification process. This paper has reviewed the best performing techniques used in skin detection for objectionable images. It has evaluated the best of the current techniques used in skin classification and feature extraction. Future challenges have been identified, and the proposed features of an optimal implementation technique are provided. References [1] M. Fleck, D.A. Forsyth, C. Bregler. “Finding naked people”, Proc. 4th European Conf. on Computer Vision, vol. 2, 1996, pages 593-602. [2] J.Z. Wang, J. Li, G. Wiederhold, O. Firschein, “System for screening objectionable images”, Computer Communications, Vol.21, No. 15, pages 1355-1360, Elsevier, 1998. [3] Y. Chan, R. Harvey, D. Smith, “Building systems to block pornography”, In Challenge of Image Retrieval, BCS Electronic Workshops in Computing series, 1999, pages 34-40. [4] F. Jiao, W. Gao, L. Duan, G. Cui, “Detecting Adult Image using Multiple Features” ICII 2001. [5] M. Jones, J. M. Rehg, “Statistical colour models with application to skin detection”, Int. J. of Computer Vision, 46(1), Jan 2002, pages 81-96. [6] Bosson, G.C. Cawley, Y. Chan, R. Harvey, “Non-Retrieval: blocking pornographic images”, ACM CIVR, Lecture Notes in Computer Science, Vol.2383, 2002, pages 60-69. [7] L.L. Cao, X.L. Li, N.H. Yu, Z.K. Liu, “Naked People Retrieval Based on Adaboost Learning”, International Conference on Machine learning and Cybernetics Vol. 2, pages 1133 - 1138, 2002. [8] L. Duan, G. Cui, W. Gao, H. Zhang, “Adult image detection method base-on skin colour model and support vector machine”, Asian Conference on Computer Vision, pages 797-800, 2002. [9] Q. Ye, W. Gao, W. Zeng, T. Zhang, W. Wang, Y. Liu, “Objectionable Image Recognition System in Compression Domain”, IDEAL 2003, pages 1131-1135. [10] C. Cusano, C. Brambilla, R. Schettini, G. Ciocca, “On the Detection of pornographic digital images”, VCIP, 2003, pages 2105-2113. [11] SJ.Yoo, MH.Jung, HB.Kang, CS.Won, SM.Choi, "Composition of MPEG-7 Visual Descriptors for Detecting Adult Images on the Internet", LNCS 2713, Springer-Verlag, pg 682-687, 2003. [12] C. Jeong, J. Kim, K. Hong, “Appearance-based nude image detection”, ICPR2004, pg 467–470. [13] K.M. Liang, S.D. Scott, M. Waqas, “Detecting pornographic images”, ACCV2004, pg 497-502. [14] H. Zheng, M. Daoudi, B. Jedynak, “Blocking Adult Images Based on Statistical Skin Detection”, Electronic Letters on Computer Vision and Image Analysis, Volume 4, Number 2, pages 1-14, 2004. 48 [15] W. Zeng, W. Gao, T. Zhang, Y. Liu, “Image Guarder: An Intelligent Detector for Adult Images”, ACCV2004, pg 198-203. [16] Y. Liu, W. Zeng, H. Yao, “Online Learning Objectionable Image Filter Based on SVM”, PCM, 2004, pg 304-311. [17] W.Arentz, B.Olstad, “Classifying offensive sites based on image content”CVIU2004, pg 295-310. [18] QF.Zheng, MJ.Zhang, WQ.Wang, “A Hybrid Approach to Detect Adult Web Images”, PCM2004, pg 609-616. [19] QF.Zheng, MJ.Zhang, WQ.Wang “Shape-based Adult Image Detection”, ICIG2004, pg 150-153. [20] Q. Zhu, C-T. Wu, K-T. Cheng, Y-L. Wu, “An adaptive skin model and its application to objectionable image filtering”, ACM Multimedia, 2004, pages 56-63. [21] J. Ruiz-del-Solar, V. Cataneda, R. Verschae, R. Baeza-Yates, F. Ortiz, “Characterizing Objectionable Image Content (Pornography and Nude Images) of Specific Web Segments: Chile as a Case Study”, LA-WEB, 2005, pages 269-278. [22] Y. Wang, W. Wang, W. Gao “Research on the Discrimination of Pornographic and Bikini Images” ISM, 2005, pg 558-564. [23] R. Belem, J. Cavalcanti, E. Moura, M. Nascimento, “SNIF: A Simple Nude Image Finder”, LAWeb, 2005, pages 252-258. [24] S-L. Wang, H. Hu, S-H. Li, H. Zhang, “Exploring Content-Based and Image-Based Features for Nude Image Detection” FSKD (2), 2005, pages 324-328. [25] W. Kim, S.J. Yoo, J-s. Kim, T.Y. Nam, K. Yoon, “Detecting Adult Images Using Seven MPEG-7 Visual Descriptors”, Human.Society@Internet, 2005, pages 336-339. [26] W. Kim, H-K. Lee, S-J. Yoo, S.W. Baik, “Neural Network Based Adult Image Classification” ICANN (1), 2005, pages 481-486 [27] Y. Xu, B. Li, X. Xue, H, Lu, “Region-based Pornographic Image Detection”, MMSP, 2005. [28] H.Rowley, Y.Jing, S.Baluja, “Large scale image-based adult-content filtering”, VISAPP2006, pg 290-296. [29] J.-S. Lee, Y.-M. Kuo, P.-C. Chung, E.-L. Chen, “Naked image detection based on adaptive and extensible skin colour model”, Pattern Recognition (2006), doi: 10.1016/j.patcog.2006.11.016. [30] P. Kakumanu, S. Makrogiannis, N. Bourbakis, “A survey of skin-colour modelling and detection methods”, Pattern Recognition, Volume 40, Issue 3, March 2007, Pages 1106-1122. [31] R. Gershon, A.D. Jepson, J.K. Tsotsos, “Ambient illumination and the determination of material changes”, J. Opt. Soc. Am. A vol. 3, 1986, pages 1700–1707. [32] M.C. Shin, K.I. Chang, L.V. Tsap, “Does Colour space Transformation Make Any Difference on Skin Detection?” IEEE Workshop on Applications of Computer Vision, Dec 2002, page 275-279. [33] R.Gonzalez, R.Woods, S.Eddins, Digital Image Processing Using MATLAB, Prentice Hall, 2004. [34] G. Gomez, M. Sanchez, L.E. Sucar, “On Selecting an Appropriate Colour Space for Skin Detection”, MICAI, 2002, pages 69-78. [35] S. Jayaram, S. Schmugge, M.C. Shin, L.V. Tsap, “Effect of Colour space Transformation, the Illuminance Component, and Colour Modelling on Skin Detection”, CVPR, 2004, pages 813-818. [36] A.Albiol, L.Torres, E.Delp, “Optimum colour spaces for skin detection”, ICIP2001, pg 122-124. [37] S.L. Phung, A. Bouzerdoum, D. Chai, “Skin segmentation using colour pixel classification: analysis and comparison”, IEEE Trans. Pattern Anal. Mach. Intell, 2005, pages 148-154. [38] V. Vezhnevets, V. Sazonov, A. Andreeva, "A Survey on Pixel-Based Skin Colour Detection Techniques". Proc. Graphicon, 2003, pages 85-92. [39] G. Pass, R. Zabih, J. Miller, “Comparing Images Using Colour Coherence Vectors”, ACM Multimedia, 1996, pages 65-73. [40] P.A. Viola, M.J. Jones, “Robust Real-Time Object Detection”, Tech report COMPAQ CRL, 2001. [41] R. Lienhart, A. Kuranov, V. Pisarevsky. “Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection”, DAGM, Pattern Recognition Symposium 2003, pg 297304. [42] A. Healy, “Call for mobile phone security”, The Irish Times, 17th February, 2004. [43] A. Healy, “Gardai seek distributor of explicit image of girl on phone”, The Irish Times, 23rd January, 2004. 49 Optical Reading and Playing of Sound Signals from Vinyl Records Arnold Hensman Department of Informatics, School of Informatics and Engineering Institute of Technology Blanchardstown, Dublin 15, Ireland Email: [email protected] Kevin Casey Faculty of Computing, Griffith College Dublin South Circular Road Dublin 8, Ireland Email: [email protected] Abstract While advanced digital music systems such as compact disk players and MP3 have become the standard in sound reproduction technology, critics claim that conversion to digital often results in a loss of sound quality and richness. For this reason, vinyl records remain the medium of choice for many audiophiles involved in specialist areas. The waveform cut into a vinyl record is an exact replica of the analogue version from the original source. However, while some perceive this media as reproducing a more authentic quality then its digital counterpart, there is an absence a safe playback system. Contact with the stylus provided by a standard turntable causes significant wear on the record (or phonograph) over time, eventually rendering it useless. Couple this with the historic value and an abundance of such vinyl media, and the need for a non-contact playback system becomes evident. This paper describes a non-contact method of playback for vinyl records which uses reconstruction of microscopic images of the grooves rather than physical contact with the stylus. Keywords: Waveform Reproduction, Image Stitching, Vinyl Record, Groove Tracking, 78rpm 1 Introduction Since a vinyl record is an analogue recording, many claim that the application of a sample rate when making digital recordings for CDs and DVDs results in too great a loss in sound quality. Natural sound waves are analogue by definition. A digital recording takes snapshots of the analogue signal at a certain sample rate and measures each snapshot with a certain accuracy. For CDs the sample rate is 44.1 kHz (44,100 times per second at 16-bit). The sample rate for DVD audio is 96 kHz or 192 kHz for HighDefinition DVD (HD –DVD). A digital recording, however, cannot capture the complete sound wave; at best it is a close approximation and many claim that, although high quality, it still cannot fully reproduce the fidelity of sound that vinyl records can. Figure 1 illustrates the application of a sample rate upon a simple sound wave. Sounds that have fast transitions, such as drum beats or a trumpet's tone, will be distorted because they change too quickly for the sample rate. Historical recordings archived for posterity are often fragile with owners not wanting to risk the use of conventional stylus playback. The development of a non-contact player that carefully reconstructs a restored image of the original analogue groove would not only remove this risk, but it would make safe playback possible for records that are severely damaged. Normally the downside of playing an analogue signal is the fact that all noise and other imperfections are also heard. So, if there is a period of silence on a record you hear background noise. With the proposed system any background noise that was present could be removed since the optical player would detect this noise in advance and simply ignore it. Figure 1: Comparison of CD and DVD sample rates upon an analogue waveform 50 1.1 Conventional playback methods for vinyl records The first method of recording sound was the phonograph created by Thomas Edison in 1877. He used a mechanism consisting of a needle and collection horn to store an analogue wave mechanically by cutting a waveform directly onto a cylindrical sheet of tin. The use of flat records with a spiral groove was an improvement on Edison’s machine by Emil Berliner in 1887. The stereo evolution of this method - High Fidelity (Hi Fi) - didn’t lose popularity until compact disks revolutionised the consumer market in the early 1980’s. Only the most advanced digital systems can rival its fidelity. At microscopic level this groove resembles an undulating track with varying width. A turntable stylus follows these undulations along the groove towards the centre thus following the waveform along the way. Records skip when the bass information is too loud and the stylus is thrown into a neighbouring section of the groove. Vocal sibilance and sudden loud symbol crashes can also cause a rapid increase in frequency so the stylus faces a pronounced ‘S’ effect in the groove that could potentially cause it to skip. The traditional method of a diamond shaped stylus running along a V-shaped groove also applies constant weight and pressure to the groove and results in a increase in temperature causing further damage. A system that views the waveform up close without any contact would completely remove this problem. Skipping of grooves would be eliminated along with unwanted background noise, scratches and distortion from tiny particles. 1.2 Assumptions The terms LP record (LP, 33, or 33-1/3 rpm record), EP, 16-2/3 rpm record (16), 45 rpm record (45), and 78 rpm record (78) all refer to different phonographs for playback on a turntable system. The rpm designator refers to the rotational speed in revolutions per minute. They are often made of polyvinyl chloride (PVC), hence the term vinyl record. For the purposes of this study, monaural signals only are processed, i.e. vinyl 78rpm records. The groove may be viewed clearly in two dimensions so image acquisition may be performed more easily. Since most historic recording are stored on 78s that are now considered antiques, it makes sense to restrict this study to that media. 1.3 Evaluation of Existing Non-Contact Systems The main technology currently using a method for non-contact playing of vinyl records is the Japanese ELP corporation’s laser turntable ™ [ELP, 2003]. This impressive system can play grooved, analogue 33.3, 45 or 78 RPM discs by illuminating the walls of each groove with five laser beams [Smart, 2003]. In essence, to play the sound, a laser views the image by reflecting back the amplitudes of the waveform. At a basic price over US$10,000 it will hardly ease into the mass production market. In fact to play 78s it requires the advanced model with extra sensors to monitor speed. The laser of the basic model cannot accurately track the groove at such speed without the risk of intermittent pauses throughout playback. The cost of the advanced model is almost twice the basic price plus additional costs for the patching of scratches and noise. Any noise, damage or dirt will be picked up as the laser cannot overcome serious flaws on the disk. If the disk is slightly bent or warped in any way, the laser turntable will reject it. It works best with black records rather than coloured or vinyl with added graphics. The system was invented by Robert E. Stoddard, a graduate student at Stanford University in 1986 [USPO, 1986]. The dual beam model was patented in 1989 [USPO, 1989]. The same result could be achieved by a microscopic imaging system at a fraction of the cost. Since the objective is to play the sound data without contact with a stylus, a microscopic imaging camera could be used to replace the stylus. The added advantage of this method is that an image could easily be enhanced to smooth out noise at source and overcome damage. Ofer Springer proposed the idea of the virtual gramophone [Springer, 2002]. Springer’s system scans the record as an image and applies a virtual needle following the groove spiral form. P. Olsson’s Swedish team developed this to use digital signal processing methods such as FIRWiener filtering and spectral subtraction to reduce noise levels [Olsson et al., 2003]. These two systems however only used a basic scanner, limiting the resolution to a maximum of 2400dpi or 10 per pixel. At this resolution, quantisation noise is high because the maximum lateral travel of the groove is limited. Fadeyev and Haber’s 2D method reconstructs mechanically recorded sound by image processing [Fadeyev and Haber, 2003]. The resolution was much higher due to the use micro-photography. Baozhang Tian proposed a 3D scene reconstruction method using optical flow methods to generate a virtual 3D image of the entire groove valley [Tian, Bannon, 2006]. But as we shall see, quality results can be achieved by processing 2D images of the 2D sound signals from 78rpm records. _m 51 2 Specification of Proposed Methods The objective of this system is a process for optical playing of vinyl records (for the purposes of this system, 78rpm phonographs will be used) by reconstructing a restored image of the original analogue. The surface reconstruction will be two dimensional as that is all that is necessary in the case of 78s. Image analysis techniques will be used to stitch together a longitudinal image of the overall groove There are four stages of implementation in order to achieve this. This section will briefly describe each of those stages in sequence and the problems encountered at each and how they are overcome. The four main sections in this playback process are: (i) (ii) (iii) (iv) Image collection Stitching of overlapped Images Groove tracking Waveform creation from the tracked groove and sound file creation. The following chart indicates the flow of each phase. Stage 1 : Image Collection Stage 2 : Stitching algorithm applied to Images Stage 3 : Groove tracking Bitmap files generated Selected files passed to Stitching algorithm Single image passed to tracking algorithm Stage 4 : Waveform and sound file creation Sound data passed Sound File generated (.wav) Figure 2: Stages in development of optical playback process 2.1 Image Collection Phase – Stage 1 Perhaps the most important and significant stage in the process is the image collection phase. If proper images of high quality are not collected initially it will be inevitable that further problems will occur in the stitching and groove tracking phases. Potential hazards to the collection of data will now be explored so that they may be foreseen and overcome in advance. Figure 3 illustrates how the images are retrieved using a computer microscope and a stepper motor to turn the record. The stepper motor takes the place of a turntable and is connected to the record in such a manner that any partial rotations cause it to move in discrete steps. For each step the motor takes (the exact distance of which is controlled by software), the microscope will in turn take an image of a small section of the groove. On subsequent movements, the stepper motor will position the next section of the groove for imaging and the microscope will take further pictures. Each image is saved in bitmap format and converted to greyscale to optimise the stitching algorithm at a later stage by keeping the images as simple as possible. Figure 3: Image Collection Stage 52 One of the most important factors in maintaining consistency of image quality is to ensure that the microscope remains in focus. Any tiny variation in the record from its level position will move the lens slightly out of focus. That is, the focal length (distance between the record and microscope lens) will increase or decrease. For this reason, a focus detection algorithm is incorporated so as to warn the user when re-focusing is required. A second motor may be added to automatically readjust the focus, which would slightly add to the system’s cost. Warped or slightly bent disks will also gain from this feature as they are prone to going out of focus. Figure 4 shows a set of a sample images maintaining a consistency of three groove sections per frame all to be stitched together into a panoramic view. There is a deliberate overlap - most evident in pictures (c) and (d). This overlap can easily be controlled by the stepper motor which acts as the turntable. It is incorporated to aid the stitching process and to ensure that the correct panoramic image has been obtained. (a) (b) (c) (d) Figure 4: Sample set of microscopic groove images (x200 Magnification) Approximately 60 to 100 slightly overlapping images will be required to capture one revolution of a 78rpm groove. A simple algorithm to detect de-focus is added to automatically pause the image collection phase and re-adjust microscope focal length using a second stepper motor before resuming image taking. The MilShaf ltd stepper control motor that was used used has a minimum shift capability of 0.25° [MilShaf SMC]. This means it takes exactly 60 images (6° x 60 = 360°) to capture one complete revolution of a groove image i.e. it will take 24 steps of the stepper motor before the microscope should take the next image. 1 step = 0.25° => Frame width = 320 pixels => 1 Image shift Pixel shift per step = 24 steps = 320 / 24 = = 0.25° * 24 13 pixels = 6° This high level of accuracy meant that the potential overlap can be set to within 13 pixels. Although the 6° movement will be the same throughout the collection process, the above calculations only consider the outer circumference of the record. A revolution of the groove will become smaller as it gets closer to the centre of the record. The angular speed of vinyl is constant, so the speed of the needle at the center is lower than at the outside. Therefore the information density is lower on the outside. So it is beneficial if there is a lot of overlap towards the center. The overlap can always be changed as the central grooves are captured to better contribute towards capturing the higher density data there more accurately. In the case of a 6° set of steps at the outer circumference, the image will perform a 13 pixel shift. Take this outer circumference as being radius r, the radius of the record, as the centre approaches the radius will become r’. To obtain the number of pixels for a 6° set of steps at radius r‘ the following equations are formed. (a) Circumference * (b) Circumference * (a) (2Πr) * (a/360) (b) (2Πr’) * (a/360) %movement = 13 %movement is unknown = 13 = pixel movement at r ‘ at radius r at radius r’ where a is the movement in degrees Dividing (a) by (b) gives r/r’ = 13/ pixel movement at r ‘ => pixel movement at r ‘ = (13r’)/r Image collection was performed using oVWF.ocx, a Video for Windows ActiveX control that required no external interaction or use of a DLL. It could easily be incorporated into our software providing the required versatility in taking images. It was created by Ofer LaOr, director of Design by Vision [LaOr 1997]. This control has all the convenience of any ActiveX control and can easily be used in the Microsoft suite of programming languages. It can prompt dialogs to allow you to change the settings of 53 the bitmap being imaged (i.e. 16 colour, 256 colour etc). Hence our need to obtain greyscale images was met. 2.1.1 Overcoming Potential Hazards in Image Collection (a) Image drift: In a sample containing three grooves as in Figure 5, one of the grooves would often move to the top of the video window after several shots were taken and eventually go out of sight as illustrated in Figure 6(b). This was due to the way in which the record was first centred upon the turntable. Any tiny variation off-centre at this level of magnification would be noticed. However, even if the geometric centre of the record is chosen with perfect accuracy, the spiral centre cut onto the groove may be slightly different (b) De-Focus: Figure 5(b) is clearly not as sharp as Figure 5(a). This is only after 25 frames have been taken in a particular set. The focal length (the distance between the record and microscope lens) would increase or decrease due to a slightly unlevelled record causing the image to go out of focus. A simple algorithm to give a deterministic value for focus is used based on the contrast of images. (a) (b) Figure 5: Contrast within frames is used to gauge the focus level Since the image is converted to greyscale, every pixel has a value between 0 and 255. The deterministic value obtained is the average contrast per adjacent pixel of any image. Converting these values to a percentage value between 0 and 1, will allow us to apply the algorithm. For example consider the following 3 by 4 bitmap matrix A: ⎡ a11 a12 a13 a14 ⎤ ⎢⎣a31 a32 a33 a34 ⎥⎦ A ⎢ ⎥ ⎢a21 a22 a23 a24 ⎥ The absolute row-contrast differences may be determined by: ⎡| a11 − a12 | | a12 − a13 | | a13 − a14 |⎤ ⎢| a − a | | a − a | | a − a |⎥ 22 23 23 24 ⎥ ⎢ 21 22 ⎢⎣| a31 − a32 | | a32 − a33 | | a33 − a34 |⎥⎦ It is possible to approximate the row contrast R, by calculating the sum of these absolute differences. The absolute column-contrast differences may be determined by: ⎡| a11 − a21 | | a12 − a22 | | a13 − a23 | | a14 − a24 |⎤ ⎢| a − a | | a − a | | a − a | | a − a |⎥ 22 32 23 33 24 34 ⎦ ⎣ 21 31 It is possible to approximate the column contrast C, by calculating the sum of these absolute differences. Total Contrast = R + C If this is applied to the initial frame to get a focus standard, and further applied to all adjacent elements of a bitmap matrix, the average contrast per pixel can be used as the deterministic coefficient for determining whether or not an image is in focus. 2.2 Image Stitching – Stage 2 Stitching of the images collected from Stage 1 is based on determining the exact positions where frames overlap. Image mosaics are collections of overlapping images that are transformed in order to 54 result in a complete image of a wide angle scene. Two frames at a time were stitched together to build up the mosaic image. The simplest algorithm in stitching simple images together is the least squares algorithm which determines a set of overlap errors for various overlapping positions. The position with the lowest error will be taken as the point where the two images overlap. These positions will then be saved in a text file in order to reproduce the stitched image at any time. For every subsequent pair of images, the overlap positions will thus be recorded. Baozhang Tian suggests a similar method of image acquisition involving complex use of surface orientation analysis to record the image [Tian, Bannon, 2007]. Stitching algorithms however provide a more simple approach, since a delay will be present anyway in both methods. The images are converted to grayscale in the capture process so we will get a bitmap matrix of values between 0 and 255. The images will be taken under similar lighting conditions and will not vary greatly in content. When dealing with panoramic images, there is usually the source position issue to contend with. Our images will be taken from a new point of reference above the vinyl record at each instance to suit the microscope as the record moves on the turntable, so this is angular distortion is not an issue as illustrated in Figure 6. Figure 6: Angular positioning problem will not be present The stitching algorithm may be defined as follows: sum = 0 For all overlapping pixels pixel1 = Numeric pixel value in the first image pixel2 = Corresponding numeric pixel value in the second image sum = sum + (pixel1 - pixel2)² End For error = sum / X , where X is the total number of overlapping pixels The minimum overlap can be chosen by the user in form that executes the stitching, but essentially every test of the least squares algorithm operates in a similar way. A column by column test is performed for every overlap beginning with the overlap of the upper midway position of height and width of the first frame of a pair. Frame 2 is then tested at the same column position, but on the next row down, and so on, until it overlaps midway with the lower half of frame1. At this point the column shift should take place towards the left and the procedure begins again. It will stop when the column position of frame2 reaches the minimum overlap point. The basic process used is outlined below. The height and width will be the same for both frames. For column = Width/2 To MinOverlap For row = (Height + Height/2) To (Height - Height/2) Perform Least Squares algorithm upon the overlap Next row Next column Two 2D arrays; pic1[ ][ ] and pic2[ ][ ] are passed to the function image( ) contained within a file. They will contain integer values between 0 and 255 since the images are converted to greyscale. Each array is essentially a bitmap matrix of the two frames being stitched together. 55 Figure 7: Dimensions of two overlapping images The values indicated in Figure 7 represent the dimensions processed within the algorithm. The values for rm and cm will change continually as the least squares error is determined for each position. The values for the number of rows - r, columns - c and the least_overlap field (representing the minimum overlap input by the user) are also passed as parameters. The least squares algorithm is conducted based on the for every changing values nr to test if the correct stitching position has been found. 2.2.1 Testing the stitching process The Minimum overlap was included for two reasons: 1. To speed up the stitching process as less positions are tested 2. In cases where the overlap is small, the summations made by the algorithm are also small. The possibility therefore exists that a smaller error in one of these positions will be returned instead of the true overlap position. Along with the text file containing the top and left positions for overlapping images, a second file is created. This is called errorfile.txt and it contains a list of all the errors calculated for every X and Y value tested for the pair of overlapping images. It is used to confirm that the correct overlap position has in fact been obtained. Graphically viewing this error data (when capturing three grooves per frame), reveals that there are usually three possible overlaps to consider, Figure 8. Figure 8: Potential overlap positions (a) Case 1 (b) Case 2 (c) Case 3 On observation, case three appears to be the correct version, but a more structured testing approach is adopted. A surface plot is generated from the error file containing all error values for the overlaps, at each X and Y position. The three potential overlaps can be seen in Figure 8. The x and y co-ordinates indicated above correspond to those shown in the images of Figure 9. The lowest error value (z -axis) can be seen in the centre surface plot. Note the protruding lower value of the central set of errors. According to the graph, this lowest error, in the range of 0 to 200 is to the order of 4 times lower then the closest match in either of the other two errors sets which begin at 800. By including a minimum 56 overlap in the algorithm the number of positions tested can be calculated as follows: Positions tested = 240 * (320-minimum overlap) Figure 9: The lowest error value as seen on surface plot. 2.3 Groove Tracking The next stage of the system deals with the tracking of a groove across the stitched images in order to create a waveform of sound data. As stated, the record groove is in fact the representation of the sound waveform. Our system allows the user to select the first image from which to begin the tracking process. The image focus algorithm will determine whether or not this is an appropriate frame to begin with. Figure 10: Groove tracking process 57 Once the initial frame is loaded the user simply clicks on a pixel slightly above the groove they wish to track. The test track button performs the tracking algorithm and displays a sample of exactly which range of pixels will be considered. Figure 10 illustrates how this is achieved by displaying the groove outline in red. If the user is satisfied with this groove, and the default settings appear to trace it accurately, the accept groove button is chosen and a set of text files are generated containing the groove information. The upper and lower tracks of the groove. Two ranges of pixel values are considered when tracking the groove: 1. 2. Those considered to be in the range of the groove colour (White) Those considered to be in the range of the non- groove colour (Black) The default settings for groove/non-groove ranges are based on the first frame that is loaded. Every pixel is then compared to this range so the algorithm can make an accurate estimation of which pixels belong to the groove. The ranges for groove/non–groove values may also be chosen manually for better accuracy in the case where the defaults are unsatisfactory. This is done by sampling a pixel in a typical position on the groove (white) and another which is not part of the groove (black). The focus threshold value may also be set by the user by accepting the average contrast per pixel of the selected image as the optimum focus value. All other frames will be tested relative to this focus range. 3 Sound file creation – Stage 4 The data saved in the waveform.txt file is prepared for transfer to a .wav file. This file contains a series of undulating vertical values (y-axis) along a horizontal plane (x-axis). The graphic of this file may be displayed in our system’s software. The minimum vertical value is calculated so that the relative offsets from this minimum can be manipulated such that the lowest value becomes 0 and the upper limit of the waveform becomes 255. This is merely a manipulation to enable the file to be placed into a wav file. There are several methods of .wav file creation once the waveform has been stored in a text file. There are also a number of off-the-shelf packages that will create the image of any sound signal you record through a microphone, and allow you to modify the image for playback. Such programs could too be incorporated by this system to playback the image of the record groove. The simplest way to ensure the correct sound signal has been recorded is to test it by creating a .wav file from the data. There is a sampling rate applied but this would merely be for testing purposes only. Many off the shelf products will play such waveform once created. 4 Conclusions This paper described a method for non-contact playing of vinyl records by stitching together smaller microscopic images of the waveform into one larger panoramic view. This enables playback of the waveform from the image rather than through contact with a stylus. Previous attempts did not consider the added simplicity involved in restricting the image to a two-dimensional greyscale format. The grooves of 78rpm phonographs can be seen clearly in a two-dimensional format with relatively inexpensive equipment. Stitching algorithms, although simpler in approach to computer vision techniques such as optical flow and surface orientation, will perform competitively with higher processing speeds. Serious scratches and even broken records may be played and enhanced by image manipulation. The laser turntable cannot do this and has problems with even slightly warped disks. The disadvantage of this method is in the timescale required to fully execute the image stitching process. This renders instant playback impossible since the image must first be collected. However, once the image is in fact collected, and a mosaic stitched together, virtual real-time playback is possible with added features of noise reduction, no skipping and broken segments causing no obstructions. Since all other proposed methods, apart from the expensive laser turntable have similar delays in image acquisition, it can be argued that the image collection process that is the most paramount to the systems success. 58 The image stitching method proposed in this paper used 2D greyscale microscopic pictures rather than a full 3D groove construction which will obtain a large amount of redundant data. Since older 78s contain mono signals only, the grooves can be viewed in two dimensions. If a delay is inevitable, then the time required for image acquisition in such systems becomes secondary to the quality of images taken. Further enhancements would include ways to more accurately obtain this overall record image while minimizing the hazards outlined in sections 2.1.1 above, namely image drift and de-focus. Image drift could be instantly eliminated by using a method of keeping the record stationary and moving the microscope instead over it in grid like movements through two by two sections rather than following the groove specifically. The fact that no specific groove is followed would mean the image drift problem disappears. The same stitching methods could be used to stitch grid sections. This sectioning of the image would also give more control and flexibility than one long groove image. The fact that the record is stationary would mean that de-focus is less of a problem as zero movement of the disc will not create an unlevelled surface (unless of course the disc is warped) and thus focal length should remain more or less consistent. To make real time playback possible while the imaging is taking place a system of buffered images might be incorporated where there would only be one initial short delay. With the current system, image analysis methods to dismiss the majority of unsuccessful overlaps would have to be developed to speed up the process. Essentially however, it does indeed appear possible for this system to be used as an inexpensive way to safely transfer the information from rare, antique or damaged records so they can be played in analogue form. References [ELP, 2003] ELP Laser Turntable ™, No needle No wear. Web reference www.elpj.com [Fadeyev and Haber, 2003] Fadeyev, V. and Haber, C, 2003. Reconstruction of Mechanically Recorded Sound by Image Processing. Journal of Audio Engineering. Society. Pgs: 1172–1185. [LaOr 1999] Ofer LaOr, 1999. A Video for Windows ActiveX control. Dr Dobbs Programmers Journal. June 1999. [MilShaf SMC] StepperControl.com, a division of MilShaf technologies inc. Web reference: www.steppercontrol.com/motors.html [Olsson et al., 2003] Olsson, P, Ohlin. R. Olofsson, D., Vaerlien, R., and Ayrault, C, 2003. The digital needle project - group light blue. Technical report, KTH Royal Institute of Technology, Stockholm, Sweden. Web reference: www.s3.kth.se/signal/edu/projekt/students/03/lightblue/ [Smart, 2003] The amazing laser turntable. Smart Devices Journal, Aug 2003. Web reference www.smartdev.com/LT/laserturntable.html [USPO 86] Stoddard, Robert E, Finial Technology Inc, 1986. United States Patent Office. Number 4,870,631. Optical turntable system with reflected spot position detection, [USPO 89] Stoddard, Robert E et al, Finial Technology Inc. 1989. United States Patent Office, Number 4,972,344. Dual beam optical turntable, [Springer, 2002] Springer, O, 2002. Digital needle - a virtual gramophone. Web reference: www.cs.huji.ac.il/~springer/ [Tian, Bannon, 2006] Baozhang Tian and John L.Barron, 2006. Reproduction of Sound Signals from Gramophone Records using 3D Scene Reconstruction. Irish Machine Vision and Image Processing Conference. [Tian, Bannon, 2007] Baozhang Tian and John L.Barron, 2007. Sound from Gramophone Record Groove Surface Orientation. 14th IEEE-Intl Conferece on Image Processing. 59 Optimisation and Control of IEEE 1500 Wrappers and User Defined TAMs Michael Higgins, Ciaran MacNamee, Brendan Mullane. Circuits and Systems Research Centre (CSRC), Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland. [email protected] Abstract: With the adoption of the IEEE 1500 [1] Standard, the opportunity exists for System on Chip (SoC) designers to specify test systems in a generic way. As the IEEE 1500 Standard does not address the specification and design of the on-chip Test Access Mechanism (TAM), considerable effort may still be required if test engineers are to optimise testing SoCs with IEEE 1500 Wrapped Cores. This paper describes novel research activity based on the design of TAMs that are compatible with IEEE 1500 wrapped cores and once a Test Resource Partitioning (TRP) scheme has been adopted it is shown that multiple TAM sections and Core Wrappers on a SoC can be controlled through the use of an intelligent test controller. Taking into account previous work on TRP, functional testing using the system bus and TAM architectures, a novel approach is introduced that allows some elements of the system bus to be used as part of the TAM while retaining compatibility with the IEEE 1500 wrapped cores. A small micro-controller SoC design based on the AMBA APB bus is used to investigate this approach. A crucial element of this approach involves interfacing the combined TAM to the mandatory Wrapper Serial Port (WSP) and the optional Wrapper Parallel Port (WPP) of the IEEE 1500 wrapped cores in the chip. Test Application Time (TAT) results are presented that establish the viability of the ideas described, as well as comparative analysis of TAT results derived from a number of test structures based on these techniques. Keywords: IEEE 1500, TAT, TAM, SoC, Intelligent Test Controller. 1. Introduction The ever-increasing SoC test problem has been well documented in recent years [2]. The many factors that contribute to the overall problem can be broken down as follows: low Test Access Port (TAP) bandwidth, limited embedded core accessibility, large volumes of test data, deep sub-micron effects not covered by standard fault models and undefined TAM structures. The contributing factors are by no means limited to the above list but can be more accurately defined by the test objectives or constraints such as cost, time or test coverage. The purpose of this paper is to investigate how an IEEE 1500 wrapper can be configured and combined with TAM optimisation techniques to reduce overall test time. A novel on-chip test controller is also presented that has the ability to manage multiple TAM sections and wrapper configurations. The concept of a bus-based TAM used for both functional and structural testing is also introduced to allow for further reduction of resources i.e. silicon. Section 2 gives an overview and the history of TAM types and structures, namely bus and non-bus based TAMs. The concept of TRP is covered in section 3. An overview of the IEEE 1500 standard is covered in section 4. The bottom up optimisation model [3] is analysed in section 5. The TRP solution for the benchmark circuit is discussed in section 6. TRP can be used to find the trade-off between TAM width and Test Application Times (TATs) and results are presented later in section 6 showing TATs based on TAM structures with and without the application of TRP. The effect of 3 different wrapper configurations: WSP Only, WPP Only and WSP and WPP combined, on overall TATs are also considered in section 6. Section 7 60 introduces and describes the novel test controller. Section 8 details future work and conclusions. 2. TAM Types A TAM is an on-chip mechanism that is used to transport test vectors and test responses from cores to an on-chip test controller or an off-chip test manager (Automated Test Equipment (ATE)). The TAM is user definable and is generally based on one of the architectures described below. The earliest TAMs were categorised into 3 distinct types [4]: daisy-chain architecture, distributed architecture and multiplexed architecture. SoC IN SoC SoC Core A IN Core A OUT Core A Core B IN Core B OUT IN Core B Core C IN Core C OUT Core C OUT OUT ( b) (a) (c) Figure 1: (a) Daisychain, (b) Distributed, (c) Multiplexed [4] • A daisy-chain architecture (Figure 1 (a)) is where the input TAM for 1 core is the output TAM from the previous core, i.e. if there are 10 cores in the SoC, before the 10th core can be tested the first 9 cores must have been tested, therefore this is also a sequential testing scheme. A parallel core-testing scheme is one where each of the cores can be tested in parallel. • A distributed architecture (Figure 1(b)) is one where the TAM lines are divided between all of the cores, so that core testing can occur in parallel. • A multiplexed architecture (Figure 1 (c)) is one where each core within the SoC has access the whole TAM, but only 1 core can ever use the TAM at any given time, so testing is sequential. Previous work [5] on TAM assignment techniques has been based partially on these 3 architectures mentioned above. TAM structures can use a non-bus based or a bus based strategy. • A non-bus based strategy is where extra interconnections are added to a design to facilitate a TAM. These extra interconnections will only ever transport test stimuli and test response to and from the cores. There are 2 main disadvantages to a non-bus based TAM: the extra area required for the TAM and the extra complexity added to the layout stage of the SoC design flow. These 2 disadvantages can have an impact on the SoC, possibly resulting in increased time to silicon and increased overall cost in terms of silicon area. • A bus-based strategy is one where the system bus is reused to transport test stimuli and test responses across the chip. For a full scan design, using the stuck at fault (SAF) model, the test vectors can be separated into 2 main categories: the scan vectors and the functional vectors. Using the SAF model one of the signal lines within the digital core is stuck at a fixed logic value, regardless of inputs are applied to the core. Previous approaches [6-8] using bus based TAM strategies have mainly focused on carrying only functional vectors; therefore only functional tests are implemented. Functional testing on its own can only produce a limited test coverage, where in most cases it would be less than the recommended test coverage of 95% - 99.9% [9]. A bus based TAM strategy delivering both structural and functional test vectors [10] has been investigated, but core test data had to be buffered before it could be applied. Bus-Based TAM 61 strategies have had 2 distinct disadvantages: the width of the TAM is limited by the existing system bus structure and the type of test methods has been restrictive. We address both of these points in our proposal. There is no universal TAM scheme that can be applied to all SoC digital designs as each design has different requirements depending on its target market, fabrication process used, design flow used and budget and overhead allowances. The recently accepted IEEE 1500 Standard for Embedded Core Test (SECT) has an optional WPP which can be connected to a user defined TAM for faster test vector application, but the TAM cannot be defined in the standard as it is design dependent. The TAM design can be one of the most important aspects of a SoC test structure as it can have severe impacts on the overall TAT and the silicon area if not designed with careful planning and consideration. 3. Test Resource Partitioning Many resources are required to execute a SoC test. The amount of each resource used is dictated by the test resource-partitioning scheme in operation. Examples of test resources are as follows: test cost, test time, test power, test interface bandwidth, and TAM width. The above list is not exhaustive and may be different depending on the SoC design and test environment. • Test cost may contain many different elements such as test engineer costs, additional silicon area, and ATE cost which may all have an impact on the end cost of the SoC to the customer. Increasing overall test costs by a small fraction may give a competitor added advantages. • Test time is the amount of time that it takes to achieve reasonable test coverage of the SoC. • When a SoC is placed in test mode, more power may be dissipated than in normal operation. • The test interface bandwidth limits the amount of test data that can be transported on and off chip at any given time. An example of the importance of test interface bandwidth is the IEEE 1149.1 TAP where all test data had to be serialised and de-serialised due to restrictive test interface bandwidth limits. The IEEE 1500 has built in a WPP to accommodate a higher bandwidth test interface if required. • A wider TAM may in certain circumstances introduce additional interconnections and routing complexity. Each of these resources is closely associated with each other and placing a constraint on one resource will have direct consequences on other resources. Table 1 shows possible outcomes of increasing and decreasing certain test resources. Test Cost Increase Test Cost Decrease Test Time Increase Test Time Decrease Test Power Increase Test Power Decrease Test I/F Bandwidth Increase Test I/F Bandwidth Decrease TAM Width Increase TAM Width Decrease Test Cost Test Time Test Power Test I/F Bandwidth TAM Width ↑ ↓ ↑ ↑ ↑ ↓ ↑ ↓ ↓ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↓ ↑ ↑ ↑ ↓ ↓ ↑ ↑ ↑ ↑ ↑ ↓ ↓ ↓ ↑ ↓ ↑ ↑ ↑ ↓ ↑ ↓ ↓ ↓ ↑ ↓ ↑ ↑ ↑ ↓ ↑ ↓ ↓ ↓ Table 1: Test Resource Comparisons The test resource partitioning scheme can only be decided when the SoC design is known and the test requirements and constraints are decided. 62 4. IEEE 1500 Overview The IEEE 1500 [1] provides a scalable test architecture for embedded digital cores within a SoC. Access is provided to these embedded digital cores using the IEEE 1500 wrapper for controllability and observability of those cores. An IEEE 1500 wrapper can be used as a bridge between core users and core providers. A standard IEEE 1500 wrapped core is shown in Figure 2. Figure 2: Standard IEEE 1500 Wrapped Core[1] The main building blocks of the 1500 wrapper are shown in Figure 2. The WIR (Wrapper Instruction Register) enables all of the IEEE 1500 operations. The IEEE 1500 wrapper has several modes of operation. There are modes for functional (nontest) operation, inward facing (IF) test operation, and outward facing (OF) test operation. Different test modes also determine whether the serial test data mechanism (WSI–WSO) or the parallel test data mechanism (WPI–WPO), if present, is being utilised. The WBY (Wrapper Bypass Register) provides a bypass path for the WSI – WSO terminals of the WSP (Wrapper Serial Port). The WBR (Wrapper Boundary Register) is the data register through which test data stimuli are applied and pattern responses are captured. The WPP is used for increased data bandwidth to the wrapped core. 5. TRP Utilisation There has been much previous work [5] in the area of TRP for SoC designs. The resource that this paper has concentrated on is efficient TAM allocation to reduce the total test. An approach used by [3], TR_ARCHITECT, has been the basis for the TRP for TAM allocation in this work. There are two steps from TR_ARCHITECT used: CreateStartSolution and OptimiseBottomUp. The main constraint that has to be decided before the TAM allocation can be computed is the total TAM width. • T = TAM WIDTH The total number of cores in the SoC also needs to be known. • C = Total Number of Cores For each core ‘i’ (1 < i < C) in the SoC the number of primary inputs that will have a WBR (wrapper boundary register) cell when the IEEE 1500 wrapper is in place must be known. • ni = Core ‘i’ Primary Inputs with WBR cell 63 The number of scan flip-flops (fi) contained in each core is required to calculate a test time for each core along with the number of test patterns (tpi) for that core. • fi = Core ‘i’ number of scan flip flops • tpi = Core ‘i’ number of test patterns To calculate the test time for each core, several assumptions are made such as: each core is wrapped with an IEEE 1500 compliant wrapper; the amount of time that it takes to set-up the wrapper for test is 7 cycles (4 cycles for the instruction op-code and 3 cycles setup) plus 1 cycle per pattern to apply the test patterns to the core in normal functional mode; the scan chains contained in each of the cores are balanced so that test time can be reduced further; and each of the scan chains are to have a dedicated input via the WPP or WSP of the IEEE 1500 wrapper. Wpp denotes the width of WPP in bits. When only the WPP is used for test vector loading and unloading each individual core test time (ti) can be calculated as follows: • WPP = WPP width (bits) • ti = Core ‘i’ test time • ti = ((fi + ni)/ WPP) * tpi) + 7 + tpi The first step of the CreateStartSolution [3] step from the TR_ARCHITECT algorithm allocates the TAM bits one at a time giving each core access to one TAM line initially where T > C. If a core has access to only one TAM line, then there is only one scan chain in that core, whereas if a core has access to three TAM lines then the core would have 3 balanced scan chains. The test time (ti) for each core changes according to the amount of access it has to the TAM (i.e. more access to the TAM leads to more scan chains, therefore lowering test time). After the first allocation of TAM lines, the remaining TAM lines are allocated to the cores with the largest test times. Each time a core is allocated another TAM line, its test time is reduced. When all TAM lines have been allocated the OptimiseBottomUp [3] step from the TR_ARCHITECT algorithm is applied to distribute the TAM lines evenly between cores. After the initial allocation of the TAM lines some cores have a larger test time than others. The goal of the OptimiseBottomUp algorithm is to make the test time on each TAM lines as even as possible. To achieve this, the 2 TAM lines are found with the smallest test time, the test(s) that are performed on the smallest test time TAM line are added to the 2nd smallest test time TAM line, therefore freeing up the TAM line that previously had the smallest test time. The freed up TAM line is allocated to the core that currently has the largest test time, thus reducing the test time. One TAM line may service several different cores for test if required. This process repeats until the test times on each of the TAM lines are equal (or as close as possible). 6. Optimisation of the IEEE 1500 Wrapper and user defined TAM In this novel IEEE 1500 and TAM optimisation a bus based TAM is used. The advantages of the bus based TAM have been described in section 2. The benchmark test circuits that were used for this paper can be found at [11]. The system bus that is incorporated in this benchmark circuit is the AMBA APB bus [12]. The Test Interface Controller (TIC) and re-use of the AMBA bus previously implemented by ARM [6-8] is similar to the novel TAM architecture presented in this paper, but this architecture not only delivers functional test vectors but also the scan vectors. The benchmark circuit has been calculated to have a complexity factor of 131 according to the naming format stipulated by [13], therefore this circuit is one of low complexity and ideal to demonstrate the advantages of a bus based TAM where additional overheads required for test must be kept to a minimum. An overview of the proposed novel test architecture is shown in Figure 3. 64 System Bus / TAM Figure 3: Proposed Debug/Test Architecture The test structure consists of each core being wrapped using an IEEE 1500 compliant wrapper, an ‘input TAM’ to deliver the test vectors and an ‘output TAM’ to collect the test responses. Each scan chain has an input and output, therefore the number of bits required for the ‘output TAM’ must equal the number of bits for the ‘input TAM’. In a worst case scenario, where an 8 bit system is in operation using the AMBA APB bus, calculations have been made to determine which signals could be reused from the system for a bus based TAM. It has been calculated that in the worst-case scenario, 15 signals could be re-used from the AMBA APB bus for the ‘input TAM’ and 8 signals could be re-used for the ‘output TAM’. Therefore an additional 7 bits would have to be added to the ‘output TAM’ so that the number of bits of the ‘input TAM’ would equal the ‘output TAM’. These additional bits are not part of the bus based TAM as they are not system bus signals that are being re-used. The WPP of each IEEE 1500 wrapper is used to deliver and collect test responses, as a higher bandwidth is provided using the WPP than the WSP. The wrappers have also implemented the WSP to comply with the IEEE 1500 standard and to control the WIR, WBY and the WBR. The 1500 wrapper can be configured in 3 different ways for test vector loading and unloading: 1. WSP only mode, 2. WPP only mode and 3. WSP & WPP hybrid mode. To illustrate the different wrapper configurations, a core (SPI Core) from an in-house benchmark SoC is considered. The test characteristics of the SoC are shown can be found at [11]. In WSP only mode (Figure 4), all test vectors and wrapper instructions are delivered via the WSP. In this mode the core can only have one scan chain and the test data is delivered in a serialised format. 65 Figure 4: Wrapper in WSP Only Configuration In WPP only mode (Figure 5), the test vectors are delivered via the WPP, but instructions are still delivered via the WSP. The number of balanced scan chains that the core contains determines the width of the WPP: if the core has 3 balanced scan chains, the WPP is 3 bits wide. Figure 5: Wrapper in WPP Only Configuration The final mode of operation combines the WSP and WPP (Figure 6); the wrapper instructions are delivered via the WSP, the test vectors are then delivered via the WSP and WPP to the balanced scan chains making full use of the available TAM resources. Figure 6: Wrapper in WSP & WPP Configuration 66 6.1 Wrapper/TAM Optimisation Comparison The theoretical experimental results for the in-house SoC design, that are shown in Table 2 are for the case of using; WPP to load and unload test vectors or WSP to load and unload test vectors, or a combination of WSP and WPP using both distributed and multiplexed TAM structures. Table 2 gives a summary of the 5 different IEEE 1500 wrapper/ TAM optimisation investigated A: B: C: D: E: Test Architecture WSP Only (Distributed TAM) WPP Only (Distributed TAM) WPP Only (Multiplexed TAM) WSP & WPP (Multiplexed TAM) WSP & WPP (Distributed TAM) TAT 697261 87697 85758 80792 68548 % TAT decrease 87.423 87.701 88.413 90.169 Table 2: Wrapper and TAM mode comparisons The TAT for the WSP only (Distributed TAM) architecture is used as a lower bound for the TAT and column 3 of Table 2 represents the percentage decrease that each test architecture has on the overall TAT. The % decrease ranges from 87.423% to 90.169%. The TAT can be calculated for each core in the SoC for all test architecture shown in Table 2 using equations 6.1.1 – 6.1.5: ti = ((fi + ni) * tpi) + 7 + tpi (Test Architecture A TAT calculation) (6.1.1) ti = ((fi + ni)/ WPP) * tpi) + 7 + tpi (Test Architecture B TAT calculation) (6.1.2) ti = ((fi + ni)/ WPP) * tpi) + 7 + tpi (Test Architecture C TAT calculation) (6.1.3) ti = ((fi + ni)/ (WPP + 1)) * tpi) + 7 + tpi (Test Architecture D TAT calculation) (6.1.4) ti = ((fi + ni)/ (WPP + 1)) * tpi) + 7 + tpi (Test Architecture E TAT calculation) (6.1.5) 7. Control of the IEEE 1500 Wrappers and user defined TAMs Figure 7: Test Controller Figure 3 shows the proposed novel SoC debug/test architecture. In this architecture an intelligent test controller is required to control the TAM and also the WSC (wrapper serial control) ports of each core’s 1500 wrapper. A block diagram of the intelligent test controller is shown in Figure 7. 67 The test controller is based on the well known IEEE 1149.1 TAP state machine[14]. A conventional 1149.1 TAP state machine has an instruction register and a data register whereas this test controller has an instruction register and control and status registers instead of the data register but is still utilizing the 16 state state-machine. An additional PTDI (parallel test data in) and PTDO (parallel test data out) is provided to allow for a higher bandwidth port for test vector application and test vector response for all cores in the SoC. The total TAM within the SoC is divided up into TAM sections. Each TAM section has its own TAM section state machine to control the TAM and WSC ports associated with it. Each TAM section is operated independently of each other. Each TAM section also has its own WSC port for the cores on that TAM section. If the there is more than one core on a TAM section then further control signals are needed for that TAM section i.e. CoreSelect, to enable the individual selection of a core’s wrapper. The test vectors are transmit to the cores on a TAM section using the appropriate PTAM_ip (parallel TAM input) lines and the test responses are transported via the appropriate PTAM_op (parallel TAM output) lines. The number of PTAM_ip and PTAM_op lines associated with each TAM section is derived from the TAM optimisation scheme used. Each TAM section state machine also has a small piece of memory that has the necessary core test information stored to carry out tests on the cores associated with the cores on that TAM section. This information includes: the number of cores, length of the longest scan chain in each core, number of WBR cells for each core and also the number of test patterns for each core. There is section of memory for the core test data when the cores only have a mandatory serial test interface and there is a section of memory that contains information about each core when it has a hybrid interface, i.e. a serial and parallel interface combined. Each TAM section state machine supports all of the IEEE 1500 mandatory test modes and also some additional hybrid test modes with higher bandwidth interfaces to reduce TATs. The intelligent test controller is required as there is no mechanism specified by the IEEE 1500 standard for the control of multiple wrappers in a digital SoC. Not including an intelligent test controller would require bringing all WSC signals for each wrapper to the SoC primary inputs and primary outputs for external control, resulting in the addition of physical pins which are already under tight constraints. 8. Future Work & Conclusion The results in Table 2 show that combining WSP and WPP for test vector application and test response collection provides the best time for overall TAT. This is based on a distributed TAM approach with the TRP TR_ARCHITECT algorithm applied. The bus based approach introduced in this paper is based on adding an additional seven lines to the ‘output TAM’ that is not part of the system bus. If a multiplexed TAM approach is used, each core in the system requires access to these additional seven TAM lines, introduces interconnection complexity (i.e. additional silicon) and routing overheads. Using the TAM TRP approach the overall TAT decreases by 90.169% compared to a WSP only approach with a distributed TAM. In addition to this considerable reduction in TAT, the silicon cost may also be reduced due to the lower interconnection complexity of the bus-based TAM and by carefully planning the layout, placing the additional seven ‘output TAM’ lines closest to the test data source/sink (test controller or TAP). The addition of the intelligent test controller IP to a digital SoC design will allow each of the digital cores within the system to be managed according to the TAM allocation scheme. The test controller could be added to any digital design, where all the cores are wrapped with an 68 IEEE 1500 compliant wrapper alleviating the need for an expensive piece of ATE to control each of the core’s wrappers within the digital SoC. This TRP approach only focuses on the TAM resource, further analysis of this TAM allocation technique would have to take power into consideration; for example activating too many cores in parallel may increase power consumption considerably, perhaps to an unacceptable level. Future work involves applying this novel approach to other benchmark circuits using a system bus architecture and eventually bringing the L131 (in-house SoC design) circuit to fabrication, so that the viability of the approach can be verified on silicon and with external tester technology. Before fabrication the approach has to be verified on a FPGA. The authors have already shown that it is possible to validate a full scan design on FPGA [15] To further validate this novel architecture it would also be necessary to replace the bus based TAM with a traditional non-bus based TAM to generate figures for silicon overheads and interconnection complexity. Acknowledgement The authors acknowledge the support of the CSRC and the Department of Electronic and Computer Engineering at the University of Limerick. This project has been funded under an Enterprise Ireland Commercialisation Fund - Technology Development Grant: CFTD/05/315 (GENISIT). References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] IEEE, "IEEE Standard Testability Method for Embedded Core-based Integrated Circuits," in IEEE Std 1500-2005, 2005, pp. 0_1-117. E. Larsson, K. Arvidsson, H. Fujiwara, and Z. Peng, "Efficient test solutions for core-based designs," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 23, pp. 758775, 2004. S. K. Goel and E. J. Marinissen, "Effective and efficient test architecture design for SOCs," presented at Test Conference, 2002. Proceedings. International, 2002. J. Aerts and E. J. Marinissen, "Scan chain design for test time reduction in core-based ICs," presented at Test Conference, 1998. Proceedings. International, 1998. K. Chakrabarty, V. Ivengar, and A. Chandra, Test Resource Partitioning for System-on-a-Chip: Kluwer Academic Publishers, 2002. D. Flynn, "AMBA: enabling reusable on-chip designs," Micro, IEEE, vol. 17, pp. 20-27, 1997. P. Harrod, "Testing reusable IP-a case study," presented at Test Conference, 1999. Proceedings. International, 1999. A. Burdass, G. Campbell, R. Grisenthwaite, D. Gwilt, P. Harrod, and R. York, "Microprocessor cores," presented at European Test Workshop, 2000. Proceedings. IEEE, 2000. A. L. Crouch, Design-for-Test for Digital IC's and Embedded Core Systems. New Jersey: Prentice Hall, 1999. A. Larsson, E. Larsson, P. Eles, and Z. Peng, "Optimization of a bus-based test data transportation mechanism in system-on-chip," presented at Digital System Design, 2005. Proceedings. 8th Euromicro Conference on, 2005. CSRC, "http://www.csrc.ie/Documents/tabid/141/Default.aspx," 2006. ARM, "AMBA Specification. http://www.arm.com/products/solutions/AMBAHomePage.html," ND. ITC, "http://www.hitech-projects.com/itc02socbenchm/," 2002. IEEE, "IEEE standard test access port and boundary-scan architecture," in IEEE Std 1149.1-2001, 2001, pp. i-200. B. Mullane, C. H. Chiang, M. Higgins, C. MacNamee, T. J. Chakraborty, and T. B. Cook, "FPGA Prototyping of a Scan Based System-On-Chip Design," presented at Reconfigurable CommunicationCentric SoCs 2007, Montpellier, France, 2007. 69 Session 3 Applications 71 MemoryLane: An Intelligent Mobile Companion for Elderly Users Sheila Mc Carthy 1, Paul Mc Kevitt 1, Mike McTear 2 and Heather Sayers 1 Intelligent Systems Research Centre School of Computing and Intelligent Systems Faculty of Computing & Engineering University of Ulster, Magee Derry/Londonderry BT48 7JL Northern Ireland {McCarthy-S2, p.mckevitt, hm.sayers} @ulster.ac.uk 1 School of Computing and Mathematics Faculty of Computing & Engineering University of Ulster, Jordanstown Newtownabbey BT48 7JL Northern Ireland [email protected] 2 Abstract Mobile technologies have the potential to enhance the lives of elderly users, especially those who experience a decline in cognitive abilities. However, diminutive devices often perplex the aged and many HCI problems exist. This paper discusses the development of a mobile intelligent multimodal storytelling companion for elderly users. The application, entitled MemoryLane, composes excerpts selected from a lifetime’s memories and conveys these past memories in a storytelling format. MemoryLane aims to possess the capability to produce bespoke stories that are both appropriate and pleasing to the user; this paper documents the proposed methodology and system design to accomplish this. As MemoryLane is expected to be deployed on a Personal Digital assistant (PDA), the preliminary field work to date investigating the usability of PDAs by elderly users is also discussed. Keywords: Digital Storytelling, Multimodal, Elderly, Usability, MemoryLane. 1 Introduction The elderly population is dramatically increasing, especially in the more economically developed countries of the world and Ireland is no exception, according to the 2002 census [Department of Health and Children, 2007] there are 436,001 people aged 65 and over living in Ireland, an increase of 22,119 since the previous census of 1996. It is well accepted that with age there is often an associated cognitive decline, which varies among individuals, affecting abilities such as memory and planning. For example, severe cognitive decline in the form of dementia currently affects 1 in 20 over the age of 65, 1 in 5 over the age of 80, and over 750,000 people in the UK [Alzheimers Society, 2006]. Cognitive decline is an inherent part of the natural ageing process ensuring that the numbers of sufferers increase steadily as the elderly population grows. Catering for such a diverse sector requires detailed analysis. 72 Reminiscence plays an important role in the lives of elderly people; many perfect the art of storytelling and enjoy its social benefits. The telling of stories of past events and experiences defines family identities and is an integral part of most cultures. Losing the ability to recollect past memories is not only disadvantageous, but can prove quite detrimental, especially to many older people. Ethnographical studies rely on participants’ powers of recall to successfully conduct their research, and often bear witness to the intangibility of precious memories. Considerable research is being conducted into how technology can best serve and assist the elderly. Pervasive environments (smart homes with smart appliances) are being developed to assist elderly users to remain living independently in their own homes while maintaining a high quality of life. This, in turn, minimises the emotional and financial strain often caused by nursing home accommodation. Memory prompts have been developed to remind users to perform imminent activities and the prospect of personal artificial companions has often been proposed [Wilks, 2005]. Mobile technology is commonplace and offers the potential to be harnessed as a tool to assist many of these elderly people. However, diminutive devices often perplex the aged and many usability problems exist. Consequently this potential is very often not maximised. The aim of this research is to develop a usable, mobile intelligent multimodal companion for elderly users. Due to the known benefits of reminiscence among the elderly, the objective of the companion will be to assist the elderly in recalling their own past life events and memories as they experience the natural cognitive declines associated with the ageing process. The application is entitled MemoryLane and will employ digital storytelling techniques to relay the memories to the user. MemoryLane will be deployed on a Personal Digital Assistant (PDA) which will equip users with the ability to re-live bygone days, and the portability to relay them to others. The application will also address the usability problems encountered by the elderly when using mobile devices. In addition to this, it is envisaged that MemoryLane could posthumously be inherited by family members and drawn on to revive the memory of a loved one. This paper will discuss the background areas and related work to the research, the system design, the work accomplished to date, and the remaining challenges. 2 Background and Related Research The focus of this research is underpinned by several distinct research areas including gerontechnology, HCI, usability studies, memory, reminiscence, life-caching, pervasive computing, mobile companions, ethnography, digital storytelling, artificial intelligence and multimodality. A background to these areas is now provided. 2.1 Intelligent Storytelling Traditionally, intelligence is perceived as problem solving techniques, where composing and listening to ‘stories’ may be construed as a peripheral aspect of intelligence. However the term ‘intelligent’ implies having the ability to relay appropriate information, of particular relevance to the user, in a suitable context and format [Schank, 1995], such an ability is also a critical feature of intelligent storytelling. Humans possess an intrinsic desire to both tell and hear stories. It is widely accepted that children are especially fond of stories yet adults too love to read or watch stories in various formats. Schank [1995] observes that it is essential for people to discuss what has happened to them and to hear about what has happened to others, especially when such experiences directly affect the hearer, or the teller is known personally. Schank [1995] considers the connotations of how recalling past stories shape the way in which new ones are heard and interpreted, he also endeavors to develop storytelling systems which not only have appealing stories to relay, but encompass the awareness to know when to tell the stories. Indeed Schank’s work [Schank, 1995] forms the basis of various other storytelling systems. Intelligent storytelling systems very often incorporate multimodality and interactivity for a rich user experience. Larsen & Petersen [1999] developed multimodal storytelling environment in which the 73 user traverses a virtual location in subjective camera view and is both active story-hear and storyteller. Similarly, the Oz project [Loyall, 1997] also allows the user to interact with a virtual environment called ‘The Edge of Intention’, a peculiar world populated by 4 ellipsoidal creatures called Woggles. The user embodies one of the Woggles, the remaining 3 being controlled by the computer. KidsRoom by Bobick et al. [1996] is also typical of interactive multimodal storytelling systems. KidsRoom is a fully-automated, interactive narrative play-space for children. Images, lights, sound, and computer vision action recognition technology are combined to transform a child's bedroom into a curious world for interactive play. Such storytelling systems enable the user to dynamically interact during storytelling, allowing them to play pivotal roles in the proceedings. However, in contrast to this genre of storytelling systems, which focus largely on story scripts, Okada [1996] developed AESOPWORLD. This storytelling system is not interactive, moreover it aims to model the mind, developing human-like intelligence, and modelling the activities of the central character accordingly. STORYBOOK by Callaway & Lester [2002] uses a narrative plan to convert logical representations of the characters, props and actions of a story into prose. MemoryLane will draw on the intelligent storytelling techniques discussed in this section to relay memories to the user. 2.2 Gerontechnology Due to the increasing numbers of the elderly population they have become the focus of much research designed to improve, prolong and enhance their lives. Gerontology is the study of elderly people and of the social, psychological and biological aspects of the ageing process itself, as distinct from the term Geriatrics, the study of the diseases which afflict the elderly. Gerontechnology, the merger between gerontology and technology is a newer genus, concerning itself with the utilisation of technological advancements to improve the health, mobility, communication, leisure and environment of elderly people, effectively allowing them to remain living independently in their own homes for longer. Stanley & Cheek [2003] discuss what is understood by the ‘well-being’ of the elderly in their comprehensive literature review. Therefore gerontechnology is heavily concerned with the ways in which elderly people interact with computers and technology, and substantial research is being conducted in this area. Willis [1996] discusses cognitive competence in elderly persons, while Melenhorst et al. [2004] investigated the use of communication technologies by elderly people and explored their perceived and expected benefits. Fisk & Rogers [2002] discuss how psychological science might assist with the issues of age-related usability, and Van Gerven et al. [2006] formulates recommendations for designing computer-based training materials aimed at elderly learners. In a recent paper, Zajicek [2006] reflects upon established HCI research processes and identifies certain areas in which this type of research differs significantly from other research disciplines. Pervasive environments designed to assist older people to live independently and maintain a high quality of life have been developed. Search engines have been specifically designed for elderly users [Aula & Kaki, 2006], and many pervasive gadgets are evident, including a meal preparation system [Helal et al., 2003], a self monitoring teapot [AARP, 2005] and a hand held personal home assistant capable of controlling a range of electronic devices in the home [Burmester et al., 1997]. By implementing MemoryLane, we hope to add to the large body of gerontechnology research. 2.3 Digital Memories Digital memory aids have been designed to assist users in various ways, acting as digital companions, especially in later life. The value of such devices was initially debated by Bush [1945], and has since been deliberated and discussed by Wilks [2005]. In addition to digital memory aids, memories themselves are being digitalised. Nokia provide a digital photo album, often utilised by the blog community to organise photos and videos to a timeline. Kelliher [2004] discuss an online weblog populated by the daily submissions of events experienced by a group of camera phone using participants. An experiment which digitalises and stores the lifetime memories of one man is being conducted by Gemmell et al. [2006], and another of the UKCRC’s Grand Challenges is focused in this area (GC3 project). The GC3 project aims to gain an insight into the workings of human memory and 74 develop enhancing technologies. Incidentally, this project also envisages featuring personal companions in the next 10 to 20 years, using information extracted from memories to aid elderly persons as senior companions for reminders. SenseCam [Hodges et al., 2006] is a revolutionary pervasive device, which aims to be a powerful, retrospective memory aid. SenseCam is a sensor augmented, wearable, stills camera, worn around the neck, which is designed to record a digital account of the wearer’s day. SenseCam will take (wide-angle) photographs automatically every 30 seconds, without user intervention, and also when triggered by a change in the in-built sensors, such as a change in light or body heat. The rationale behind SenseCam is that having captured a digital record of an event, it can subsequently be reviewed by the wearer to stimulate memories. Dublin City University’s Centre for Digital Video Processing (CDVP) is currently using two sensecams in their Microsoft funded ‘personal life recording’ research project. MemoryLane will use similar ‘lifecached’ data to compose personal digital memories for output. 2.4 Usability Studies Myriad HCI usability studies are being conducted in the area of computers and the elderly, but substantially less are being conducted into the specifics of how the elderly interact with pervasive devices, despite the fact that active researchers within this area have discussed the benefits of mobile devices to the elderly, and have highlighted the need to learn more to design for this genre [Goodman et al., 2004]. An initial PDA usability study conducted by Siek et al. [2005] compared differences in the interaction patterns of older and younger users. This work attempted to ascertain whether older people, who may be subject to reduced cognitive abilities, could effectively use PDAs. However, this initial research was conducted with a small sample of 20 users, made up from a control group of 10 younger users aged 25-35, and 10 elderly users aged 75-85 years. The study was restricted to the monitored analysis of the participants’ abilities to perform 5 controlled interactive tests using a ‘Palm Tungsten T3 PDA’. The findings of this basic study failed to identify any major differences in the performance of the two groups which could be due to the fact that the elderly group was extended extra practice time privileges. Siek et al. work [Siek et al., 2005] offers an early insight into the nature of the proposed field work for this research. A study conducted into determining the effects of age and font size on the readability of text on handheld computers is also of particular interest [Darroch et al., 2005]. Additional research has been conducted into mobile phone usage by the elderly; usability issues identified include displays that are too small and difficult to see, buttons and text that are too small causing inaccurate dialling, non user-friendly menus, complex functions and unclear instructions resulting in limited usage, usually reserved for emergencies [Kurniawan et al., 2006]. Research shows that mobile devices that are not designed to include the needs of the elderly have the potential to exclude them from using the device, therefore it is imperative that MemoryLane be developed using a user-centred approach. 2.5 Ethnographical Studies Cultural probes and props such as photographs and memorabilia are often used in ethnographical studies to prompt participants. The benefits of photo elicitation have been widely acclaimed by Quigley & Risborg [2003] who document tremendous success with the elderly users of their digital scrapbook. The work conducted by Wyche et al. [2006] also employs cultural probes in a ‘historicallygrounded’ research approach to designing pervasive systems and assistive home applications which present findings from an ethnographic study which examined ageing and housework. The study employed a physical ‘memory scrapbook’ as seen in Fig. 1, and used photo elicitation to provoke responses from elderly participants. The memory scrapbook was constructed from an 8.5 x 11 inch, fabric bound volume and was filled with dated images and memorabilia applicable to the focus of the study. Approximately 100 photos, greeting cards, magazine snippets, advertisements and other mementos were displayed. Wyche et al. [2006] found that the images contained in the memory scrapbook stimulated the memories of participants and evoked deep elements of human consciousness which yielded rich user experiences. It is envisaged that cultural probes be used in a similar way during subsequent ethnographical studies for MemoryLane to elicit oral histories from participants. 75 Fig. 1. The Memory Scrapbook [Wyche et al., 2006] 3 MemoryLane Design and Architecture MemoryLane will accept various media objects as input, personal items applicable to the history of the user such as photographs, video clips, favourite songs or even a favoured poem. These objects together with personal details and preferences of the user will be intelligently utilised in the composition of a story told for the pleasure of the user. MemoryLane needs to mimic the notion of understanding to compose appropriate and interesting stories and respond effectively to the user. People have a memory full of experiences that they may wish to recount and relay to others. MemoryLane needs to create an account of the right ones to tell in anticipation of their eventual use. The platform for deployment is a PDA, which would enable the users to carry their memories in a mobile companion. A visual concept of MemoryLane is depicted Fig. 2. Fig. 2. Concept of MemoryLane The need for multimodal intelligent user interfaces has been identified and embodied in various applications such as landmark project SmartKom [Wahlster, 2006]. In accordance with this requirement, it is envisaged that MemoryLane be designed to support multimodal input via a touch screen and possible use of simple voice control commands. The benefits of multimodal interaction are widely discussed by López Cózar Delgado & Araki [2005], and the design of MemoryLane will assure a multimodal interface which will accommodate elderly users with different capabilities, expertise or expectations. MemoryLane will also provide multimodal output in the form of images, video, audio and text to speech synthesis. There are several security and privacy aspects of MemoryLane which will require definition during MemoryLane’s development phase such as, ownership of the media and the rights of individuals present in other people’s memories. The Unified Modelling Language (UML) will be used as a method for designing the application incorporating use cases and the standardised graphical notation to create an abstract high level model of MemoryLane as a whole. 3.1 Artificial Intelligent Techniques for Storytelling MemoryLane will incorporate Artificial Intelligence (AI) techniques to compose life-caching data into appropriate and pleasing ‘stories’ for the user. It is vital that stories are constructed in an intelligent way, so that they (a) make sense, and (b) don’t include erroneous data objects that do not belong to the history of the current user. Case-Based Reasoning (CBR) and Rule-Based Reasoning (RBR) will be employed for the decision making in MemoryLane. Decision making will be necessary to 76 appropriately compose the various input data objects into personalised stories. MemoryLane needs to be aware of sensitive data, how to handle it, and be able to accommodate the preferences of the users. Speech processing can be divided into several categories, two of which are related to this research: speech recognition, which analyses the linguistic content of a speech signal, and speech synthesis, the artificial production of human speech. Speech recognition will be investigated as a possible user input mode, however speech recognition is notoriously difficult, the main problem being that speech recognition systems cannot guarantee as accurate an interpretation of their input as systems whose input is via mouse and keyboard [McTear, 2004], and the varying speech abilities of the elderly may cause problems in this area. MemoryLane will employ Text to Speech (TTS) to convert normal language text into speech for both verbal directions to guide user interaction and as part of the memories output to the user. Speech synthesis systems allow people with visual impairments or reading disabilities to listen to written works which will prove beneficial in systems designed for the elderly, however, speech synthesis systems are often judged on intelligibility, and their similarity to the human voice [McTear, 2004]. 3.2 MemoryLane Architecture The architecture as depicted in Fig. 3 visually represents the data flow of MemoryLane. To begin, the elderly user interacts with the AI multimodal interface and inputs a request to view a memory. This request is transmitted to the AI decision making module, which uses RBR and CBR to interpret the user’s request. The decision making module will first establish if the request is for a previously viewed memory (saved as a favourite) or for a new, (previously un-composed) memory. The decision making module will then either retrieve a complete previously seen ‘favourite’ memory, or the data objects required to compose a new one from storage. The decision making module will also commit favourite memories to file for future viewing. The user’s previously input personal data objects (images, audio, video and text) are stored on the storage module and are made available to the decision making module. The decision making module uses its rule bases to compose a memory for output in association with the personal user information stored by MemoryLane. This memory transcript is transmitted to the memory composition module which will design the memory output in a ‘storytelling’ format, using speech processing if required. The formatted memory is then relayed to the multimodal interface which will output the memory to the user. The multimodal interface also transmits and records user information during user interaction, for example, MemoryLane may record the preferences of the user for subsequent usage. Fig. 3. MemoryLane Architecture 77 3.3 Software Analysis It is envisaged that MemoryLane will be coded using the Visual Studio developer suite. The utilisation of X+V the latest addition to the XML family of technologies for user interface development, will be investigated for its usefulness to the project as will the various development platforms as discussed by McTear [2004]. It is also envisaged that SPSS (Statistical Package for the Social Sciences) be used in the statistical modelling of the data. A variety of handheld devices, such as smart phones and tablet PCs, may be investigated for their usefulness to the project; however the preferred hardware PDA device is a Dell Axim™ X51v-624 MHz Handheld which runs the Windows Mobile 2005 operating system. The Axim has a colour touch screen, stylus, and navigational input buttons. 3.4 Usability Evaluation The completed MemoryLane application will be deployed on a PDA device for testing and evaluation. The preferred deployment platform, a DELL Axim X51v PDA device is pictured beside an impression of the proposed MemoryLane prototype in Fig. 4. In the final phase of the project it is hoped to conduct a usability evaluation of the PDA based MemoryLane prototype with a section of original participants from the field study which evaluated the usability of a PDA. Fig. 4. Dell Axim X51v PDA 4 Usability of PDAs The initial stage of this research began with a preliminary HCI pilot study conducted with a sample of elderly users and aimed at investigating the usability of a PDA. Prior to conducting interviews many preliminary visits were initially required to gain trust and build a rapport with the elderly participants. The pilot study sample comprised 15 participants of apparent good health. The sample was aged between 55 and 82 years and included 6 males and 9 females. Participants were selected from four different sources; 6 attended an Age Concern centre, 3 were members of The University of the 3rd Age, 2 were day patients of a local Nursing Unit and the remaining 4 were selected at random from responses received from volunteers. Each participant was interviewed separately in a one-to-one structured interview format in familiar surroundings. The interviews involved completion of a detailed questionnaire, a demonstration of how to interact with a PDA by the researcher, followed by observation of participants’ capability in attempting to complete pre-set interactive PDA tasks. Initial research for the questionnaire design discovered that questions requiring prose type answers took participants too long to complete, during which they often became frustrated and seemed to prefer yes/no or tick box answers. Prose answers also proved ambiguous and often difficult to quantify, therefore the questionnaire followed the 5 point Likert- type scale giving participants 5 optional answers. The ensuing questionnaire was divided into sections A and B. Section A of the questionnaire was designed to acquire background information regarding participants’ physical characteristics, socio-economic factors, perceived technical abilities, prior exposure to technology and personal opinions of modern day technology. Section B of the questionnaire was designed to be completed in conjunction with undertaking the interactive PDA tasks; this section determined the participant’s ability to complete the set tasks and ascertained their HCI preferences. This section centred on questions regarding preferred interaction modalities and aspects and elements of the PDA hardware and software. As part of section B, participants were asked to attempt 6 basic tasks on the PDA as 78 illustrated in Fig. 5. This section of the interview was videotaped where possible, in conjunction with the participant’s approval. Fig. 5. Participant Interacting with PDA It was clear from the outset that the participants found the PDA extremely complicated to use and had difficulty even knowing where to start; no one found the interface instinctive or intuitive. This was evidenced by the level of assistance requested and given. Despite the functionality of a PDA being demonstrated beforehand, not one of the participants could carry out even the most basic of tasks unaided. There was also a noticeable level of general disinterest in applications hosted on the PDA; none were of particular personal appeal to the participants. For example most thought that its functions as a calendar or diary were of little interest as they preferred a pen and diary. When asked, many agreed that they would certainly be more interested, and inclined to engage with the PDA if it provided an application of personal interest, such as MemoryLane. However, despite participants initially expressing concern about being unable to partake in the study due to their lack of computer knowledge, and the difficulties incurred during the tasks, many participants said they actually enjoyed the experience of PDA interaction. Most felt that their skills would improve if they had more time with the PDA and some expressed a desire to learn more about a PDA given the desired surroundings and instructor. The portability of a PDA appealed to the majority of participants who remarked on it being ‘small enough’ to fit into a handbag or breast pocket. This would imply that many elderly users possess a genuine interest in engaging with mobile technologies and that a PDA has a certain appeal to many elderly people, however, due to complex interfaces and interactions, many choose not to experiment with such devices. These findings suggest that the interface for MemoryLane must strive to be simplistic, usable and intuitive to be successfully deployed on a PDA. 5 Relation to Other Work Mobile devices that are not designed to include the needs of the elderly users have the potential to exclude them from using such devices. Technologies are often developed for elderly users without specific usability studies having been conducted with target users, and are typically based on generic HCI guidelines. Minimal usability studies focus on elderly users’ interaction with mobile devices [Goodman et al., 2004] and those that have are small scale [Siek et al., 2005]. This research aims to incorporate a large sample and perform a detailed analysis in a bespoke usability study using the intended hardware conducted with the target audience prior to developing the application. MemoryLane will then be designed and implemented in a storytelling format based on the specific findings of the study. This research also aims to deploy MemoryLane to a PDA - rarely used in Gerontechnology, and as yet no PDA based multimodal storytelling companion, which takes existing memory data and builds it into a coherent story for users, exists. Most existing memory assistive devices are prompts for current or future events [Morrison et al., 2004]; MemoryLane will be a multimodal reminder of memories and past events. Therefore the contributions of this research are a set of design guidelines for PDA based applications for the elderly users and multimodal storytelling of memories and past events. 79 6 Conclusion & Future Work This paper provides a summary of issues relating to the development of MemoryLane. The objectives of MemoryLane, in providing a usable, intelligent mobile companion for elderly users have been defined, and the importance of reminiscence to the elderly clearly stated. The work completed to date has largely centred on requirements gathering, the first stage of which took the form of an investigative study into the usability of PDAs by the elderly and the second phase of requirements gathering, a field study which will investigate reminiscence patterns among the elderly is currently underway. This next phase of requirements gathering is concerned with eliciting the user requirements for MemoryLane. In order to develop a system which presents users with digital accounts of their memories, it is first important to see how people reminisce and recall their episodic memories. This study will establish what the users require from such an application and will form the basis of the design and implementation of MemoryLane. The study will also initiate storytelling and reminiscence to elicit oral histories of the past lives and experiences of the elderly participants. Video-taped informal focus groups will be conducted, at which, there will be guided open discussion. Questionnaires will not be used at this point to avoid incorporating bias and inhibiting the flow of conversation. Participants will be observed to ascertain how well they remember, and the manner in which they recount their memories. The participants will also be observed to elicit the emotions and feelings that reminiscence evokes, to note if the experiences are pleasant or uncomfortable; MemoryLane can then incorporate procedures to handle sensitive data. The focus sessions will also aim to establish any omissions, similarities, patterns or trends in the discourse of participants. A bespoke ‘memory scrapbook’ will be constructed and used in the next phase in this research. Photographs and mementos of by-gone eras, applicable to the socio-economic climate of the area will be included in the scrapbook. Cultural probes, everyday artefacts from bygone days, will also be used in the study to provoke responses from participants. Participants will be asked about their ability to recall memories prior to using the scrapbook and then, in contrast, whilst using the scrapbook as a visual aid and prompt. The hypothesis is that the latter discussions, with the scrapbook, will elicit far richer oral histories than discussion based on recollect alone. The remaining challenges of the research will be to implement the design for MemoryLane while adopting a user-centred methodology. The development process will be iterative in nature, requiring repeated evaluations with the elderly sample, and will incorporate the findings of the two field studies. Acknowledgments: The authors would like to express gratitude to Dr. Norman Alm for his input and to Dr. Kevin Curran and Professors Bryan Scotney and Sally McClean for their valuable advice and guidance. The authors would also like to extend appreciation to the pilot study participants who took the time to contribute to the research. 7 References [AARP, 2005] AARP (2005). Japan: i-pot—A Virtual Tea for Two [Homepage of AARP], [Online]. Available at: www.aarp.org/international/agingadvances/innovations/Articles/06_05_japan_ipot.html [Alzheimers Society, 2006] Alzheimers Society, (2006). Facts about Dementia [Online] Available at: http://www.alzheimers.org.uk/ [Aula & Kaki, 2006] Aula, A. & Kaki, M. (2006). Less is more in Web search interfaces for older adults, First Monday, [Online], vol. 10, no. 7. Available at: http://www.firstmonday.org/issues/issue10_7/aula/ [Bobick et al., 1996] Bobick, A., S. Intille, J. Davis, F. Baird, C. Pinhanez, L. Campbell, Y. Ivanov, A. Schtte & A.Wilson (1996). The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment. In PRESENCE: Teleoperators and Virtual Environments, 8(4): 367-391 [Burmester et al., 1997] Burmester, M., Machate, J. & Klein, J. (1997). Access for all: HEPHAISTOS - A Personal Home Assistant, Conference on Human Factors in Computing Systems, CHI '97 extended 80 abstracts on Human factors in computing systems: looking to the future, Atlanta, Georgia, USA, ACM Press, New York, USA, 36 - 37. [Bush, 1945] Bush, V. (1945), The Atlantic Monthly Group, Boston, USA, As We May Think, The Atlantic Monthly. [Callaway & Lester, 2002] Callaway, C. & Lester, J.C (2002). Narrative Prose Generation. Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence. Seattle, USA. [Darroch et al., 2005] Darroch, I., Goodman, J., Brewster, S. & Gray, P. (2005). The Effect of Age and Font Size on Reading Text on Handheld Computers, Proceedings of Interact 2005, Rome, September 2005. Springer Berlin, Heidelberg, 253-266. [Department of Health and Children, 2007] Department of Health and Children. (2007). Population of Ireland: summary statistics for census years 1961-2002 [Online] Available at: http://www.dohc.ie/statistics/health_statistics/table_a1.html [Fisk & Rogers, 2002] Fisk, A.D., & Rogers, W.A. (2002). Psychology and aging: Enhancing the lives of an aging population. Current Directions in Psychological Science, 11, 107–110 [Gemmell et al., 2006] Gemmell, J., Bell, G. & Lueder, R. (2006). MyLifeBits - A Personal Database for Everything, Communications of the ACM, vol. 49, Issue 1, Microsoft Research Technical Report MSR-TR-2006-23, San Francisco, USA, 88-95 [Goodman et al., 2004] Goodman, J., Brewster, S. & Gray, P. (2004). Older People, Mobile Devices and Navigation, HCI and the Older Population. Workshop at the British HCI 2004, Leeds, UK. [Helal et al., 2003] Helal, S., Winkler, B., Lee, C., Kaddourah, Y., Ran, L., Giraldo, C. & Mann, W. (2003). Enabling Location-Aware Pervasive Computing Applications for the Elderly, 1st IEEE Conference on Pervasive Computing and Communications (Percom) Fort Worth [Hodges et al., 2006] Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., and Wood, K. (2006). SenseCam: A retrospective memory aid. Proc. Ubicomp 2006. [Kelliher , 2004] Kelliher, A. October (2004). Everyday Cinema, SRMC 2004, New York, USA, ACM Press [Kurniawan et al., 2006] Kurniawan, S., Mahmud, M. & Nugroho, Y. (2006). A Study of the Use of Mobile Phones by Older Persons, CHI 2006, 989 - 994. [Larsen & Petersen, 1999] Larsen, P.B. & Petersen, B.C. (1999). Interactive StoryTelling in a Multimodal Environment, Institute of Electronic Systems, Aalborg University, Denmark [López Cózar Delgado & Araki, 2005] López Cózar Delgado, R. & Araki, M. (2005). Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment. Wiley & Sons, Hoboken, N.J., U.S.A. [Loyall, 1997] Loyall, A. B.(1997). Believable agents: building interactive personalities. Ph.D. thesis, CMUCS-97-123, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA. [McTear, 2004] McTear, M. F. (2004). Spoken Dialogue Technology: Toward the Conversational User Interface, Berlin, Germany: Springer-Verlag [Morrison et al., 2004] Morrison, K., Szymkowiak, A. & Gregor, P. (2004). Memojog – An Interactive Memory Aid Incorporating Mobile Based Technologies, in Lecture Notes in Computer Science, Volume 31, Springer Berlin, Heidelberg, 481-485. [Melenhorst et al., 2004] Melenhorst, A.S., Fisk, A.D., Mynatt, E.D. & Rogers, W.A. (2004). Potential Intrusiveness of Aware Home Technology: Perceptions of Older Adults. Proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting 2004. Santa Monica, CA: Human Factors and Ergonomics Society [Okada, 1996] Okada, N. (1996). Integrating Vision, Motion and Language through Mind. In Artificial Intelligence Review, Vol. 10, Issues 3-4, 209-234. [Quigley & Risborg, 2003] Quigley, A. & Risborg, P. (2003). Nightingale: Reminiscence and Technology – From a user perspective, OZeWAI 2003, Australian Web Accessibility Initiative, Latrobe University, Victoria, Australia [Schank, 1995] Schank, R.C. (1995). Tell me a story: narrative and intelligence. Evanston, Ill.: North WesternUniversity Press [Siek et al., 2005] Siek, K.A., Rogers, Y. & Connelly, K.H. (2005). Fat Finger Worries: How Older and Younger Users Physically Interact with PDAs, INTERACT 2005, eds. M.F. Costabile & F. Paterno, Springer Berlin, Heidelberg 267 – 280 81 [Stanley & Cheek, 2003] Stanley, M. & Cheek. J. (2003). Well-being and older people: a review of the literature: A Review of the Literature. Canadian Journal of Occupational Therapy 70(1):51-9 [Van Gerven et al., 2006] Van Gerven, P.W.M., Paas, F. & Tabbers, H.K. (2006). Cognitive Aging and Computer-Based Instructional Design: Where Do We Go From Here? Educational Psychology Review, Springer Netherlands, Volume 18, Number 2 [Wahlster, 2006] Wahlster, W. (2006). Smartkom: Foundations of Multimodal Dialogue Systems, Springer Berlin, Heidelberg, New York [Wilks, 2005] Wilks, Y. (2005), Artificial Companions in Lecture Notes in Computer Science Machine Learning for Multimodal Interaction, Volume 3361/2005 edn, Springer Berlin, Heidelberg, 36 -45. [Willis, 1996] Willis, S. L. (1996). Everyday Cognitive Competence in Elderly Persons: Conceptual Issues and Empirical Findings. The Gerontologist. 36, 59 [Wyche et al., 2006] Wyche, S., Sengers, P. & Grinter, R.E. (2006). Historical Analysis: Using the Past to Design the Future. Ubicomp 2006, LNCS 4206 , pp. 35 – 51, Springer-Verlag Berlin Heidelberg 2006 [Zajicek, 2006] Zajicek, M. (2006). Aspects of HCI research for elderly people, Universal Access in the Information Society, Volume 5, Number 3, 279 – 286 82 Using Scaffolded Learning for Developing Higher Order Thinking Skills Cristina Hava Muntean and John Lally School of Informatics, National College of Ireland, Mayor Street, Dublin 1, Ireland [email protected], [email protected] Abstract This paper presents a research study that investigates whether a scaffolded learning structure such as a WebQuest can be used to effectively develop higher order thinking. The results from this study proved that through the use of scaffolded support and collaboration, teachers can effectively direct students learning and help them to gain higher order thinking skills moving beyond simple rote learning and towards the higher levels of Bloom’s Taxonomy. Keywords: scaffolded learning, WebQuest, higher order thinking, educational content delivery 1 Introduction Military forces develop skills based on the principles of drill and practice. From a civilian perspective, this type of rote learning can also be seen in education where students are taught the skills to successfully pass examinations and not necessarily the skills required to develop a deeper understanding of a subject. Therefore, majority of students learn to become capable of effectively solving problems which relate to individual areas in a logical sequential manner. However, problems are rarely so simple and straight forward. There is a strong need to develop educational methods which encourage students to move beyond rote learning and develop higher order thinking skills such that principles learnt in the traditional manner can be concurrently applied to multiple areas and to solving non-linear problems. This paper will review scaffolded learning and will investigate the effectiveness of WebQuest as a learning support tool to develop students’ level of knowledge and their thinking skills beyond basic rote learning towards higher level of Bloom’s taxonomy such as Analysis, Synthesis and Evaluation. 1.1 Bloom’s Taxonomy and Higher Order Thinking Rote learning is a learning technique that focuses on memorizing the material, or learning “off-byheart”, often without an understanding of the reasoning or relationships involved in the material that is learned, and after that simple remembering or recalling the facts. Higher-order thinking involves engaging students at the highest levels of thinking and allowing them to become creators of new ideas, analysers of information and generators of knowledge. If we wish to achieve higher order thinking we need to do something with the information available. We need to categorise the information and connect it to pre-existing information already stored in the memory as a model, enhancing it. Using this internal model, we can now attempt to develop new solutions to existing real-world problems. Higher-order thinking is represented in Bloom's Revised Taxonomy (by Lorin Anderson in 1990) by the top three levels: Analysing, Evaluating and Creating. Bloom’s taxonomy (proposed in 1950) classifies educational goals and objectives and provides a way to organise thinking skills in six levels from the most basic to the higher order levels of thinking (Figure 1). Each subsequent level is built on the skills developed during the previous stage. 83 Creating (designing,, planning, constructing, inventing ) Evaluating (checking, critiquing, experimenting, judging) Analysing (comparing, organising, interrogating, finding ) Applying (implementing,, using, executing ) Understanding (explaining, interpreting, summarising, classifying) Remembering (recognising, describing, naming) Figure 1. Bloom’s Revised Taxonomy ‘s thinking levels Nowadays we live in a digital world where access to a large quantity of information is only a click away. The key issue is how to shift through this volume of information and to find the answers required. Due to the potential information overload students are required to possess higher order thinking skills such as analysis, evaluation and creation. By designing education courses which require the development and use of higher order thinking skills, we can provide learners with opportunities to critically assess and transform their experiences into authentic learning experiences [1]. 1.2 Scaffolded Learning In education, scaffolding is structure which support learning and problem solving. Scaffolding can include helpful instructor comments, self-assessment quizes, practice problems, collections of related resources, a help desk, etc. The original term “scaffolding” was developed by Wood et al. in their 1976 study [2] and is described as a metaphor for an instructional technique where the teacher provides assistance for the student to reach a goal or complete a task which they could not complete independently. The key element of the scaffolded support is that the student is only assisted to complete the tasks which are currently beyond their capabilities. One problem with scaffolding is finding the right balance of scaffolding required. Lipscombe, et al. [3] suggest that requiring students to complete tasks too far out of their reach can lead to frustration, while tasks which are too easy can also lead to the same frustration. It is therefore important that teachers understand the current level of knowledge of the students so that their interests can be “hooked” or connected to the new information being presented and made relevant to the students so that the motivation to learn is increased [3]. A key element in the development of scaffolded learning is structure and without a clear structure and precisely stated expectations from the exercise, many students are vulnerable to distraction and disorientation and effectively become lost in the volume of available information. Based on his study McKenzie [4] suggested the following eight guidelines to be followed in the educational scaffolding: • Provide clear directions – The goal is to develop a set of user-friendly instructions which will minimise confusion and help move students towards the learning outcome. • Clarify the purpose of the scaffolded lesson –Students are told early in the lesson why the studied issues are important and given the bigger picture so that they may see the connections in their own lives. This enables them to view the lesson as a worthwhile study and one where they should apply their talents. • Keep students on task –Students are provided with “the guard rail of a mountain highway” [4]. This enables the students and the teacher to ensure that although the students may be researching for information under their own direction they do not stray too far off the predefined path and do not waste valuable lesson time. 84 • • • • • 2 Offer assessments to clarify expectations – From the beginning students are made aware of the requirements and standard expected by the teacher at the end of the assignment. This guide helps students to aim at a target of quality and to understand the important areas of the study. Point students to worthy sources – The Internet has proven itself as a valuable source of information for both formal and informal research. Information overload can be greatly reduced or eliminated by providing relevant data sources for the students. Reduce uncertainty, surprise and disappointment – The ultimate goal of the teacher is to maximise the learning and efficiency of the lesson. Therefore the various elements of the lesson should be tested for problems and alternative solutions should be considered. A review of the success of the lesson should also help to refine the lessons for future students. Deliver efficiency – If done successfully a scaffolded lesson should “distil” the work effort required for both student and teacher showing obvious signs of efficiency. Create momentum – The momentum is used by the students to find out more about the subject and therefore increase their understanding of the topic being researched. What exactly is WebQuest? Traditional teaching methods have relied on the principle of the transmission of knowledge through word of mouth. With the explosion of information available on the Internet many see it as an online library that requires teachers to think more creatively on how they may employ these information sources while also providing engaging material for their learners through the use of guided activities, self discovery and reflection as both an individual and in collaboration with other students [5, 6]. However, Reynold et al. study [7] found that the simple exposure to Internet resources is not enough to significantly improve student learning. Surfing the web can lead to the loss of precious time and can also, if not monitored, lead to access to inappropriate material. A WebQuest offers a structured format which enables students to gather information and construct new knowledge and learning. WebQuests were first developed by Bernie Dodge and Tom March at the San Diego State University in 1995 and are defined by Dodge [8] as “an inquiry-oriented activity in which most or all of the information used by learners is drawn from the web. WebQuests are designed to use learners’ time well, to focus on using information rather than looking for it, and to support learners’ thinking at the levels of analysis, synthesis and evaluation”. This structured approach to using the Internet as a learning resource helps to focus those involved into suitable areas of the web “otherwise, the World Wide Web becomes similar to having 500 TV channels” [9]. Since the original development and definition of a WebQuest, Dodge and March have developed and refined the original framework. The following definition can be seen as a more concrete definition of a WebQuest: “scaffolded learning structure that uses links to essential resources on the World Wide Web and an authentic task to motivate students’ investigation of a central, open-ended question, development of individual expertise and participation in a final group process that attempts to transform newly acquired information into a more sophisticated understanding.” Two types of WebQuests were proposed by Dodge according to their duration. A short-term WebQuest has the instructional goal of knowledge acquisition and integration where a learner can be made aware of a significant amount of information and make sense of it similar to the lover levels of Bloom’s Taxonomy. This type of WebQuest would typically last from one to three class sessions. A long-term WebQuest has the instructional goal of extending and refining knowledge by requiring the learner to demonstrate the higher levels of Bloom’s Taxonomy by analysing the information and using this deep understanding to create something which others can respond to. This type of WebQuest would typically last from one week to one month of a classroom setting. In conclusion, the main purpose of the WebQuest model is to harness the advantages of the resources available on the Internet while also focusing students to complete the task. In order to achieve this 85 efficiency and clarity of purpose the following six sections are critical attributes of a WebQuest and are required for both short term and long term WebQuests: • Introduction – This section provides an overview of the learning objectives and attempts to motivate the students to begin the WebQuest. • Task – The task is a clear formal description of what the students are required to accomplish by the end of the exercise. • Process – Explicit details of the various steps required to be accomplished in order to achieve the stated task are given. • Resources – Sources of information which the teacher has deemed appropriate and relevant are given to the students. • Evaluation – The evaluation tool used is a rubric that presents a defined set of criteria in which submissions can be clearly and consistently measured against. • Conclusion – At this stage students are given an opportunity to reflect on the exercise. In addition to the critical attributes of a WebQuest there are three additional non-critical attributes which may be also included if required. • Group Activities – Students can share their knowledge and experience helping each other, while also reinforcing their own understanding. • Role Playing – In order to increase the motivation of the students the learners are encouraged to adopt a role to play during the exercise. • Single Discipline or Interdisciplinary – Students can try real-world problems and solutions while gaining an understanding of how there choices and decisions can affect other areas. 3 Preliminary Research Findings 3.1 Study background In this study the development of a scaffolded learning strategy using WebQuest is investigated to determine its level of success when trying to encourage students to develop higher order thinking skills. During current military career courses the students are required to conduct individual study and presentations on a particular topic of interest (referred to as a “test talk”) which is closely aligned to the course objectives. A typical test talk in this area would require the students to review specific areas of a major battle, Operation Market-Garden, the battle for Arnhem for example, and discuss how the logistics and resource management of this battle were conducted and more importantly what lessons can be learnt and applied to today’s military operations. A solution to this type of study method is the development of a group WebQuest where students are required to collaborate as small groups in the development of a final product and where the various groups are designed to combine with the other groups to develop a larger body of research. The design of the WebQuest could follow along the lines of the chapters of a book, where each group of students are required to develop a specific section. This initial research would fulfill the role of the initial “background for all” stage of a WebQuest for each group. When this has been finished the students would be required to answer an open ended question relating to this section and to develop a solution which requires them to transform the research developed to more sophisticated understanding of the topic being learnt. When each group has completed their work it can be compiled to form a larger piece of research which would be stored or published and used by future students as research material. Therefore students are required to apply critical thinking skills to develop their final solution. 3.2 Research Procedure The sample used for this study consisted of a group of approximately 9 students who were undergoing an Officer promotion course. At this level of the military, students are expected to use their initiative and be constructive in their problem solving abilities. A WebQuest would offer a good foundation in 86 the development of these necessary skills. These students came from a number of different units and military trades; with various levels of prior training and educational backgrounds. Although this group is a convenience sample, the various levels of training and prior education helped to ensure a random selection of test subjects and their skills levels and gave a representation of the overall population. The measurement and analysis for this study was through a number of means both quantitative (using assessment rubric), and qualitative (through the comparison of a pre-study and post-study survey and informal interviews). The data was analysed at each stage to identify any trends or issues, for example, the individual WebQuest results were compared against the group WebQuest results to determine if any noticeable improvement was apparent. 3.2.1 Stage 1: Pre-Study Survey This first survey was designed to capture general information from the students such as age, gender, computer experience, etc. The students were also questioned on their current preference to working on assignments and for their preference with regard to the use of technology. We tried to assess if students had a preference for the traditional classroom delivery, an online delivery preference or a blended preference of e-learning delivery supplemented by face-to-face training. The results from this survey were compared to the final survey (see Stage 4) to determine if a change in student opinions has been developed through their participation on this study. 3.2.2 Stage 2: Individual WebQuest After students had completed their pre-study survey the individual WebQuest could begin. This WebQuest was designed to capture the attention and interest of the students by providing them with an authentic, open-ended task and the role-play technique was as a means of motivating the students while completing the WebQuest assignments. It was decided to focus on the current Iraq conflict as the main topic for this WebQuest. The students were required to take on the role of an Officer serving in Iraq. In order to provide the necessary background for the students they were exposed to a number of documentaries and discussions on this topic that later put them in a better position to adopt the role of a serving soldier. The students were given the task to research the resources presented to them as part of the WebQuest, to gather the basic information of the battle and to highlight the valuable lessons learnt from this conflict. As part of the scaffolding structure in order to help the students to complete this task they were required to prepare a presentation on the findings of their research. The submitted assignments were marked against a rubric designed to facilitate both the individual and group WebQuests. Rubrics are used to assess the submitted assignments because they help to make the expectations of the teacher clearer and also offer the students targets to achieve [10]. The rubric was generated following the suggested three stage format of Dodge [11]. 3.2.3 Stage 3: Group WebQuest At the completion of the individual WebQuest the students were randomly assigned to three groups. Students were asked to make communications with the other members of their group using Moodle Learning Content Management System (LCMS). Since the students used for this study came from a number of different locations throughout the country and in order to facilitate group work, Moodle was considered an appropriate tool because of its available features such as discussion forums, chat rooms, private messaging and the use of Wiki. Building from the individual WebQuest and the background developed on the training day the group WebQuest was again focused on the Iraq war. The students were required to adopt the role of a person who has been assigned to a working board tasked with the development of a report investigating the problems faced by reserve members and formulate preventative measures. As in the individual WebQuest, the submitted assignments were marked against a rubric. The students have used the Wiki as the delivery method for the assignment. 87 During the analysis of the group collaboration it was envisioned that the initial responses on the discussion boards will be at a low level of critical thinking. However, as the study develops responses should increase to show signs of critical thinking going from an “I agree” type post to posts applying learned content and then even to posts which show obvious synthesis of learned content. The analysis of the study attempted to highlight this progression of critical thinking on the discussion boards. 3.2.4 Stage 4: Post-Study Survey and Interview A post-study survey was presented to the students after the successful completion of the group WebQuest. The survey was based on the pre-study test and included the general areas of the use of technology for learning for military training and for the development of higher order thinking skills in the Defense Forces. The results from both surveys were brought together and in the analysis of the data, it was attempted to determine what changes in the student’s initial understanding of the subject and their impressions of the value and use of technology had changed during the course of the study. 3.3 Pre and Post Study Survey Results This paper presents in details only the results of the two surveys. Other papers will address the individual and group study outcomes when using WebQuest. As mentioned before, one survey evaluated the students’ current thinking in relation to study, the use of IT and critical thinking, before the experimental study. The second survey was given to the students after the study and it was designed to assess the changes which had occurred to the students during the WebQuest-based course of the study. Each survey consisted of twenty-two core questions plus questions that were developed to gather further information useful for the design and development of additional courses. There was originally a strong trend for individual learning (55%), but this changed significantly by the end of the study. Only 22% still maintained this original point of view and 77% of students were now comfortable in group learning, up from the original 44%. Students also commented that they found the group study easier because they were able to discuss ideas and they felt less pressure knowing that if they were unable to finish a task someone else in the group would be able to compensate. Which of the following best describes your preference with regard to the use of technology? Pre-Study Number of Students (%) Post-Study 60 50 40 30 20 10 0 I prefer taking I prefer taking classes that classes that use no use limited information technology technology features I prefer taking classes that use moderate level of technology I prefer taking classes that use technology extensively I prefer taking classes that are delivered entirely online Figure 2. Students’ opinions on the usage of technology during a study Question (Figure 2) that assessed the students’ preference regarding the use of technology for the educational content delivery has shown a strong and positive reaction of the students towards receiving electronic lectures instead of traditional delivery methods, i.e. an instructor in a classroom. The survey also aimed at determining how students’ study habits were affected by the use of technology. The majority of e-learning courses developed for the Irish Defense Forces are CD-based materials that are a huge improvement over the issue of paper-based manuals. The issue of CD based 88 training materials is slow development time of the material and the lack of resources/funding. Using the Internet in a similar manner as used in this study reduces some of the burden on the development team in that the Internet can be used to disseminate the material to those required. In addition, the material is always available where there is in Internet connection and can be easily updated and made available very quickly. The instructor's use of technology in my classes can increase my interest in the subject matter 80 80 70 70 Number of students (%) Number of students (%) I spend more time engaged in course activities in those courses that require me to use technology 60 50 40 30 20 10 60 50 40 30 20 10 0 0 Not applicable Strongly disagree Disagree Pre-Study Neither agree nor disagree Agree Strongly agree Not applicable Strongly disagree Post-Study Disagree Pre-Study Neither agree nor disagree Agree Strongly agree Post-Study Figure 3 a and b. The influence of the technology-based study on the students’ study habit How would you most like to receive training for future courses? Pre-Study Post-Study 60 Number of students (%) 50 40 30 20 10 0 Printed Notes Online Initially online In a traditional but follow ed class w ith a traditional class to reinforce the material Other Figure 4. Teaching techniques preferred by the students It was found from the pre-study survey that for the vast majority of courses run in the Irish Defense Forces there is very little (if any) use of technology other than PowerPoint. However, there is a tendency to develop e-learning material that is more like e-reading than e-learning. This lack of interactivity is unfortunately forced upon the developers of these e-learning packages because of delivery constraints placed upon them. It was satisfying to see from the results in the post-study (Figure 3) that when students were given the opportunity to interact with the material presented and with the other colleges or with the instructor they availed of this opportunity. This interactivity was possible due to the use of Internet-based technology that enables forums, chat rooms or private messaging. It can be seen from the responses given to the questions presented in Figure 3 (a and b) that in general students were in favour of technology being used in the classroom and 77% had formed the opinion that their interests could be increased through the proper use of technology. 89 As already mentioned, all students were required to use the Moodle LCMS that served as the main collaboration tools for the study. After the study had finished and all students had experienced the advantages and disadvantages of using an LCMS the results from both pre and post surveys were compared. It could be clearly seen that there was a vast improvement in the opinions of the students to the use of course management system (44% in the pre-survey, 88% in the post-survey). The surveys also included a question (Figure 4) that assessed students own preferred method for the receipt of training material. There can sometimes be a tendency to push technology towards the students without actually consulting with them on their preference. The results shown here offer no real surprises. In the pre-study survey the students were mostly inclined towards the traditional classroom delivery, 44% preferring this method, 22% willing to take a blend of online and traditional learning, 22% favoring a completely online delivery and finally 11% preferring a printed version of future courses. These kinds of results would be normally expected by students who have not used an LCMS or have had a bad experience of e-learning in the past. After the study was completed and this question asked again the results were in favour of an e-learning solution but now 55% were in favour of a blend of the new and the old using the technology available on the Internet to introduce the students to the material. As in the pre-study survey, there was an 11% preference for the printed based option of delivery. 4 Discussions and Conclusion This research has developed a WebQuest-based scaffold learning strategy that encourages students to move beyond rote learning and develop higher order thinking skills such that principles learnt in the traditional manner can be concurrently applied to multiple areas and to solving non-linear problems. The results analysis indicated that although initially students have preferred to study and work on projects individually, by the end of the study the vast majority of the students felt that the ability to discuss problems within a group was beneficial when problem solving. Students were also of the opinion that the use of the Internet as the main delivery tool offered them a much greater level of control over their own learning. However, they questioned the ability to effectively communicate via the Internet through the use of discussion forums and chat. For the initial stage of the project, discussion forums provided on the site were considered useful. However, as the discussion progressed the students needed real-time discussions to develop deeper understanding of the subject. Since Moodle LCMS does not currently permit the use of voice communications, a suitable solution using a third-party software called “iVocalize Web Conference” has been used. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] O’Murchu, D. and Muirhead, B. (2005). Insights into promoting critical thinking in online classes. International Journal of Instructional Technology and Distance Learning, 2(6):3-14. Wood, D., Bruner, J., and Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17:89-100 Lipscomb L., Swanson J., and West A.. (2004). Scaffolding. In M. Orey (Ed.), Emerging perspectives on learning,teaching, and technology. McKenzie, J. (1999). Scaffolding for success. The Educational Technology Journal, 9(4). Oliver, R. and Omari, A. (2001). Exploring student responses to collaborating and learning in a Web-based environment. Journal of Computer Assisted Learning, 17(1):34-47. Leahy, M. and Twomey, D. (2005). Using web design with pre-service teachers as a means of creating a collaborative learning environment. Educational Media International, 42(2):143-151. Reynolds, D., Treharne, D., and Tripp, H. (2003). ICT-the hopes and the reality. British Journal of Educational Technology, 34(2):151-167. Dodge, B.(1995). Some thoughts aboutWebQuests. The Distance Educator Journal, 1(3):12-15. Erthal, M.J. (2002), Developing a WebQuest. Book of Readings. Delta Pi Epsilon National Conference, Cleverland, OH, USA. Whittaker, C., Salend, S. and Duhaney, D. (2001). Creating instructional rubrics for inclusive classrooms. Teaching Exceptional Children Journal, 34(2):8-13. Bernie, D (2001) Creating a rubric for a given task http://webquest.sdsu.edu/rubrics/rubrics.html 90 Electronic Monitoring of Nutritional Components for a Healthy Diet Zbigniew Frątczak1,2, Gabriel-Miro Muntean 2, Kevin Collins 2 1 2 International Faculty of Engineering, Technical University of Lodz, Skorupki 10/12, Łódź, 90-924, Poland [email protected] Performance Engineering Laboratory, School of Electronic Engineering, Dublin City University, Glasnevin, Dublin 9, Ireland {munteang, collinsk}@eeng.dcu.ie Abstract Obesity and other diseases related to unhealthy diet are problems of near epidemic proportion and become a growing issue every year. This paper presents a solution to this issue by proposing the use of a computer application that is able to suggest the appropriate products related to one’s diet, and to keep track of nutritional intake. The paper also describes the principle of the solution, system architecture and implementation and presents testing results. If the application’s instructions are followed by users it is expected that an optimal diet will be achieved resulting in users good health. Keywords: Healthy diet, e-health, utility function, nutrition control 1 Introduction Some of the most serious social issues of our time are obesity and dietary problems. Approximately 39% of Irish adults are overweight and 18% are obese [1]. Approximately two thousand premature deaths are attributed to obesity annually, at an estimated cost of €4bn to the Irish State, expressed in economic terms [1]. People are not conscious of the gravity of these issues and consequently the situation is worsening. In order to combat this growing problem it is necessary to bring it to the attention of society. One way to achieve this is an application that enables people to monitor and control nutritional value in a fast and simple way while shopping. The aim of this research is to propose computer based solution which will assist users in controlling the nutrition values of the food products they buy. The application will include several diet plans suitable for potential users from simple ones which focus on the energetic values of the products (expressed in calories), to more complex ones which also consider other nutritional components such as proteins, carbohydrates, sugars and fats. By using a utility function, the proposed solution will 91 select a set of products from a range of products considered by the user for purchasing based on their nutritional values and the user’s selected diet plan. An important goal was also to build a highly usable and portable application as possible. In order to achieve this, an application was developed to be used not only on a laptop or desktop PC but also on smart phones, PDAs and gaming consoles. Consequently a web browser accessible application was designed, implemented and tested. It uses a server-located database to minimize the memory consumption on the client devices and give higher flexibility. With this approach users may work with information held in an in-shop database, which is customized for each individual shop to reflect the products available there. This paper is structured as follows: Section 2 summarizes related work, section 3 describes the design of the proposed solution as well as the algorithm, whereas section 4 presents the testing process and related results. The paper finishes with section 5 which focuses on conclusion and future work. 2 Related Work The diet monitoring problem is not a new one, as software for computing calories or nutrition diaries have been developed since 1980’s. There were many such applications such as ”The diet balancer” [2] or “MacDine II” [3], but they differ in approach and target audience. In 1999 a diet calculation software called FUEL Nutrition Software was released [4]. This application was capable of calculating the nutrition values for professional athletes. FUEL allows the access to applied sport nutrition information on topics such as nutrition during regular training, food appropriate for pre- and post-exercise meals, eating for recovery, hydration, eating strategies during trips or in forezign countries and vitamin and mineral supplements. The program is suitable only for fit and healthy individuals. Anyone with special health conditions such as diabetes, osteoporosis, etc. will require individualized professional advice [4]. The program itself offered many interesting solutions but was targeted at professionals and was developed for stationary computers. An other electronic system is eHDNAS – electronic Healthy Diet and Nutrition Assessment System [6]. This recently developed software was created to fight malnutrition and other nutrition related disease over a sustained period of time. Its aim was to inform people about the nutrients of certain foods in restaurants and it is mainly based on the food pyramid described in [5]. The system specifically targeted elderly people. This is a major limitation as such applications should take into account people of all ages. Another drawback of the system is that it operates on full meal level, rather than a product level, which makes it very inflexible as regards to individuals eating habits. The report on “The Food We Eat” [7] found that it is more user friendly to work with barcode scanners than voice recording, while using electronic self monitoring application. This observation influenced the decision to use barcode scanning for this application. Those results were gathered by tests carried out on a group of participants with an average age of 52 and using the DietMatePro [8] and BalanceLog [9] applications. DietMatePro is a commercial web application designed specifically for PDAs, that uses the expandable USDA-based nutrient database [10] and supplemental databases for restaurant and brand name foods. It addresses the needs of researchers and dietitians. While a very powerful dietary tool a major drawback of this application is that it was developed for scientific purposes, and as such lacks the simplicity for more general use. 3 Design and Solution 3.1 Architectural Design The main aim when designing this application was to create user-friendly, portable software for calculating nutrition values. It is supposed to also be flexible and customizable. To fulfill these 92 requirements it must have different diet plans and must enable the creation of user specific diets. Many previous attempts to solve the diet monitoring problem resulted in diet diaries or calorie counters. While it was desirable to include calorie counter functionality it was also wished to go a step further and create a diet validator: i.e. given a set diet plan to which an individual is to adhere to the application can verify if a users food shop falls within the nutritional parameters of this plan. Another design prerequisite was to enable the user to run the application not only on a PC, but also on mobile devices such as smart phone, PDA and portable game consoles. MySQL database, Java Server Pages and Tomcat web server were used in order to achieve these goals. Information about the products and diet plans is stored in the database. There is a server side administration interface enabling the modification of the database to reflect the products available. In order to provide a degree of flexibility to users, the solution was deployed into a web application which can be accessed using any web browser. This makes the application accessible for any owner of a networked mobile device. The application was placed in a Tomcat web container which enables multithreading, allowing multiple users to access the application simultaneously at the same time. Figure 1 illustrates the proposed system architecture. It can be seen clearly that the user connects to an in-store Wi-Fi network and then by means of a web browser on their mobile device can communicate with the Tomcat web server that maintains the web application which communicates with the database in order to retrieve the data. It is believed that the best solution is to have a separate database in every shop so a user entering the shop would use that shop’s database which contains only the products available there. Alternatively a shop ID can be used to select the products within a particular shop from a larger central-located database. Figure 1 Architectural Design 3.2 Algorithm Description A novel algorithm is used for verifying the compliance of products with users’ diet plans. The algorithm is based on a modified Knapsack problem, which takes into consideration all nutritional values: energy, as well as carbohydrates (including sugars), proteins and fats. The algorithm’s goal is to optimize the selection of products in order to maximize their utility to users, according to their diet plan. 93 A novel utility function was introduced to describe the usefulness of the product to users. This utility considers grades computed for each nutrition component, weighted according to the importance of that particular component for a user diet plan. Below equation 1 presents the function for calculating the utility of a product i. Utilityi = w1 * Gi proteins + w2 * (Gi carbohydrates − Gi sugars ) + w3 * Gi fats (1) w1 + w2 + w3 In equation (1) w1, w2 and w3 are weights which depend on the diet type and express the importance of the nutrients in any specific diet plan. Giproteins, Gicarbohydrates, Gisugars, Gifats represent the grades computed based on the quantity of a particular nutrient and are expressed in the [0:1] interval. Equation (2) presents the formula for calculating individual grades. Gi nutrient = Qi proteins Qi nutrient + Gi carbohydrates + Gi fats (2) In equation (2), Qiproteins, Qicarbohydrates, Qisugars, Qifats represent the quantities with which each individual nutrient component is present in the product i. The nutrient component grade Ginutrient describes the ratio of certain nutrients in comparison to all nutrients within the given food item. The equation for calculating the utility parameter of the product was based on the healthy diet pyramid as presented in Figure 2. It states that the healthiest products are those which contain the smallest possible amount of fats and sugars. This equation gives the highest values to products containing the most protein and carbohydrates (excluding sugars) and the lowest to those with high levels of fats and sugars. Figure 2 Healthy diet pyramid Having calculated the utility of every product, the benefit of the product in terms of value to a particular user is computed as the ratio between the utility and the calorie amount suggested to the user by their diet plan. Next all the products are sorted in descending order based on their value to the user. The products whose energy values are exceeding the user’s calorie limit which is computed based on their physical parameters (weight, height, age, gender) are discarded and are not shown. The Knapsack problem uses as limit a daily energy requirements (DER) expressed as a calorie amount, but this number is different for every user as people are characterized by different physical parameters. To calculate the amount of calories to be “spent” by each user the Mifflin formula was used [11]. This equation expresses Resting Daily Energy Expenditure (RDEE) and uses parameters such as weight, height, age and gender. These formulae were used as they give a very high accuracy (over 80%) [12]. They are presented in equation (3): 94 RDEE male = 5 + (10 * weight ) + (6.25 * height ) − (5 * age) (3) RDEE female = (10 * weight ) + (6.25 * height ) − (5 * age) − 161 In order to calculate users’ DER, a formula that factors in the so-called activity factor was used. This is essentially a number based on the level of physical activity the users have interactively selected. The users can choose between: sedentary, lightly active, moderately active, and extremely active. This activity factor is multiplied by the RDEE value and the result expresses DER. DER is used as limit by the Knapsack problem. 4 Testing The proposed algorithm was deployed in a system that conforms to the description made in section 3. The application was tested in a number of different settings which included variable diets and different user parameters (weight, height, gender, etc.). The application provides user choice between several diet plans and different diet consideration modes. After running the application in the web browser user may choose one of the following options: Weekly Shopping or Diet Check. Weekly shopping is an option enabling user to do the shopping for a specified number of days. It involves using calculated DER as a limit for calorie counter per each day. The application adds up the energy values of each product in the cart and if the limit is reached prints the notification. Another option is Diet Check, where application uses the algorithm described in section 3.2. The System creates a list of most diet suitable products from those in the users’ cart. Table 1: Test 1 - Input products Table 2: Test 1 - Sorted products Currently the application offers two specified diet plans: normal diet plan and protein diet plan. Normal is suitable for most healthy people and assigns higher level of importance to products that are described as significant for each healthy person. Second diet plan is based on protein diet, in which more valuable are proteins. This diet plan could be addressed to athletes wishing to build muscle mass. The first test used the diet check mode and the normal diet type the utility value of products was calculated and is presented in Table 1. 95 As presented, the above algorithm works successfully on the chosen group of products. It can clearly be seen, that in Table 1 there are products in the order in which they were added to the cart. Table 2 includes the same products sorted in the order of their significance to the user diet. On the bottom of the table there are flavor products with high value of carbohydrates and proteins while on the top are products with sugars and fats. Results are correct as the layout of the table corresponds to the pyramid of healthy diet which was presented in Figure 2. The second test involved the diet check mode and the protein diet type, where the most valuable products are those with significant amounts of proteins and the smallest amount of fat. The test produced the following results, as shown in Table 3 and Table 4. Table 3: Test 2 - Input products Table 4: Test 2 - Sorted products In this test the same products were used, but the results presented in Table 4 correspond to a different diet type, which places the products rich in proteins at the top of the table. It can be clearly seen that there is a distinct difference between the arrangement of the products in the results for the normal and protein diets. While the normal diet selects mainly products full of carbohydrates, the protein diet gives precedence to products with high protein values. At the same time it is possible to observe that most of high energy products are at the top of the table. 5 Conclusion and Future Work This paper proposes an intelligent system which will assist users while shopping by suggesting the appropriate products related to people diets, and by keeping track of their nutritional intakes. The system is capable of verification of the chosen products and includes an option of calorie counter. Simple navigation and use of web browser minimizes maintaining difficulties. In-store database with clear administration interface enables user friendly management of in-stock products. Future extensions may allow the addition of new diet plans which require other parameters than those used at the moment. The application needs further testing with different diet types and different user parameters. Verification by medical staff in terms of correctness of the approach and exactness of the results is also envisaged. Medical approval is crucial because it may have high influence on the future of the proposed solution. The application may be extended to make use of barcode scanner. 96 Acknowledgement This paper presents work performed within the ODCSSS (Online Dublin Computer Science Summer School) 2007. The support provided by the Science Foundation Ireland is gratefully acknowledged. References: [1] Obesity (2005), Obesity the Policy Challenges- the Report of the National Taskforce on Obesity, Department of Health and Children, Ireland, [Online] Accessed: August 2007 Available at: http://www.dohc.ie/publications/pdf/report_taskforce_on_obesity.pdf [2] Marecic, M., Bagby, R. (1989). The diet balancer, Nutrition Today, 1989; 24-45 [3] Crisman M., Crisman, D., (1991). MacDine II – Evaluation, Nutrition Today, 1991. [4] Durepos, A. L. (1999), FUEL Nutrition Software and User Manual, Canadian Journal of Dietetic Practice and Research. Markham: Summer 1999, 60: 111-113 [5] Russell, R.M., Rasmussen, H., Lichtenstein, A. H. (1999), Modified Food Guide Pyramid for People over Seventy Years of Age, USDA Human Nutrition Research Center on Aging, Tufts University, Boston, USA, Journal of Nutrition. 1999; 129: 751-753 [6] Hung, L. H., Zhang, H. W., Lin, Y. J., Chang, I. W., Chen H. S. (2007), A Study of the Electronic Healthy Diet and Nutrition Assessment System Applied in a Nursing House”, 9th International Conference on e-Health Networking, Application and Services; 64-67 [7] Siek, K. A., Connelly, K.H., Rogers, Y., Rohwer, P., Lambert, D., Welch, J. L. (2006), The Food We Eat: An Evaluation of Food Items Input into an Electronic Food Monitoring Application, Proc. of the First International Conference on Pervasive Computing Technologies for Healthcare (Pervasive Health), Innsbruck, Austria, November 2006 [8] DietMatePro, PICS (Personal Improvement Computer Systems), Accessed: August 2007, Available at: http://www.dietmatepro.com [9] BalanceLog, HealtheTech, http://www.healthetech.com, Accessed: August 2007 [10] USDA Palm OS Search, I. H. Tech, USDA, [Online] Accessed: August 2007 Available at: http://www.nal.usda.gov/fnic/foodcomp/srch/search.htm [11] Miffin M. D. (1990). A new predictive equation for resting energy expenditure in healthy individuals. American Journal Clinical Nutrition, 1990; 51: 241-247 [12] What is Normal? Predictive Equations for Resting Energy Expenditure (REE/RMR), [Online] Accessed: August 2007 Available at: http://www.korr.com/products/predictive_eqns.htm#ref_miffin 97 A Web2.0 & Multimedia solution for digital music Helen Sheridan & Margaret Lonergan National College of Art & Design, 100 Thomas Street, Dublin 8, Ireland [email protected], [email protected] Abstract Presented are a number of solutions utilizing multimedia and Web 2.0 for the sale, playing and promotion of digital music files. Sales of CDs still greatly out perform those of digital music files. We find out why and present a number of solutions that will enhance users digital music experience. Web 2.0 has dramatically changed the way we use, collaborate and interact using the World Wide Web and this interactivity will play a vital role in the future of digital music. Keywords: Web 2.0, Digital Music, Multimedia 1 Introduction The introduction of digital music to the online market place has revolutionised how we buy, sell, distribute and listen to music. Since EMI released the first ever album to be offered as a digital download, David Bowie’s ‘Hours’ in 1999, the digital music marketplace has evolved and grown at a rapid rate [1]. In May 2007 Apple Inc. and EMI began to sell Digital Rights Management free (DRM free) music files to iTunes customers. Now iTune’s digital music purchasers can buy higher quality and DRM free music that can be played on multiple devices and shared freely with friends. It is inevitable that other music companies and sellers of digital music will follow Apple’s lead allowing more online sharing and swapping of digital music files. With these legal and ethical issues removed this allows for the development of a Web 2.0 application that allows users to not only store and buy digital music but share music with friends. The principal aim of this paper will be to discuss how people are now buying and listening to music, how technology has changed to meet the demands of the digital user, how the role of multimedia designers has also changed and ulitmately how changes in technology, including Web 2.0, will be used to promote music in this new environment. We will begin by presenting a number of key developments in the technology sector that have not only influenced the growth in sales of digital music, but also the way we buy, listen to and share music files. We then discuss our methodology and explain in detail the outcomes of our research questionnaire. We present a summary of findings from this research and conclude with describing our future work based on our findings. 2 New technologies that are effecting how we listen to music 2.1 Visual Radio from Nokia In recent years new technologies have been launched onto the market that combine visual graphics/multimedia content with digital music and offer a more enhanced experience for the user. Visual radio from Nokia streams syncronised and live graphics to users mobile phone via a GPRS connection using an “interactive visual channel” that streams visual and interactive content alongside audio content. Nokia has described visual radio as not just what the listener hears but also what they see and read. As a result radio has become a more valuable promotional tool as listeners know and see what they are hearing [2]. At present these graphics are static but the next logical step would be to introduce enchanced multimedia content. 2.2 Music players for mobile phones A version of iTunes has been developed for Motorola phones where users can syncronise their desktop computers iTunes library with their mobile phone. With this addition to your mobile phones usability, online purchasing of music via over the air (OTA) downloads are predicted to rise greatly over the 98 next 5 years. The International Data Corporation (IDC) is anticipating U.S sales of full-track downloads to surge to $1.2 billion by 2009. This figure stood at zero in 2004 [3]. If developers combine the multimedia possibilities of visual radio with the functionality of iTunes the results could significantly increase the sales of digital music files. Mobile phone users will be able to view moving, static or interactive graphics on their mobile phone that relate to the music they have just purchased and downloaded from iTunes or other digital music stores. There are also great possibilities for marketing messages directly to potential customers using this method of communication [4]. Apple Inc. have also very recently launched the much hyped iPhone. With over a quarter of a million units sold in its launch weekend in June ’07 Apple have managed to capture consumers conciousness through clever use of advertising and press releases even before the product had launched. For Apple the ‘mobile-phone-meets-music-player’ market was an obvious step to take. The iPod, now concidered Apple’s icon product, has reached the height of its functionality with the addition of larger colour screens. The addition of the iPod suite of players including the iPod nano and iPod shuffle are basically a redution in size and functionality to the original iPod. The iPod suite encompasses the iPod family and major design changes from now on would most likely be in the storage capacities, battery life of the products or combining it with mobile phone technology. The iPhone combines a lot of what consumers love about Apple’s products with mobile phone technology. It is essentially a PDA, mobile phone and iPod in one product. Nokia, Motorola and Sony Ericsson have all launched a number of multipurpose devices since 2005 but none combine the functionality of iPhone. Another factor which influences the digital music user is the idea of a digital music package which includes portable and non-portable players. Combinations like iTunes - iPod – iPhone are hard to compete with and with a large number of both Mac and PC users using iTunes on their desktop computers the progression to using iPhone is an easy step to take. Ted Schadler of Forrester Research maintains that iPod and iPhone competitors are failing to utilise the main selling point of Apple’s music playing products. For youthful digital music purchasers the personal computer still plays a critical role as Forrester’s research discovered that 27% of online youth said that they can’t live without their PC while only 4% said that they can’t live without their MP3 player [5]. Another significant advancement in the mobile phone market is OpenMoko ”the World’s first integrated open source mobile communicatons platform” [6]. Currenlty in its alpha stage and not available for use by the general public OpenMoko is more a project than a product with contributions and participation from the development community. This open source mobile phone will free the end user from the traditional constratins associated with mobile phone software. Sean Moss-Pultz of First International Computer (FIC) and the OpenMoko team claims that this open source mobile phone can and will become the portable computer of the future with the potential to be a platform that can do anything that a computer with broadband access can [6]. With this level of control over a mobile phone’s functionality it will be interesting to see what programmers develop for this platform in relation to digital music. 2.3 Portable games consoles with music playing capabilities Play Station Portable (PSP) introduced its Media Manager software in November 2006 creating one of the new media players on the market. PSP in collaboration with Sony has also developed Locationfree a means of accessing your home entertainment system wirelessly from any location. The addition of media software to games consoles brings the digital music market to a different audience than the iPod or PC markets and with multiple enhancements PSP is fast becoming the all round portable multimedia entertainment system. This is significant for the digital music market as the opportunities to design and develop sophisticated graphics and multimedia for games consoles is yet to be fully exploited. 2.4 Agreements between mobile phone companies & music companies In 2004 Vodafone and Sony Music announced “the world's largest single mobile operator/music company content distribution agreement’ [7]. This agreement establishes Vodafone and its Vodafone Live! 3G services as the global leader in bringing enhanced multimedia content to its users worldwide. 99 This content will intially consist of “real music ringtones, polyphonic ringtones, artist images, video streaming and short video downloads” [7]. More recently Vodafone has signed a similar deal with Universal Music Group bringing their music catalogue to over 600,000 tracks. Ceo / Chairman of Universal Music Group International Lucian Grainge has commented that the scale of this agreement shows that both industries, the music industry and the mobile phone industry, are committed to providing a vast range of multimedia content to its customers. Inevitably the music industry has had to embrace the new digital methods of distribution or face huge losses in revenue [8]. 2.5 Media centre systems controlled from one computer via your TV Dell and Apple have also developed media center systems (Dell media centre PC using Microsoft Windows XP Media Centre and Apple Front Row) where users can control all of their home entertainment including music, video, DVD, TV, internet and photo albums from one computer. More often the computer moniter is being replaced by the LCD or Plasma TV where people can view TV, DVDs, listen to music and look at their photo albums all from the comfort of their sofa on a 50” Plasma screen. The computer is becoming the heart of the home entertainment system and people can now purchase music directly from their TV, via the internet, within minutes and play it using a music player such as iTunes over their surround sound audio system. Motion graphics and multimedia content to accompany this music would be an obvious enhancement that has not been fully exploited yet. These are just a few of the many new advances in technology that are effecting the way that we listen to and purchase music digitally. However, to really identify how the digital music industry will change over the next 5 years we carried out primary research and analyis of peoples attitudes towards listening to and buying digital and non-digital music. 3 Methodology 3.1 Aim of the questionnaire We began our research by designing a short questionnaire that asked questions about peoples buying and listening habits in relation to music. The results showed a huge bias towards purchasing CDs over digital music with 56% perferring to buy CDs. The most popular place to purchase CDs was from music shops with HMV gaining 24% of the 54% of people that bought CDs from music shops. The favourite place to listen to music was at home on a CD/Record player. This short questionnaire was used to develop a more comprehensive second questionnaire that looked at a number of key research sections. These sections covered both digital and non-digital music. The first section gathered data relating to peoples personal information such as age, gender and nationality. The second area concentrated on peoples attitudes to CDs and covered topics like buying, listening to and burning of CDs. The third section concentrated on digital music and also covered topics such as buying habits, listening habits and technology associated with digital music. Section four covered peer-to-peer (P2P) downloads and questions asked researched technology, fear of prosecution and the convenience of P2P software. The final section concentrated on over the air (OTA) music purchases on users mobile phones and topics covered included frequency of use, network used and model of phone used. The questionnaire had 62 questions and was distributed face-to-face as a printed hard copy to a sample amount of 50 people. Questions were presented using closed dichotomus and multiple questions. The Likert scale was also used to rate a persons level of agreement or disagreement with a given statement. We used the following scale: Strongly agree, Agree, Disagree, Strongly disagree and Undecided. We positioned Undecided as the last option as opposed to positioning it third. This was to avoid the common mistake of users choosing Undecided, as it is positioned in the middle of options, for large percentages of answers. Final questionnaires were analysed using SPSS (Statistical Package for the Social Sciences). This programme allows researchers to input and analyse data and output graphs and charts that represent this data. Crosstabultions of data can also be carried out with this programme. Through this research we hope to identify what types of music format people buy and listen to, what peoples attitudes are to CDs and digital music and why CD sales still out number digital sales. In the 100 IFPI Digital Music Report 2007 research showed that digital music sales accounted for 10% of all music sales in 2006. This means that 90% of sales were from non-digital formats including CDs [9]. 4 Results 4.1 Personal Information The age range was mainly concentrated in the 21 to 25 age bracket with 44% in this range. The next highest concentrations were in the 31-35 bracket with 24% and 14% in the 26-36 bracket. The remaining 18% were spread over the remaining age brackets. There were almost equal amounts of male and female respondents with 56% being male and 44% being female. It was important to try to get equal amounts, as we did not want the results to be bias towards any one gender. The main nationality represented was Irish with 66% in this area. 26% of respondents did not specify what nationality they were and 6% and 2% were Spanish and African respectively. Unsurprisingly the main respondents were Irish. The large number of unspecified answers will make it difficult to use this data during cross tabulations. However we feel that age and gender will be of more concern to us in this research. 4.2 Research relating to CDs Using the Likert scale we asked a series of questions about peoples attitudes to CDs. The main question that we wanted to answer was why are people still buying CDs? Results showed four main reasons why people buy CDs. Sound quality was a factor with 66% of people feeling that CDs represented good sound quality. Shopping was another major reason. Fig.1 shows that large numbers of respondents felt that they liked going shopping for CDs or at least that it did not deter them from buying CDs. Price, however, was a factor as people felt that CDs did not represent good value for money with 80% agreeing that CDs are too expensive. Packaging was not a major factor with only 38% of respondents agreeing or strongly agreeing that they liked opening a CD package and discovering what was inside Some other general trends in relation to CDs yielded interesting results. People mostly buy CDs from music shops such as HMV or Tower rather than buying CDs from websites and getting them mailed to them. This would support the view that people enjoy the experience of shopping for CDs in a traditional setting such as a music shop. See Fig.2. Fig.1 I rarely buy CDs as I hate shopping for them Fig.2 I mostly buy CDs from websites and get them mailed to me Fig.3 I buy CDs but then copy them to my computer, MP3 player or iPod and listen to the digital format When asked if a CD contained extras such as extra songs, DVD style CDs or free gifts would the purchaser be more likely to buy the CD; 54% felt that it would help to persuade them to make the purchase. This is significant for the design of digital music. If some of these extras could be incorporated into digital music files then perhaps the non-digital buyer may be persuaded to switch allegiance to digital formats over CDs and digital buyers may purchase in larger quantities. Another interesting result shown in Fig.3 revealed that 88% of CD purchasers bought CDs but then burned them to their computer and listened to the digital format. This is significant for many reasons. If people are mainly buying CDs but listening to digital formats why then buy the CD at all? If sound quality is a factor why choose to listen to a compressed format? Perhaps ownership of the music is a 101 deciding factor. With a CD you can listen to the music on as many CD players as you wish, give the CD to as many friends as you like and always have a back up of your music collection. Perhaps it is the experience of going shopping for CDs that people like and results have already supported this theory. Cross tabulations with further questions will attempt to answer this question. 4.3 Research in relation to digital music General questions were asked about whether respondents had or had not bought or listened to digital music. Fig.4 shows that almost double the numbers of people had not bought digital music compared to those that had with 67% responding that they had never bought digital. Fig.4 Have you ever bought digital music (e.g. from iTunes, napster, emusic, 3music) Fig.5 Which of the following would best describe your music listening habits? Fig.6 Which of the following would best describe your music buying habits? However, when asked about their music listening as opposed to their music buying habits large percentages of people listened to digital music but bought CDs. Fig. 5 and 6 show the differences in results form this research. From this series of questions we hoped to develop an understanding of why people would or would not buy digital. Two sets of questions were asked both using the Likert scale. The first set was asked of those who do not buy digital and the second set to those that had bought digital. Results have shown that there are some key reasons that people would buy digital music. Price was a factor with respondents agreeing that digital music was good value for money. 60% either strongly agreed or agreed. Portability was a big factor with 76% of respondents agreeing or strongly agreeing that this was important to them. The ability to purchase one track at a time was also a deciding factor as the control to buy only one song instead of a whole album was important. 61% strongly agreed or agreed. A dislike of shopping was not an issue. Surprisingly the majority of digital music purchasers also liked going shopping in the traditional manner. Only 36% of people felt that they bought digital music, as they disliked going shopping So why then do people not buy digital music? From our results the understanding of technology was not a contributing factor as 85% of people felt that they understood the technology that was associated with buying digital music. Broadband issues were also not a factor as 76% of respondents felt that having or not having broadband did not effect their decision to purchase digital music. Price was not an issue either as most people felt that digital music was good value for money. Not having access to a credit card was also not a factor as 71% of people felt that having or not having a credit card did not effect the decision to buy digital music. When asked if lack of packaging / physical object effected their decision 62% of people responded with disagree or strongly disagree. When asked if sound quality was an issue surprisingly this too did not deter people from purchasing digital music. With most people buying CDs but listening to digital this would suggest the people understood the quality issues associated with digital music. We also asked if not having an iPod / MP3 player deterred people from buying digital music. 74% of people felt that this was not an issue either. So having a portable digital music player is not a deciding factor. From these responses the typical reasons that purchasers would not buy digital can be discounted. This did not tell us, however, why some people did not buy 102 digital music. From analysis of previous questions asked some possible reasons may be due to ownership issues and the fact that respondents simply like to shop. 4.4 Research in relation to Peer-to-Peer (P2P) software The third section of the questionnaire dealt with the usage of peer-to-peer software. Questions were asked to determine the numbers that do and do not use this type of software. The results were almost even with 42% having used peer-to-peer (P2P) software, 50% having never used peer-to-peer and 8% not knowing if they had or hadn’t. The software mostly used was Limewire with 80% of the results. Bit torrent also featured with the next highest results at 15%. Respondents were not overly concerned with the legal implications of using this type of software, as 84% of people answered no in this area. From our results there are two main reasons that people use P2P software. The option of downloading free music was a big factor in the usage of P2P software. 86% of people claimed to use P2P software to have access to free music. Convenience was another main reason. Over 91% of respondents felt that P2P downloads were more convenient than shopping and 73% felt that P2P downloads were more convenient that ripping CDs from friends. Further questions researched why do people not use P2P software. We asked if access to a PC was an issue. For those that had not used P2P software access to computers did not deter them from using P2P software as 88% had some kind of computer access. Broadband or high speed internet access was not a factor in usage as 70% of people either disagreed or strongly disagreed that they hadn’t used P2P software, as they had no broadband access. Those who had not used P2P software felt that they did understand the technology associated with using P2P software. 77% felt that they did understand the technology but still chose to not use P2P software. As with earlier questions on this topic results showed that fear of prosecution by users was not a deciding factor as 74% of people were not concerned with legal implications. So why would people choose not to use P2P software? If the obvious reasons do not play a part perhaps there is a large portion of music purchasers that simply have no interest in or time to download from P2P software. Several users felt that the fear of downloading spyware and virus’ prevented them from using P2P software. 4.5 Research in relation to OTA (over the air) music downloads The final section of the questionnaire related to the use of mobile phone networks to download music directly to your mobile device. An establishing question was asked at the start of this section with 86% of mobile phone owners saying that they had never downloaded music over their mobile network. Of the people who had downloaded music from a mobile network the highest percentage were using the Vodafone network and all of the respondents had only downloaded music once or 2-5 times. From the sample of 50 people questioned we gathered very few responses to this section of the questionnaire. Some of the reasons for this are due to the slow download speeds and high costs but as 3G networks become more widely available this slow adoption of OTA downloads should reduce. 4.6 Cross tabulation research A series of cross tabulations on various results have shown interesting outcomes. In some cases the results have been as expected and in others unexpected. 4.6.1 Cross tabulation: I buy CDs as I like opening a CD package and discovering what is inside & I don’t buy digital as I don’t get a physical object when I buy a digital song see Fig. 7. The results from this cross tabulation were as expected with those that felt that a CDs packaging was important also felt that they did not buy digital as they did not get a physical object. 4.6.2 Cross tabulation: What is your age & I don’t buy digital music as I don’t have a credit card see Fig. 8. Expected results would be that a large percentage of the younger market (16 – 25) would agree that lack of credit cards would deter them from buying digital music. Results showed that those that agreed or strongly agreed were only from the 21 – 30 age group. However, most numbers were concentrated in the disagree or strongly disagree area with only small amounts or no respondents 103 choosing agree or strongly agree. This would suggest that for a small number of 21 – 30 year olds not having a credit card was an issue but for most it did not factor in their decision to buy digital. Fig.7 I buy CDs as I like opening a CD package and discovering what is inside & I don’t buy digital as I don’t get a physical object when I buy a digital song Fig.8 What is your age & I don’t buy digital music as I don’t have a credit card Fig.9 I buy CDs but then copy them to my computer, MP3 player or iPod and listen to the digital format 4.6.3 Cross tabulation: What is your age & I don’t buy digital music, as I don’t understand the technology see Fig.9. Results form this cross tabulation were not as expected. Of those that answered agree or strongly agree that they did not understand the technology all respondents were from either the 21-25 age group or the 26-30 age group with the largest amount coming from the 21-25 age group. Those in the 31-35 and 36-40 age groups felt that they did understand the technology with all responses either choosing disagree or strongly disagree. There were a higher number of responses overall from the 21-25 age group so it is more likely to have a larger variety of results from this age group. However I feel that the results are still significant and unexpected. 5 Summary of findings A brief summary of results has shown that one of the main reasons that people still choose to buy CDs over digital music is that people like to shop. The social interaction and shopping experience is something that has not been reproduced with digital or virtual shopping environments such as iTunes store. One solution to this would be to bring digital music purchases into the traditional shopping environment with, for example, interactive digital shopping booths in music shops or OTA music downloads within the music shop. Another solution would be to make the online or digital experience more like a traditional music shop. Interactivity would play a major role here. Another significant finding was that packaging for CDs was not a major reason that people buy CDs and also that the lack of physical object with a digital music file was also not a major factor in the choice to buy digital. However, of those that felt that CD packaging was important all respondents felt that no physical object with digital music did influence their decision to buy digital. This suggests that in the majority of cases packaging is not an issue but of the small numbers that felt that it was it is also a reason to not buy digital. This supports our reason to create graphically enhanced digital music files in an attempt to create a type of digital packaging. Even if the percentages of people that are swayed by a CDs packaging is very small (in our research only 38% of people felt that packaging was significant) 38% of all of the people that buy CDs per year would amount to a huge number. If even 1% of these people could be persuaded to buy digital over CDs this would amount to a huge jump in revenue for digital music sellers. The next most significant finding was that most people buy CDs but listen to digital music. CD purchasers are burning their music collections to computers or iPods and only listening to the digital format. This is quite significant as if CD purchasers can be persuaded to change their music buying habits to digital there would be a major shift in how people buy music. CD 104 purchasers have already embraced digital music as a format to listen to and so half of the process has been taken care of. However there are still reasons when it comes to actually buying music CDs over digital that CDs are the format of choice; ownership and Digital Rights Management (DRM) issues and the fact that people simply like to shop seem to be contributing factors. If these issues could be rectified with design then, even a small percentage of CD purchasers may be persuaded to switch to buying mainly digital music. This would have a huge impact on the music industry 6 Conclusions and future work So far we have pinpointed several devices that people, at present and in the future, listen to music on. We have devised a series of multimedia and Web 2.0 design solutions that combine this information and the information gathered from questionnaires that attempt to solve some of the issues raised. We will begin by developing a Web 2.0 design solution that will take the form of an online application that mimics a users CD collection. An in store digital music application will also be developed. This may take the form of a booth or listening point, which are already familiar to in store music purchasers. For mobile phone devices with wireless capabilities this application can be accessed and your entire music collection can be listened to from your mobile phone. This mobile phone application will also be able to scan digital information from CDs in store allowing users to purchase digital music content over an in store wireless network. At present the technology for this exists in Japan where metro users can use their mobile phones to scan the metro turnstiles and enter the underground. They are then charged to their phone bill for the service. With emerging network technologies such as IEEE 802.11n claiming transmission rates of up to 600 Mbps this idea is technically feasible in the near future. References [1] [2] [3] [4] [5] [6] [7] [8] [9] EMI Group, EMI Music launches DRM-free superior sound quality downloads across its entire digital repertoire Press Release, April 2, 2007. [Online]. Available: http://www.emigroup.com/Press/2007/press18.htm. [Accessed: 03 August 2007]. Visual Radio, Visual Radio :: Redefining the Radio Experience, 2005. [Online]. Available: http://www.visualradio.com/1,121,,,541.html.& http://www.visualradio.com/1,121,,,412.html. [Accessed: 24 Feb. 2006] [Campey, R, Roman, P, Lagerling, C, 2005] Campey, R et al (2005). The search for Mobile Data Revenue II – a Sector Overview of Mobile Music. GP Billhound Sector report, London. Motorola, Motorola SLVR with iTunes, 2006. [Online]. Available: http://www.motorola.com/motoinfo/product/details/0,,139,00.html. [Accessed: 30 Feb. 2006] [Mello, J.P. Jr., 2005] Mello, J.P.Jr., (2005). iPod slayers misdirecting efforts, 2005. [Online]. Available: http://www.technewsworld.com/story/46236.html. [Accessed: 25 Oct. 2005] [Moss-Pultz, S, 2007] Moss-Pultz, S, 2007 Openmoko Announce Free Your Phone, 2007. [Online]. Available: http://lists.openmoko.org/pipermail/announce/2007-January/000000.html. [Accessed: 03 Aug. 2007] Vodafone, Media Centre – Vodafone and Sony Music Entertainment hit global high note, May 23, 2004. [Online]. Available: http://www.vodafone.com/start/media_relations/news/group_press_releases/2004/press_releas e23_05.html. [Accessed: 01 March 2006] Vodafone, Media Centre - Vodafone and Universal Music Group International sign strategic partnership, Nov. 14, 2005. [Online]. Available: http://www.vodafone.com/start/media_relations/news/group_press_releases/2005/press_releas e14_11.html. [Accessed: 01 March 2006] IPFI: 07 Digital Music Report 2007, International Federation of Phonographic Industry, London. 105 Session 4 Algorithms 107 Adaptive ItswTCM for High Speed Cable Networks Mary Looney 1, Susan Rea 1, Oliver Gough 1, Dirk Pesch 1 1 Cork Institute of Technology, Cork, Ireland {mary.looney, susan.rea, oliver.gough, dirk.pesch}@cit.ie Abstract The use of traffic conditioning in high speed networks is significant in today’s cable industry due to the increased demand of real-time data services such as video streaming and IP telephony. Various traffic conditioning techniques exist such as traffic shaping, policing and metering. The focus of this paper is a Rate Adaptive Shaper (RAS), known as the Improved Time Sliding Window Three Colour Marker (ItswTCM). This RAS was proposed to improve the fairness index in differentiated service networks and is based on the average arrival rate of packets over a constant window period of time. For high speed networks the window size required is large due to the large delay-bandwidth product incurred. For ItswTCM the window size is held constant which does not greatly improve network efficiency. This paper concentrates on applying an adaptive sliding window, known as the Improved Time Sliding Window (ITSW), to the ItswTCM algorithm to produce an adaptive sliding window TCM mechanism. The behaviour of this Adaptive ItswTCM algorithm is examined under simulation conditions in a high speed DOCSIS environment. Keywords: Traffic Conditioning, ItswTCM, DOCSIS, Adaptive Window Scaling. 1 Introduction With the increase in demand for symmetric real-time services, Data Over Cable Service Interface Specification (DOCSIS) has been successful in providing cable operators with the high speed data transfer required [1]. The original specification, DOCSIS 1.0, provided the cable industry with a standard based interoperability to allow for high speed web browsing and describes the communications and support operator interface within a fully deployed Hybrid Fiber Co-axial (HFC) network. With the increase in advanced IP services such as voice over IP (VoIP) and real time data services, DOCSIS 1.0 needed to be upgraded to support greater levels of Quality of Service (QoS) and to meet market demands for QoS. Hence the introduction of DOCSIS 1.1 which added key enhancements to the original standard, enabling it to support several levels of QoS while also improving bandwidth efficiency and supporting multiple service flows (SFs). Another significant aspect of DOCSIS 1.1 is that it is backward compatible. For QoS support, DOCSIS 1.1 specifies a number of enhancements to the DOCSIS 1.0 standard. Firstly, the DOCSIS 1.0 QoS model has been replaced with a SF model that allows greater flexibility in assigning QoS parameters to different types of traffic and in responding to changing bandwidth conditions. Support for multiple SFs per cable modem (CM) is permitted. Greater granularity in QoS per CM is applied, allowing it to provide separate downstream rates for any given CM to address traffic conditioning and rate shaping purposes. To support on demand traffic requests, the creation, modification and deletion of traffic SFs through dynamic MAC messages is also supported. The focus of this paper is on traffic conditioning and rate shaping for increased downstream throughput within a DOCSIS environment. Traffic conditioning improves network efficiency of high speed cable networks and maximises throughput by rate limiting the flow of packets over the downstream network. It reduces retransmissions in the network by smoothing traffic rates and dropping packets for more dependable operation. Effective traffic shaping and policing algorithms already exist, some based on window flow 108 control, others based on rate and prediction flow controls [2]. One particular algorithm used in Differentiated Services (DiffServ) networks is known as the Improved Time Sliding Window Three Colour Marker (ItswTCM) [3]. This algorithm was proposed to improve fairness in DiffServ networks due to the increased demand for greater QoS in the internet. As a result it improved throughput in DiffServ networks and was therefore applied to a DOCSIS network to provide greater performance in the network. The ItswTCM uses a time sliding window along with a colour marking scheme for the conditioning of its traffic. It is a rate estimation algorithm that shapes traffic according to the average rate of arrival of packets over a specific period of time (i.e. a window length). This period of time is preconfigured to be a constant value of either a short value in the order of a round trip time of a TCP connection, or a long value in the order of the target rate of the SF [6]. This constant value limits the potential of the algorithm. For instance, when working with high speed cable networks a large window size would be required due to the large delay-bandwidth product sustained [2]. The static nature of the window can lead to bandwidth wastage. Dynamically changing the characteristics of the traffic shaper could result in greater throughput in the network [7] [8]. If the window length was variable, adapting to its particular environment, performance of the ItswTCM should greatly improve. Various window adaptation algorithms exist to maximise network throughput [4] [5] [6]. One such algorithm is called the improved TSW (ITSW) [9]. This algorithm is based on the original TSW that was used in the creation of the ItswTCM algorithm. It differs from the TSW in that its window length is varied and not held constant allowing the ITSW to adapt to its environment. The main contribution of this paper is the merging of the adaptive ITSW with the ItswTCM algorithm to produce an Adaptive ItswTCM to be used in a high speed DOCSIS network. Simulation results will demonstrate the beneficial effects of this Adaptive ItswTCM algorithm within a DOCSIS environment. The layout of the paper is as follows: section 2 reviews the TSW algorithms with a focus on ItswTCM and ITSW algorithms. The merging of these algorithms to produce an Adaptive ItswTCM is discussed. The DOCSIS environment where the Adaptive ItswTCM algorithm will be applied is described in Section 3. Experimental setup and performance results are presented in Section 4 and finally the paper end with the conclusions that are drawn as a consequence of this work. 2 Traffic Conditioning in High Speed Networks Traffic Conditioners are typically deployed in high speed networks to regulate traffic flow in order to avoid overloading intermediate nodes in the network. Various traffic shaping and marking schemes exist such as leaky buckets [10] and token buckets such as the single rate three colour marker (srTCM) and the two rate three colour marker (trTCM) [11][12]. Rate Adaptive Shapers (RAS) are another type of traffic shaping mechanism used to produce traffic at the output that is less bursty than that of the input. Recently, RAS have been successfully combined with the marking schemes of the above mentioned token buckets to produce the single rate RAS (srRAS) and the two rate RAS (trRAS) algorithms [13] [14]. These RAS schemes are mainly used in the upstream direction [13]. Since the concern of this paper is in the downstream direction of DOCSIS networks another type of RAS was considered. This is known as the time sliding window (TSW) [15]. The TSW algorithm is based on the average rate of arrival of packets and traffic is conditioned according to this value. The marking schemes associated with the srTCM and trTCM was later adapted to the TSW algorithm so that traffic streams could be metered and packets marked accordingly [16]. This algorithm is known as the Time Sliding Window Three Colour Marker (TSWTCM). The unfairness of this algorithm in differentiated services has been discussed and a solution to solve this unfairness problem was proposed in [3] and this is referred to as the Improved TSWTCM (ItswTCM) algorithm. The work presented in this paper uses this algorithm and applies an adaptive window size for improved downstream throughput. 109 2.1 ItswTCM The underlying principle of the ItswTCM is that packets are permitted into the network in proportion to their Committed Information Rate (CIR) depending on their estimated average arrival rate (avg_rate) of the network over a specific preceding period of time (win_length_const). A constant value for win_length_const is normally adhered to [15] [17]. The avg_rate and the time the last packet arrived (prev_time) are variables used within the algorithm that are updated each time a packet arrives (as in equation 1). bytes _ in _ TSW = avg _ rate * win _ length _ const ; new _ bytes = bytes _ in _ TSW + pk _ size; avg _ rate = new _ bytes /(curr _ time − prev _ time + win _ lenth _ const ); Figure 1: Algorithm for the TSW in ItswTCM The coloured marker in this algorithm is focused on smoothing traffic in proportion to its CIR and injecting yellow packets into the network to achieve a fair share of bandwidth across the network. Hence, yellow packets play a significant role in this algorithm. if (CIR < avg _ rate <= PIR) packet = yellow; elseif (avg _ rate <= CIR ) packet = green; else packet = red ; Figure 2: Algorithm for the colour marker in ItswTCM In this algorithm the service rate is guaranteed if the avg_rate is less than the CIR i.e. these packets are marked green. If the avg_rate is greater than CIR but less than the Peak Information Rate (PIR) then the packets are marked as yellow, thus allowing larger flows to be able to contend with smaller ones. However, if the avg_rate exceeds the PIR then packets are marked as red. The CIR and PIR are determined from the networks maximum and minimum guaranteed bandwidth rates, Tmax and Tmin as illustrated in Equation 1. PIR = T max/ 8 CIR = T min/ 8 Equation 1. Considering the ItswTCM is based on a constant previous period of time an efficient use of the network is not always reflected. For high speed networks large window sizes would be required which in some cases might permit larger bursts of traffic into the network and less smoothing or shaping of traffic, which is not ideal. Larger buffering would also be required at each node. This may be overcome however with the use of an adaptive window scaling algorithm. If such an algorithm was merged with the ItswTCM, performance could be improved resulting in the smooth injection of traffic into a network. 2.2 ITSW The improved TSW (ITSW) is an adaptive window scaling algorithm, which uses a variable window length method and is a variant of the original TSW [9] [15]. The variable window length accommodates and reflects the dynamics of TCP traffic. As previously mentioned, in the original TSW the window length is preconfigured to be a constant value of either a short value in the order of a round trip time of a TCP connection, or a long value in the order of the target rate of the SF [6]. For ITSW a combination of both of these are used to determine the variable window length as shown in Equation 2 below. 110 § ¨ ¨ target_rate win _ len = ¨ n ¨ ¨ ¦ target_ratei ¨ i =1 n © · ¸ ¸ ¸ × win _ length _ const ¸ ¸ ¸ ¹ Equation 2 The constant window length used here is the same as was used in the original TSW algorithm. The incorporation of target rates into the window length allows the window to adjust according to its environment. This equation still permits high speed networks to adapt to a large window size if required due to the involvement of their target rates. 2.3 Adaptive ItswTCM The ITSW has improved performance over the original TSW. It allows for greater fairness in networks and hence throughput. To improve the ItswTCM algorithm it is merged with the ITSW so that a variable window length is now used instead of a constant value as described in Figure 3 below. win_len is the variable window length as described in Equation 1 above. bytes _ in _ TSW = avg _ rate * win _ len; new _ bytes = bytes _ in _ TSW + pk _ size; avg _ rate = new _ bytes /(curr _ time − prev _ time + win _ length _ const ); Figure 3: Adaptive ItswTCM algorithm The Adaptive ItswTCM can now adapt to changing network conditions. It is expected to outperform that of the static window used in the original ItswTCM. 3 DOCSIS Simulation Environment For experimental investigation a computer simulated DOCSIS environment is required. CableLabs [18], in conjunction with OPNET have developed a model for the HFC DOCSIS 1.1 specification using the OPNET simulator [19]. The model includes features relevant to both DOCSIS 1.0 and 1.1, and allows the creation of complex networks so that analysis and evaluation of alternative configurations can be performed to determine capacity and Quality of Service (QoS) characteristics. Using this environment the Adaptive ItswTCM algorithm is implemented to provide enhanced QoS features. The OPNET DOCSIS implementation is based on the Radio Frequency (RF) Interface Specification 1.1 for equipment and network design and planning [20]. Traffic scheduling classes such as unsolicited grant service (UGS), real time polling service (rtPS), non real time polling service (nrtPS) and best effort (BE) are all modelled in the OPNET DOCSIS model as well as upstream QoS features such as fragmentation, concatenation, contention, piggybacking and payload header suppression (PHS) to enhance utilisation of bandwidth. Upstream and downstream RF parameters are all configurable and multiple channels are supported in both the upstream and downstream direction. However, the model is limited in some of its capabilities as listed below [20]. • The dynamic creation, deletion and modification of services are not permitted in the model. • Multiple SFs are not permitted. • Enhancements to QoS features are not implemented. This includes Connection Admission Control (CAC) and traffic shaping and policing. • Additional security features and oversubscription rates are not modelled. To provide realistic results and to comply with the DOCSIS 1.1 standard the following features were modelled along with the Adaptive ItswTCM algorithm [21]. • Multiple SFs 111 • 4 CAC is modelled as a first come first serve resource reservation policy in which requests for bandwidth guarantees are rejected if the resulting total utilisation would exceed some specified threshold. This threshold is based on the total available bandwidth (in both the US and DS) and the maximum guaranteed bandwidth, Tmax, that any user is allowed have. Tmax is operator dependant. (Tmin is the minimum guaranteed bandwidth). CAC is implemented during the CM registration phase with the bandwidth requirements being assessed to control the traffic entering the network so that each SF obtains its desired QoS. Simulation Setup and Results Simulation experiments were conducted to investigate the impact of the proposed Adaptive ItswTCM algorithm and the ItswTCM algorithm on downstream throughput performance. The associated algorithm throughputs and End to End DOCSIS network delays are examined to determine performance in moderately loaded and uncongested network environments with the proportions of red, green and yellow packets generated by both schemes being analysed. 4.1 Simulation Environment The OPNET simulator was used to demonstrate performance. Firstly, a set of ftp scenarios were set up for analysis in moderately loaded and uncongested networks [22]. For each set of experiments the original ItswTCM and the Adaptive ItswTCM schemes are used. Each scenario is set up with 50 cable modems (CMs) connected to one cable modem termination system (CMTS). Four downstream channels with data rates of 41.2Mbps, 38Mbps, 31.2Mbps and 27Mbps are configured along with a single US channel with a data rate of 10.24Mbps. Each simulation is run for thirty minutes. CMs are configured to ftp files across the DOCSIS network as follows: For an uncongested network environment 50% of CMs will ftp 35KByte files, 25% will transfer files of size 30KByte, and the remaining 25% will transfer 40KByte files. For a moderately loaded network environment all CMs will ftp files of size 1MByte. Inter-request times of (exponential) 360, 300 and 400 seconds respectively were used. All traffic is forwarded as BE. The maximum guaranteed bandwidth value, Tmax is set to 20Mbps for each downstream and the minimum guaranteed bandwidth, Tmin is set to 0Mbps for each downstream. A traffic burst size of 1522 Bytes is used. The constant window length, win_length_const is set to 90msecs. The target_rate of each SF refers to the maximum guaranteed rate that is equivalent to Tmax for each flow. For the second set of scenarios, a traffic mix of ftp, email and video is used. Again, the performance of the original ItswTCM and Adaptive ItswTCM is analysed and the DS and US channel details are as described above with 50CMs. The CMTS is configured to transfer MPEG-2 compressed video streams with a frame rate of 50 frames per second (fps) across the DOCSIS network to 50% of CMs. 25% of CMs will ftp 35KByte files with an inter-request time of (exponential) 36 seconds. The CMTS is also configured to send email data to the other 25% of CMs with an email size of 1000Bytes. 4.2 Analysis of Coloured Packets Table 1 shows the number of green, yellow and red packets generated for simulations with moderately loaded and uncongested networks. On comparing both algorithms it is evident that for the uncongested network traffic conditioning does not play a major role as the numbers of green, yellow and red packets generated are the same. However, for the moderately loaded network the adaptive ItswTCM algorithm outperforms the original algorithm by adapting its window length to the target rate of the network. Note that in the explanation of ItswTCM algorithm its purpose was to smooth traffic into the network, hence the injection of yellow packets rather than green. This is reflected in the table below with the majority of packets coloured yellow. Green packets are only permitted if the average rate is less than the CIR. As CIR = 0bps, no green packets are allowed. In comparing both algorithms, the number of yellow packets is greater for the Adaptive ItswTCM algorithm (almost 9100 packets more) than the original ItswTCM algorithm and the number of red packets is zero in comparison to 9105 packets for the original algorithm. Adaptive window scaling results in better control of packet rate 112 resulting in fewer red conditions. Also for the moderately loaded case, the aggregate number of yellow and red packets is greater for the Adaptive ItswTCM algorithm showing a greater throughput than that of the ItswTCM. Uncongested Network Moderately Loaded Network Green Yellow Red Green Yellow Red 0 0 0 ItswTCM 187335 376591 9105 0 0 0 Adaptive ItswTCM 187335 385664 0 Table 1: Traffic Marking using ItswTCM and the Adaptive ItswTCM algorithms The number of green, red and yellow packets for the set of scenarios with a different traffic mix is represented in Table 2. It verifies the conclusion drawn to from Table 1 above that the Adaptive ItswTCM outperforms the ItswTCM in throughput. The Adaptive ItswTCM exhibits a lower number of red packets and therefore a greater amount of yellow packets, coinciding with results presented in Table 1 above. Traffic Mix Green Yellow Red ItswTCM Adaptive ItswTCM 0 0 379567 392493 596306 583547 Table 2: Traffic Marking using ItswTCM and the Adaptive ItswTCM algorithms For the moderately loaded network in Table 1, there is only a difference of 2% in the number of yellow/red packets. However, we see in Table 2 for a greater traffic mix that the difference in yellow/red packets is consistent i.e. a 3% difference. Hence, there is a coherent improvement throughout the experiments when using the Adaptive ItswTCM. This is also reflected in the following section. 4.3 Throughput Performance The DS bus throughput values were recorded for the ftp scenarios described above. These values represent the total throughput in bits/sec on all downstream channels. The maximum, minimum and average values for each scenario were recorded over the simulation duration of 30 minutes. The results of these throughput figures in bits per second are shown in Table 3 below. ItswTCM Adaptive ItswTCM Uncongested Network Moderately Loaded Network Minimum Average Maximum Minimum Average Maximum 93,838 1,316,922 18,706,053 48,152 817,512 48,152 93,838 1,437,009 21,129,809 48,152 817,512 48,152 Table 3: Minimum, Average and Maximum Throughput values for the ItswTCM and Adaptive ItswTCM algorithms As can be seen from Table 3 for the uncongested networks the bus throughputs are exactly the same as would be expected from results shown in table 1 above. For the moderately loaded network however, the average throughput is greater for the Adaptive ITSWTCM algorithm by approximately 120000 bits/sec. This again reflects that the Adaptive ITSWTCM algorithm performs better than the ItswTCM showing a noticeable improvement in throughput performance. For the traffic mix scenarios, the average DS bus throughput is plotted in Figure 4 below. Here we see that the throughput for the Adaptive ItswTCM is approximately 18% greater than that of the original ItswTCM. This shows the consistency of greater performance for the Adaptive ItswTCM. 113 Figure 4: Average DS Bus Throughput for the ItswTCM Algorithms 4.4 DOCSIS Delay The End to End delay was also recorded for both sets of scenarios and their average values are represented in Figure 5 for a simulation duration of 30 minutes. The delay for the original ItswTCM is greater than that of the Adaptive algorithm for both cases. Even though this difference in delay is very small it is still consistent, thus confirming the greater DS throughputs experienced above. 5b) 5a) Figure 5: DOCSIS End to End Delays for the ItswTCM Algorithms for a) moderately loaded network and b) scenarios set up using a traffic mix 5 Conclusion This paper presented an Adaptive ItswTCM algorithm from the collaboration of existing algorithms, ITSW and ItswTCM. From a consistency of results presented in Section 4, it can be concluded that this adaptive algorithm provides improved throughput and performance within a DOCSIS environment over the original ItswTCM algorithm. End to End delay was decreased leading to an increase in DS throughput. The importance of traffic shaping for the enhancement of QoS features within a DOCSIS environment was also discussed. A network simulated DOCSIS model was illustrated and it was this simulated model that was used in simulations to confirm that the adaptive ItswTCM algorithm had greater throughput and performance in a DOCSIS environment than the algorithm that is based on a constant period of time. Future work will be based on an active queue 114 management policy that will queue red packets rather than simply dumping them. This obviously degrades performance as can be seen in the number of red packets in Table 2 above. Acknowledgements This project is supported by a European Commission Framework Programme (FP6) for Research and Technological Development titled “CODMUCA - Core Subsystem for Delivery of Multiband data in CATV networks”, IST-4-027448-STP. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] www.cablemodem.com/specifications/ Chao, H. J., Guo, X. (2001). Quality of Service Control in High-Speed Networks. John Wiley & Sons, Inc: New York, USA, pp.235-240. Su, H., Atiquzzaman, M. (2001). ItswTCM: A new aggregate marker to improve fairness in Diffserv. Proc. of the Global Telecommunications Conference, 3: 1841-1846. Mitra, D. (1992). Asymptotically optimal design of congestion control for high speed data networks. IEEE Transactions on Communications, 40:301-311. Mitra, D. (1990). Dynamic adaptive windows for high speed data networks: theory and simulations. ACM SIGCOMM Computer Communication Review, 20:30-40. Byun, H.-J., Lim, J-. T. (2005) Explicit window adaptation algorithm over TCP wireless networks. Proc. IEE Communications, 152: 691-696. Ahmed, T., Boutaba, R., Mehaoua, A. (2004). A measurement based approach for dynamic QoS adaptation in DiffServ networks. Journal of Computer Communications 28: 2020-2033. Elias, J., Martignon, F., Capone, A., Pujolle, G. (2007). A new approach to dynamic bandwidth allocation in Quality of Service networks: Performance and bounds. Journal of Computer Networks, 51: 2833-2853. Nam, D.-H., Choi, Y.-S., Kim, B.-C., Cho, Y.-Z. (2001). A traffic conditioning and buffer management scheme for fairness in differentiated services. Proc. Of ATM (ICATM 2001) and High Speed Intelligent Internet Symposium, 91-96. Niestegge, G. (1990). The “leaky bucket” policing method in the ATM (asynchronous transfer mode) network. International Journal on Digital Analog Communications Systems, 3:187-197 Heinanen, J., Guerin, R. (1999). A single rate three colour marker. Internet Draft: RFC 2697. Heinanen, J., Guerin, R. (1999). A two rate three colour marker. Internet Draft: RFC 2698 Zubairi, J.A-., Elshaikh, M.-A., Mahmoud, O. (2001). On Shaping and Handling VBR traffic in a Diffserv domain. Englewood Cliffs NJ: Prentice-Hall. Shuaib, K., Sallabi, F. (2003). Performance evaluation of rate adaptive shapers in transporting MPEG video over differentiated service networks. Proc. Of Communications, Internet and Information Technology, pp. 424-428. Clark, D.D., Fang, W. (1998). Explicit allocation of best effort packet delivery service. IEEE/ACM Transactions on Networking, 6: 362-373. Fang, W., Seddigh, N., Nandy, D. (2000). A time sliding window three colour marker. Internet Draft RFC 2859. Strauss, M. D., (2005). A Simulation Study of Traffic Conditioner Performance. Proc. of IT Research in Developing Countries, 150: 171-181. www.cablelabs.com www.opnet.com Specialised Models User Guide, DOCSIS Model User Guide, SP GURU/Release 11.5, SPM 21. Looney, M., Rea, S., Gough, O., Pesch, D., Ansley, C., Wheelock, I. (2007). Modelling Approaches to Multiband Service Delivery in DOCSIS 3.0 – An Architecture Perspective. Symposium on Broadband Multimedia Systems and Broadcasting. Martin, J., Shrivastav, N. (2003). Modelling the DOCSIS 1.1/2.0 MAC Protocol. IEEE Proc. Of Computer Communications and Networking. pp.205-210. 115 Distributed and Tree-based Prefetching Scheme for Random Seek Support in P2P Streaming Changqiao Xu 1,2,3, Enda Fallon 1, Paul Jacob 1, Austin Hanley1, Yuansong Qiao 1,2,3 1 Applied Software Research Centre, Athlone Institute of Technology, Ireland 2 Institute of Software, Chinese Academy of Sciences, China 3 Graduate University of Chinese Academy of Sceinces, China [email protected], {efallon, pjacob, ahanley, ysqiao}@ait.ie Abstract Most research on P2P streaming assumes that users access video content sequentially and passively, where requests are uninterrupted from the beginning to the end of the video streaming. An example includes P2P live streaming system in which the peers start playback from the current point of streaming when they join streaming session. This sequential access model is inappropriate to model on-demand streaming, which needs to implement VCR-like operations, such as forward, backward, and random seek because of the users pattern of viewing at will and the users ignorance of the content. This paper proposes a distributed and Balanced Binary Tree-based Prefetching scheme (BBTP) to support random seek. Analysis and simulation show BBTP is an efficient interactive streaming service architecture in P2P environment. Keywords: P2P, Balanced binary tree, Prefetching, Random seek 1 Introduction Using a peer-to-peer (P2P) approach to provide streaming services has been studied extensively in recent years [1, 2, 3, 4, 5]. P2P consumes the bandwidth of peers efficiently by capitalizing on the bandwidth of a client to provide services to other clients. An important advantage of P2P streaming is to provide system scalability with a large number clients sharing stream. Besides, P2P streaming can work at the application layer without requiring any specific infrastructure. A P2P live streaming system always assumes that a user who joins a streaming session would receive streaming from the point of joining time and keep on watching till it leaves or fails the session. However, for on-demand streaming, through analysis of large volumes of user behavior logs during playing multimedia streaming in paper [6], the user viewing pattern indicates that random seek is a pervasive phenomenon. The authors of [6] propose a hierarchical prefetching scheme which prefetchs popular and sub-popular segments to support random seek based on examination of the large amount of user viewing behavior logs. This scheme has these limitations: 1)not flexible, before bringing the prefetching scheme into effect, it should examine and collect the user viewing logs which need much user access information and suffer a long time testing; 2) impractical, it possibly collects access logs in traditional client-server model, however, it challenges the implementing in a distributed P2P system. In VMesh [7], videos are divided into smaller segments (identified by segment IDs) and they are stored in peers distributed over the network based on distributed hash tables (DHT). A peer may store one or more video segments in its local storage. It keeps a list of the peers who have the previous and the next video segments. By following the list, it can find the peers who have the next requested segments. If the client wants to jump to another video position which is not far away from the current one, it can simply follow its forward/backward pointers to contact the new nodes. On the other hand, if the new position is too far away, it triggers DHT search for the segment corresponding to the new position. However, keeping all 116 the pointers would be very costly. In this paper, we propose a novel scheme a distributed and Balanced Binary Tree-based Prefetching scheme (BBTP) to distribute video segments over the network and support random seek. The rest of the paper is organized as follows. A prefetching scheme of BBTP is discussed in section 2. Section 3 discusses the random seek support procedure of BBTP. The performance of BBTP is evaluated in section 4. Finally, section 5 concludes the paper and offers some future research directions. 2 Prefetching Scheme of BBTP The overlay network in BBTP is a balanced binary tree structure. A tree is balanced if and only if at any node in the tree, the height of its two subtrees differs by at most one. It has been shown that a balanced binary tree with N nodes has height no greater than 1.44logN [8]. Table 1 Notations used of prefetching scheme of BBTP S The source media server T The balanced binary tree P The requested video for playback Pid P’s identifier number Length of P Len R The root node of the balanced binary tree X Node of T Prebuf(X) Node X’s prefetching buffer d Size of Prebuf(X) Seg(X) The serial number of prefetching unit for node X Node X’s parent node in the balanced binary tree Parent(X) LChild Left child node in the tree RChild Right child node in the tree LHeight Height of the left subtree in the tree RHeight Height of the right subtree in the tree As the table 1 shows, we suppose the source media server is S, the balanced binary tree is T, the requested video stream is P coded by CBR rate with Len length. We divide the video P into equal segments (the length of a segment is 1 s for playback), and set d numbers of sequential segments as a prefetching unit, so P have ⎡Len d ⎤ numbers of prefetching units which are numbered from 1 to L sequentially. Any node X has a prefetching buffer named as Prebuf(X) with size of d. Supposing Seg(X) is the serial number of prefetching unit for Prebuf(X), the node X should accomplish two important operations when it joins the system: 1) Node X should become a leaf node of the balanced binary tree. (2) Node X should prefetch a prefetching unit whose serial number equals Seg(X) into Prebuf(X) from the source media server or other nodes’ prefetching buffer. ⎣L/16⎦ P 1 ⎣3L/16⎦ ⎣5L/16⎦ ⎣L/4⎦ ⎣L/8⎦ ⎡5L/8⎤ ⎡L/2⎤ ⎡15L/16⎤ ⎡13L/16⎤ ⎡11L/16⎤ ⎣9L/16⎦ ⎣7L/16⎦ ⎣3L/8⎦ ⎡3L/4⎤ L ⎡7L/8⎤ 1 2 3 T 4 8 5 9 10 6 11 12 7 13 14 15 Fig. 1 Prefetching scheme of BBTP As fig. 1 shows, We suppose R is the root node of T. At the level 1 of T, we set the mapping relationship between the corresponding prefetching unit of P’s middle position with Prebuf(R), namely have Seg(R)= ⎡L 2⎤ . At the level 2 of T, we suppose R1, R2 are node R’s left and right child node respectively and divide P into two subsections L1 and L2 with the same length. Assuming Mid(L1) and Mid(L2) are the corresponding serial number of prefething unit for L1’s and L2’s middle position 117 respectively, we set the mapping relationship between Mid(L1), Mid(L2) with Prebuf(R1) and Rrebuf(R2) respectively, namely have Seg(R1)=Mid(L1), Seg(R2)= Mid(L2). At the level 3 of T, divide L1, L2 into two equal subsections L11, L12 and L21, L21 respectively, we set the mapping relationship between the corresponding prefething unit of middle position of L11, L12 and L21, L21 with R1’s, R2’s left and right node’s prefetching buffer respectively. The above operations are repeated for each tree level until P can not be divided anymore (the subsection length is less than a prefetching unit length), we set the prefetching unit’s serial number the same as its parent node’s. For node X, assuming its parent node is Parent(X), k(X) is the X’s tree level in T, k(R)=1, we have equations as follows: 1) If k(X)=1, X is the root node of T , then have Seg ( X ) = ⎡L / 2⎤ (1) 2) If 1 < k ( X ) ≤ ⎣log L ⎦ + 1 ① if X is LChild L ⎥ ⎢ (2) Seg ( X ) = ⎢ Seg ( Parent ( X )) − k ( X ) ⎥ 2 ⎦ ⎣ ② if X is RChild L ⎤ ⎡ (3) Seg ( X ) = ⎢ Seg ( Parent ( X )) + k ( X ) ⎥ 2 ⎥ ⎢ 3) If k ( X ) > ⎣log L ⎦ + 1 Seg ( X ) = Seg ( Parent ( X )) (4) Supposing Sibling(X) is the sibling node of X, if Seg(X)=Seg(Parent(X)), so have Seg(X) = Seg(Sibling(X)), if Seg(X) = Seg(Parent(Parent(X))),then have Seg(X) = Seg(Sibling(Parent(X))). The nodes which have the same prefetching unit can become the prefetching suppliers of X when X joins system, which can avoid all nodes to prefetch video streaming from S directly and lighten the load of S. Assuming Presuppliers(X) is the aggregate of nodes which have the same prefetching unit, Bw[i](X) is the node i’s usable bandwidth in Presuppliers(X), we will always choose the node whose Bw[i](X) is maximal as prefetching buffer supplier of X. From what have discussed above, we draw the prefetching algorithm and the procedure for constructing balanced binary tree of BBTP: 1) X sends message Join<X,Pid> to S; 2) If there is no request log for the video P in S, establish the balanced binary tree named T and set X as the root node of T, set the identifier number of X as 1, k(X) as 1, and X prefetchs the prefetching unit whose serial number equals ⎡L / 2⎤ from the source media server S; 3) If there is a request log for the video P in S, R redirects X to its LChild in the tree if LHeight is less than or equals RHeight of R, or otherwise to its RChild. The above operations are repeated until the corresponding child is empty, and node X is then inserted to this position as leaf node. Assuming the parent node of X is node Y, X Get the value of k(Y), Seg(Y) from Y, set k(X)=k(Y)+1; 4) Node X sends a “HeightChange” message to its parent. Upon receiving the message, the parent resets its LHeight as LHeight+1 or RHeight as RHeight+1, depending on which branch the message comes from, and then calculate its new height as max(LHeight, RHeight). If the height is changed, the node sends “HeightChange” message to its parent. This process continues until the R node of the balanced binary tree is reached; 5) If 1 < k ( X ) ≤ ⎣log L ⎦ + 1 , X calculates the value of Seg(X) by the equation (2) when X is LChild or by equation (3) when X is RChild. Node X prefetchs the prefetching unit whose serial number equals Seg(X) from S; 6) If k ( X ) > ⎣log L ⎦ + 1 , set Seg(X)= Seg(Y) and put Sibling(X) and Y into Presuppliers(X), if Seg(X) equals Seg(Parent(Y)) , put Sibling(Y) into Presuppliers(X). Calculate the usable bandwidth Bw[j](X) of node j in Presuppliers(X) and copy all the prefetching buffer content from node j who have maximal value of Bw[j](X). For the step of 6), we only search two levels above node X, namely Parent(X) and Parent(Parent(X)). For adding more candidate nodes into Presuppliers(X), we can search more tree levels above node X until we encounter node Z and Seg(Z) does not equal Seg(X). In the process of 118 constructing the balanced binary tree, every node needs to keep three peer pointers (i.e., peers’ IP address and port): parent, left child, right child. 3 Random Seek Support Procedure A peer can find the peers who have the next requested segments that are its right subtree nodes and also can find the peers who have the previous requested segments that are its left subtree nodes. If the client wants to jump to another video position which is not far away from the current one, it can simply follow its left/right pointers to contact the new nodes. If the new position is too far away, the following operations are performed: 1) Calculate the video segment serial number for the new jump position from the player interface. Assuming video segment serial number is g, we can calculate the prefetching unit’s serial number by ⎡g d ⎤ named as M which includes the video segment g. 2) Traverse the balanced binary tree, starting from the root node R; 3) If M is less than Seg(R), the search pointer goes to R’s left child, or otherwise goes to R’s right child. The above operations are repeated until the encountered node J, Seg(J) equals M, J is the target searching node. 4) List the nodes of node J’s right subtree by inorder traversal. These nodes send prefetching buffer content to the node that jumps to another video position when playback. If the continuous nodes have the same serial number for prefetching unit, we always choose the node that has the maximal usable bandwidth to supply the prefetching buffer content. 1 1 2 3 3 4 4 5 6 7 2 8 9 10 11 12 13 14 15 Fig.2 Jump operations In fig. 2, assuming node 5 jumps when playback, the target video segment falls in node 3’s prefetching buffer by searching in BBTP’s balanced binary tree. The nodes in the subtree of node 3 will be arranged by inorder traversal, that is 3, 14, 7, 15, which is just the continuous playback streaming which can seen from fig.1 and those nodes send their prefetching buffer content to node 5. Since the height of the balanced binary tree is O(logN), the cost for a joining operation and random seek are thus bounded by O(logN). 4 Performance Evaluation 4.1 Simulation setting In this section, we evaluate the performance of BBTP in simulation. The source media server has ten videos for streaming, each with 256 Kbps rate and 2-h length. The length of a segment (or a time unit) is 1 s, and the prefetching buffer at a node can accommodate 720 segments, i.e., 10% of a video stream. The underlying network topology is generated using the GT-ITM package [9], which emulates the hierarchical structure of the internet by composing interconnected transit and stub domains. The network topology for the presented results consists of ten transit domains, each with twelve transit nodes, and a transit node is then connected to six stub domains, each with nine stub nodes. The total number of nodes is thus 6,600. We assume that each node represents a local area network with plenty of bandwidth, and routing between two nodes in the network follows the shortest path. The initial bandwidth assigned to the links is as follows: 1.5 Mbps between two stub nodes, 6 Mbps between a stub node and a transit node, and 10 Mbps between two transit nodes. We will also inject cross traffic in the experiments to emulate dynamic network conditions. 119 To mitigate randomness, each result presented in this section is the average over ten runs of an experiment. 4.2 Performance and Comparison Number of routing hops (1)Random Seek We evaluate the random seek performance of BBTP compared with P2VoD[10] and VMesh. P2VoD organizes nodes into multi-level clusters according to their joining time, and the data stream is forwarded along the overlay tree built among the peers. Each host receives data from a parent in its upper cluster and forwards it to its children in its lower cluster. A new node tries to join the lowest cluster or forms a new lowest cluster. If it fails to find an available parent from the tree and the server has enough bandwidth, it directly connects to the server. In our experiment, we use Smallest Delay Selection for P2VoD’s parent selection process. And, we set the system parameter K = 6 and the cache size same with BBTP's prfetching buffer. We build VMesh on top of a public Chord implementation and set the length and bit rate of the video and each segment length same with BBTP’s. 60 50 40 P2VoD VMesh BBTP 30 20 10 0 1000 2000 3000 4000 5000 Number of nodes 6000 Fig 3 The cost for random seek We can simulate the random seek operation by searching a video segment. The routing hop-count can become the cost for searching. As figure 3 shows, the cost for random seek in P2VoD increases almost linearly with the group size, while that in VMesh and BBTP only increases in logarithmic scale. In VMesh and BBTP, a new node can quickly locate nodes with the first several segments through DHT routing or searching in balanced binary tree. The searching time for VMesh and BBTP is O(logN) where N is the number of nodes in the system. The latency is hence significantly reduced. (2) Streaming quality To playback continuity is critical for streaming applications. We adopt the Segment Missing Rate (SMR) as the major criterion for evaluating streaming quality. A data segment is considered missing if it is not available at a node till the play-out time, and the SMR for the whole system is the average ratio of the missed segments at all the participating nodes during the simulation time. As such, it reflects two important aspects of the system performance, namely delay and capacity. For comparison, we also simulate an existing on-demand overlay streaming system, oStream [11], with the same network and buffer settings. oStream employs a pure tree structure, in which each node caches played out data and relays them to its children of asynchronous playback times. A centralized directory server is used to maintain the global information of the overlay, and facilitates node join or failure recovery. Firstly, we investigate the performance of BBTP under dynamic network environments with bandwidth fluctuations. To emulate bandwidth fluctuations, we decrease the bandwidth from 100% to 64% of the base setting. As figure 4 shows, the SMR of BBTP, VMesh, oStream increases with decreasing the bandwidth. However, the increasing rate for VMesh, BBTP is generally lower than that of oStream. VMesh is lower than BBTP, but it is not obvious. Secondly, we investigate the performance of BBTP under random seek with seek rate fluctuations. For oStream , random seek can be implemented by letting the node leave the system and then re-join with the new playback offset. We set the random seek rate as the average ratio of the total nodes which random seek occurs to the total nodes of the whole system. As Fig. 5 shows, when 8% nodes of system random seek occurs, the SMR of BBTP less than 10%, the SMR of VMesh equals 10% and the SMR 120 of oStream has reached 30%. So BBTP is an efficient interactive streaming service architecture in P2P environment. 0.4 oStream VMesh BBTP 0.7 0.6 Segment Misssing Rate Segment Misssing Rate 0.8 0.5 0.4 0.3 0.2 0.1 oStream VMesh BBTP 0.3 0.2 0.1 0 0 0 0 4 8 12 16 20 24 28 32 36 1 2 3 4 5 6 7 8 Percentage of Random Seek Nodes Bandwidth Reduction(%) Fig. 4 The impact of dynamic network Fig. 5 The impact of random seek rate 5 Conclusion and Future Work This paper proposed a distributed and Balanced Binary Tree-based Prefetching strategy (BBTP) to support random seek for P2P on-demand streaming. Simulation and comparison shows BBTP is an efficient interactive streaming service architecture in P2P environment. It supports random seek quickly and plays back smoothly under dynamic network conditions and random seek. Further research for BBTP includes the recovering algorithm with the leaving or failure of nodes. References [1]Hefeeda M, Bhargava B (2003, May) On-demand media streaming over the Internet. In: Proc. IEEE FTDCS’03, San Juan, Puerto Rico [2]Guo Y, Suh K, Kurose J, Towsley D (2003, May) P2Cast: peer-to-peer patching scheme for VoD service. In: Proc. WWW’03, Budapest, Hungary. [3]Do T, Hua KA, Tantaoui M (2004, June) P2VoD: providing fault tolerant video-on-demand streaming in peer-to-peer environment. In: Proc. IEEE ICC’04, Paris, France. [4]Sheng-Feng Ho, Jia-Shung Wang, “Streaming Video Chaining on Unstructured Peer-to-Peer Networks”, master thesis, 2003. [5]Zhang X, Liu J, Li B, Yum T-SP (2005, March) CoolStreaming/DONet: a data-driven overlay network for live media streaming, to appear In: Proc. IEEE INFOCOM’05, Miami, FL,USA. [6]Changxi Zheng, Guobin Shen, Shipeng Li, "Distributed prefetching scheme for random seek support in peer-to-peer streaming applications", Proceedings of the ACM workshop on Advances in peer-to-peer multimedia streaming P2PMMS'05, November2005 [7]W.-P. Ken Yiu, Xing Jin, S.-H. Gary Chan, Distributed Storage to Support User Interactivity in Peer-to-Peer Video Streaming, Communications, ICC '06, IEEE International Conference on, June 2006, page: 55-60. [8]D. E. Knuth. The Art of Computer Programming,volume 3. Addison-Wesley Professional, 1998. [9]Zegura E, Calvert K, Bhattacharjee S (1996, March) How to model an internetwork. In: Proc. IEEE INFOCOMM, San Francisco, California, USA. [10]T. T. Do, K. A. Hua, and M. A. Tantaoui, “P2VoD: Providing Fault Tolerant Video-on-Demand Streaming in Peer-to-Peer Environment,” in Proceedings of IEEE International Conference on Communications (ICC), Paris, France, jun 2004 [11]Cui Y, Li B, Nahrstedt K (2004, January) oStream: asynchronous streaming multicast. IEEE J Sel Areas Commun 22:91–106. [12] Liu Wei , ChunTung Chou, Cheng Wenqing, “Caching for Interactive Streaming Media”, Journal of Computer Research and Development, 43(4):594~600,2006. 121 Parsing Student Text using Role and Reference Grammar Elizabeth Guest Innovation North, Leeds Metropolitan University, Headingly Campus, Leeds [email protected] Abstract Due to current trends in staff-student ratios, the assessment burden on staff will increase unless either students are assessed less, or alternative approaches are used. Much research and effort has been aimed at automated assessment but to date the most reliable method is to use variations of multiple choice questions. However, it is hard and time consuming to design sets of questions that foster deep learning. Although methods for assessing free text answers have been proposed, these are not very reliable because they either involve pattern matching or the analysis of frequencies in a “bag of words”. In this paper, we present work for the first step towards automatic marking of free text answers via meaning: parsing student work. Because not all students are good at writing grammatically correct English, it is vital that any parsing algorithm can handle ungrammatical text. We therefore present preliminary results of using a relatively new linguistic theory, Role and Reference Grammar, to parse student texts and show that ungrammatical sentences can be parsed in a way that will allow the meaning to be extracted and passed to the semantic framework. Keywords: Role and Reference Grammar, Parsing, Templates, Chart Parsing 1 Introduction In the current climate of increasing student numbers and decreased funding per student in many HEIs internationally, it is necessary to find economies of scale in teaching and supporting undergraduate students. Economies of scale are possible to a certain extent for lectures and tutorials, but this is less possible for assessment. The main solution to this dilemma is to mark student work automatically using variations on multiple choice questions. If designed correctly, these kinds of tests can provide students with immediate feedback on how well they are doing and can provide valuable formative pointers for further learning. This kind of feedback can impact positively on student learning and retention [1] [2] [3], but it can be difficult to design if we want to avoid encouraging inappropriate behaviour, such as random guessing of answers. Considerable work has been undertaken in recent years to investigate and implement methods for automatic marking of free text answers. These methods generally either involves pattern matching [4] [5] or latent semantic analysis [6] [7], or a combination of these [8]. These methods work to a certain extent, but because they are not based on the meaning of the text, they are quite easy to fool. For instance latent semantic analysis can be fooled by writing down the right kinds of words in any 122 order. The problem with current approaches to pattern matching on the other hand, is that if the student writes down a correct answer in a different way, it will be marked wrong. In this work we describe a method for using the Role and Reference paradigm (RRG) for parsing student texts, which do not have to be grammatically correct. RRG [9] [10] is a relatively new linguistic theory which is related to functional grammar. It separates the most vital parts of the sentence from the modifiers, which means that the core meaning can be extracted first and then the modifiers fitted in at a later stage. As long as the arguments and the verbs are in the correct order for English then the sentence can be understood. It doesn’t matter if (for example) Chinese students forget the articles, the sentence can still be parsed and the meaning extracted. 2 Parsing The main constituents of RRG parsing are the use of parsing templates and the notion of the CORE. A CORE consists of a predicate (generally a verb) and (normally) a number of arguments. It must have a predicate. Everything else is built around one or more COREs. Simple sentences contain a single CORE; complex sentences contain several COREs. The fact that RRG focuses on COREs, means that the semantics is relatively easy to extract from a parse tree. You just have to look for the PRED, and ARG branches of the CORE to obtain the predicate (PRED) and the arguments (ARG). Examples of RRG parse trees of real student sentences are given in figure 1. Notice that in these examples, the word “would” does not feature in the parse tree, but it is linked to the verbs “recommend” and “provide”. This is because it is an operator. Similarly the adjectives “representative” and “stratified” are attached to their nouns, “sample” and “sampling”. An important feature of RRG from a parsing point of view is that parsing happens in two projections: the constituent projection, shown in figure 1 and the operator projection, which consists of words which modify other words (such as auxiliaries and adjectives). This is important because modifiers are often optional and it simplifies the parsing process considerably if these can be handled separately. Note that adverbs, which can modify larger constituents (such as COREs and CLAUSEs) go in the constituent projection so that it is clear what they are modifying. PERIPHERY’s feature in both of these examples. In the second example, the PERIPHERY modifies the CLAUSE to tell the reader what is believed. This is another useful feature of RRG to enable meaning to be extracted easily. In the first example the PERIPHERY is attached to the CORE. In RRG theory, this should really be attached to the 2nd argument because that is what it is modifying. However, we need to analyse the meaning in order to find out what it should attach to. So in this implementation of RRG parsing, we have made a design decision to attach such structures to the CORE. RRG makes extensive use of templates. These templates consist of whole trees and are thus harder to use in a parsing algorithm than rules. The templates can easily be reduced to rules, but only at a loss of much important information. The first example in figure 1 consists of one large template that gives the overall structure and some simple templates (which are equivalent to rules) so that elements such as NP and PP can be expanded. An NP is a noun phrase and in this theory consists of a noun, pronoun, or question word. Templates are required to parse complex noun phrases, such as those with embedded clauses. A PP is a prepositional phrase and consists of a preposition followed by a NP. Clearly if we reduce the large template in the example in figure 1 to the rule CLAUSE → NP1 V2 NP ADV/PP 123 a lot of the information inherent in the structure of the template is lost. A further feature of RRG is that the branches of the templates do not have to have a fixed order and lines are allowed to cross. The latter is important for languages such as German and Dutch where the adverb that makes up the periphery normally occurs within the core. This feature will be important in our application for marking work by students for whom English is not their first language. The above features pose challenges for parsing according to the RRG paradigm. We have overcome these challenges by making some additions to the standard chart parsing algorithm. The main innovations are • a modification to enable parsing with templates • a modification to allow variable word order. In addition, parsing also includes elements of dependency grammar to find operators and to determine which word they belong to. At present the most popular methods of parsing are HPSG [11-13] and dependency grammar [14-16]. HPSG is good for fixed word order languages and dependency grammar is good for free word order languages. The approach to parsing described below is novel in that is allows parsing with templates, and because of the range in flexibility of word order allowed. SENTENCE SENTENCE CLAUSE CORE ARG NP1 CORE-N PERIPHERY NUC2 ARG PRED2 CLAUSE PERIPHERY CORE CLAUSE ADV/PP NP PP V2 CORE-N P NP recommend NUC-N in CORE-N ARG NUC1 NP1 PRED1 ARG CORE-N NUC-N would PRO I N stratified sampling NUC-N NUC-N PRO N I clusters. V12 CORE NP1 believeCORE-N NUC-N would NUC2 ARG PRED2 NP V2 CORE-N provide NUC-N DEM this N a more representative sample. Figure 1: Example RRG parse trees. 2. 1 Outline of the parsing algorithm The parsing algorithm relies on correctly tagged text, for which we use Toolbox (available from www.sil.org/computing/toolbox). There are three parts to the parsing algorithm: 1. Strip the operators. This part removes all words that modify other words. It is based on a correct tagging of head and modifying words. This stage uses methods from dependency grammar and the end result is a simplified sentence. 2. Parse the simplified sentence using templates. This is done by collapsing the templates to rules, parsing using a chart parser and then rebuilding the trees at the end using a complex manipulation of pointers. The chart parser has been modified to handle varying degrees of word order flexibility. 3. Draw the resulting parse tree. Details of the extensions to the chart parser are given below. 2.2 Parsing Templates 124 Templates are parsed by collapsing all the templates to rules and then re-building the correct parse tree once parsing is complete. This is done by including the template tree in the rule, as well as the left and right hand sides. When rules are combined during parsing, we make sure that the right hand side elements of the instantiated rule, as represented in the partial parse tree, point to the leaves of the appropriate rule template tree. This is especially important when the order of the leaves of the template may have been changed. The reference number for the rule that has been applied is also recorded so that it can be found quickly. Modifying nodes, such as PERIPHERY, cause problems with rebuilding the tree. This is because such nodes can occur anywhere within the template, including at the root and leaf levels. Also, if we are dealing with a sub-rule whose root node in the parse tree has a modifying node, it is not possible to tell whether this is a hang-over from the previous template, or part of the new template. To solve this problem, modifying nodes have flags to say whether they have been considered or not. There is a potential additional problem with repeated nested rules because if processing is done in the wrong order, the pointers to the rule template tree get messed up. To overcome this problem, each leaf of a template is dealt with before considering sub-rules. 2.3 Parsing with fixed, free, and constrained word order There were two main problems to solve in order to modify the chart parser to handle varying degrees of word order flexibility: 1. Working out a notation for denoting how the word order can be modified. 2. Working out a method of parsing using this notation. (1) was achieved by the following notation on the ordering of the leaves of the template, treating the template as a rule: • Fixed word order: leave as it is. • Free word order: insert commas between each element {N,V,N} (Note that case information is included as an operator so that the undergoer and actor can be identified once parsing is complete.) • An element has to appear in a fixed position: use angular brackets: {N, <V>, ADV} this means that N and ADV can occur before or after v, but that V MUST occur in 2nd position. Note that this is 2nd position counting constituents, not words. • Other kinds of variation can be obtained via bracketing. So for example {(N, V) CONJ (N, V)} means that the N’s and V’s can change order, but that the CONJ must come between each group. If we had {(N,V),CONJ,(N,V)} Then the N’s and V’s must occur next to each other, but each group doesn’t not have to be separated by the CONJ, which can occur at the start, in the middle, or at the end, but which cannot break up an {N,V} group. 2.4 Modifications to the parsing algorithm. Parsing was achieved via a structure that encoded all the possible orderings of a rule. So for example the rule CORE→N, V, N would become 125 This means that N or V can occur in any position and N has to occur twice. The lines between the boxes enable the “rule” to be updated as elements are found. Using this schema, SENTENCE→(N,V) CONJ (N,V) would become In this case, the CONJ in the middle is by itself because it has to occur in this position as the grouping word order is fixed. The groupings of N’s and V’s show where the free word ordering can occur. To apply a rule, the first column of the left hand side of the rule is searched for the token. When the token is found, any tokens that do not match are deleted along with the path that leads from them. In the first example, after an N is found, we would be left with And in the second example, after an N is found we would be left with Note that in order for the rule to be satisfied, we must find a V and then a CONJ: there are no options for position 2 once the element for position 1 has been established. In this way, we can keep track of which elements of a rule have been found and which are still to be found. Changes in ordering with respect to the template are catered for by making sure that all instantiated rules point back to the appropriate leaves of the rule template, as described above. The different possibilities for each rule are obtained via a breadth first search method that treats tokens in brackets as blocks. Then the problem becomes one of working out the number of ways that blocks of different sizes will fit into the number of slots in the rule. 3 Results Preliminary results of applying these algorithms to student texts are very promising, but some issues have been highlighted. The method parses relatively simple sentences correctly and the main arguments and verbs are found. In addition, some very long and complicated sentences are parsed correctly and many kinds of grammatical errors do not cause any problems. An example of a correctly parsed sentence is “I would target main areas populated by students and would attend the same place at different times and during the day.” The parse tree for this example 126 is given in figure 2. Note that the complex object “main areas populated by students” has been parsed correctly and that the tree attaches the qualifying phrase to “area” so that it is clear what is being qualified. An important source of ambiguity in English sentences is caused by prepositional phrases and this is a main cause of multiple parses of a sentence. In this example, the phrases “at different times” and “during the day” are placed together in the periphery of the CORE, although arguably they should have a different structure. This is a design decision to limit the number of parses. This kind of information needs semantic information to sort out what attaches to what. This cannot be obtained purely from the syntax. An example of an ungrammatical sentence that is correctly parsed is “Results from the observations would be less bias if the sample again was not limit the students in the labs between 9:30 and 10:30 on a Thursday morning.” for which the parse tree is given in figure 3. This sentence parses correctly because the affix that should be on “limit” is an operator and the correctness of the operators is not checked during the parsing process. The word “bias” is labelled as a noun and gets attached as the second argument to “would be”, although it should be “biased”, which would get it labelled as an adjective. Despite these errors, the meaning of the sentence is clear and the parse will enable the meaning to be deduced. The sentence “Therefore, asking only the students present on a Thursday morning will exclude all the students that either have no lessons or are not present” produces two parses: once correct and one incorrect. The incorrect parse breaks up “Thursday morning” to give two clauses: (1) “Asking only students present on a Thursday” and (2) “Morning will exclude all the students that either have no lessons or are not present” In the first clause, the subject is “asking only students”, the main verb is “present” and the object is “on a Thursday morning”. This does not make sense, but it is syntactically correct as far as the main constituents are concerned. Similarly, the second clause is also syntactically correct, although it does not make sense. There are two ways of eliminating this parse. The first is to do a semantic analysis; the second is to not allow two clauses juxtaposed next to each other without punctuation such as a comma. However, students tend to not be very good at getting their punctuation correct. The current implementation of the parsing algorithm ignores all punctuation other than full stops for this reason. In fact, there is a tradeoff between allowing the system to parse ungrammatical sentences and the number of parse trees produced. More flexibility in grammatical errors increases the number of parse trees. An issue that makes parsing problematic is that of adverbs. These tend to be allowed to occur within several places within the core and some, such as yesterday, modify groups of words rather than a single word. The best solution, given their relative freedom of placing and the fact that sorting out where best to put them is more a meaning than a syntactic issue, would be to remove them and work out where they belong once the main verb and arguments have been identified. Most of the above issues have to be left to an analysis of meaning to sort out the correct parse. There is no clear division between syntax and semantics. However there is another issue that has been highlighted to do with grammar and punctuation. How tolerant of errors should the system be? We have shown that errors in the operators do not cause problems for the parser, and errors in the placing of adverbs are relatively easy to deal with, but errors in the main constituents are not handled. For example the phrase “the main people you need to ask will not be in the labs so early unless that have got work to hand in” occurs in one of the texts. The current algorithm will not handle these kinds of mistakes. But should the system be able to handle these kinds of mistakes, or should students be encouraged to improve their writing skills? 127 128 NUC-N Results main populated by NP students PP-BY PERIPHERY PRED2 P-BY NUC2 N areas CLAUSE CORE-MIN NUC-N PERIPHERY LNK and CONJ the CORE-N from BE would be less the ARG bias N NUC-N sample again was not limit V2 PRED2 NUC2 CORE the would students N NUC-N in CORE-N P NP ARG attend PRED2 NUC2 Figure 3: An example of a correctly parsed ungrammatical sentence. observations N NUC-N NP P PP NP CORE-N PRED V-AUX NP1 CORE-N if CLAUSE ARG NUC ARG PERIPHERY CORE LNK CLAUSE SENTENCE Figure 2: An example of a correctly parsed sentence. target V2 would NP CORE-N PRED2 I ARG NUC2 CORE ARG CLAUSE SENTENCE the PP the P place labs NP NUC-N 9:30 and CONJ 10:30 N NUC-N ADV/PP PERIPHERY NUC-N LNK N PP P P on a Thursday N NP the PP day. N NUC-N CORE-N NP morning. N NUC-N CORE-N CORE-N NUC-N PP CORE-N during NP ADV/PP PERIPHERY different times CORE-N PP NUC-N at CORE-N P NP ARG CORE-N between NP same CORE-MIN 4 Conclusion We argue that this approach, though still under development, potentially has huge benefits for students and staff in higher education and could, with further improvements, form one building block in constructing a new paradigm for CAA. Our intention is to use this as the first stage in a system that uses a new semantic framework, ULM (Universal Lexical Metalanguage) [17], to compare the meaning of student texts with a (single) model answer. ULM would enable us to convert text to a meaning representation. The aim is to build up a meaning representation from several sentences and then compare the meaning of the student text with the model answer – even when the words used are not the same. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] 17. Rust, C. (2002). The Impact of Assessment on Student Learning. Active Learning in Higher Education 3(2): p. 145-158. Sambell, K. and A. Hubbard, (2004). The Role of Formative 'Low Stakes' Assessment in Supporting Non-Traditional Students' Retention and Progression in Higher Education: Student Perspectives. Widening Participation adn Lifelong Learning. 6(2): p. 25-36. Yorke, M.(2001). Formative Assessment and its Relevance to Retention. Higher Education Research and Development. 20(2): p. 115-126. Sukkarieh, J.Z., S.G. Pulman, and N. Raikes. (2003). Auto-marking: using computational linguistics to score short, free text responses. in International Association of Educational Assessment. Manchester, UK. Sukkarieh, J.Z., S.G. Pulman, and N. Raikes.(2004). Auto-Marking 2: An Update on the UCLES-Oxford University research into using Computational Linguistics to Score Short, Free Text Responses. in International Association of Educational Assessment. Philadephia. Wiemer-Hastings, P. (2001). Rules for Syntax, Vectors for Semantics. Proceedings of 22nd Annual Conference of the Cognitive Science Society. Landauer, T.K., et al. (1997). How well can Passage Meaning be Derived without using Word Order? A Comparison of Latent Semantic Analysis and Humans. Proceedings of 19th Annual Conference of the Cognitive Science Society p. 412-417. Pérez, D. and E. Alfonsa. (2005). Adapting the Automatic Assessment of Free-Text Answers to the Students. in 9th Computer Assisted Assessment Conference. Loughborough, UK. Van Valin, R.D.J. and R. LaPolla, (1997). Syntax: Structure, Meaning and Function. Cambridge: Cambridge University Press. Van Valin, R.D.J. (2005). Exploring the Syntax-Semantics Interface. Cambridge University Press. Hou, L. and N. Cercone, (2001). Extracting Meaningful Semantic Information with EMATISE: an HPSG-Based Internet Search Engine Parser. IEEE International Conference on Systems, Man, and Cybernetics. 5: p. 2858-2866. Kešelj, V.(2001). Modular HPSG. IEEE International Conference on Systems, Man, and Cybernetics. 5: p. 2867-2872. Wahlster, W. (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer. Covington, M.A. (2003). A Free Word Order Dependency Parser in Prolog. Chung, H. and H.-C. Rim, (2004). Unlexicalized Dependency Parser for Variable Word Order Languages based on Local Contextual Pattern. Lecture Notes in Computer Science: Computational Linguistics and Intelligent Text Processing (5th International Conference CICLING). 2945: p. 112-123. Holan, T. (2002). Dependency Analyser Configurable by Measures. Text, Speech and Dialogue 5th International Conference TSD, 2002: p. 81-88. Guest, E. and R. Mairal Usón, (2005). Lexical Representation Based on a Universal Metalanguage. RAEL, Revista Española de Lingüística Aplicada. 4: p. 125-173. 129 Parallel Distributed Neural Network Message Passing System in Java Stephen Sheridan 1 1 Institute of Technology Blanchardstown, Blanchardstown Rd. North, Dublin 15 [email protected] Abstract Many attempts have been made to parallelize artificial neural networks (ANNs) using a wide variety of parallel hardware and software methods. In this work we employ a parallelized implementation of the Backpropagation (BP) learning algorithm to optimize neural network weight values. A cluster of heterogeneous worksations is used as a virtual parallel machine, which allows neural network nodes to be distributed across several processing elements (PEs). Experimental results indicate that only small speed-ups can be achieved when dealing with relatively small network topologies and that communication costs are a significant factor in the parallelization of the BP algorithm. Keywords: Backpropagation, Distributed, Parallel, Workstation Cluster 1 Introduction Many attempts have been made to take advantage of the inherent parallel characteristics of Artificial Neural Networks (ANNs) in order to speed up network training [1, 2, 3, 4]. Most attempts can be categorised into algorithmic or heuristic approaches. Algorithmic approaches to parallelization focus on splitting the training algorithm into blocks of code that can execute in parallel on an appropriate parallel architecture. Heuristic approaches tend to focus on how the ANN behaves and on its architecture. Heuristic attempts at parallelization tend to take a trial and error approach based on knowledge of the ANN and of the target platform. In contrast, algorithmic attempts tend to take a more theoretic approach to the parallelization process. The focus of this paper will be to describe a heuristic parallel mapping for the Backpropagation Neural Network (BP). The mapping described uses a cluster of workstations as the target platform and implements a message passing system using Java and the User Datagram Protocol (UDP). This means that network nodes on the same layer can compute in parallel. In effect, this mapping allows the BP network to be split into vertical slices. Each slice of the network can reside on its own workstation (processing element), thus allowing network nodes to compute in parallel. 2 Mapping BP onto a message passing architecture Research into the BP training algorithm has revealed three possible parallel mappings commonly referred to as training set parallelism, pipeline and node parallelism. Training set parallelism is where the networks training data is distributed among a number of processing elements as described in the work carried out by King and Saratchandaran[5]. In pipelining the training data can be staggered between each layer of the network as discussed by Mathia and Clark [6]. Node parallelism allows the network to be distributed across a number of processing elements in vertical slices. For example, a fully connected network with the topology 4, 7, 3 (4 input, 7 hidden, 3 output) might be split into three vertical slices as shown in figure 1. 130 Figure 1: Possible vertical distribution of a 4, 7, 3 network Of the three approaches outlined, node parallelism represents a fine-grained approach whereas pipelining and training set parallelism represent a more coarse-grained solution. The PDNN architecture described in section 3 was built to carry out node parallelism, although only small modifications would be necessary to implement pipelining. 3 Parallel Distributed Neural Network (PDNN) Architecture The main goal of the PDNN architecture is to allow the processing that occurs during the training of a backprop network to be distributed over a number of processing elements. As this architectures target platform is a cluster of workstations, the processing elements are a group of networked heterogeneous workstations. The architecture was developed in Java so that the only software requirement on each processing element is a Java virtual machine and the PDNN software. Figure 2 shows an overview of the PDNN architecture and its components. Figure 2: PDNN: architecture overview 131 3.1 Overview of Architecture The PDNN architecture is comprised of a number of processing elements, a HTTP web server and a network monitor application. Each processing element executes a thread that listens on a specified port for incoming messages. When a processing element receives a message it passes it on to its node controller. The node controllers main responsibility is to act as a container for the network nodes on each processing element and to carry out computations on each node as specified by the message received. The network topology and training data are globally available from a HTTP server on the physical network. The network topology and training data are stored as text files on the HTTP server. Each processing element reads the network topology and training data from the HTTP server when it starts up. Therefore, the neural network topology and the training problem can be easily changed from a central location. An example topology file is shown in table 1 . Entry 4 0.45 0.7 0.1 3 2 2 1 Description Number training patterns Learning rate Momentum term Error tolerance Number of network layers Size of input layer Size of hidden layer Size of output layer Table 1: Topology file structure In order to use the PDNN architecture a network monitor program must be run on one of the processing elements. The network monitor allows the user to specify how the neural network is to be distributed over the set of available processing elements. The distribution of network nodes depends on the problem to be solved and the proposed neural network architecture, so at present this must be carried out manually. However, the network monitor has been developed in such as way as to make it easy to replace this manual process with an appropriate load-balancing scheme or a genetic algorithm so that optimal configurations can be achieved [7]. The network monitor is also responsible for making sure that all the processing elements are synchronised. Synchronisation is very important because all processing elements must be in step. In other words, each processing element must conduct a forward and backward pass with the same input pattern data. When a forward and backward pass has been completed with the current input pattern data the network monitor informs all processing elements to move on to the next input pattern. Message ID 1 2 3 4-5 6 7 11 19 Description Signals PE to create nodes Signals PE to forward pass Signals PE to backward pass Signals PE to start training Signals PE to remove all nodes Signals PE print out its nodes Signals PE to return all weights to monitoring app Signals PE that training has finished Table 2: Backprop message overview 132 3.2 BackProp message protocol In contrast to traditional software implementations of the backpropagation training algorithm that are encoded in serial a manner using loops, the PDNN architecture encodes the training algorithm in a set of messages that are broadcast across the physical network to a group of PE’s running the PDNN software. A backpropagation message protocol was implemented so that each PE could interpret the messages it receives and process them accordingly. This protocol identifies a number of important features of the backpropagation algorithm such as the forward and backward pass as well as defining special messages that are used to synchronise training activity. Table 2 shows an overview of the backpropagation message protocol. During training, each message received by a PE contains the data components for either a forward or backward pass of the backprop algorithm. For example, the net input for any given output node will require N messages to be broadcast, where N is the number of PE’s used. The net input for the layer section PE(1,2) in figure 1 is given by netP E(1, 2) = i<N X M SGi ,j ·wi ,j (1) i=0 where N = the number of PE’s and 0 ≤ j <k P E(1, 2) k Each message contains data components equivalent to the individual net inputs of the nodes from which the message originated. Therefore, each M SGi ,j is equivalent to: M SGi ,j = f k<N X ! ni ,k ·wi ,k (2) k=0 where 0 ≤ i < N U M LAY ERS and 0 ≤ k < size of layer i and 1.0 where f (x) = 1.0+e −x 4 Testing the PDNN architecture Since the PDNN’s main goal is to adjust the network weights in parallel, it does not have a built in mechanism for verifying that the weights produced are valid. In order to ensure that the weights produced will actually work, the PDNN architecture signals all PE’s to return their weights back to the NetMonitor application at the end of the training phase. The NetMonitor application then stores all weights in a file that can be used with a serial version of the BP algorithm for verification. Since the overhead in using neural networks is the training phase and not the recall phase, it makes sense to use the weights generated by the PDNN architecture in a serial version of the BP algorithm. Three well known neural network data sets, XOR[8, 9], 2D Spiral Recognition[10] and Iris[11] were used to test the weight adjustment scheme in the PDNN architecture. Weights generated by the PDNN architecture were verified by running a number of test training sessions for each of the data sets listed above and then using the weights generated with a serial version of the BP algorithm. In all three cases the weights returned by the PDNN architecture performed well when used in recall mode with a serial version of the BP algorithm. 5 Experimental test with the XOR problem In this section we present some experimental data that was generated by running the PDNN architecture on the XOR problem with a 2,N,1 topology, where N varied between 10 and 100. Networks with varying 133 middle layer sizes were used to determine speed-up times against a serial version of the BP algorithm with the same network topology running on a single processing element. Each network topology was distributed across 1, 2, 4, 6, 8 and 10 workstations in order to find the optimal distribution if any. Each network was run a total of ten times on each workstation configuration in order to calculate the average training time for that setup. The physical environment for each experiment was setup using 10 Fujitsu Siemens 1GHz Intel based Window NT workstations with 512MB RAM and 100 MBit Ethernet Network Interface Cards. The underlying network used was a 100 MBit switched Ethernet Network. 5.1 Results As can be seen from the graph in figure 3, the training time for a network with 10 middle layer nodes increased almost linearly as it was distributed over more processors. This is not really surprising given the communication overheads associated with the BP algorithm. The graph shows a larger jump up in training times when moving from 1 PE to 2 PE’s. This is to be expected as there are no latency issues when running all the network nodes on a single processor. The situation for 20 middle layer nodes is not much better. Training times dramatically increase as more and more PE’s are used. There seems to be an anomaly around 6 PE’s where the training time peaks and then drops back down when 8 PE’s are used. This may be due to how the underlying physical network deals with the broadcast messages from each PE. One other feature of adding more PE’s is that the size of messages being broadcast actually decreases as each PE has less and less nodes. Large amounts of small messages is bad news for parallelisation as there is a network latency associated with each broadcast message. There is a slight speed-up in training times when 50 middle layer nodes are distributed across 2 PE’s. However, the speed-up does not continue as more PE’s are added. Once again, the anomalous situation between 6 and 8 PE’s can be seen. The upper end of the graph for 20 middle layer nodes and this graph are very similar. This would suggest that their is a point where the communication costs associated with the BP algorithm peaks. This may represent a mix of conditions that lead to the worst case scenario for the underlying network. The final set of experimental data using 100 middle layer nodes is slightly more promising. Two speed-ups are achieved over the training times for 1 PE. Training times drop from around 156 seconds on 1 PE to 140 seconds on 2 PE’s and then down to 137 on 4 PE’s. It is not surprising that two decreases in the training times are observed for a BP network with 100 middle layer nodes. With the increased amount of nodes, each PE must carry out more work and hence there is a better balance between the time spent processing and the time communicating. Figure 3: Experimental data for XOR problem 134 6 Analysis of Experimental Data This paper shows the reality of implementing the BP algorithm solving the XOR problem on a software based message passing system such as the purpose built PDNN architecture. The experimental data presented in section 5 confirms that the standard BP algorithm cannot take advantage of the parallel processing power of a cluster of workstations. This is due to the fact that network traffic negates any benefit that is to be gained by distributing the training phase over a number of workstations. Although a workstation cluster may reduce the completion time of a system, the benefits depend on how the message passing interfaces are designed. It is clear to see that without reducing the communications overhead of the BP algorithm it is difficult to achieve any significant speed-up in training times. While execution of neural networks on serial machines are linear, it would seem that when the BP algorithm is run in a message passing environment it completion time is non-linear. While the experimental data generated is interesting from the point of view that it is the first set of data generated by the PDNN architecture, it could not be used as a definitive argument against BP on a message passing architecture. This is because the XOR problem is not an ideal candidate for experimentation in a distributed parallel environment. A problem that requires a larger input and output layer would be better suited to experimentation in these conditions. It is most likely that any speed-up to be gained by the PDNN system will only be observed for large networks that can drain the processing resources of a conventional PC. It is obvious from the experimental data produced in section 5 that the communication cost versus the time spent processing for the BP algorithm is far too high. This communication cost must be reduced in order to observe any benefits. 7 Conclusion and Future Work We have shown that it is possible to implement a purpose built message passing architecture in Java that will allow the BP algorithm to distribute is training workload over a number of networked workstations. We have also confirmed that the weights returned by the PDNN architecture are valid by using them in the recall phase of a serial version of the BP algorithm solving a number of well known problems. While the experimental data produced for this paper only serves as an initial test of the PDNN architecture, it raises some very important questions. Such as how to reduce the communication overhead of BP and what types of neural network problems are suitable for experimentation. These questions will form a major part of the next phase of this project, which is to refine the PDNN architecture with a view to running further experiments in order to achieve a significant speed-up over serial implementations of the BP algorithm. Future work will include developing a modified version of the BP algorithm to cut communication costs. Research of, and selection of other neural network training algorithms that may be better suited to distribution across a number of workstations, such as, Differential Evolution, Spiking Neural Nets and Liquid-state Machines. Further work will also need to take into consideration the exact measurement of communication versus processing costs and will have to include metrics for network latency such as the standard PingPong and Jacobi tests carried out by Wang and Blum [1]. References [1] X. Wang and E. K. Blum, “Parallel execution of iterative computations on workstation clusters,” Journal of Parallel and Distributed Computing, vol. 34, no. 0058, pp. 218–226, 1996. [2] D. Anguita, S. Rovetta, M. Scapolla, and R. Zunino, “Neural network simulation with pvm,” 1994. [3] A. Weitzenfeld, O. Peguero, and S. Gutiérrez, “NSL/ASL: Distributed simulation of modular neural networks,” in MICAI, pp. 326–337, 2000. 135 [4] J. Lut, D. Goldman, M. Yang, and N. Bourbakis, “High-performance neural network training on a computational cluster,” in Seventh International Conference on High Performance Computing and Grid Computing (HPC Asia’04), 2004. [5] F. King and P. Saratchandran, “Analysis of training set parallelism for backpropagation neural networks,” Int J Neural Syst, vol. 6, no. 1, pp. 61–78, 1995. [6] K. Mathia and J. Clark, “On neural hardware and programming paradigms,” in International Joint Conference on Neural Networks, pp. 12–17, 2002. [7] S. W. Stepniewski and A. J. Keane, “Topology design of feedforward neural networks by genetic algorithms,” in Parallel Problem Solving from Nature – PPSN IV (H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.-P. Schwefel, eds.), (Berlin), pp. 771–780, Springer, 1996. [8] R. Bland, “Learning XOR: exploring the space of a classic problem,” Computing Science Technical Report CSM-148, University of Stirling, Dept of Computing Science and Mathematics, Department of Computing Science and Mathematics University of Stirling Stirling FK9 4LA Scotland, June 1998. [9] D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. MIT Press, 1986. [10] S. Singh, “2D spiral recognition with possibilistic measures,” Pattern Recognition Letters, vol. vol. 19, no. no. 2, pp. 141–147, 1998. [11] C. B. D.J. Newman, S. Hettich and C. Merz, “UCI repository of machine learning databases,” 1998. 136 Session 5a Wired & Wireless 137 138 The Effects of Contention among stations on Video Streaming Applications over Wireless Local Area Networks- an experimental approach Nicola Cranley, Tanmoy Debnath, Mark Davis Communications Network Research Institute, School of Electronic and Communications Engineering, Dublin Institute of Technology, Dublin 8, Ireland [email protected], [email protected], [email protected] Abstract Multimedia streaming applications have a large impact on the resource requirements of the WLAN. There are many variables involved in video streaming, such as the video content being streamed, how the video is encoded and how it is sent. This makes the role of radio resource management and the provision of QoS guarantees extremely difficult. For video streaming applications, packet loss and packets dropped due to excessive delay are the primary factors that affect the received video quality. In this paper, we experimentally analyse the effects of contention on the performance of video streaming applications with a given delay constraint over IEEE 802.11 WLANs. We show that as contention levels increase, the frame transmission delay increases significantly despite the total offered load in the network remaining constant. We provide an analysis that demonstrates the combined effects of contention and the playout delay constraint have on the video frame transmission delay. Keywords: Video Streaming, Multimedia, WLAN, Quality of Service 1. INTRODUCTION Streaming multimedia over wireless networks is becoming an increasingly important service [1] [2]. This trend includes the deployment of WLANs that enable users to access various services including those that distribute rich media content anywhere, anytime, and from any device. There are many performance-related issues associated with the delivery of time-sensitive multimedia content using current IEEE 802.11 WLAN standards. Among the most significant are low delivery rates, high error rates, contention between stations for access to the medium, back-off mechanisms, collisions, signal attenuation with distance, signal interference, etc. Multimedia applications, in particular, impose onerous resource requirements on bandwidth constrained WLAN networks. Moreover, it is difficult to provide QoS in WLAN networks as the capacity of the network also varies with the offered load. Packet loss and packets dropped due to excessive delay are the primary factors that have a negative effect on the received video quality. Real-time multimedia is particularly sensitive to delay, as multimedia packets require a strict bounded end-to-end delay. Every multimedia packet must arrive at the client before its playout time, with enough time to decode and display the contents of the packet. If the multimedia packet does not arrive on time, the playout process will pause and the packet is effectively lost. In a WLAN network, in addition to the propagation delay over the air interface, there are additional sources of delay such as queuing delays in the Access Point (AP), i.e. the time required by the AP to gain access to the medium and to successfully transmit the packet which may require a number of retransmission attempts. 139 Multimedia applications typically impose an upper limit on the tolerable packet loss. Specifically, the packet loss ratio is required to be kept below a threshold to achieve acceptable visual quality. For example, a large packet loss ratio can result from network congestion causing severe degradation of multimedia quality. Even though WLAN networks allow for packet retransmissions in the event of an unsuccessful transmission attempt, the retransmitted packet must arrive before its playout time or within a specified delay constraint. If the packet arrives too late for its playout time, the packet is effectively lost. Congestion at the AP often results in queue overflow, which results in packets being dropped from the queue. In this way, packet loss and delay can exhibit temporal dependency or burstiness [3]. Although, error resilient encoded video and systems that include error concealment techniques allow a certain degree of loss tolerance [4], the ability of these schemes to conceal bursty and high loss rates is limited. In IEEE 802.11b WLANs, the AP is usually the critical component that determines the performance of the network as it carries all of the downlink transmissions to wireless clients and is usually where congestion is most likely to occur. There are two primary sources of congestion in WLAN networks. The first is where the AP becomes saturated due to a heavy downlink load which results in packets being dropped from its transmission buffer and manifests itself as bursty losses and increased delays [5]. In contrast, the second case is where there are a large number of wireless stations contending for access to the medium and this results in an increased number of deferrals, retransmissions and collisions on the WLAN medium. The impact of this manifests itself as significantly increased packet delays and loss. For video streaming applications, this increased delay results in a greater number of packets arriving at the player too late for playout and being effectively lost. In this paper, we experimentally investigate this second case concerning the effects of station contention on the performance of video streaming applications. The remainder of this paper is structured as follows. Section 2 provides an analysis of the video clips used during the experiments. Section 2.1 and 2.2 describe the experimental test bed and experimental results respectively. We focus on a single video content type and show in detail how the delay and loss rates are affected by increased station contention. We show the effects of contention on the performance of the video streaming application for a number of different video content types. We provide an analysis that shows how the play out delay constraint and the number of contending stations affect the video frame transmission delay. Finally we present some conclusions and directions for future work in section 3. 2. VIDEO CONTENT PREPARATION AND ANALYSIS In the experiments reported here, the video content was encoded using the commercially available X4Live MPEG-4 encoder from Dicas. This video content is approximately 10 minutes in duration and was encoded as MPEG-4 SP with a frame rate of 25 fps, a refresh rate of one I-frame every 10 frames, CIF resolution and a target CBR bit-rate of 1Mbps using 2-pass encoding. Although a target bit rate is specified, it is not always possible for an encoder to achieve this rate. Five different video content clips were used during the experiments. DH is an extract from the film ‘Die Hard’, DS is an extract from the film ‘Don’t Say a Word’, EL is an extract from the animation film ‘The Road to Eldorado’, FM is an extract from the film ‘Family Man’, and finally JR is an extract from the film ‘Jurassic Park’. The video clips were prepared for streaming by creating an associated hint track using MP4Creator from MPEG4IP. The hint track tells the server how to optimally packetise a specific amount of media data. The hint track MTU setting means that the packet size will not exceed in the MTU size. It is necessary to repeat the experiments for a number of different video content types since the characteristics of the streamed video have a direct impact on its performance in the network. Each video clip has its own unique signature of scene changes and transitions which affect the time varying bitrate of the video stream. Animated videos are particularly challenging for encoders since they generally consist of line art and as such have greater spatial detail. 140 Clip DH DS EL FM JR Mean Packet Size (B) 889 861 909 894 903 TABLE 1 CHARACTERISTICS OF ENCODED VIDEO CLIPS Frame Size (B) I-Frame Size (B) P-Frame Size (B) Mean Bit Rate (kbps) Max. Avg. Max. Avg. Max. Avg. 910 682 1199 965 1081 16762 12734 27517 17449 17299 4617 3480 6058 4903 5481 16762 12734 27517 17449 17299 7019 6386 14082 10633 8991 12783 10600 14632 15078 13279 812 713 1587 1188 1006 Peak-toMean Ratio 3.63 3.66 4.54 3.56 3.16 Table 1 summarizes the characteristics of the encoded video clips used during the experiments. The second column shows the mean packet size of the clip as it is streamed over the network and the third column shows the mean bit-rate of the video clip. The following columns show the maximum video frame size and the mean video frame size in bytes as measured over all frames, over I-frames only and P-frames only. Finally, the last column shows the peak-to-mean ratio of the video frames. It can be seen that despite encoding the video clips with the same video encoding parameters, the video clips have very different characteristics. Despite all the video clips being prepared with exactly same encoding configuration, due to the content of the video clips the mean and maximum I and P frames vary considerably in size. 2.1 EXPERIMENTAL TEST BED Fig. 1: Experimental Setup To demonstrate the effects of station contention on video streaming applications, the video server was set up on the wired network and streamed the video content to a wireless client via the AP (Figure 1). The video streaming system consists of the Darwin Streaming Server (DSS) [6] acting as the video server and VideoLAN Client (VLC) [7] as the video client. DSS is an open-source, standards-based streaming server that is compliant to MPEG-4 standard profiles, ISMA streaming standards and all IETF protocols. The DSS streaming server system is a client-server architecture where both client and server consist of the RTP/UDP/IP stack with RTCP/UDP/IP to relay feedback messages between the client and server. The video client VLC allowed the received video stream to be recorded to a file for subsequent video quality analysis. Both the video client and server were configured with the packet monitoring tool WinDump [8] and the clocks of both the client and server are synchronised before each test using NetTime [9]. However, in spite of the initial clock synchronisation, there was a noticeable clock skew observed in the delay measurements and this was subsequently removed using Paxson’s algorithm as described in [10]. The delay is measured here as the difference between the time at which the packet was received at the link-layer of the client and the time it was transmitted at the link-layer of the sender. 141 There are a number of wireless background load stations contending for access to the WLAN medium where their traffic load directed towards a sink station on the wired network. The background uplink traffic was generated using Distributed Internet Traffic Generator (D-ITG) [11]. The background traffic load had an exponentially distributed inter-packet time and an exponentially distributed packet size with a mean packet size of 1024B. To maintain a constant total background load of 6 Mbps, the mean rate of each background station was appropriately decreased as the number of background stations was increased. 2.2 RESULTS Video streaming is often described as “bursty” and this can be attributed to the frame-based nature of video. Video frames are transmitted with a particular frame rate. For example, video with a frame rate of 25 fps will result in a frame being transmitted every 40ms. In general, video frames are large, often exceeding the MTU of the network and results in a several packets being transmitted in a burst for each video frame. The frequency of these bursts corresponds to the frame rate of the video [12]. In a WLAN environment, the bursty behaviour of video traffic has been shown to results in a sawtooth-like delay characteristic [13]. Consider, a burst of packets corresponding to a video frame arriving at the AP. The arrival rate of the burst of packets is high and typically these packets are queued consecutively in the AP’s transmission buffer. For each packet in the queue, the AP must gain access to the medium by deferring to a busy medium and decrementing its MAC back-off counter between packet transmissions. This process occurs for each packet in the queue at the AP causing the delay to vary with a sawtooth characteristic. It was found that the duration and height of the sawtooth delay characteristic depends on the number of packets in the burst and the packet size. This is to be expected since when there are more packets in the burst, it takes the AP longer to transmit all packets relating to this video frame. To describe this sawtooth characteristic we have defined the Inter-Packet Delay (IPD) as the difference in the measured delay between consecutive packets within a burst for a video frame at the receiver. When there are no other stations contending for access to the medium, he IPD is in the range 0.9ms to 1.6ms for 1024B sized packets. This delay range includes the DIFS and SIFS intervals, data transmission time including the MAC Acknowledgement as well as the randomly chosen Backoff Counter values of the 802.11 MAC mechanisms contention windows in the range 0-31. This can be seen in Figure 2 where there is an upper plateau with 32 spikes corresponding to each of the possible 32 Backoff Counter values with a secondary lower plateau that corresponds to the proportion of packets that were required to be retransmitted through subsequent doubling of the contention window under the exponential binary backoff mechanism employed in the 802.11 MAC. Delay (ms) 16 14 12 10 IPDi 8 6 FTD 4 2 0 17 27 37 Sequence Number Fig. 2: PDF of the IPD with and without contention 142 Fig. 3: IPD and FTD Relationship As contention levels increase, all stations must pause decrementing their Backoff Counter more often when another station is transmitting on the medium. As the level of contention increases, it takes longer to win a transmission opportunity and consequently the maximum achievable service rate is reduced which increases the probability of buffer overflow. In these experiments, the nature of the arrivals into the buffer remains constant, i.e. only the video stream is filling the AP’s transmission buffer with packets, but by varying the number of contending stations we can affect the service rate of the buffer and thereby its ability to manage the burstiness of the video stream. This can be seen in Figure 2 where there is a long tail in the distribution of IPD values for the 10 station case. In this case, 10 wireless background traffic stations are transmitting packets to the wired network via the AP’s receiver. The aggregate load from these stations is held constant as the number of background stations is increased. For video streaming applications, not only is the end-to-end delay important, but also the delay incurred transmitting the entire video frame from the sender to the client. A video frame cannot be decoded or played out at the client until all of the constituent video packets for the frame are received correctly and on time. For this reason, in our analysis we also consider the video Frame Transmission Delay (FTD), i.e. the end-to-end delay incurred in transmitting the entire video frame and is related to the number of packets required to transmit the entire video frame and the queuing delay in the AP buffer for the first video packet in the burst to reach to head of the queue. Figure 3 shows the relationship between the IPD and FTD for two consecutive video frames. In our analysis, we also consider the loss rate and the Playable Frame Rate (PFR). The PFR is inferred by using the statistical techniques described in [14]. The loss rate corresponds to packets that have failed to be successfully received as well as those packets that have been dropped as a result of exceeding the Delay Constraint (Dc). If packets arrive too late exceeding Dc, these packets are effectively dropped by the player since they have arrived too late to be played out. 2.2.1 THE EFFECTS OF CONTENTION ON STREAMED VIDEO In this section, we experimentally demonstrate the effects of contention on video streamed applications. We shall begin by focusing on a single video clip DH being streamed from the wired network via the AP to a wireless client. This particular clip was chosen since it is representative of a typical non-synthetic video stream. Table 2 presents the mean performance values for the video clip DH over the test period with increased contention. It can be seen that the mean delay, loss rate, FTD and IPD increase with increased contention. In this work we have set the Dc to 500ms which is the delay constraint for low latency real-time interactive video. TABLE 2 MEAN PERFORMANCE VALUES FOR DH CLIP WITH INCREASED CONTENTION (DC = 500MS) 0STA 3STA 4STA 5STA 6STA 7STA 8STA 9STA 10STA 10.43 11.50 1.24 29.62 36.62 3.73 30.97 37.96 3.75 37.91 45.39 3.97 63.63 71.76 4.34 105.75 115.61 4.82 174.91 186.05 5.27 311.71 325.01 5.66 395.27 406.83 5.95 0.00 0.01 0.01 0.03 0.08 0.15 0.23 0.34 0.41 25.00 25.00 23.00 21.83 19.04 16.91 14.02 10.51 9.92 1500 1000 500 0 1000 0.5 500 0 0 3 5 7 Num STA 9 30.00 1 Loss Rate FTD (ms) Delay (ms) 1500 PFR (fps) Performance Metric Mean Delay (ms) FTD (ms) IPD (ms) Mean Loss Rate (Dc > 500ms) PFR (fps)(Dc > 500ms) 3 5Num STA 7 3 9 5 7 Num STA 9 20.00 10.00 0.00 3 5 7 9 Num STA Fig.4 Mean values for a number of video clips for a fixed total offered uplink load with increased number of contributing stations (a) Mean Delay, (b) Mean FTD, (c) Average loss rate with a Dc of 500ms, (d) Inferred PFR with a Dc of 500ms. 143 It can be seen that when there are no background contending stations, the mean packet delay is about 10ms. As the number of contending stations increases from 3 to 7 to 10, the mean delay increases to 30ms, to 100ms to 400ms respectively. This can be explained from the growing tail of the IPD distribution as shown in Figure 2. As the number of contending stations is increased from 3 to 7 to 10 stations with a Dc of 500ms, the mean loss rate including packets dropped due excessive delay is increased from 1% to 15% to 41% respectively. This in turn affects the ability of the codec to decode the video frames since there is increased likelihood that packets will not arrive within the given delay constraint. The experiment was repeated for the other video clips all encoded with the same encoding configuration but having different content complexity characteristics. Figure 3 shows the mean performance metrics for different content types with increased contention. For all content types, it can be seen that the mean packet delay and FTD increases with increased contention as shown in Figures 4(a) and 4(b). Figure 4(c) shows the mean loss rate over the test period for each of the video clips where it can be seen that there is a dramatic increase in the mean loss rate when the number of contending stations exceeds 7 stations when a delay constraint of 500 ms is imposed on the system which results in an even greater impact of the contention on performance. Figure 4(d) shows the PFR that is statistically inferred from the packet loss and delay. Apart from the impact of contention, Figures 4(a)-4(d) also highlight the impact of the video content where it can be seen that the animation clip EL is the most severely affected by increased contention whilst the clip DS is the least affected. The high complexity of the animation clip EL is due to frequent scene cuts and line art within the scene that affects the burstiness of the encoded video sequence since much more information is required to encode the increased scene complexity. 2.2.2 ANALYSIS In this section we shall generalize the results presented in the previous section to account for all content types and a given delay constraint. For video streaming applications, there is a tradeoff between acceptable delay and tolerable packet loss. A delay constraint imposes an upper limit on this tradeoff since the lower the delay constraint, the greater the probability of packets being dropped due to exceeding the delay constraint. 0 500 FTD (ms) 1000 1500 2000 1.00 2500 P(FTD < Dc) CCDF FTD 1 0.1 0.01 0.80 0.60 0.40 0.20 0 0.001 NO STA 5STA 8STA 3STA 6STA 9STA 4STA 7STA 10STA Fig.5 Generalized distribution of the FTD with increased contention 2 4 6 8 10 12 14 #Contending STA FTD Dc 500ms FTD Dc 1000ms FTD Dc 1500ms FTD Dc 2000ms Weibull <500ms Weibull <1000ms Weibull <1500ms Weibull <2000ms Fig.6 Fitted Weibull Distribution to CDF of FTD with Dc TABLE 3: CDF OF FTD BELOW THE PLAYOUT DELAY CONSTRAINT, DC Number of Contending Stations Dc (ms) 3STA 4STA 5STA 6STA 7STA 8STA 9STA 10STA 500 1000 1500 2000 2500 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.994 0.996 1.000 1.000 1.000 0.984 0.998 1.000 1.000 1.000 0.957 0.986 0.994 1.000 1.000 0.877 0.942 0.971 0.995 1.000 0.740 0.832 0.903 0.980 1.000 0.653 0.752 0.836 0.945 1.000 144 In our analysis we focus on the FTD since all or most of the packets belonging to a video frame packet burst must be received in order for the video frame to be decoded on the client device. Figure 5 shows the Complementary Cumulative Distribution Function (CCDF) of the FTD averaged over all content types with an increasing number of contending stations. For example consider a video streaming application with a Dc of 500ms, it can be seen that with 4 contending background stations, the FTD is always less than 500ms. However with 6, 8, and 10 background contending stations, statistically 2%, 12% and 35% of video frames will have an FTD that exceeds a Dc of 500ms. The statistical distribution of the FTD has been summarized in Table 3 which presents the CDF of the FTD for different values of Dc and with an increased number of contending stations. It can be seen that when there are 10 contending stations, with a Dc of 500ms 65% of video frames will arrive within this upper delay bound whereas 95% of video frames will arrive within a Dc of 2000ms. Figure 6 shows a plot of the fitted Weibull distribution to the probability of the FTD arriving within a given Dc with an increased number of contending stations. The Weibull distribution fit had a correlation coefficient of over 99.5% in all cases. The shape and scale parameters are related to the number of contending stations and the delay constraint of the video. This distribution can be used to provide statistical FTD guarantees by a resource management system to perform admission control to assess the impact of station association on the video streaming applications. Furthermore, adaptive streaming systems can use the statistical characterization of FTD to adaptively dimension the playout buffer on the client device or to adapt the number of packets per video frame i.e. the bitrate of the video stream based on current contention load conditions since by reducing the number of packets per video frame, the FTD is reduced. 3. CONCLUSIONS In this paper, we have experimentally investigated the effects of station contention on streaming video over IEEE 802.11b WLAN networks. Video is a frame-based media where video frames are transmitted from the server to the client at regular intervals that is related to the frame rate of the video. In general, several packets are required to transmit the video frame. The video frame cannot be decoded at the client until all the packets for the video frame have been received. In this way, loss and delay have a serious impact on the performance of video streaming applications. Loss can occur due to packets reaching their retransmission limit following repeated unsuccessful attempts and packets that are dropped due to incurring excessive delays resulting in them arriving too late to be decoded. Through experimental work, we have demonstrated that as the number of contending stations increases, while a maintaining a constant total offered load, the video streaming application experiences increased delays. These delays are due to the 802.11b MAC mechanism where stations must contend for access to the medium. As the number of stations contending for access to the medium increases, the AP must defer decrementing the Backoff Counter while another station is transmitting on the medium. Experimental results show that the performance degrades with increased contention despite the offered load in the network remaining the same. Furthermore we have shown that the complexity of the video content affects the degree of performance degradation. In our analysis we focused on the Frame Transmission Delay (FTD) which is the delay incurred transmitting the entire video frame from the server to the client. The FTD is important for video streaming applications since a video frame cannot be correctly decoded at the client until all of the packets relating to the video frame have been received within a given delay constraint. The delay constraint imposes an upper bound delay threshold for the video frames. Packets that exceed this delay constraint are effectively lost since they have not been received at the client in time for play out. We statistically analysed the results to determine the probability of the FTD being within a given delay constraint and have shown that this can be modeled as a Weibull distribution. This analysis can be used as part of a WLAN access control scheme or used in a cross-layer contention-aware video playout buffering algorithm. The QoS capabilities of the IEEE 802.11e QoS MAC Enhancement standard [15] facilitates new management mechanisms by allowing for traffic differentiation and prioritization. Work is ongoing [16] [17] with 802.11e standard. Further work is required to investigate the benefits for video streaming afforded by this standard. 145 ACKNOWLEDGEMENT The support of the Science Foundation Ireland, grant 03/IN3/1396, under the National Development Plan is gratefully acknowledged. REFERENCES [1] J. Wexler, “2006 Wireless LAN State-of-the-Market Report”, Webtorials, July 31, 2006, [Online]. Available: http://www.webtorials.com/abstracts/WLAN2006.htm [2] Insight Research Corp., “Streaming Media, IP TV, and Broadband Transport: Telecommunications Carriers and Entertainment Services 2006-2011”, Insight Research Corp., April 2006, [Online]. Available: http://www.insight-corp.com/reports/IPTV06.asp [3] S. Moon, J. Kurose, P. Skelly, D. Towsley. “Correlation of packet delay and loss in the Internet”. Technical report, University of Massachusetts, January 1998. [4] Y. Wang, S. Wengers, J. Wen, A.K. Katsaggelos, “Error resilient video coding techniques”, IEEE Signal Processing Mag., vol. 17, no. 4, pp. 61-82, July 2000 [5] N. Cranley, M. Davis, “The Effects of Background Traffic on the End-to-End Delay for Video Streaming Applications over IEEE 802.11b WLAN Networks”, 17th Annual IEEE Personal, Indoor and Mobile Communications, PIMRC Helsinki, Finland, September 2006 [6] Darwin Streaming Server, http://developer.apple.com/darwin/projects/streaming/ [7] VideoLAN Client, http://www.videolan.org/ [8] WinDump, http://windump.polito.it/ [9] NetTime, http://nettime.sourceforge.net/ [10] S. B. Moon, P. Skelly, D. Towsley, “Estimation and Removal of Clock Skew from Network Delay Measurements”, in Proc. of IEEE InfoComm’99, March 1999 [11] Distributed Internet Traffic Generator (D-ITG), http://www.grid.unina.it/software/ITG/download.php [12] A. C. Begen, Y. Altunbasak, "Estimating packet arrival times in bursty video applications," in Proc. IEEE Int. Conf. Multimedia and Expo (ICME), Amsterdam, The Netherlands, July 2005 [13] N. Cranley, M. Davis, “Delay Analysis of Unicast Video Streaming over WLAN”, 2nd IEEE International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2006, Montreal, Canada, June 2006 [14] N. Feamster, H. Balakrishnan, “Packet loss recovery for Streaming Video”, Proc. of 12th International Packet Video Workshop, April 2002 [15] IEEE STD 802.11e, September, 2005 Edition, IEEE Standards for Local and Metropolitan Area Networks: Specific requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications Amendment 8: Medium Access Control (MAC) Quality of Service Enhancements [16] Nicola Cranley, Mark Davis, “Video Frame Differentiation for Streamed Multimedia over Heavily Loaded IEEE 802.11e WLAN using TXOP”, IEEE PIMRC 2007, Athens, Greece, September 2007 [17] Nicola Cranley, Tanmoy Debnath, Mark Davis, “An Experimental Investigation of Parallel Multimedia Streams over IEEE 802.11e WLAN Networks using TXOP”, IEEE ICC 2007, Glasgow, Scotland, June 2007 146 !" #$ %&% '() *+&% ( ((( $ ,-$,.#/-.# ( #0 (( 1-. # 1 ( 1 #1 # % 23 ( (( ( ( % #-.-.# $, ( # # # %## ( # ( % # #2$ ( % (( ( $,% 2$ # % $ ((( $,% ( ( ( # %(( #0( 3( % 4" 23567$, 8 -.9:9;:9<:(( 0 * # => # # 3(( ( ( % 3 ( % # 3 ( ( #1( % # # ( ( (((#1% 3( #(%5( #1# ( %# ( ( ( -. ( # ( (( % ( % # ((# ( % # ( / !"# 3$,->< . !"#$ 3$,-> . 147 !"#$%?0$,->@A . #$%#! 3 # # # ->B(. ( ((( $, #/-. # ( # 0 ( ( 1 -. # 1 ( 1 #1 # %#-.-. $, # ( ## % 9C: ( $, $,%2% D (0 3((#$,%2 ( 567% ((E#% ;#1% < ( % C 0( % C ( B% @% & ! '( ) $ 9C: ( $, ( ( # % $ $,% ( ( # 1 % # ( ( %9B: ( # ( $, # 0E ( 1 ## E # $% $, %9@:# $, /-.( # 0 -;. ( (1 -<. ( # ( ( ( $% # $, 13 ((3$,(% 1 # ( # ( %9F:< ( $,% $,%20 2$ # % # ( 1% 9G: ( ( 2<=3 D$7 #1% (0E( # %(( 2$ # ( %9H: ( ( % $ # 2$ ( (% )#8! ( %>( ( ( (I(( 148 %5(( 1((# ((% ( 1( % JK / 1 # % 0( 1#(#(%,# 0 ( ( %20%$ % (( # % ( ( # > $ 0 ( ((% )#& ! *"+!", ( 1 #$, # 1% 0 (%20%$#1 % $, ( ( %$, ( $3(-$.$3(L-$L$. (% E # $,% # ( / M -<%. $ $L$ ( $3( -$.(%$((%5 $/ $% $ $L$ E / M-<%;. % ; -<%<. $,( MN -<%C. # $/ $%# $ $L$ # (#/ M N -<%B. M N -<%@. 5 OCOG(% #$,/ M N -<%F. ! # $, $,%2" # # $,%2% ! # $, $ $,%20" # # $,%20% % !"$,!###/ M &-<%G. !#$,$$,%20"$,%20##!#$,% !$ (! #$"'(# #$$L$"$!$,#*-<%B. -<%F.% 149 !) ! **-<%;.-<%@.#$,$!($% $ # ;B # 0% ($$(#@BBAA#!%# ) (% # # #1$ (" ((0O"!AA2%*$;$$ !;B% 8+!P,! !" #($$!;A(% (3;A $ $ # CC # 1 ! A-% # !$!($$$% ( G3AA($#B-$$#CA% !" 8#$. A P,! CC !!" 1 $8# . 8#/. /8#0. ; < G G< B B 8+P,! 08#8.. B CA ! 1#8 ( (! ! (2$ $,% !$'((!7;9:## = ! ># 9;: '( % ! 0(#C%$-#!$$ *$%($*$;% $+ 150 7 7 $ '( (% Q '( ( # % $"" $";" $;" $;"; % ( !$ # ( ## # (% 7 #$ *( 7$!A%B% 1#$ ! $$ !! ! $,% (2$ '( (! <@ #0%!@#$,% !AAA-!(!9:."FBA"BAA";BA"AABA%* ! $,% " ( # 0 # (2$ $$!B-!(!9:.A%#; # ! 2# ! # ! (2$ $,%2 % # ( $ 9F:9G:9H: ! $$ # $ # # (2$ !! $(% 3! 1. . 8 $ 5 1 8.. !"#3 !" $1. 1.. 41. 8... <F@C CHCH <GAC< <@HB <@CB@ <FC;C <GHB <;;G< ;HB;< ;HAHC <FF< ;HC ;HAC< ;GAC ;FHHG ;@G;@ ;HC ;HAC< ;GAC ;FHHG ;@G@C ;HCC ;HAC< ;GAC ;FHHG ;@G@F ;H;F ;HAC@ ;GAFH ;FHHG $# 4" 6"3! !"#3 <@HB ;GH<C ;FGAH ;FGAH ;FHHG ;@G@C !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# 151 !"# !"# !"# !"# !"# =$!#;"*$<*$G$( #!2#!!% $ !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# 5 $ !! ! ( '( (!" # $ ( $" (2$ ((%#(2$#A ;;@%B 2# ! % # (2$ # ; GH%; 2# FH%; 2# ( ($ G<%B- FH%C- ! ##(2$#A%#(2$# <"CB((0*(!$@G%H2#"@H% 2# @G 2# ( ($ FC%BF-" FC%@F- FC%;;- ! (!!(2$7A$% # $,%2 ( #$($"!!'((!$$#% $ # $,%2 ! BA AA H;%F@ 2#HA%B2#(%$#$,%2!;BA G@%B2#%<$#$,%2!BAA"FBA AAA FF%H" FF%B FB%@ 2# (% $ ! $,%2 !! (! !! $ # $,%2 # AAAHG%H-!!!!$#$,%7 BA% *$ H # $ # ! 2#!!$,%2AAA"FBA"BAA";BA" AABA!$!(2$!AB%$# $,%7AA(2$7A#!!% $ # $,%2 # AA" ;BA" BAA" FBA AAA !# $ # (! $ $! (2$ #%0(# #$,%2#BA" !B%<% $ "7%&!LQ#PV "7%&!LQ#PV "7%&!LQ#PV "7%&!LQ#PV "7%&!LQ# PV "7%&!LQ#PV 305 8# 4" 3! !"#3 152 1#5 !" 5 (2$ !! '( (! $,%2 ( ( $ $ 0( 567 * #!!!!(2$ $,% $% *$ A *$ B # ! 2# !$,%RBA"AA";BA"BAA"FBAAA! $!(2$!BA#;A% !"# !"# !"# !"# !"# !"# 3$' 3$' !"# !"# !"# !"# !"# !"# && && && && && && && 3$' && && && 3$' !"# !"# !"# !"# !"# !"# 3$' 3$' !"# !"# !"# !"# !"# !"# && && && && && && && && && && 3$' 3$' !"# !"# !"# !"# !"# !"# 3$' 3$' !"# !"# !"# !"# !"# !"# && && && && && && && && && && 3$' 3$' % ! *$ A *$ B $ $$ (! ! (2$ ! <" C B $ ! $,% % ! $$ (2$ " ( A (0% *$ < # (2$$!!#!##$,%BA% (B3@A$$#$#(2$<" C B # !!% !$ A !! ##$#(2$";<C"B@((% 0(##!( !$,%(2$!!#%#<# !#!(% 153 3! . 8 $ 5 1 1. %AG %;B %FA HG%@C SAA SAA 8.. <@%AH FA%HF SAA SAA SAA SAA !"#3 !" $1. @A%G; @H%@ SAA SAA SAA SAA 1.. @C%FG GA%;C SAA SAA SAA SAA 5#% 6"3! !"#3 41. @<%G G;%A SAA SAA SAA SAA 8... @C%AF G;%B@ SAA SAA SAA SAA 5 $,% BA (2$ A # ! %AG %#(11!! ;A#A%#!( # 567 (1 ( # 0 # $,%2 % #(2$;($#%;B %F(%5## (% !!#($(!%*$A ;3CA$$#$$ !!%*$<"#1((0@A ! $$ # $ $ ( # ! 2# % ! @A $ $ $ (! ! $$ $$ $ $" $$ #$#((!##% / * ( '( ! ((( $, !!##/-.$ $#$(!##$ 0 $ ( #! $ (1 -#. $ # 1 (1 # !$ #1 # % !!##-.-#.'( !!!$,#% $ ( (2$ !! '( (!%$$(2$!A"; # ! 0 $(% # ( #9F:9G:9H:%##"#$$(2$ ( # $,%2 ( ( # % >($*!#1"(% 567 ( ( # $! $ (! ! $ 0 #$ $ ( ! ! (1 % $ # $,%2 # ((($"!((#1 !$((($,#3% 9% #1#$((!!!!(!%$$ 6#-(%6.='$>#>#'=(($ % 154 ! 9:$%#":%T%%'("$*';H@A 9;: $% #" :% T %% ' (" > $!$"((02;AA@ 9<:2%$$"2%0"2#'("((!2;AA@ 9C:"2"(0"LJ,%$%33%7#1(((K" ($!!(("$"" (!(-HHH. 9B: 2" " 2E" ;" J,(E$ '( $ K" C '!7#1$"$ "*;AAB 9@: %1" <%R 6#$" $" J (13<((/ 7# %33% $ !$#=(K 7*,',2;AAC%LC" "F3 2;AAC($-./;BA;3;B<%C 9F:'"%6%"=%R"(%>%R#"$%$%J%33!! ($K2 6',2;AAC% %%%L"($-./HH3ABL% 9G:D"2"Q"";AAB"J (!!!6'(* D$7K" '!7#1$ 9H:'"""("#"$"J$1$%33%*#(6 2$K"!;AA@ 155 Performance Evaluation of Meraki Wireless Mesh Networks Xiaoguang Li1,2, Robert Stewart2, Sean Murphy3, Enda Fallon1, Austin Hanley1 and Sumit Roy4 1 Applied Software Research Center, Athlone Institute of Technology 2 Athlone Institute of Technology 3 University College Dublin 4 University of Washington [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Abstract Multi-hop networks using 802.11s are currently being standardized to improve the range and throughput of existing Wi-Fi networks. However, the performance of multi-hop networks in the indoor environment has to be investigated further to improve reliability and effectiveness. In this work, several tests have been carried out to investigate the throughput of multi-hop network equipment available from Meraki Corp. These tests are all in the indoor environment where cochannel interference can cause problems [1]. The application of this technology is instrumental in improving productivity for companies such as T5 Process Solutions for remote control and monitoring of the production equipment in large semiconductor fabs [2]. The results show the impact on throughput performance of co-channel interference from other access points in the vicinity. This is currently a major problem for users of 802.11b technology in areas with densely populated access points, forcing some users to migrate to the less congested 802.11a technology. Keywords: Wireless Lan, Multi-hop, Throughput, Mesh, Meraki, Roofnet. 1. Introduction An unprecedented popularity and growth in Wi-Fi networks in recent years has seen the technology being rolled out to Enterprises, public hotspots and for domestic use in the home, etc. The range of WLAN standards from the IEEE 802.11 working groups has also increased with 802.11b, 802.11a and 802.11g being the most common. The Meraki repeaters used in this experiment operate on the 802.11g standard. One of the main applications of the Meraki repeaters is to offer cheap simple solutions for connection between devices (mobile or fixed) and the Internet. The Meraki devices also provide increased range as the devices provide a multihop function. The IEEE 802.11 standard implements the bottom two layers of the OSI model: the Data Link layer and the Physical Layer. In the data link layer the Medium Access Layer (MAC) manages access to the network medium (e.g. CSMA CA) at the physical layer the communication method is defined (most common in 802.11 is Direct Sequence Spread Spectrum (DSSS) and Orthogonal Frequency Division Multiplexing (OFDM)) The IEEE 802.11b equipment operates in the 2.4GHz frequency band at a speed of 11Mbps and is a commonly used older standard. The IEEE 802.11a standard operates in the 5 GHz band at a maximum speed of 54 Mbps. However, because of the high frequency it uses, the coverage of 802.11a is smaller, also the attenuation due to obstructions maybe increased. The 802.11g which works in the frequency of 2.4GHz and is compatible with 802.11b has the advantage of the 156 speed of 54 Mbps. The IEEE 802.11n has recently been standardized. The advantage of 802.11n is its high speed with a maximum throughput of 108 Mbps by using multiple 802.11g channels. The IEEE 802.11s standard which has received a lot of attention lately will use mesh networking techniques to extend the range of wireless LANs securely and reliably. The IEEE 802.11s standard will be expected to provide an interoperable and secure wireless distribution system between IEEE 802.11 mesh points. This will extend mobility to access points in IEEE wireless local area networks (WLANs), enabling new service classes to be offered to users as shown in Fig 1. Fig 1 Infrastructure of wireless mesh network [3] One current challenge is to wirelessly connect the AP’s to form an Extended Service Set (ESS) as shown in Fig 2. Connectivity is provided by the Basic Service Set (BSS) consisting of stationary Access points as shown in Fig 2. The mobile stations (STA) with a stationary AP, transmits data packets between the wired and the wireless network. Fig 2 the Architecture of 802.11 [4] 157 Recent developments in worldwide standardization bodies show the industry’s will to put products enabling mesh based Wireless Local Area Network (WLAN) on the market. The new technology allows for a transparent extension of the network coverage, without the need of costly and inflexible wires to connect the Access Points. The main part of our research is the performance study of 802.11g networks using wireless technology from MerakiTM and simulation. Meraki has commercially developed technology first researched in the Roofnet project [5] [6] and created management software called dashboard to allow the mesh network to be setup [7]. Each device periodically broadcasts and all the other devices in range will report their routes. Roofnet’s design assumes that a small fraction of users will voluntarily share their wired or wireless Internet Access. Each gateway acts as a Network Address Translation (NAT) for connection from Meraki to the Internet. The organization of the paper is as follows; in Section 2, we introduce some related work in mesh networking; in Section 3, presenting the experiment and the results of tests. This part has two experiments, which explain different aspects of Meraki mesh networking; Section 4 includes the conclusion and future work. 2 Related Work In [5], it evaluates the ability of wireless mesh architecture to provide high performance Internet access while demanding little deployment planning or operational management. One of its conclusions is throughput decreases with number of hops. In [8], it presented BFS-CA, a dynamic, interference aware channel assignment algorithm and corresponding protocol for multi-radio wireless mesh networks. BFS-CA improves the performance of wireless mesh networks by minimizing interference between routers in the mesh network and between the mesh network and co-located wireless networks. However, in contrast, this paper will present results from research on channel interference in Meraki mesh networks. The tests are performed in a small area in a normal office environment where interference is high, the throughput in a WLAN is also investigated. 3 Throughput Tests The throughput of the multi-hop network was investigated in the tests using tools AirPcap to monitor the traffic. As we mentioned above, the main application in wireless mesh network is to extend the coverage and the capacity. The indoor tests where performed to thoroughly investigate the co-channel interference [8] and the routes taken by packets. 3.1 Tools for monitoring Traffic in Wi-Fi To analyze the traffic for a specific wireless AP or station, the identity of the target device, the channel and frequency must be obtained. The wireless card is configured to use the same channel before initiating the packet capture. Wireless cards can only operate on a single frequency at any given time. To capture traffic from multiple channels simultaneously, an additional wireless card for every channel to be monitored is required. There are several network analyser tools for Wi-Fi: (1) Wireshark (http://www.wireshark.org/) Wireshark has sophisticated wireless protocol analysis support to troubleshoot wireless networks. With the appropriate driver support, Wireshark can capture traffic “from the air” and decode it into a 158 format to track down issues that are causing poor performance, intermittent connectivity, and other common problems. The software is provided free. (2) NetStumbler (http://www.stumbler.net/) NetStumbler (also known as Network Stumbler) is a tool for Windows that facilitates detection of Wireless LANs using the 802.11b, 802.11a and 802.11g WLAN standards. It is commonly used for verifying network configurations, finding locations with poor coverage in a WLAN, detecting causes of wireless interference and unauthorized ("rogue") access points. (3) Commview for Wi-Fi (http://www.tamos.com/products/commview/) CommView for Wi-Fi allows you to see the list of network connections and vital IP statistics and examine individual packets. Packets can be decrypted utilizing user-defined WEP or WPA-PSK keys and are decoded down to the lowest layer, with full analysis of the most widespread protocols. Full access to raw data is also provided. Captured packets can be saved to log files for future analysis. A flexible system of filters makes it possible to drop unnecessary packets or capture the essential packets. Configurable alarms can notify the user about important events such as suspicious packets, high bandwidth utilization, or unknown addresses. However a license is required. (4) AirMagnet (http://www.airmagnet.com/) AirMagnet's Laptop Analyzer is the industry's most popular mobile field tool for troubleshooting enterprise Wi-Fi networks. Laptop Analyzer helps IT staff make sense of end-user complaints to quickly resolve performance problems, while automatically detecting security threats and other network vulnerabilities. Although compact, Laptop Analyzer has many of the feature-rich qualities of a dedicated, policy-driven wireless LAN monitoring system. However the cost is prohibitive. Considering all of the above network analysers, Wireshark with Airpcap was chosen to monitor the traffic in Wi-Fi networks. Airpcap has been fully integrated with WinPcap and Wireshark:, and it enables the capture and analysis of 802.11b/g wireless traffic. 3.2 Test1: Co-channel Interference Co-channel interference or CCI is crosstalk from two different radio transmitters reusing the same frequency channel. There can be several causes of CCI. Overly crowded radio spectrum is one of the main reasons. Stations will be densely-packed in, sometimes to the point that one can hear two, three, or more stations on the same frequency. Co-channel interference, decreases the ratio of carrier to interference powers (C/I) at the periphery of cells, causing diminished system capacity, more frequent handoffs, and dropped calls [9]. Fig 3 shows the layout of channel assignment scheme for a typical campus to reduce the interference problems. Fig 3 Typical Channel Assignment for a Campus 159 Fig 4 shows the beacon frames captured from the Wireshark network analyzer of the network shown in Fig 5. Fig 4 Beacon Frame Message Captured from Wireshark Fig 5 shows the environment of the file transfer with the detailed information. Fig 5 Environment of File Transfer 160 Beacon Broadcast of Idle repeater Fig 6 Packet capture using Wireshark Fig 6 shows the detailed information obtained when using the network analyzer during the transfer of files with meraki repeaters. It is possible from the screenshot shown in Fig 6 to differentiate between repeaters transferring data (Refer to MerakiNe_01:14:aa and MerakiNe_01:14:3a) and repeaters in the vicinity (Refer to MerakiNe_01:1c:fc and MerakiNe_01:1c:fa) just transferring management frames. To avoid the channel interference from other Wi-Fi networks, it was arranged that the repeaters would work in channel 6 as shown in Fig 3. Other access points in the area operate on channel 1 and channel 11. Because an 802.11 WLAN is a shared medium, the impact of co-channel interference is increased by client collisions as the clients hear signals from the many APs and clients surrounding them [10]. Results are shown in Fig 8. In our test, we made all the repeaters to work in the same channel. To increase the hops, we add one Meraki repeater at a time at some known distance. Fig 7 shows the environment of the test. Fig 7 Throughput Test for Co-channel Interference 161 3.3 3.1 Throughput (Mbps) 2.9 3m 4m 5m 9m 11m 12m 2.7 2.5 2.3 2.1 1.9 1.7 0 1 2 3 4 Number of Repeaters 5 6 Fig 8 Test Result for Co-channel Interference We have used monitoring tools to trace the data packets to ensure the path taken by packets from the source to destination is correct. However, in the indoor environment, from our tests it was found that repeaters invariably attempt to access the root node unless otherwise directed. As the distance between root node and last node decreases, the throughput decreases also. Throughput is at a minimum 1.94 Mbps when the devices are places side-by-side. 3.3 Test2: Bandwidth Sharing Fig 9 Performance Analysis of Multi-hop Network Node 2 is at the fixed distance of 1m from the root node, and the distance of node 1 varied every time. Simply, we test the throughput of the root node. It can achieve the maximum of 7.143Mbps, and the average of 5.611Mbps. Then, we do the test with node1. Fig 10 shows the throughput as the distance between the root node and node1 increased, the throughput will increase also. As we can see with single node test in Fig 8, the throughput is less than the half of the root node. Considering this case, we presume that as two clients connected to the node simultaneously, the packet may pass the root node. To prove this point, we designed the following tests and monitored the packets passing through. 162 We added another node and two clients. Then we made every two clients connect to node1, and node2 respectively. And we did the throughput test simultaneously. Fig 9 shows the test environment. 2.5 Thoughput (Mbps) 2 Single node test 1.5 Throughput of node1 Throughput of node2 1 Sum of node1 and node2 0.5 0 0 2 4 6 Distance 8 10 12 (m) Fig 10 Test Result of Throughput From Fig 10, we can see that the sum of the sub-node throughput is almost the value of single node test. In other words, the both two nodes share the bandwidth. We have used the capture tool to monitor the packet passing through. We noticed that all the data packet have passed the root node. If the client1 want to transfer data packet to client2, the procedure will operate as below. Fig 11 Data Packet Procedure According to the procedure, we can see that in some point of view, the Meraki repeaters try to implement the extension of internet use. However, if transmitting the data in local network, the root node will become the bottleneck in this situation. 4 Conclusion and Future Work In this paper, we have done several throughput tests of WLAN mesh networks. Co-channel interference problems occur since the meraki repeaters work in the same channel which is 163 automatically arranged by the root node. All the repeaters have to share the bandwidth when the packets are transferred in the WLAN (However, currently this is a limitation of meraki product if you are accessing one host on the Meraki 10.x.x.x to another host on the Meraki 10.x.x.x. from August, 2007 [7]). These experiments show that the performance of Wireless Mesh Networks is influenced by the number of nodes in the vicinity, distance between nodes and the number of users. As explained throughout this article, there still remain many research problems to be investigated when looking at performance of wireless mesh networks. Acknowledgment This work is supported through an Innovation Partnership sponsored by Enterprise Ireland and T5 Process Solutions. References [1] J. Robinson and E. Knightly, "A Performance Study of Deployment Factors in Wireless Mesh Networks," in Proceedings of IEEE INFOCOM 2007, Anchorage, AK, May 2007. [2] www.t5ps.com [3] Ian.F. Akyildiz and Xudong Wang,"A Survey on Wireless Mesh Networks," IEE Communications Magazine, vol. 43, no. 9, s23-s30, Sept. 2005 [4] IEEE 802.11, 1999 Edition [5] John Bicket, Daniel Aguayo, Sanjit Bisvas, Robert Morris, "Architecture and Evaluation of an Unplaned 802.11b Mesh Network., " in proceedings of the 11th annual international conference on Mobile computing and networking. [6] MIT roofnet. http://www.pdos.lcs.mit.edu/roofnet/. [7] www.meraki.net/docs [8] K.Ramachandran, E.Belding, K.Almeroth, M.Buddhikot,"Interference-Aware Channel Assignment in Multi-Radio Wireless Mesh Networks," in Proceedings of IEEE INFOCOM, 2006 [9] Co-Channel Interference White Paper [10] “Revolutionizing Wireless LAN Deployment Economics with the Meru Networks Radio Switch,” www.nowire.se [11] “The Impact of IEEE 802.11 MAC Strategies on Multi-Hop Wireless Mesh Network”, In Proc. IEEE WiMesh 2006, Reston, Virginia, USA, Sept. 25, 2006 164 Session 5b Wired & Wireless 165 166 Embedded Networked Sensing – EmNetS Panneer Muthukumaran1, Rostislav Spinar1, Ken Murray1, Dirk Pesch1, Zheng Liu2, Weiping Song2, Duong N. B. Ta2, Cormac J. Sreenan2 1 Centre for Adaptive Wireless System, Cork Institute of Technology, Ireland {panneer.muthukumran, rostislav.spinar, ken.murray, dirk.pesch}@cit.ie 2 Mobile and Internet Systems Laboratory, University College Cork, Ireland {zl3, wps2, taduong, cjs}@cs.ucc.ie Abstract This paper presents the work under investigation within the Embedded Networked Sensing (EmNetS) project funded by Enterprise Ireland under the WISen industry/academia consortium. The project addresses four main research areas within the embedded networked sensing space, namely, protocol stack development, middleware development, sensor network management and live test bed implementation. This paper will provide an overview of the research questions being addressed within EmNetS, the motivation for the work and the proposed solutions under development. Keywords: Wireless Sensor Networking, Protocol Stacks, Testbed, Middleware, Network Management. 1 Introduction The WISen Industry/Academia Consortium has identified wireless sensor networks as a medium term target for Irish Industry and a range of application domains focusing on Utilities and Resource Management in the first instance. Market forecasts indicate that the global wireless sensor network market could be worth $8.2Bn by 2010. The market is currently US-led but there is growing demand in Europe in particular for applications in the utilities, health-care, and environmental monitoring domains. The EmNetS project aims to advance research in Ireland in the areas of wireless sensor networking software and live test bed implementation [1]. The project addresses the need to network a range of low power heterogeneous sensor devices to be used in the utilities and responsive building environments. Firstly the project aims at developing a network protocol stack for individual sensor nodes and for cluster controller/base station type nodes that need to inter-work with other networking technologies such as local area networks and mobile networks, e.g. GSM. The protocol stack will be based on the industrial adopted IEEE 802.15.4/Zigbee stack for low power sensing [2]. It is the aim of the protocol stack development to overcome some of the limitations of IEEE 802.15.4/Zigbee stack such as scalability, energy efficiency in large scale mesh networks, dynamic address assignment and energy efficient routing. It is envisaged within the environmental monitoring and responsive building environments the number of sensing devices can run to the order of hundreds to thousands. The current sensor networking standards are unable to support such large scale energy efficient deployments. A further part of the software infrastructure consists of a simple middleware layer that provides application programmers interfaces to allow rapid development of applications. The development of such a software platform will ease product development of sensor network applications. Sensor networks that provide a mission critical role require remote management facilities in order to monitor the correct operation of the network, query the status of individual nodes, and provide means to upload software updates. A remote management system is current being developed that will integrate with the protocol stack within a live system deployment. This test bed deployment 167 will provide for test and validation of networking protocols, as well as scalability and internetworking trials within the EmNetS team and the sensor network research community at a National level. This paper provides a technical overview of the current state of research and development activity within the EmNetS team. The challenges and proposed solutions under investigation will be presented in each of the aforementioned areas of research. 2 Protocol Stack Development Wireless sensing within the responsive building environment has been highlighted as the target application domain for the EmNetS project. In such sensing environments the area of deployment can be relatively large, for example the control of HVAC systems in multi-story buildings based on user location/density. To facilitate such large scale deployments, the sensing devices must cooperate to efficiently route data from source to a destination data sink which can be many hundreds of meters apart and contain multiple intermediate nodes. Mesh networking topologies can provide high redundancy for failed data links, provide scalable network topologies and provide the dynamic selection of alternative routes for high priority traffic. Within a mesh topology, data can be routed to fulfil requirements of energy efficiency, throughput, and quality of service (QoS). The deployment of energy efficient mesh wireless sensor networks is therefore desirable in the provisioning of services over large sensor fields. To enable energy efficient data transmission over sensor networks requires the use of energy efficient protocol stacks. Energy efficient algorithms should be present at each layer in the stack, in particular the MAC and NWK layers and cross layer interaction used to optimise performance. Techniques in the literature employ the transmission of beacon packets between transmitter and receiver to facilitate low duty cycle, energy efficient channel access in which devices transmissions are coordinated. With this strategy, devices can sleep between the coordinated transmissions, which results in energy efficiency and prolonged network lifetimes. Beacon scheduling is an important mechanism in multi-hop mesh networks to enable multiple beacon enabled devices function whilst avoiding beacon and data transmission collisions. 2.1 Distributed Beacon Synchronisation The IEEE 802.15.4 MAC standard for low duty cycle, low data rate devices is the most significant commercially adopted MAC protocol to date [2]. The standard however does not specify techniques by which the synchronisation of beacon packets is to be achieved to enable low duty cycle functionality. Furthermore, the standard specifies that to enable mesh topologies, the router devices within the network need to be line-powered and engage in idle listening. Recent proposals exist in the literature for low duty cycle MAC protocols are based on the channel polling, low power listening technique [3, 4]. These schemes however suffer from long and variable preambles at the transmitter side and are best suited for bit streaming transceiver chipsets. The latest trend is toward packetized radios in which the preambles are a fixed length such as the TI CC2420. A collision-free beacon scheduling algorithm for IEEE 802.15.4/Zigbee Cluster-Tree Networks is presented in [5]. The approach called Superframe Duration Scheduling (SDS) builds upon the requirement for beacon scheduling outlined in the Zigbee specification for Cluster-Tree multi-hop topologies. The SDS algorithm functions within the coordinator. Although centralised control reduces the computational overhead and information flow between distributed devices, it can result in excessively data flow of control traffic toward the coordinator, which results in devices close to the coordinator being excessively overloaded in relaying this data. A distributed approach may be more attractive. The IEEE 802.15.5 Task Group 5 is discussing the proposal for beaconing scheduling for mesh topologies [6]. The proposal involves making fundamental changes to the superframe structure on the MAC to provide a beacon-only timeslot in which beacons of neighbouring devices will be transmitted. This proposal however involves changing the MAC superframe structure, which affect the interoperability with the current MAC standard. In order to overcome the limitations outlined above, the EmNetS team propose a distributed beaconing scheduling strategy, in which the coordinator device does not participate in the beacon scheduling process. The proposed algorithm is depicted in Figure 1. Local decisions are made at each mesh router device based on information received during the beacon scan. In this way information is not required to be sent to the coordinator each time a new device requests a 168 beacon schedule time, hence reducing the control traffic overhead toward the coordinator. When a node initially starts, it associates with a beacon enabled device, based on for example, the strongest signal strength (network layer function). If the node is required to transmit beacons it must build a list of its neighbours and neighbours’ neighbours. It does this by listening for its neighbour’s beacons (obtain neighbour list) and records the beacon transmit time of each in a Beacon Schedule Table (BST). The device will then request the neighbours’ neighbour list in the CAP of each neighbour [2]. The list will contain the beacon offset time of the two-hop neighbours relative to the one-hop neighbour sending the list. This data is also added to the BST. Upon completion of this step the node will have a complete list of its neighbours and neighbours’ neighbours in absolute values (one-hop neighbours) and offset values (two-hop neighbours). The device can now determine its own schedule period. The new device shall remain scanning for beacons to calculate the transmission offset values between its scheduled beacon time and that of the received beacons and notify each neighbour of this offset. This may also facilitate the reception of any unheard beacons in the first beacon scan. The new device will at this stage have bi-directional non-interfering connectivity between it and all one-hop neighbours in the PAN. Association Procedure Send offset values to neighbours and place in own neighbour table Required to tx BCN = True Scan Neighbour BCN & request neighbour list > Time to receive all BCN and neighbours list via indirect transmission Scan for all BCN and calculate tx offset of neighbours Determine schedulability and beacon transmit time Figure 1 2.2 Schedule nth BCN tx Sleep Wake at nth BCN schedule time Wake at dest node scheduled BCN time if sending data Distributed Beacon Scheduling for IEEE 802.15.4 Two level Zone based Routing The communication scenarios in wireless sensor networks can be classified into few-to-many data dissemination, many-to-one tree based routing, and any-to-any routing topologies [7]. Many of the routing algorithms in wireless sensor networks are based on network-wide dissemination and the collection of data from the interested nodes. Tree based routing is used for forwarding data to a common destination at the tree root. These scenarios make the sink node or root a fixed node and there is no support for communication between any two independent devices. To implement dynamic backbone networking inside a wireless sensor network, any-to-any routing would be useful. However limited memory in wireless sensor networks makes it impossible to maintain routes to every node in the network. The Zigbee standard uses a variant of the AODV routing algorithm to route packets in which each node has to maintain a routing table with the entries of destination and next hop in that route. This method is not suitable for larger networks. For example when a node tries to send to multiple destination nodes, it requires multiple route entries along the same path. To combat the limited resources of sensing devices in terms memory and computational complexity, we have defined a framework for network routing in wireless sensor networks similar to the Zone Routing Protocol ZRP [8]. In this strategy we divide the network into clusters or zones. A hybrid of proactive and reactive routing is used for the intra-cluster level and a reactive approach is used at the inter-cluster level. Each node in a cluster always maintains routes to the destination cluster, rather than to maintain the entire route for a destination node. This framework works on the basis of “Think Global and Act Local”. Each node maintains two types of routing tables to perform routing at the two levels. To perform inter-cluster routing, nodes on the current cluster do not care about the destination node, instead they try to route the packet to the next cluster which is en-route to the destination cluster. If the destination node belongs to the current cluster, it routes using the intra-cluster algorithm based on 169 AODV. The concept is illustrated in Figure 2. The nodes on the edge of the cluster that have neighbours in other clusters are called Gateway nodes. These gateway nodes forward the data packet to their neighbour clusters. Gateway nodes broadcast (proactively) their next cluster information to all nodes inside the cluster. This broadcast enables the nodes within a cluster to build routes to appropriate gateway nodes depending on the destination cluster. When a data packet is transmitted, it needs to be supplied with destination cluster address and node address. Nodes send the Inter-Cluster routing request to all gateway nodes, unless it knows which gateway has a route to the destination cluster. As the network evolves it is possible that nodes will learn which gateways have paths to the most recent destined clusters. They forward the route request packet to neighbouring clusters via neighbour gateway nodes. When this packet reaches a gateway in the destination cluster or a gateway that has knowledge of the remaining path, a route reply packet is sent back to the source node. The route reply packet reaches the source node with the next cluster or entire cluster list (optionally). Each gateway nodes along the path caches the cluster level route. This completes the inter-cluster routing procedure. The data packet can now be sent to the destination cluster. Inside the cluster, it finds the destination node using intra-cluster reactive routing. The simplified version of AODV may be used as a basis of the reactive routing algorithm. 3 Middleware Development Emnets is developing a simple middleware platform to provide application developers interfaces to facilitate rapid development of applications in the utilities space; this platform will ease product development of wireless sensor network. The goals of the middleware are: (1) to develop services oriented towards rapid applications development; (2) to develop a composition tool for middleware synthesis; (3) to develop a resource-aware deployment tool for middleware mapping. C=C6 C=C2 C=C1 C=C7 Pan C=C0 Clusters C=C3 C=C8 C=C1 3 C=C1 2 C=C5 Connection between Gateway nodes Route between Clusters C12 – C7 C=C4 C=C1 1 Figure 2 C=C9 C=C1 0 Concept of cluster based routing in wireless sensor networks The middleware will employ a VM-based approach. There are two main advantages of employing a VM-based approach in the middleware system. Firstly, middleware services do not need to be rewritten for different platforms as they run transparently over the varied platforms. Secondly, a virtual machine provides a well-designed instruction set. This enables rapid prototyping of highly compact application binaries which may result in low energy overheads when they are distributed in the network [9]. As shown in Figure 3, the middleware system will be decomposed into two layers, virtual machine layer and services layer. Virtual machine layer resides on top of operating system and network stack, it utilises a virtual machine to provide APIs to abstract different hardware and/or system platforms. Services layer resides on top of virtual machine layer and consists of middleware services. All services provided by the middleware will be implemented in this layer. The middleware services can be tailored for each sensor node, which is based upon the facts that (1) services can be realised in different ways based upon hardware components, hardware resources, user requirements, optimisation criteria, etc. (2) services are required differently by applications running on top of it, the services running on any specific sensor node do not need to reflect the full requirement specification. The middleware services may include, just to name a few, service discovery, aggregation, localisation, synchronisation, adaptation, update, security. New services can be added into the system if they use 170 certain interfaces. There are a number of algorithms and/or models for different services in literature, thus, instead of developing new algorithms and new models for the services, the proposed middleware services will be realised using the ones in existence. Similarly, the virtual machine used in the middleware system will be based upon one of the existing virtual machines for wireless sensor network (e.g., Mate, Agilla, SensorWare, etc.), and further modification will be applied if necessary. The middleware system will be built by defining a set of components, dependencies, and the specific components which can be selected under a certain condition. A key design goal is to develop a tool capable of selecting components, capturing relationships among components and composing specific components. For the purposes of selection and composition, each component will have enough information as its attributes; such information may include functional properties, non-functional properties, required and/or provided guarantees upon qualities, etc. In order to compose a middleware system, the following information will be needed by the composition tool: platform description, middleware services, constraints, and quality criteria. The platform description specifies hardware components and their resources; middleware services specify the services which run on top of a particular device; constraints specify the resource constraints of each component; and quality criteria specify user’s non-functional requirements which may include reliability, usability, performance to name a few. After reading the information, the composition tool builds a dependency diagram by satisfying the dependencies of a start component which can be any one of the components selected by the user. During this process, based upon resource constraints and quality criteria, the composer selects the most suitable components from a set of different possible components, and also produces additional components required and necessary glue code to hook all components together [9, 10]. Finally, the system will also provide a mechanism to map the middleware images into the entire network. Figure 4 depicts the processes of composition and deployment. Application Layer Middleware Layer Time Synchronizer Group Manager Data Manager [Service N] Virtual Machine Network Stack Operating System Hardware Figure 3 4 Proposed Middleware Architecture Sensor Network Management The EmNetS project is developing a general purpose remote management system to monitor, manage and control the behaviors of wireless sensor networks. To date, there has been very little research on network management and performance debugging for wireless sensor networks. This is mainly because the original vision for such networks envisaged extremely dense, random deployments of very inexpensive nodes, operating fully autonomously to solve or avoid faults and performance problems. However in reality, it is not the case. Many real-world applications, including those for utilities, require carefully planned deployment in specific locations, and nodes actually are not very inexpensive. As a result, autonomous approaches will need to be complemented by traditional network management approaches, but using new algorithms that are cognizant of the severe resource constraints which characterize sensor nodes. Some relevant work in this area includes [11], which proposes two simple application-independent protocols for collecting health data from and disseminating management messages to the sensor networks. It is limited as a passive monitoring tool only, i.e., it requires a human manager to issue queries and perform analysis on collected data. In contrast, [12] proposes to reuse the main sensing application’s tree routing protocol to deliver monitoring traffic. As a result, it might be non-trivial to 171 adapt the current monitoring mechanism to different classes of applications. In [13], the authors have surveyed some existing work in sensor network management. They found that currently there’s no generalized solution for sensor network management. Part of the EmNetS project has been targeted to fill this gap in sensor network research. Our goal is to develop a simple yet efficient, general purpose, policy-based sensor network management system which should exhibit the following important characteristics: low management overhead, strong fault tolerance, adaptive to network conditions, autonomous and scalable. Constraints Platform Description Quality Criteria Compose Middleware Services Middleware Image Deploy Wireless Sensor Network Figure 4 4.1 Processes of Composition and Deployment EmNetS’s Network Management System To strike a good balance between scalability and complexity, a hierarchical network management system would be an appropriate solution. We propose to use several layers of management, in which the managers in the lowest layer directly manage the sensor nodes in their part of the network. Each manager passes collected health data to its higher-level manager and at the same time disseminates commands from the higher-level manager to the nodes it manages. Typically, a management layer of the proposed EmNetS’s Network Management System (ENMS) consists of the following components (see Figure 5): Data Collection & Dissemination Protocols Management Policies Sensor Network Models Management Engine Figure 5 EmNetS’s Network Management System Components (1) Sensor network models: The models are to depict the actual states of the sensor networks. There are various possible sensor network models to be captured in our management system, for example link quality map, network topology map, energy map, etc. An important requirement is that the network models must be extensible to easily accommodate future classes of sensing applications. (2) Data collection and dissemination protocols: To collect health data from sensor networks, we are exploring a combination of energy-efficient application-dependent and application-independent data collection protocols. The former protocol would use the main sensing application’s tree routing protocol to deliver health data from the sensor networks. One way to implement this approach is to piggy-back the health data into the real application data packets. The latter has the advantage of being independent from the sensing applications, thus can be easily adapted to be used in different applications. Moreover, when the application fails, the latter protocol can still continue functioning. We envisage a combined protocol in which the former approach would be used to report network health data periodically, while the latter is for active node probing and sending management messages when required. 172 (3) Management policies: ENMS’s management policies will specify tasks to be executed if certain system health conditions are met, e.g., battery level of node A is now 10%, so node A should go to sleep mode. (4) Management execution engine: The management engine uses the data collection/dissemination protocol to update the sensor network models, and to send commands to sensor nodes. Based on the collected health data, and the management policies, it will then automatically analyze the current situation and execute the right management tasks, e.g., re-configuring a network route in case of congestion. Together with well-defined management policies, an intelligent management engine would help to achieve the desirable level of autonomy for our ENMS, thus minimizing the need of human managers. 5 Test bed Implementation One of the key activities within the EmNetS project is the development of a live system test bed. The objective of the test bed is twofold. Firstly it provides a hardware platform to validate protocols and network architectures developed within EmNetS and to execute experiments demonstrating the suitability of the developed platform for utilities and responsive building applications. Secondly, the test bed provides network management functionality via a backbone USB/WiFi network for topology control, network element status indication, performance/fault analysis, configuration and remote programming of individual devices. The test bed architecture, depicted in Figure 6, is based on the ReMote test bed architecture developed at the University of Copenhagen [14]. Client PCs Server PC Host PCs Sensor-Net Figure 6 EmNetS Test bed The test bed consists of the following layers – The Sensor Net layer consists of the sensor devices. The Xbow MICAz and Moteiv TMote are currently supported. The test bed includes support for software stacks developed in TinyOS versions 1 and 2. Future work in this layer includes the additional support for the Contiki operating system and the provisioning of more advanced components for testing and debugging. The Hosts PCs consist of Linux based embedded PC platforms with USB2 connectivity to the sensor nodes in the layer below. The host PCs contains the mote control host daemon to facilitate mote discovery and issuing mote commands start, stop and reset for specific network topology control and remote power management. This layer also contains the Bootloaders for the sensor devices within the testbed. Connectivity to the upper IP based Server layer is via an Ethernet/WiFi link. Future work in this layer will focus on the implementation of wireless USB and the provision of application data logging. The Server PC layer contains the mote control server daemon to bridge the communication between the clients in the upper layer to motes in layer 1. It also contains an MYSQL based database for central storage of all system information and a TOMCAT information server to provide client system information, user authentication and mote information. Connectivity to the top layer is via WiFi link. Further extensions 173 in this layer are focused on development of an administrative interface to manage the network users, reservation of test bed resources and mote information. The Client PC layer forms the top layer in the sensor test bed in which the user can interact with the system. Each client PC contains a java graphical user interface which lists the available motes and provides services such as start, stop and reset of the individual devices. Individual motes can also be reprogrammed with a console window available for each device. Future directions in this layer include the development of an interactive map showing the node deployment, a reservation system by which users can reserve network resources for a particular test and a data logging facility. It is envisaged that the Sensor-Net test bed architecture will evolve across multiple domains and institutes providing a tool for remote access and the deployment of networking protocols to further advance wireless sensor network research. 6 Conclusion The EmNetS project is current undertaking a research programme in the area of embedded wireless sensor networks and is advancing the current state of the art in energy efficient, scalable networking protocols, middleware, network management and testbed development. The application domain under investigation includes the responsive building environment which provides a number of key research challenges in terms of energy efficiency and scalability. This paper has provided an overview of current research activities within the programme and highlighted the challenges and solutions under development. References [1] http://www.cs.ucc.ie/emnets/ [2] IEEE 802.15.4 Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications for Low-Rate Wireless Personal Area Networks (LR-WPANs), 2006 [3] A. El-Hoiyi, J.-D. Decotignie, and J. Hernandez. Low power MAC protocols for infrastructure wireless sensor networks. In Proceedings of the Fifth European Wireless Conference, Feb. 2004. [4] Joseph Polastre, Jason Hill & David Culler, “Versatile Low Power Media Access for Wireless Sensor Networks”, Proc. Embedded Networked Sensor Systems, 2004, pp. 95 – 107 [5] Anis Koubaa, Mellek Attia, “Collision-Free Beacon Scheduling Mechanisms for IEEE 802.15.4/Zigbee Cluster-Tree Wireless Sensor Networks”, Technical Report, Version 1.0, Nov. 2006, http://www.open-zb.net/ [6] Ho-In Jeon, Yeonsoo Kim, “BOP Location Considerations and Beaconing Scheduling for Backward Compatibility to Legacy IEEE 802.15.4 Devices”, submitted to IEEE 802.15.5 Task Group 5. [7] A Holistic approach to multi-hop routing in sensor networks, Alec Lik Chuen Woo, Thesis, University of California, Berkeley. [8] The Haas, Z. J., and Peatlman, M. R., “The zone routing protocol (ZRP) for ad hoe networks”, Internet Draft -- Mobile Ad hoc NETworking (MANET) Working Group of the Internet Engineering Task Force (IETF), November 1997. [9] Joel Koshy, Raju Pandey, “VM*: Synthesizing Scalable Runtime Environment for Sensor Networks”, SenSys’05, San Diego, California, USA, 2-4 November 2005. [10] Peter Graubmann, Mikhail Roshchin, “Semantic Annotation of Software Components”, EUROMICRO-SEAA’06, Cavtat/Dubrovnik (Croatia), August 28 – September 1, 2006. [11] G. Tolle, and D. Culler, “Design of an Application-Cooperative Management System for Wireless Sensor Networks”, European Workshop on Wireless Sensor Networks, Istanbul, Turkey, Jan 2005. [12] S. Rost, and H. Balakrishnan, “Memento: A Health Monitoring System for Wireless Sensor Networks”, IEEE SECON, Reston, VA, Sep 2006 [13] W. L. Lee, A. Datta, and R. Cardell-Oliver, “WinMS: wireless sensor network-management system, an adaptive policy-based management for wireless sensor networks”, Tech. Rep. UWACSSE-06-001, The University of Western Australia, June 2006. [14] http://www.distlab.dk/sensornet 174 Dedicated Networking Solutions for Container Tracking System Daniel Rogoz 1, Dennis Laffey2, Fergus O’Reilly 1, Kieran Delaney 1, Brendan O’Flynn2 1 TEC Centre, Cork Institute of Technology, Rossa Avenue, Cork (daniel.rogoz, fergus.oreilly, kieran.delaney)@cit.ie 2 Tyndall National Institute, Cork (dlaffey, boflynn)@tyndall.ie Abstract TEC Centre researchers in CIT in collaboration with Tyndall National Institute are currently developing a container management and monitoring system using Wireless Sensor Networks (WSNs) with a support of Cork Port and local company Nautical Enterprises. The system is essentially being designed to seamlessly integrate with existing container management and monitoring techniques at the port, efficiently and at low cost extending its capabilities with remote querying, localization and security. To achieve its goals, the system exploits the capabilities of wireless sensor network nodes used as container tags, forming a wireless, ad-hoc network throughout the container yard. The paper will briefly describe the current project status which includes developing hardware solutions by Tyndall – dedicated hardware WSN platform; and software solutions by the TEC Centre, like specialized graphical user interfaces on PDAs (based on the .NET Compact Framework) or laptops, and applications for WSN motes running TinyOS WSN operating system to provide full system functionality on one-hop communication level. The paper will further introduce current work being done to overcome the main project challenge – physical, visual and the most constraining, radio shielding of the containers. By implementing multi-hopping and ad-hoc routing techniques the system will exploit the stacked and rowed containers to forward information from one to the next thus allowing intelligent and reliable communication from the depths of the port/yard to the management system user. The system power constraints will be addressed by using a power efficient MAC layer, placed underneath routing protocol, extending system lifetime to considerable amount of time, enabling it to operate throughout the whole container management cycle on ordinary batteries. Keywords: wireless sensor networks, applications, asset tracking. 1 1.1 Introduction Project background and rationale Ireland’s island status and large external trade make efficient, low cost and fast trans-shipment of goods a strategic economic requirement. The efficient and timely flow of container traffic, through Ireland’s ports is of vital importance to maintaining Ireland competitiveness and export driven economy. With Irish ports operating as economic gateways, container traffic is on/off-loaded, moved and stacked in tiers, for further shipment or transfer to the rail or road network. Within ports the order of on/off-loading, placement in the storage yard, stacking and equipment levels are all key in maintaining an efficient low-cost operation. Organizational mistakes can have significant time and labour costs. Four ports in the Republic and two in Northern Ireland have load-on/load-off(lo-lo) container services. In 2003, the total island traffic was 1,007,261TEU (Twenty foot container Equivalent Unit). The ports vary in throughput from Warrenpoint 9,712TEU to Cork 137,246TEU to 175 Dublin at 495,862TEUs. This shows the potential for the technology innovations. In Ireland, lo-lo traffic in 2003 grew at twice the global average. Projection forward to 2008 sees estimates of a total growth of 26% in the 5 year to 2008. These trade figures show the continuation of a strong import/export business and the opportunity for ports to invest in their infrastructure requirements. [1,2] 1.2 Container tracking scenario Within the container terminal, containers are stored in rows, divided into slots. In the Port of Cork the main yard area consists of 80 rows with 20 slots in each row. Figure 1 shows the structure of the rows and slots which provide a regular storage area. The tracks between the containers allow the wheels of the straddle carriers to pass, over the containers. Figure 1. Port container storage area and container markings used for identification Steel walled containers currently used globally are tagged with a registration and classification identifier code, which consists of both the supplier, container number and a description code. The Supplier/Unit Number code consists of 4 letter code followed by 6 numeric digits. This code is unique to each container. In addition, a 4 digit alphanumeric type/classification code identifies the type of container and its use. Using the full set of codes each individual container can be uniquely identified. A sample of one such code is shown in Figure 1. In Port of Cork, all equipment for loading/unloading containers is equipped with a portable terminal. This allows the machinery operator to key in the last 4 digits of the container identification, identifying the container and then the operation carried out on the container. This main equipment mainly used on terminal in the handling consists of the gantry cranes used at the port/sea interface and the straddle carriers used to load/unload trucks at the port/land interface. This equipment is networked wirelessly back to a central management system which records the movements and operations carried out and then interfaces with the accounting/management systems for customs and payment purposes. Currently the on-site management, tracking, location monitoring and auditing of containers at Irish container ports is a laborious and time consuming process, necessitating hand location, identification, tracking of containers. This imposes significant costs on the relatively small and economically weaker ports and trans-shipment centres. Mistakes or delays result in additional expenses, all of which impact on the economic costs of export and import. Additionally, systems as used in large ports abroad, e.g. Rotterdam are too expensive and un-justifiable, given the scale of traffic levels in the Irish ports. In large ports such as Rotterdam Machine Vision Systems are being used to read container numbers and automatically identify them from databases. These vision systems are costly to install and maintain and are only justified for large volumes of traffic. They suffer from poor weather conditions, which make vision difficult, damage to numerals and dirt covering the numerals on the containers. These are conditions which are prevalent in Ireland. Machine vision systems also do not allow for remote finding and identification of containers, only those currently in vision. Other container tracking solutions are mainly based on Passive Electronic Tagging and RF-ID. The use of passive electronic tags/RF-ID, which respond when queried, will give the identification of specific containers, but fails to provide for remote location/identification and will not allow for monitoring. Passive tags do not have the ability to network to allow communication with tags out of range. Passive tags also do not have the capability for container monitoring and/or protection. Such 176 monitoring can extend to ensure that containers are not entered/exited e.g. for terrorism/immigration purposes and ensure that containers are handled correctly in a yard by measuring vibrations, forces exerted etc. 1.3 Proposed solution and the challenges This project proposes a Wireless Sensor based tracking system which will allow the tagging, identification and tracking of shipping containers, from when they enter a port, to when they depart for their final destination. This will be low cost and efficient for smaller ports to use, allowing them a competitive equality, especially in the regional areas. We propose using Sensor Network derived, Sensor Identification Devices (SID) to self-identify, track and help manage the individual containers in a yard. These will be self-contained identification devices, approximately the size of a cigarette box, with radio networking capability. The SID devices will be attached, in a removable manner, to each container entering the yard and will store, identification information regarding the container, its source, destination and any important information regarding its contents. Each SID will run off an enclosed battery and be capable of communicating via radio over a short distance of approximately 30m with either other SID devices or handheld readers/PDAs. When containers are stacked and placed in rows, SIDs will use multi-hopping and ad-hoc networking techniques to forward information from one to the next and allow communication from the depths of a stack/row to an outside point. The prototype SID devices are based on existing technology developed in the CIT and the Tyndall National Institute. Tyndall National Institute has developed and tested a flexible sensor network platform [3,4] thus providing much of the base hardware for implementing the SID devices. The container tags wireless communication will be using ZigBee standard in unlicensed ISM RF band The main challenge in realization of the project is the environment, the fact that containers provide physical, visual and radio shielding. Radio frequency communication range is limited by multipath propagation in presence of steel containers – phenomena such as reflection and diffraction would be omnipresent. In order to facilitate accessibility to each container tag, an approach different than direct communication needs to be taken. The container deployment manner is highly unpredictable, imposes no fixed infrastructure of the network formed by the container tags and is moderately dynamic as the containers are deployed and removed on a regular basis. Lastly, the battery operation of the tags makes their lifetime limited making power efficiency a significant issue. 2 2.1 Current status System overview Our system consists of wireless sensor nodes [3,4] acting as container tags. The tags communicate wirelessly using 2.4GHz unlicensed ISM frequency band and access to the tags is enabled through a gateway. The gateway can be connected either to a PDA or PC/Laptop, acting as a bridge, forwarding messages from serial connection to RF and backward. For user interaction with the system we have developed a specialised Graphical User Interface, which can run on any Windows Mobile PDA or Windows PC. The system functionality is a substitution of current container management and tracking methods used in the port storage yard. First of all it enables RF communication with the tags, establishes connection with the nodes to either find active tag (by known container number), or find empty tag (by known mote ID, which is a unique tags number), or just discover all tags in range (all in direct communication range, then no information is required). The system provides a Beacon function to physically locate the tags (locating the tag with RSSI (received strength of the signal) /hop count and LED indicator). The container location can be determined by accessing stored location data (row/slot) as well. All the container information stored on the tag can be accessed by querying the tags. The full container information can be displayed, including container number, type, arrival and departure dates, location, owner, and any additional information. The user can change the data stored to update the 177 container information. The system allows activation of new/empty tags and storage of relevant data as well as tag deactivation (resetting the data). Figure 2 summarizes the system architecture. PDA GUI Laptop GUI Serial connection Wireless connection Gateway node Container tags Figure 2. System overview 2.2 Graphical user interface In order to facilitate user interaction with the system, we have developed a specialised Graphical User Interface. Its purpose is to provide user with the full system functionality through a PDA or PC screen, thus hiding the underlying complexity of the system. It acts as a bridge between a PDA/Laptop and the WSN/container tag network – communicates through a serial connection with a Gateway node, which in turn interacts with the deployed container tags wirelessly using ZigBee standard RF communication. As the mobility of the user interface is the key aspect – it has to interact with the deployed container tags in the yard, we have based the GUI application on a widely established Windows Mobile based PDA device. The gateway it attached to the PDA using a dedicated serial cable. Figure 3 shows the PDA interface. Figure 3. Graphical User Interface running on a PDA As the Windows Mobile GUI is based on .NET Compact Framework, which is to some extent a subset of full .NET Framework, the same GUI application can be launched on an ordinary Windows XP PC with installed .NET Framework without any additional changes. Figure 4 shows the same GUI running on a desktop Windows XP. Figure 4. Graphical User Interface running on a desktop PC (Windows XP) 178 2.3 Gateway and container tags 2.3.1 Hardware platforms Tyndall National Institute has provided hardware solution for the project, based on its DSYS25z wireless sensor node [3,4]. The node is built with ATmega128 microcontroller and Ember EM2420 radio transceiver (Chipcon CC2420 counterpart), a RF monopole antenna is used for communication. Gateway and container tag modules are essentially the same with the exception of an external serial connector on the Gateway module. Both are enclosed in waterproof, RF-transparent boxes, with external power switch and two LEDs. Figure 5 shows exemplary tag, sealed and open with visible Tyndall mote. Figure 5. Wireless sensor nodes used as container tags. 2.3.2 Software solutions Wireless Sensor Networks applications are tightly bound to a particular hardware, manipulating hardware resources to execute high-level logic tasks. In WSN, applications are specialized and hardware resources are very limited. Therefore accurate control of how these resources are used is essential, making software development process long and error susceptible. In addition, changing the platform requires repeating the whole development process. Programming the motes using conventional methods can be challenging, especially utilizing some more complex algorithms, such as ad-hoc networking. Therefore as a software platform for container tags we have chosen TinyOS, dedicated operating system for wireless sensor networks [5]. Gateway and container tags motes run applications written in nesC [6,7], a C-like programming language of TinyOS. TinyOS TinyOS is a multi-platform sensor networks operating system designed by U.C. Berkeley EECS Department to address specific needs of embedded wireless sensor networks. TinyOS is a set of “blocks” representing certain functionality, from which the programmer can choose to build his application by “snapping” or “wiring” these components together for a target hardware platform. These components can be high-level logic (like routing algorithms), or software abstractions for accessing the hardware resources (such as radio communication, ADC, timers, sensors or LEDs), and interact through well-defined bi-directional interfaces (sets of functions), which are the only access points to the component. Bi-direction of interfaces allows introduction of split-phase operation, making commands in TinyOS non-blocking. Components structure goes from top-level logic layer down to platform-dependent hardware presentation layer. By replacing a component we can change algorithms, hardware platforms or expand hardware platform functionality. TinyOS supports high level of resource constrained concurrency in form of tasks and hardware event handlers as two separate threads of execution. Scheduling tasks allows implementation of power-saving algorithms; the mote can go into a sleep mode, saving energy while waiting for an event to occur. [5,6,7] Gateway application The basic role of the gateway is to be a bridge between the PDA connected via serial cable, and the container tag network accessible through wireless RF connection. It forwards the messages received over UART to RF, and the other way. The messages are following a specific packet structure 179 defined by TinyOS – Active Messages, containing destination address, message type, group ID, length, CRC and message payload. Basing on this information, the gateway can check if the message is not corrupted, filter out messages transmitted from outside the system; it can assess the strength of the received radio signal as well. Container tag application The container tag stores detailed container information and makes the information accessible to the system user. It communicates with the gateway using only radio packets, receiving commands and responding accordingly, to provide the functionality described in 2.1. 2.4 Current system deployment Our current system testbed consists of small scale, 4 container tag network, a gateway node and a PDA interface. The RF communication is based on a direct, single hop connection between the gateway and the nodes; the network has currently a star topology. At this stage the system full functionality as described in 2.1 is implemented, with the exception of hop count indication. This setup provides the testbed for connectivity tests, makes possible measuring the RSSI and drop packet rate. Figure 6. Current, single-hop, communication setup 2.5 Feasibility tests results To verify the feasibility of this container management system solution, we have performed a number of tests with containers currently stacked in various combinations in one of the Cork Port container yards. The empty container storage area was used as a test site, as active storage areas were inaccessible due to normal operation of container loading/unloading equipment within these areas. Two key tests have been performed. First with the setup as pictured on Figure 7, the containers have been placed in a grid, 4 rows, 3 containers long, and stacked 3 containers high. One of the tags was acting as a receiver and the other was transmitting from various locations. For each location the average strength of the signal was measured as well as packet delivery rate. The results showed packet delivery rates ranging from 73% to 100% and RSSI from -85dBm to around -40dBm. This test proved that communication between tags that are up to 2 containers apart is possible with sufficient reliability. RECEIVER Figure 7. Test setup used for verifying container-to-container communication For the second test we used a single tag emitting only raw 2.4GHz carrier frequency and a directional antenna with spectrum analyzer. The tag was attached to the container door and containers have been placed 10cm apart. Figure 8 shows the setup of this test. We measured strength of the signal in 5m radius with receiving antenna pointing in the direction of the gap between containers, where the transmitter has been placed. The results showed that the transmitter antenna radiates from the gap 180 between the containers in a wide angle, not only the narrow line of sight, which makes container-tocontainer communication feasible. The test results are included in the Figure 8 below. Figure 8. Shape of antenna transmission field radiating from the tag attached on container door 3 Future work Most of the functional targets of the project have been already fulfilled, but two key aspects still remain to be addressed. One of them is accessibility of every single container tag within a network, from any point in the container yard. This means that a multi-hop networking protocol (such as in [8]) has to be implemented to exploit the manner of container placement (stacked rows) in order to allow the tags to forward the radio messages from the gateway deeply into the network. In this way no direct communication range is necessary to access a tag. The multi-hop protocol cannot rely on any topology as a user should be able to connect to the network from any place, provided that at least one tag is within the gateway’s communication range. Containers are loaded and unloaded on a regular basis, so the container tags will be constantly entering and leaving the network, and the size of the network is not initially defined. Therefore the networking protocol has to be reconfigurable, scalable and moderately dynamic as well, since the user will be mobile. Figure 9. Multi-hopping communication scheme in container yard The other aspect is the power efficiency in order to extend system lifetime to a reasonable amount of time (i.e. months). A power conserving radio Media Access Control (MAC) protocol, similar to ones described in [9,10] should be used to manage the radio state by switching it off when idle, as the radio used in the project consumes roughly the same amount of current, whether in idle, receive or transmit mode. The current work focuses on introducing these multi-hopping ad-hoc routing techniques into networking part of the system and addressing the power constraints problem through power efficient MAC protocol. This project’s aim is to prove the technology and concepts for container management and tracking application, and at the end of the project we are expected to deliver a medium scale (10 tags) system demonstrator with suitable multi-hopping ad-hoc techniques employed (Figure 9), providing functionality as described in system specification. 4 Conclusions The proposed solution for a container management and tracking system is competitive to the existing solutions, has the potential for future enhancement and is an attractive opportunity for commercialization. The two enabling technologies provided by the system are Intelligence and Networking capabilities. The combination of these technologies enables several innovations which are not in existing tracking systems. 181 System intelligence allows containers to be tracked and monitored individually, permitting them to log their own conditions/movement and generate alarms where appropriate. This gives the capability to extend to security/immigration control purposes and automatically audit the container management in the port. Passive or unintelligent tagging, e.g. RF-ID, does not allow for this future value added capability. The ability to record/monitor each container’s stay in a port will allow for quality management procedures and provide an automatic electronic audit trail. This is important for food, valuable and dangerous goods trans-shipments. The sensor based hopping networking ability, allows potential access and communication to all containers within a yard from a central location, without needing to individually visit them. This provides for radio communication, in what is in fact a difficult radio environment due to the significant quantity of steel present. The system will allow full scalability in operation, scaling with port development and container traffic growth. Given that there are low infra-structural overheads the system would be affordable across a wide range of port sizes and can grow through the networking function without significant additional outlay. Acknowledgements: This work is carried out as part of the Enterprise Ireland funded Project Containers [11] PC/2005/126, and the support of all project partners is recognised. References [1] [2] [3] Irish Maritime Transport Economist, Sept. 2004, published by IMDO-Ireland; Irish Short Sea Shipping, Inter-European Trade Corridors, 2004, published by IMDO-Ireland; S.J. Bellis, K. Delaney, B. O'Flynn, J. Barton, K.M. Razeeb, and C. O'Mathuna, “Development of field programmable modular wireless sensor network nodes for ambient systems”, Computer Communications, Special Issue on Wireless Sensor Networks and Applications, Volume 28, Issue 13 , 2 August 2005, Pages 1531-1544 [4] B. O'Flynn, S. Bellis, K.Mahmood, M. Morris, G. Duffy, K. Delaney, C. O'Mathuna “A 3-D Miniaturised Programmable Transceiver”, Microelectronics International, Volume 22, Number 2, 2005, pp. 8-12; [5] Hill J, Szewczyk R, Woo A, Hollar S, Culler D, Pister K “System architecture directions for networked sensors” SIGOPS Oper. Syst. Rev., Vol. 34, No. 5. (December 2000), pp. 93-104; [6] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, D. Culler. “The nesC language: A holistic approach to networked embedded systems”; [7] D. Gay, P. Levis, D. Culler, E. Brewer. “nesC 1.1 Language Reference Manual”, May 2003; [8] C. Gomez, P. Salvatella, O. Alonso, J. Paradells. “Adapting AODV for IEEE 802.15.4 Mesh Sensor Networks: Theoretical Discussion and Performance Evaluation in a Real Environment” International Symposium on a World of Wireless, Mobile and Multimedia Networks, 2006 (WoWMoM'06); [9] W. Ye, F. Silva, J. Heidemann “Ultra-Low Duty Cycle MAC with Scheduled Channel Polling” in Proceedings of the 4th ACM Conference on Embedded Networked Sensor Systems (SenSys), Boulder, Colorado, USA, Nov., 2006; [10] J. Polastre, J. Hill, D. Culler. “Versatile low power media access for wireless sensor networks” In Proceedings of the Second ACM Conference on Embedded Networked Sensor Systems (SenSys), November 3-5, 2004; [11] D. Laffey, D. Rogoz, B. O’Flynn, F. O’Reilly, J. Buckley, J. Barton. “Containers – Innovative Low Cost Solutions for Cargo Tracking” Information Technology & Telecommunications Conference 2006, Institute of Technology, Carlow, October 25-26, 2006. Proc pp 187-188; 182 Handover Strategies in Multi-homed Body Sensor Networks Yuansong Qiao 1,2,3, Xinyu Yan 1, Adrian Matthews 1, Enda Fallon 1, Austin Hanley 1, Gareth Hay 4, Kenneth Kearney 4 1 Applied Software Research Centre, Athlone Institute of Technology, Ireland 2 Institute of Software, Chinese Academy of Sciences, China 3 Graduate University of Chinese Academy of Sciences, China 4 Sensor Technology + Devices Ltd [email protected], [email protected], {amatthews, efallon, ahanley}@ait.ie, {Gareth.Hay, Kenneth.Kearney}@stnd.com Abstract Wearable wireless medical body sensor networks provide a new way of continuous monitoring and analysis of physiological parameters. Reliable transmission of real-time vital signs is a basic requirement for the design of the system. This paper explores multi-homing to increase data reliability for body sensor networks. It proposes a multi-homed body sensor network framework and investigates handover strategies during sensor nodes movement. Keywords: Multi-homing, Body Sensor Network, Handover 1 Introduction Wireless sensor networks have been developing rapidly in recent years. Much effort has been put into the exploration of wireless sensor network applications. Body sensor network for medical care is an emerging branch amongst these applications. They use wearable sensors to continuously monitor patient vital signs such as respiration, oxygen in the blood, temperature and electrocardiogram (ECG) etc. The real-time vital sign information can be delivered to doctors, nurses or other caregivers through the communication module in the wireless sensor node. Through a body sensor network, patient status monitoring can be extended from hospital to home, working place or other public locations. Any changes in patient status can be reported immediately to corresponding responders. This can expand the reach of current healthcare solutions, provide more convenience for patients and potentially increase patient survival probability in the case of emergency situations such as heart attack [1]. Although a body sensor network is derived from a sensor network, there exist several significant differences between the two [2]. Unlike common sensor networks, the data rate in a body sensor network may range widely according to different medical monitoring tasks. Life-critical data should be delivered reliably. Furthermore, medical data usually can not be aggregated by the network because the data is generated from different patients. Consequently, the technologies in common sensor networks can not be used directly in body sensor networks directly. Nevertheless, these features make it possible for the body sensor network to take advantage of the traditional Internet technologies. Currently, many solutions for body sensor networks use a Personal Digital Assistant (PDA) on a patient to gather data from sensors and forward the data to a central server through cellular networks 183 [3][4][5]. This paper investigates utilizing multi-homing technologies (a node with multiple network interfaces) in a body sensor network to increase data delivery reliability and decrease data delay in the case of network failures. The sensor node transfers data directly to ambient network nodes without the need for a bulky coordinating unit carried by the patient. In particular, this paper studies the handover strategy of the multi-homed sensor node. As the patient is mobile, the transmission distance of the sensor node is short and the wireless signal suffers interference from the environment, which will cause network handovers to occur frequently. Despite the fact that Internet protocols usually can not be used in sensor networks directly, the algorithms in the Internet protocols are still valuable for the design of such networks. Multi-homing technologies, where a host can be addressed by multiple IP addresses, are increasingly being considered by the Internet society. Two multi-homing transport protocols have been proposed in the current stage. They are Stream Control Transmission Protocol (SCTP) [6] and Datagram Congestion Control Protocol (DCCP) [7]. DCCP is an unreliable transport protocol with congestion control, whereas SCTP is a reliable transport layer protocol and employs a similar congestion control mechanism to TCP. As this paper focuses on reliable data transmission and handover strategies in multi-homed body sensor networks, SCTP is employed in simulations. Currently, the performance of SCTP for bulk data transmission is studied in [8]. This paper focuses on the SCTP performance for delay sensitive situations. This paper is organized as follows. Section 2 discusses related work. Section 3 presents the system architecture. Section 4 analyzes handover strategies. Section 5 discusses conclusions and future work. 2 Related Work In [3], a remote heart monitoring system is proposed. It transmits ECG signals to a PDA which forwards the signals to the central server through the cellular network. In [4], a wearable MIThril system is proposed. It uses a PDA to capture ECG data, GPS position, skin temperature and galvanic skin response. In [5], a body sensor network hardware development platform is presented. It is also based on the sensor node plus PDA solution. SCTP [6][9][10] is a reliable TCP-friendly message-oriented transport layer protocol defined by the IETF. The features of multi-homing, multi-streaming, partial reliability [11] and unordered delivery of SCTP make it possible for transmission of real-time data in multi-homed contexts. SCTP supports link backup for a multi-homed endpoint through its built-in multi-homing feature. Data is transmitted on the primary path. Retransmission is performed on an alternate path. The handover mechanism of SCTP is based on link failures. After the primary path failure is detected, data will be sent on the backup path. 2.1 Path Failure Detection and Handover Algorithms in SCTP SCTP is designed to tolerate network failure and therefore provides a mechanism to detect path failure. For an idle destination address, the sender periodically sends a heartbeat chunk to that address to detect if it is reachable and updates the path Round Trip Time (RTT). The heartbeat chunk is sent per path RTO (Retransmission TimeOut) plus SCTP parameter HB.interval with jittering of +/- 50% of the path RTO. The default value of HB.interval is 30s. RTO is calculated from RTT which is measured from non-retransmitted data chunks or heartbeat chunks. For a path with data transmission, it can be determined if it is reachable by detecting data chunks and their SACKs. When the acknowledgement for a data chunk or for a heartbeat chunk is not received within a RTO, the path RTO is doubled and the error counter of that path is incremented. For a data chunk timeout, the sender retransmits data chunks through an alternate path. For a heartbeat chunk timeout, the sender sends a new heartbeat chunk immediately. When the path error counter exceeds SCTP parameter PMR (Path.Max.Retrans), the destination address is marked as inactive and the sender sends a new heartbeat chunk immediately to probe the destination address. After this, the sender will continuously send 184 heartbeat chunks per RTO to the address but the error counter will not be incremented. When an acknowledgement for an outstanding data chunk or a heartbeat chunk sent to the destination address is received, the path error counter is cleared and the path is marked as active. If the primary path is marked as inactive, the sender will select an alternate path to transmit data. When the primary path becomes active, the sender will switch back to the primary path to transmit data. The path failure detection time is determined by SCTP parameters PMR and RTO. The default PMR value in SCTP is 5, which means that SCTP needs 6 consecutive transmission timeouts to detect path failure. RTO will be doubled for each transmission timeout and ranges between the SCTP parameters RTO.Min and RTO.Max. The default values for RTO.Min and RTO.Max are 1s and 60s respectively. If RTO is 1s (RTO.Min) in the case of a path failure, the minimum time for detecting path failure is 1+2+4+8+16+32=63s. However, the initial RTO could be 60s (RTO.Max). Therefore, the maximum path failure detection time is 6*60=360s. 3 System Design The wearable sensor node is deployed on patient. Each node has multiple Bluetooth [12] interfaces which are connected to ambient separate Bluetooth access points. The sensor node selects one network interface to transmit data. If the network interface fails, it switches to another interface to transmit data. The failed interface keeps searching available access points. It attaches to one of the access points except those that have been used by other interfaces. Figure 1: Architecture of Body Sensor Network Node The architecture of the body sensor node is shown in Figure 1. The communication entity includes three modules: Network Status Measurement (NSM): NSM provides local and end-to-end network dimensioning information such as available access points, available bandwidth, delay, jitter, and loss to other modules in the system. Network Handover Management (NHM): NHM select one of the available access points which are provided by NSM. Path Handover Management (PHM): PHM manages end-to-end switchover amongst connections between the source sensor node and the destination central server. A connection is identified by source address and destination address. PHM makes a path handover decision based on the network status information provided by NSM. The handover strategies will be discussed in the next section. 185 4 Investigation of Handover Strategies This section studies the effects of path failure threshold on transmission delay. SCTP is used in simulations. 4.1 Simulation Setup The simulations in this section focus on a sensor node with two Bluetooth interfaces. All simulations in this paper are carried out by running a revision of Delaware University's SCTP module [13] for NS2 [14]. Sensor AP1 R=10M 0 8 16 AP2 R=10M 24 32 40 48 56 Time (s) AP3 R=10M 64 72 80 88 96 104 Figure 2: Sensor Node Mobility Scenario Figure 3: Simulation Network Topology In the simulations, it is supposed that the sensor node is outfitted with two Bluetooth interfaces. The transmission radius of the sensor is 10 meters (Figure 2). The patient walks in slow speed which is about 0.5 meters per second. The overlap area between the access points is 20% of transmission diameter, i.e. 4 meters. The patient walks directly from one access point to another access point. When an interface in the sensor node fails, it will attach to the next access point when the access point is available. 186 In the current NS2-SCTP implementation, the SCTP module does not work well with the wireless module. This paper uses wired network to simulate network switch off. The simulation topology is shown in Figure 3. Node S (the sensor node) and Node R are the SCTP sender and receiver respectively. Both SCTP endpoints have two addresses. R1,1, R1,2, R2,1 and R2,2 are routers. The implementation is configured with no overlap between the two paths. The MTU of each path is 1500B. The queue length of bottleneck links in both paths is 50 packets. The queue length of other links is set to 10000 packets. SCTP parameters are all default except those mentioned. The initial slow start threshold is set large enough to ensure that the full primary path bandwidth is used. Only one SCTP stream is used and the data is delivered to the upper layer in order. Initially the receiver window is set to 100MB (infinite). In order to simulate the network changes, the loss rate of the bottleneck links in Figure 3 are set to 0% when the patient enters the area of an access point and it is set to 100% when the patient leaves the area of an access point. At the initial stage, the primary path loss rate is set to 0% and the secondary path loss rate is set to 100%. Mean of Delay (s) Simulation Results & Analysis 18 16 14 12 10 8 6 4 2 0 0.167 0.333 1 10 20 30 40 50 Data Rate (Packets/Second) PMR=0 PMR=1 PMR=2 PMR=3 PMR=4 PMR=5 Figure 4: Mean of Delay Standard Deviation of Delay (s) 4.2 25 20 15 10 5 0 0.167 0.333 1 10 20 30 40 50 Data Rate (Packets/Second) PMR=0 PMR=1 PMR=2 PMR=3 Figure 5: Standard Deviation of Delay 187 PMR=4 PMR=5 As the transmission speed for different medical monitoring task varies widely, the simulated date rate is changed from 10 packets per minute to 50 packets per second. CBR (Constant Bit Rate) is used for data transmission. The effective payload length is 4 bytes. For each transmission speed, the PMR value is changed from 0 to 5 and the data transmission time is 1000 seconds. The mean and standard deviation of transmission delay are calculated for each simulation as shown in Figure 4 and Figure 5. The results show that PMR=0 gives the minimum mean and standard deviation of delay amongst all PMR settings for all data rates. The mean and standard deviation of delay increase when the PMR value grows. However, there is a performance gap between PMR=2 and PMR>=3. For PMR<=2, the performance difference of different PMR values is not significant. 5 Conclusion & Future Work This paper proposes a multi-homed medical body sensor network framework to increase data reliability and studies handover strategies in multi-homed environments. It puts forward a two-level network selection strategy. The Network Handover Management module controls local access point selection. The Path Handover Management module controls end-to-end path selection. Through SCTP simulations, the results show that smaller path failure detection can achieve lower transmission delay for various data rates in the case of path failures. Future work is to study path handover strategies for multi-homed medical body sensor networks on more complex environments. Wireless transmission distance, wireless access points deployment, patient walking speed and network loss will be considered in the work. Acknowledgements The authors wish to recognize the assistance of Enterprise Ireland through its Innovation Partnership fund in the financing of this Research programme. References Guangzhong Yang (2006). Body Sensor Networks. Springer, ISBN: 978-1-84628-272-0. Victor Shnayder, Bor-rong Chen, Konrad Lorincz, Thaddeus R. F. Fulford-Jones, and Matt Welsh (2005). Sensor Networks for Medical Care. Harvard University Technical Report TR-0805. [3] ROSS P.E. (2004). Managing Care through the Air. IEEE Spectrum, 14-19. [4] PENTLAND A. (2004). Healthwear: Medical Technology Becomes Wearable. IEEE Computer, 37(5): 42-49. [5] B Lo, S Thiemjarus, R King, G Yang (2005). BODY SENSOR NETWORK – A WIRELESS SENSOR PLATFORM FOR PERVASIVE HEALTHCARE MONITORING. The 3rd International Conference on Pervasive Computing. [6] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, V. Paxson (2000). Stream Control Transmission Protocol. IETF RFC 2960. [7] E. Kohler, M. Handley, S. Floyd (2006). Datagram Congestion Control Protocol (DCCP). IETF RFC 4340. [8] Caro, A., Amer, P., Stewart R (2006). Rethinking End-to-End Failover with Transport Layer Multihoming. Annals of Telecommunications’, 61, (1-2), pp 92-114. [9] Shaojian Fu and Mohammed Atiquzzaman (2004). SCTP: State of the art in Research, Products, and Technical Challenges. IEEE Communications Magazine, vol. 42, no. 4, pp. 64-76. [10] Randall R. Stewart, Qiaobing Xie (2006). Stream Control Transmission Protocol (SCTP) – A Reference Guide. Addison-wesley. [11] R. Stewart, M. Ramalho, Q. Xie, M. Tuexen, P. Conrad (2004). Stream Control Transmission Protocol (SCTP) Partial Reliability Extension. IETF RFC 3758. [1] [2] 188 [12] IEEE 802.15.1 (2005). Wireless medium access control (MAC) and physical layer (PHY) specifications for wireless personal area networks (WPANs). [13] Caro, A., and Iyengar, J.. ns-2 SCTP module, Version 3.5. http://www.armandocaro.net/software/ns2sctp/. [14] UC Berkeley, LBL, USC/ISI, and Xerox Parc (2005). ns-2 documentation and software, Version 2.29. http://www.isi.edu/nsnam/ns. 189 Session 6 Doctoral Symposium 191 Hierarchical Policy–Based Autonomic Replication Cormac J. Doherty 1 , Neil J. Hurley 1 1 School of Computer Science & Informatics University College Dublin {cormac.doherty, neil.hurley}@ucd.ie Abstract The complexity of managing and accessing large volumes of data is fast becoming the most pertinent problem for users of large scale information systems. However, current trends in data management requirements and data production exceed the capability of storage systems in existence. The current state of a solution to this problem is presented in the form of a system for policy–based autonomic replication of data. The system supports multiple distinct replication schemes for a single data item in order to exploit the range of consistency and quality of service requirements of clients. Based on traffic mix and client requirements, nodes in the system may make independent, integrated replica management decisions based on a partial view of the network. A policy based control mechanism is used to administer, manage, and control dynamic replication and access to resources. Keywords: Distributed systems, Data management, Replication, Autonomic 1 Introduction As exemplified by the notions of ubiquitous computing and personal area networks, technology is penetrating and permeating everyday life to an ever-increasing degree. With this acceptance of technology come applications demanding access to data and services from any geographical location (e-banking, news, video–on–demand, OS updates, games etc.). As demonstrated by collaborative work environments, researchers sharing datasets across institutional and national boundaries, and remote access to corporate VLANs, this demand for data persists in the workplace. Provisioning timely and reliable access to this data may be viewed in the abstract as a data or replica management problem. Replication may be used to increase scalability and robustness of client applications by creating copies throughout the system such that they can be efficiently accessed by clients. The data management issues faced by applications and technologies are mirrored in the systems and networks that support them. We now consider as a concrete example, a telecommunications network. 1.1 Motivation As 3G mobile networks are deployed, and pervasive, highly heterogenous 4G networks are developed, a scalability crisis looms in the current network operations and maintenance (OAM) infrastructure. Due to the trend towards ubiquitous computing environments, customers of future networks are expected to use several separate devices, move between locations, networks and network types, and access a variety of services and content from a multitude of service providers. In order to support this multiplication of devices, locations, content, services and obligatory inter–network cooperation, there will be an increase in the scale, complexity and heterogeneity of the underlying access and core networks. Furthermore, based in part on the 2G to 3G experience, an explosive growth in the number of network elements (NEs) to be managed is predicted. Each additional NE, type of NE, and inter-working function between 192 different access network technologies, adds to the volume of management data that must be collected, queried, sorted, stored and manipulated by OAM systems. Moreover, as a result of this “always online” lifestyle and the increased size and complexity of networks, there will be an increase in management and service related data by several orders of magnitude. As exemplified by the OSI reference model, the Simple Network Management Protocol (SNMP) management framework, and the Telecommunications Management Network (TMN) management framework, network management (NM) has thrived on either centralised or weakly distributed agent-manager solutions since the early 1990s [Martin-Flatin et al., 1999]. However, the increase in size, management complexity, and service requirements of future networks will present challenging non-functional requirements that must be addressed in order to deliver scalable OAM data management sub-systems. More distributed architectures for next generation OSS platforms are one approach to providing scalable, flexible and robust solutions to the demands presented by future networks [Burgess and Canright, 2003]. 2 Distributed Data Layer As an enabling technology for these distributed NM systems, a distributed data layer to manage replication and data access has been developed [Doherty and Hurley, 2006, Doherty and Hurley, 2007]. As many of the challenges posed by future networks are data management challenges, an element of distributed control and autonomy is added to manage the replication life-cycle of data items. The degree to which the advantages of replication are experienced is dependent upon access patterns, traffic mix, the current state of the network and the applied replication schemes. Previous work has indicated that replication schemes impact significantly on performance of distributed systems in terms of both throughput and response times [Hurley et al., 2005]. Indeed, a bad replication scheme can negatively impact performance and as such, may be worse than no replication at all. A fundamental observation motivating this work is the fact that the access pattern perceived by a data item is the product of an entire population of clients. This observation is not exploited in most replication systems. That is, a system applying replication treats the arrival stream to a data item as though it were generated by a single client. The system then attempts to generate a “one size fits all” replication scheme to suit this client. As such, the range of consistency and quality of service requirements of all clients contributing to an arrival stream is not taken into account when developing replication schemes. In order to account for and exploit the various classes of client that contribute to the arrival stream experienced by a data item, multiple distinct replication schemes are simultaneously applied to a single data item so as to best satisfy the requirements of all classes of client. To provide this additional feature of dynamic replication, policies are introduced to the system that must be enforced by all nodes. 2.1 Policy Based Replication In order to account for node heterogeneity and control resources available to the distributed data layer, the role a particular node plays in the network is controlled using a policy. Node policies are defined by an administrator and specify how a particular node can be used in terms of network, storage and processing resources. Node policies are used in determining which data items can be replicated on a specific node. Two data centric policies are used to control replication. A data item policy specifies upper and lower bounds on consistency related parameters and performance metrics that must be maintained by any replica of the data item to which the policy refers. Associated with an instance of a logical data item is a replica policy indirectly describing the level of consistency maintained by that replica and request related performance metrics its host is prepared to maintain; replica policies are bound by data item policies. A replica policy defines a particular point in the parameter space defined by a data item policy. 2.2 Distributed Control Replication affords the possibility of increased ‘performance’ and robustness of client applications as well as a degree of failure transparency. The appropriate measure of ‘performance’ is subjective with 193 respect to the system; it may relate to system–wide characteristics such as response time, throughput or utilisation of nodes in a distributed database management scenario; or in a grid environment, replication may be motivated by the high likelihood of node failure and quantified by availability. Our work specialises to the first scenario. We are interested in maintaining a desired level of performance, measured in terms of throughput or response time, as well as data consistency, under changing workload conditions. Traditional, centralised, approaches attempt to optimise some system wide measure of performance such as throughput or response time using a centralised controller with complete knowledge of system demands and resources. Such centralised control is impractical; firstly, due to lack of flexibility and issues pertaining to failure transparency, reliability and availability. Secondly, due to the difficulty in deciding upon a set of performance metrics to be optimised that will satisfy the QoS requirements of the various classes of user in an inherently heterogenous environment. Finally, the immense computational costs involved in provisioning centralised control represents an inescapable performance bottleneck. Distribution or decentralisation of control and responsibility allows for independent and autonomous components and yields partial solutions to the inadequacies of centralised control. In a decentralised system, resources can not only be located where they will be most effectively utilised, but can also be relocated, added and upgraded independently and incrementally in order to accommodate increasing demands, growth or changes in system infrastructure. This flexibility also facilitates a more scalable system. Furthermore, the relative independence and autonomy of components also affords a degree of fault tolerance. Whereas component failure in a centralised system can result in total system outage, a similar failure in a decentralised distributed system is typically limited to that component and results in limited service degradation for a limited group of users. Though based on several simplifying assumptions, a preliminary investigation using a discrete event simulator has demonstrated the potential applicability of feedback control to replica management. In response to a changing workload, nodes reconfigure replication schemes using feedback control so as to maintain a particular response time. The current focus of research centres on maintaining the performance of the simulated controller whilst removing all simplifying assumptions. The approach being taken is to use a set of algorithms to control different aspects of replication and feedback control to manage the frequency with which the algorithms are run and setting of algorithm parameters. 3 Related Work This work is primarily concerned with replication in a large scale, distributed, dynamic environment. Existing work may be categorised according to features of this environment. As a peer–to–peer system grows, so too do its resources, including bandwidth, storage space, and compute power. When combined with replication, this scalability and the inherently distributed environment yields a degree of failure transparency. Furthermore, when structured, a peer–to–peer network or Distributed Hash Table (DHT) not only offers a guarantee of an efficient route to every data item, but continues to do so in the face of changes in network topology. Though seemingly well suited to the milieu, systems built on top of DHTs (PAST [Druschel and Rowstron, 2001], CFS [Dabek et al., 2001]) are typically constrained by the decentralisation integral to peer–to–peer systems and do not maintain consistency. That is, data is read–only and is essentially cached as a means improving data availability and fault tolerance. As such, many of the more difficult issues relating to replication are ignored. Systems such as Ivy [Muthitacharoen et al., 2002] accept updates but offer only relaxed consistency guarantees. In direct contrast to the scalability and restrictive consistency guarantees of peer–to–peer systems, there exists a range of more centralised alternatives providing a wider range of consistency guarantees (Bayou [Demers et al., 1994], fluid replication [Noble et al., 1999], TACT [Yu, 2000]). The global information necessary for these systems, restricts scalability and applicability to a dynamic environment due to the possibility of frequent changes. Though these systems offer a multitude of consistency semantics across date items, none offer a range of consistency guarantees for a single logical data item (see Section 2.1). 194 4 Conclusion Modelling work [Hurley et al., 2005] validated against performance metrics taken from live networks and test sites has fed into the design, development, and implementation of a flexible framework for replication. Within this framework a system supporting multiple distinct replication schemes for a single data item has been developed. This system allows the exploitation of the range of consistency and quality of service requirements of clients in a distributed environment and demonstrably improves upon performance when compared to metrics taken from live networks and test sites [Doherty and Hurley, 2006]. Further development and refinement of autonomic control mechanisms and integration into the system will facilitate validation of hierarchical policy–based autonomic replication. References [Burgess and Canright, 2003] Burgess, M. and Canright, G. (2003). Scalability of Peer Configuration Management in Partially Reliable and Ad Hoc Networks. In Proceedings of the 8th IFIP/IEEE International Symposium on Integrated Network Management, pages 293–305. Kluwer. [Dabek et al., 2001] Dabek, F., Kaashoek, F., Karger, D., Morris, R., and Stoica, I. (2001). Wide-area cooperative storage with CFS. In SOSP ’01’: Proceedings of the 18th ACM Symposium on Operating System Principles, pages 202–215, New York, NY, USA. ACM Press. [Demers et al., 1994] Demers, A., Petersen, K., Spreitzer, M., Terry, D., Theimer, M., and Welch, B. (1994). The Bayou Architecture: Support for Data Sharing among Mobile Users. In Proceedings of the IEEE Workshop on Mobile Computing Systems & Applications, pages 2–7, Santa Cruz, CA, USA. [Doherty and Hurley, 2006] Doherty, C. and Hurley, N. (2006). Policy–Based Autonomic Replication for Next Generation Network Management Systems. In Proceedings 1st Annual Workshop on Distributed Autonomous Network Management Systems, Dublin, Ireland. [Doherty and Hurley, 2007] Doherty, C. and Hurley, N. (2007). Hierarchical Policy–Based Replication. In Proceedings of the 26th IEEE International Performance, Computing and Communication Systems, pages 254–263, New Orleans, LA, USA. IEEE Computer Society. [Druschel and Rowstron, 2001] Druschel, P. and Rowstron, A. (2001). PAST: A large-scale, persistent peer-to-peer storage utility. In HOTOS ’01: Proceedings of the 8th Workshop on Hot Topics in Operating Systems, pages 75–80, Washington, DC, USA. IEEE Computer Society. [Hurley et al., 2005] Hurley, N., Doherty, C., and Brennan, R. (2005). Modelling Distributed Data Access for a Grid-Based Network Management System. In Proceedings of the 13th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 315–318, Washington, DC, USA. IEEE Computer Society. [Martin-Flatin et al., 1999] Martin-Flatin, J.-P., Znaty, S., and Hubaux, J.-P. (1999). A Survey of Distributed Enterprise Network and Systems Management Paradigms. Journal of Network and Systems Management, 7(1):9–26. [Muthitacharoen et al., 2002] Muthitacharoen, A., Morris, R., Gil, T., and Chen, B. (2002). Ivy: A Read/Write Peer–to–Peer File System. volume 36, pages 31–44, New York, NY, USA. ACM Press. [Noble et al., 1999] Noble, B., Fleis, B., and Kim, M. (1999). A Case for Fluid Replication. In Netstore ’99: Network Storage Symposium, Internet2. [Yu, 2000] Yu, H. (2000). TACT: Tunable Availability and Consistency Tradeoffs for Replicated Internet Services. ACM SIGOPS Operating Systems Review, 34(2):40. 195 Sensemaking for Topic Comprehension Brendan Ryder, Dept. of Computing and Mathematics, Dundalk Institute of Technology, Dublin Road, Dundalk, Co. Louth, Ireland [email protected] Terry Anderson School of Computing and Mathematics, University of Ulster, Newtownabbey, Co. Antrim, BT37 0QB, Northern Ireland. [email protected] Abstract Users spend a considerable amount of time engaged in sensemaking, the process of searching for structure in an unstructured situation and then populating that structure with information resources relevant to a task. In other words, searching for representations and encoding information in those representations. It is the gradual evolution of enquiry through our repeated interaction with information. This work is examining how representation construction can be supported in the sensemaking process, and specifically, how structural elements in resources can be exploited and used to tag information resources at a fine-level of granularity. This paper surveys the literature related to sensemaking, outlines the requirements and enabling technology selection for a prototype sensemaking tool called coalesce and discusses proposed evaluation strategies. Keywords: Sensemaking, Personal Information Management, Tagging. 1 Introduction Finding relevant and up to date information is an essential task performed by all users in many problem domains on a daily basis. Users spend considerable amounts of time and effort manually identifying, evaluating, organising, producing and sharing digital resources. This can be referred to as “information triage” [39], the process of sorting through relevant materials and organising them to meet the needs of the task at hand, normally time-constrained requiring quick assessment based on insufficient knowledge. The task of finding and organizing has been compounded because there is too much digital information available on the Internet. Society is suffering from “information overload”, or “data smog” [18], a concept originally explored by Bush in his seminal paper “As We May Think” [10]. To compound the problem we also have to contend with information fragmentation [33], where information that is required to complete a particular task is “fragmented” by physical location and device. More people than ever before face the problems of identifying relevant and high quality information that meet their information needs. Once the relevant information is found they need better ways to organize and manage this information for their own use and for sharing with others in a collaborative context. 2 Related Work The following section surveys related work in sensemaking, information organization and categorization. 196 2.1 Sensemaking The tight integration between the tasks of finding and organising information has lead to the establishment of a research area called sensemaking. Sensemaking is the cycle of pursuing, discovering, and assimilating information during which we change our conceptualization of a problem and our search strategies [51]. It is the gradual evolution of an inquiry through our repeated interaction with information. This interaction can serve as an organizing structure for personally meaningful information geographies [2]. Arriving at the output is ill-defined, iterative and complex. Information retrieval, organisation and task-definition all interact in subtle ways [49]. This multifaceted activity is also referred to as exploratory search [59]. The synergy and tight coupling of these behaviours result in the creation of sense, that is, the process of sensemaking. Russell [51] pioneered the work on the concept of sensemaking and developed a user model of sensemaking activity, derived from observations of how a group of Xerox employees made sense of laser printers. Russell found that people make sense of information about a specific topic using a pattern he referred to as the “learning loop complex”. The learning loop has four main processes: search for representations, instantiate representations, shift representations and consume encodons. Representations are essentially a collection of concepts related to the task at hand (i.e. the organisational structure of the information). Instantiating representations involve populating the created structure with relevant information. The shift representation refers to the iterative amendments that are made to the original structure during the ongoing search process. The original structure can be merged, split and new categories added. The final structure is a schema that provides goal-directed guidance and determines what to look for in the data, what questions to ask, and how the answers can be organised. Additional work carried out on sensemaking presented at CHI2005 suggests important revisions to Russell’s model and theory [49]. Russell’s model separates the encoding activity, populating the structure, from the representation search activity, finding a structure to aid sensemaking. Qu [49] suggests a change to the relationship between these two activities. Representational development is much more tightly integrated with the encoding process. Information suggesting representational structure comes from some of the same sources as the information content. Sensemakers are not just getting “bags of facts”, but organised ideas. These socially constructed knowledge resources can be exploited in the representational search activity of sensemaking. Existing work examining sensemaking include the Universal Labeler [29],[30], the Scholarly Ontologies Project (with ClaiMapper, ClaiMaker and ClaimFinder) [55], Compendium [3], NoteCards [28], gBIS [15], ART [43] and Sensemaker[4]. Complimentary work on the conceptual model of sensemaking, called the Data/Frame theory, has also been discussed by Klein, et al. [60]. 2.3 Information Organisation A central activity in information management is the grouping of related items, and as a result is at the heart of the sensemaking process. This can be examined from two perspectives: interface and interaction design and associative technologies, that is, underlying data models for association. Interface and interaction design is an important consideration in any application. Many studies have been conducted to examine how we interact with information resources and information management applications have provided various approaches to organising concepts and their associated content. Concepts can be managed in text form [52], using a hierarchy [29] or graphically [50], using concept maps [12], or mind-maps [43] representations. Content is associated in containers in research tools such as Nigara [57]. Commercial tools such as Google Notebook [25], Clipmarks [14] and Net Snippets [44] provide similar functionality. There is also a consensus in the literature that search and organisation need to be combined in a unified interface [45] and [16]. Users find information in the same way as our ancestors found prey, by “foraging” it, navigating from page to page along hyperlinks [47]. ScentTrails [45] is a novel approach that applies this theory. This concurs with a study conducted by Teevan [54] that found users prefer to find information by “orienteering”. They often begin with a 197 known object and then take repeated navigation steps to related information about that object, eventually arriving at the information that is required. In terms of aggregating heterogeneous information resources the associative technology can be used as an information or semantic layer, a form of metadata that resides logically on top of existing resources. This metadata layer can be created manually [31] and [32], or automatically [27] and [11]. This improves both the management and discovery of the information that is stored. It can also be used to automatically extract or recommend related resources subsequent to the initial sensemaking process [38], a process called topic tracking [22]. For our work we are only concerned with the manual creation of the metadata layer. The two most significant technologies that have been used in this regard are WC3’s RDF (Resource Description Framework) and ISO’s Topic Map standards. RDF is a fundamental technology at the heart of what is called the Semantic Web [6]. Metadata is encoded in RDF and this representation is then machine-processible. With RDF you can say anything about anything. Haystack [34] and E-person [1] both use RDF for creating associations at different levels. Annotea [32] utilizes RDF for the creation and management of annotations. A topic map [46] can represent information using topics (representing any concept, from people, individual files, events, and information resources), associations (which represent the relationships between them), and occurrences (which represent relationships between topics and information resources relevant to them). DeepaMehta [50] employs topic maps as its underlying associative data model 2.4 Categorization During this sensemaking process we label and categorise information resources. The labels that are employed can be viewed as metadata [24]. The associative data model can be viewed as a metadata layer and the labels form an integral part of that layer, a form a metadata within the metadata layer. There are a number of mechanisms or approaches that can be used to categorise content and they can be viewed along a continuum from formal subject-based classification (controlled vocabularies, taxonomies, thesauri, faceted classification and ontologies) to informal folksonomies or tagging systems to hybrid classification. A folksonomy is an Internet-based information retrieval methodology consisting of collaboratively generated, open-ended labels that categorise web resources. The labels are commonly known as tags and the labelling process is called tagging or social bookmarking. Tagging systems have proved to be very effective in a collaborative context and it dramatically lowers the content categorization costs because there is no complicated, hierarchically organised nomenclature to learn. Systems like Dogear [41] have found that social bookmarking can also be useful in the enterprise. Tagging has also been applied in a general tagging role in Piggybank [27], [16] and [9]. 3 Contribution The broad aims of this research project are as follows: • • To achieve a better theoretical understanding of sensemaking resulting in the establishment of a model of how information seeking and sensemaking representation construction interact. This involves examining the synergies between searching and organisation cognitive models. To prototype novel designs to support the user engaged in the sensemaking activity and evaluate them in the context of real-world workspaces. More specifically, this work builds on existing sensemaking work by [51], [49] and [29], information gathering work by Hunter-Gatherer [52] and DeepaMehta [50] and tagging concepts applied in systems like Dogear [41], Phlat [16] and Piggybank [27]. This work will examine how representation construction can be supported and will study how structural elements in resources can be exploited and used to tag information resources at a fine-level of granularity, thus assisting the user with the sensemaking process. 198 4 Proof-of-Concept Prototype Table 1 provides an overview of the requirements and associated technologies that are being used to create the coalesce (from word meaning associate, combine, consolidate) proof-of-concept prototype to assist with sensemaking. Iteration #1 of prototype is currently being designed and developed with the effort focusing on interface and interaction design and associative technology implementation. Figure 1 provides and overview of the current interface, with each of the major elements numbered as appropriate. Area 1 shows suggested topics that are extracted from the web document that is currently being browsed. This is where the structural elements, organised bags of facts, are presented. Area 2 shows the consolidated sensemaking concepts and their relationship to one another. This will be determined as the user interacts with resources and iteratively gains a better understanding of the selected subject area. Area 3 is the information resource that is currently being viewed. The user has the option of selecting and tagging page elements at various levels of granularity. Finally Area 4 illustrates a collection of snippets from the various resources that are browsed. The combination of structuring and tagging will allow the user to create a “sensemap”, their cognitive understanding of the association between the concepts as they search. Requirement Sensemaking Interface Search component Proposed Solution Rich Internet application using AJAX (Google Web Toolkit). Search API’s – Google (web carnivore [35]). Associative component Topic map – facilitates sharing and reuse; Tagging Production component XML, XML Schema, XSL(T). Table 1. Prototype Technologies 5 Evaluation Kelly [36], Zelkowitz [56] and Marvin [40] provide extensive discussion on evaluation techniques for validating technology. Kelly, in relation to PIM (personal information management) applications, maintains that using one-size-fits-all evaluation methods and tools is likely to be a less than ideal strategy for studying something as seemingly idiosyncratic as PIM. According to Kelly, people should be observed in their natural environments, at home or at work as they engage in PIM behavior in realtime, recording both the process and the consequences of the behaviour. Laboratory studies of behaviours and tools should be leveraged to understand more about general PIM behavior. So, an iterative combination of observations of users in their natural environments and also in controlled laboratory sessions will enable the understanding of PIM behavior. These observations can then be used to inform the design and development of prototypes and tools to support the user engaged in PIM. She identifies the need for the development of new evaluation methods that will “produce valid, generalisable, sharing knowledge about how users go about PIM activities” [36]. To this end, Elsweiler [61] has developed a task-based evaluation methodology that can be used for PIM evaluations and it is this evaluation methodology that we propose using to evaluate the coalesce prototype. We propose conducting iterative evaluations with the prototype between September 2007 and April 2008 with a representative group of users. It is intended to use Google Notebook as a benchmark and perform a comparative analysis with it and the coalesce prototype, thus highlighting the benefits that prototype brings to the sensemaking process. The aim will be to determine if structural elements within resources can be used to aid the sensemaking process. 199 1 3 4 2 Figure 1. Iteration #1 Sensemaking Interface and Interaction References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] The ePerson Snippet Manager:a Semantic Web Application. <http://www.hpl.hp.com/techreports/2002/HPL-2002-328.pdf >. Bauer, D. (2002). Personal Information Geographies. Proc. of CHI2002, ACM Press, 538-539. Compendium Institute. <http://www.compendiuminstitute.org/> (July 2007). Baldonado, M.Q.W. and Winograd, T. (1997). SenseMaker: an information-exploration interface supporting the contextual evolution of a user's interests. Proc of SIGCHI ’97, ACM Press, 11-18. Building an Integrated Ontology within the SEWASIE Project. <http://www.dbgroup.unimo.it/prototipo/paper/demo-iscw2003.pdf> (July 2007). Berners-Lee, T., Hendler, J. and Lassila, O. (2001) The Semantic Web, Scientific America, 284, 5, 34-43. Describing and retrieving photos using RDF and HTTP. <http://www.w3.org/TR/photo-rdf/> (July 2007). Brause, R.W. and Ueberall, M. (2003). Internet-Based Intelligent Information Processing Systems (Adaptive Content Mapping for Internet Navigation), World Scientific, Singapore. Buffa, M. and Gandon, F. (2006). SweetWiki: Semantic Web Enabled Technoloies in Wiki. Proc. of WikiSym ’06, ACM Press (2006), 69-78. As We May Think, Atlantic Monthly. <http://www.idemployee.id.tue.nl/g.w.m.rauterberg/lecturenotes/bush-1945.pdf> (July 2007). Cai, Y., Dong, X.L., Halevy, A., Liu, J.M.and Madhavan, J. (2005). Personal information management with SEMEX. Proc. of SIGMOD 2005, ACM Press (2005), 921-923. Mining the Web to Suggest Concepts during Concept Map Construction. <http://cmc.ihmc.us/papers/cmc2004-284.pdf> (July 2007). Chirita, P.A., Gavriloaie, R., Ghita, S., Nejdl, W. and Paiu, R. (2005). Activity Based Metadata for Semantic Desktop Search. Proc of ESWC05, Springer Lecture Notes in Computer Science, 439-454. Clipmarks: Bite-size highlights of the web <http://www.clipmarks.com> (July 2007). 200 [15] Conklin, J., Selvin, A., Buckingham Shum, S. and Sierhuis, M. (2001)Facilitated Hypertext for Collective Sensemaking: 15 Years on from gIBIS. Proc. of Hypertext ’01, ACM Press , 123124. [16] Cutrell, E., Robbins, D., Dumais, S. and Sarin, R. (2006). Fast, Flexible Filtering with Phlat – Personal Search and Organisation Made Easy. Proc of CHI 2006, ACM Press, 261-270. [17] Davies, J. and Weeks, R. (2004). QuizRDF: Search Technology for the Semantic Web. Proc of HICSS ‘04, IEEE. [18] Denning, P.J. (2006). Infoglut. Communications of the ACM, 49, 7, 15-19. [19] Ding, L., Finin, T, Joshi, A., Pan, R., Scott Cost, R., Peng, Y, Reddivari, P.,Doshi, V. and Sachs, J. (2004). Swoogle: a search and metadata engine for the semantic web. Proc. of CIKM ‘04, ACM Press, 652-659. [20] Domingue, J. and Dzbor, M. (2004). Magpie: Supporting Browsing and Navigation on the Semantic Web, Proc of IUI ‘04, ACM Press (2004), 191-197. [21] Evans, M.P., Newman, R., Putnam, T. and Griffiths, D.J.M. (2005). Search Adaptations and the Challenges of the Web, IEEE Internet Computing, 9, 3, 19-26. [22] Fan, W., Wallace, L., Rich, S. and Zhang, Z. (2006) Tapping the Power of Text Mining. Communications of the ACM, 49, 9 (2006), 77-82. [23] Ferragina, P. and Gulli, A. (2005). A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. Proc. of WWW2005, ACM Press, 801-810. [24] “Metadata? Thesauri? Taxonomies? Topic Maps!”. <http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html> (July 2007) [25] Google Notebook. <http://www.google.com/notebook> (July 2007). [26] Grigoris, A and Van Harmelen, F. (2004). A Semantic Web Primer, MIT Press, Cambridge, USA. [27] Huynh, D., Mazzocchi, S. and Karger, D. (2005). Piggy Bank: Experience The Semantic Web Within Your Web Browser, Proc. of ISWC 2005, Springer Lecture Notes in Computer Science, 413-430. [28] Halasz, F.G., Moran, T.P. and Trigg, R.H. (1986). Notecards in a nutshell. Proc. of SIGCHI/GI 1986, ACM Press, 45-52. [29] Jones, W., Munat, C., and Bruce, H. (2005). The Universal Labeler: Plan the Project and Let Your Information Follow, Proc. of ASIST 2005, 42. [30] Jones, W., Phuwanartnurak, A.J., Gill, R. and Bruce, H. (2005). Don’t Take My Folders Away! Organising Personal Information to Get Things Done. Proc. of CHI2005, ACM Press, 15051508. [31] SMORE – Semantic Markup, Ontology, and RDF Editor. <http://www.mindswap.org/papers/SMORE.pdf> (July 2007). [32] Kahan, J and Koivunen, M. (2001). Annotea: An Open RDF Infrastructure for Shared Web Annotations. Proc. of WWW10, ACM Press (2001), 623-632. [33] Karger, D.R. and Jones, W. Data Unification in Personal Information Management, Communications of the ACM, 49, 1 (2006), 77-82. [34] Karger, D.R. and and Quan, D. (2004). Haystack: A User Interface for Creating, Browsing, and Organizing Arbitrary Semistructured Information, Proc. of CHI2004, ACM Press, 777-778. [35] Kraft, R. and Stata, R. (2003). Finding Buying Guides with a Web Carnivore, Proc. of the First Conference on Latin American Web Congress, IEEE Computer Society, 84-92. [36] Kelly, D. (2006). Evaluating Personal Information Management Behaviors and Tools, Communications of the ACM, 49, 1, 84-86. [37] Googling from a Concept Map: Towards Automatic Concept-Map-Based Query Formulation. <http://cmc.ihmc.us/papers/cmc2004-225.pdf> (July 2007). [38] Martin, I and Jose, J.M. (2003). A Personalised Information Retrieval Tool. Proc. of SIGIR’03, ACM Press, 423-424. [39] Marshall, C. and Shipman, F. (1997). Spatial hypertext and the practice of information triage. Proc. of Hypertext ’97, ACM Press, 124-133. [40] Zelkowitz, M.V., Wallace, D.R. and Binkley, D.W. (2003) Experimental validation of new software technology, Software Engineering and Knowledge Engineering (Lecture notes on empirical software engineering), World Scientific Publishing Co, 229-263. 201 [41] Millen, D.R., Feinberg, J. and Kerr, B. (2006). Dogear: Social Bookmarking in the Enterprise. Proc. of CHI2006, ACM Press, 111-120. [42] Mind Maps. <http://en.wikipedia.org/wiki/Mind_map> (July 2007). [43] Nakakoji, K., Yamamoto, Y., Takada, S. and Reeves, B.N. (2000). Two-dimensional spatial positioning as a means for reflection in design. Proc. of Conference on Designing interactive systems: processes, practices, methods, and techniques (DIS ’00), ACM Press, 145-154. [44] Net Snippets. <http://www.netsnippets.com> (July 2007). [45] Nakakoji, K., Yamamoto, Y., Takada, S. and Reeves, B.N. (2003). ScentTrails: Integrating Browsing and Searching on the Web, ACM Transactions on Computer-Human Interaction (TOCHI), 10, 3, 177-197. [46] The TAO of Topic Maps; finding the way in the age of Infoglut. <http://www.idealliance.org/papers/dx_xmle03/papers/02-00-04/02-00-04.pdf> (July 2007). [47] Information Foraging. <http://www2.parc.com/istl/groups/uir/publications/items/UIR-1999-05Pirolli-Report-InfoForaging.pdf> (July 2007). [48] Preece, J.,Rogers, Y., and Sharp, H. (2007) Interaction Design: Beyond Human-Computer Interaction (2nd Edition), Wiley Publishing, USA. [49] Qu, Y. and Furnas, W. (2005). Sources of Structure in Sensemaking. Proc. of CHI2005, ACM Press, 1989-1992. [50] DeepaMehta-A Semantic Desktop. <http://www.deepamehta.de/ISWC-2005/deepamehta-paperiswc2005.pdf> (July 2007). [51] Russell, D, Stefik, M., Pirolli, P., and Card, S. (1993). The Cost Structure of Sensemaking. Proc of InterCHI ‘93, ACM Press, 269-276. [52] Schraefel, M.C., Zhu, Y., Modjeska, D., Wigdor, D. and Zhao, S. (2002). Hunter Gatherer: Interaction Support for the Creation and Management of Within-Web-Page Collections. Proc. of WWW2002, ACM Press, 172-181. [63] Selvin, A.M. and Buckingham Shum, S.J. (2005). Hypermedia as a Productivity Tool for Doctoral Research. New Review of Hypermedia and Multimedia, 11, 1, 91-101. [54] Teevan, J., Alvarado, C., Ackerman, M.S. and Karger, D.R. (2004) The perfect search engine is not enough: A study of orienteering behaviour in directed search, Proc. of CHI2004, ACM Press, 415-422. [55] Uren, V., Buckingham-Shum, S., Bachler, M. and Li, G. (2006). Sensemaking Tools for Understanding Research Literatures: Design, Implementation and User Evaluation. Int. Journal Human Computer Studies, 64, 5, 420-445. [56] Zelkowitz, M.V. and Wallace, D.R. (1998). Experimental Models for Validating Technology, IEEE Computer, 31, 5, 23-31. [57] Zellweger, P.T., Mackinlay, J.D., Good, L., Stefik, M. and Baudisch, P. (2003). City Lights: Contextual Views in Minimal Space. Proc. of CHI2003, ACM Press, 838-839. [58] OntoSearch: An Ontology Search Engine. <http://www.csd.abdn.ac.uk/~yzhang/AI-2004.pdf> (July 2007). [59] Klein, G. et al. (2006). Making Sense of Sensemaking 2: A Macrocognitive Model, IEEE Intelligent Systems, 21, 5, 1541-1672. [60] Marchionini, G. (2006). Exploratory search: from finding to understanding, Communications of the ACM, 49, 4, 41-46. [61] Elsweiler, D. and Ruthven, I. (2007). Towards task-based personal information management evaluations. Proc. of SIGIR ‘07, ACM Press, 23-30. 202 A Pedagogical-based Framework for the Delivery of Educational Material to Ubiquitous Devices Authors: C O. Nualláin*, Dr S Redfern 'HSDUWPHQWRI,QIRUPDWLRQ7HFKQRORJ\1DWLRQDO8QLYHUVLW\RI,UHODQG*DOZD\ ,UHODQG &DRLPKLQRQXDOODLQ#QXLJDOZD\LH 6DP5HGIHUQ#1XLJDOZD\LH What we are addressing in this paper is the historical failure to deliver good flexible e-learning based on pedagogically sound, intuitive profile based learning systems with assessment and reporting tools and which, based on the profile, adapt to the context and style of the user. The main issues here are the ability of users to get what they asked for in terms of a learning environment and with supporting research and data to illustrate that the system works and would be of value, if deployed, to augment lectures and tutorials. Currently from data collected in questionnaires we can say that how students are learning is changing, they need to be challenged and not base their learning on memorising. The country has a need to deliver home grown skilled graduates in the areas of Engineering and Information Technology and we should not cut corners in getting there. With that in mind it has been acknowledged by government bodies like Forfas that we must get the students involved in technology and get them experimenting with material at an earlier age in the area of electronics, and computer programming. Ultimately we need to build up students’ curiosity into uses of technology and fortify that with problem solving skills. Additional to this there are new advances in technology and software, like that of Web 2.0, which are making access to technology easier. The additional use of modalities like web cams, podcasts, live video, chat, audio and SMS messaging is making it easier for students to collaborate and share thoughts and material. This process is helping students who are using the environment perform better. The provision of gaming environments like LEGO, ROBOT WARS, ROBOCODE are additional games which have been very successful in capturing student attention and providing an environment which allows the users to learn several skills while having fun in a team based event. Many have not taken up the challenges and opportunities which the new technologies offer in terms of educational potential and return. This has been a problem throughout the history of e-learning and application of learning technology where technology was misused and misunderstood in its potential to deliver good quality educational material which can be made available to a wide audience easily and preferably seamlessly. By the careful analysis of the application of pedagogy strategies and learning styles we feel part of the current problem and problems of the past 20 years may be overcome and we feel we have turned part of the corner on that problem with our system, which takes on board tried and tested pedagogical practices established by Socrates, Pask, Bandura, Peppard, and Piage to name but a few. These are the forefathers of constructivism and social constructivism. We also have established that it is not just the taking on board of these theories but how they are applied. Most of our strategy is based around 203 collaboration, problem solving, testing, continuous challenge, assessment, feedback, fun, and team building. Problem Description The goal of the research can be outlined or described by the following paragraph Modern life is conducted at an increasingly fast pace. The effective use of time is therefore a must. Through the use of wireless mobile devices, time spent in transit (e.g. in a car, train, plane or bus) need not be wasted. Conversely, it provides a valuable opportunity to study interactive educational material. Such material must be pedagogically sound and results of trials carried out as part of the research here have indicated that it is an effective learning tool. We aim to make it effective in terms of High-Order Learning, Critical Thinking, Problem Solving and through a medium with which problem solving skills may be instilled. The framework takes great care in its research of what effective learning and teaching is and how it can possibly be best achieved with the result being a number of frameworks for optimising same. Much time was also spent researching how to affectively assess technology for its effectiveness and opportunity to be involved in new delivery of curriculum. The aim with this body of research is to strive for active engagement through using well constructed material with good instructional design and the use of new devices and file formats which are coming of age which show promise in delivering a new type of media on smaller devices to a wider audience i.e. life long learners. This takes advantage of the student profile on which it is based to deliver the most appropriate material to the user at that time on what ever device is being used at that time. In itself will allow students to join in dialogue with class members and hence allow students the opportunity to join in discussion which in turn allows them to feel part of the group and very much less isolated – isolation which has in the past resulted in students dropping out as in the case of Open University. This framework allows the student, whatever their learning style, to become immersed in the material and also, through the various support aspects, discuss and ultimately learn new skills based on the material. Currently our schools and colleges are not able to keep up with the changing needs of the audience and the specific learning needs of the audience. By this we mean different learning styles and modes of delivery which suit the users’ context and ultimately learning requirement which we consider to be aided by more direct contact with the curriculum through a micro world like environment. Here users can discuss material with fellow students and tutors while solving problems in an active way which are driven by a moderator. The moderator assesses all levels of student engagement so as to be able to identify whether the student is following or not and in either case what they can do to 204 enhance their learning potential. It also provides us with a way to learn more about the users and log them in a profile. This profile will greatly help identify failings in the material and delivery mechanisms or even assessment methods. To that end we employ several non traditional assessment methods which we use to augment the assessment process like body language, mannerisms, sentence openers and active verbs. We have been able to prove that these indirect assessment methods can be used very effectively, if captured, as indicators to not only if the user is on board but if they are active learners and even high order learners. The proving of this has led to a large number of questionnaires and the capturing of all audio and video, feedback, interview data, punctuality records and attendance records. This took place in several programming competitions and labs over a two and a half year process. The initial competition tried to use Blackboard and then a purpose built e-learning platform developed in-house which did not take off for several reasons, following which we progressed, having learned from the previous exercises, into a more active class room environment which provided the template on which we wanted to build an online environment. The classroom environment we organised, after much adapting and changing of team sizes and interaction rules, worked well and it is this we wanted to use as our template for an online version which makes up the software abstract that was developed as part of the research and has been used to test and prove the suitability of technology in education and the level of engagement and active learning possible with the use of such media and modalities. Through the evaluation of the data we have been able to prove to our satisfaction that collaboration and more importantly collaboration with active mentoring is far more effective than co-operation. We are also able to prove that good feedback is essential and the timeliness of such feedback critical. It is important to be able to learn as much about the users as possible which includes back ground information; ultimately anything that is or can impact on their learning so we can put steps in place to lessen the impact and help the students overcome the problem. A vast amount of data was collected and analysed using several methods, e.g. Chi squared, T Tests, Graphing hard data in pi charts and line charts. The analysis methods were chosen on the basis of the type of data we had as several other methods were examined to analyse our data but were deemed unsuitable. That said the methods selected are probably the most standard methods used in all statistical analysis of this kind. Through the data collected we have made findings which contribute to proving that we have achieved our goals in the areas of assessment, collaboration, teamwork and team sizes, e-moderator, moderators, modalities, motivation and engagement, effective learning, pedagogy, use of technology, gaming as a metaphor for learning , profiling, personalisation and feedback mechanisms. 205