pdf - 7MB - Annual Conference

Transcription

pdf - 7MB - Annual Conference
Digital Convergence in a Knowledge Society
The 7th
Information Technology and
Telecommunication Conference
IT&T 2007
Institute of Technology Blanchardstown
Dublin, Ireland
25-26 October 2007
Gabriel-Miro Muntean
Markus Hofmann
Brian Nolan
(Eds)
IT&T 2007 General Chair’s Letter
As the General Chair of the 2007 Information Technology and Telecommunications
(IT&T) Conference, it gives me great pleasure to introduce you to this year’s
conference. This IT&T Conference - held over the 25th and 26th October 2007 at the
Institute of Technology Blanchardstown, Dublin, Ireland – has as its major focus
‘Digital Convergence in a Knowledge Society” and welcomed papers on various
themes such as wired and wireless networks, next generation web, games and
entertainment, health informatics and security and forensics.
We have collected over 20 papers from academics across Ireland and the UK within
this peer-reviewed book of proceedings. A doctoral consortium session will also be
held as part of the conference involving researchers soon to complete their PhD
studies. These sessions will be preceded by plenary talks given by ICT experts from
Irish academia and industry.
I would like to take this opportunity to thank our sponsors for their generous support.
These include: IBM, Ericsson, the Council of Directors of the Institutes of
Technology, the Institution of Engineering and Technology, IEEE – CC Ireland
Chapter and IRCSET. Special thanks go to the IT&T 2007 Conference organisational
team (consisting of academics from across the University and Institute of Technology
sectors), the Technical Programme Committee, the Technical Chairs and also the
Financial Chair for the marvelous work, high standards and excellent results achieved.
I also extend a very warm welcome all the attendees of this IT&T 2007 conference at
the Institute of Technology Blanchardstown, Dublin.
The conference website is: http://www.ittconference.ie
I wish you a wonderful conference!
Dr. Brian Nolan
General Chair of IT&T 2007
Head of the Department of Informatics
School of Informatics and Engineering
Institute of Technology Blanchardstown
Blanchardstown Road North
Dublin 15, Ireland
iii
iv
Technical Programme Committee Chairs’ Letter
Dear Colleagues,
As Technical Programme Chairs, we would like to welcome you to the Seventh
Information Technology and Telecommunications Conference (IT&T 2007) hosted
by the Blanchardstown Institute of Technology, Dublin, Ireland.
IT&T is an annual international conference which not only publishes research in the
areas of information technologies and telecommunications, but also brings together
researchers, developers and practitioners from the academic and industrial
environments, enabling research interaction and collaboration.
The focus of the seventh IT&T is “Digital Convergence in a Knowledge Society”.
We welcomed research papers with topics in e-learning technologies, Web 2.0 and
next generation web, ubiquitous and distributed computing, adaptive computing,
health informatics, wired and wireless networks, sensor networks, network
management, quality of experience and quality of service, digital signal processing,
speech and language processing, games and entertainment, computer vision, security
and forensics and open source developments.
All submitted papers were peer-reviewed by the Technical Programme Committee
members and we would like to express our sincere gratitude to all of them for their
help in the reviewing process.
After the review process, twenty two papers were accepted and will be presented
during six technical sessions spanning the two days of the conference. A doctoral
consortium session will also be held with researchers who are nearing completion of
their PhDs. These sessions will be preceded by plenary talks given by ICT experts
from Irish academia and industry.
We hope you will have a very interesting and enjoyable conference.
Dr. Gabriel-Miro Muntean, Dublin City University, Ireland
Nick Timmons, Letterkenny Institute of Technology, Ireland
v
vi
IT&T 2007 Chairs and Committees
Conference General Chair
Brian Nolan, Institute of Technology Blanchardstown
Technical Programme Committee Chairs
Gabriel-Miro Muntean, Dublin City University
Nick Timmons, Letterkenny Institute of Technology
Doctoral Symposium Committee Chair
Declan O'Sullivan, Trinity College Dublin
Doctoral Symposium Technical Committee
Brian Nolan, Institute of Technology Blanchardstown
Cristina Hava Muntean, National College of Ireland
Dave Lewis, Trinity College Dublin
John Keeney, Trinity College Dublin
Matt Smith, Institute of Technology Blanchardstown
Patronage & Sponsor Chair
Dave Denieffe, Institute of Technology Carlow
Proceedings Editors
Gabriel-Miro Muntean, Dublin City University
Markus Hofmann, Institute of Technology Blanchardstown
Brian Nolan, Institute of Technology Blanchardstown
Organising Committee
Brian Nolan, Institute of Technology Blanchardstown
Dave Denieffe, Institute of Technology Carlow
David Tracey, Salix
Declan O’Sullivan, Trinity College Dublin
Enda Fallon, Athlone Institute of Technology
Gabriel-Miro Muntean, Dublin City University
Jeanne Stynes, Cork Institute of Technology
John Murphy, University College Dublin
Mairead Murphy, Institute of Technology Blanchardstown
Markus Hofmann, Institute of Technology Blanchardstown
Matt Smith, Institute of Technology Blanchardstown
Nick Timmons, Letterkenny Institute of Technology
vii
Technical Programme Committee
Anthony Keane, Institute of Technology Blanchardstown
Arnold Hensman, Institute of Technology Blanchardstown
Brian Crean, Cork Institute of Technology
Brian Nolan, Institute of Technology Blanchardstown
Cormac J. Sreenan, University College Cork
Cristina Hava Muntean, National College of Ireland
Dave Denieffe, Institute of Technology Carlow
Dave Lewis, Trinity College Dublin
David Tracey, Salix
Declan O'Sullivan, Trinity College Dublin
Dirk Pesch, Cork Institute of Technology
Enda Fallon, Athlone Institute of Technology
Gabriel-Miro Muntean, Dublin City University
Hugh McCabe, Institute of Technology Blanchardstown
Ian Pitt, University College Cork
Jeanne Stynes, Cork Institute of Technology
Jim Clarke, TSSG, Waterford Institute of Technology
Jim Morrison, Letterkenny Institute of Technology
John Murphy, University College Dublin
Kieran Delaney, Cork Institute of Technology
Larry McNutt, Institute of Technology Blanchardstown
Liam Kilmartin, National University of Ireland Galway
Mark Davis, Dublin Institute of Technology
Mark Riordan, Institute of Art Design and Technology Dun Laoghaire
Markus Hofmann, Institute of Technology Blanchardstown
Martin McGinnity, University of Ulster Belfast
Matt Smith, Institute of Technology Blanchardstown
Michael Loftus, Cork Institute of Technology
Nick Timmons, Letterkenny Institute of Technology
Nigel Whyte, Institute of Technology Carlow
Paddy Nixon, University College Dublin
Pat Coman, Institute of Technology Tallaght
Paul Walsh, Cork Institute of Technology
Richard Gallery, Institute of Technology Blanchardstown
Ronan Flynn, Athlone Institute of Technology
Sean McGrath, University of Limerick
Stephen Sheridan, Institute of Technology Blanchardstown
Sven van der Meer, TSSG, Waterford Institute of Technology
viii
Table of Contents
Session 1: Trust & Security
Chaired by: John Murphy, University College Dublin
Trust Management In Online Social Networks
3
Bo Fu, Declan O'Sullivan
Irish Legislation regarding Computer Crime
13
Anthony Keane
Distributed Computing for Massively Multiplayer Online Games
21
Malachy O'Doherty, Jonathan Campbell
A Comparative Analysis of Steganographic Tools
29
Abbas Cheddad, Joan Condell, Kevin Curran, Paul McKevitt
Session 2: Computing Systems
Chaired by: Matt Smith, Institute of Technology Blanchardstown
A Review of Skin Detection Techniques for Objectionable Images
40
Wayne Kelly, Andrew Donnellan, Derek Molloy
Optical Reading and Playing of Sound Signals from Vinyl Records
50
Arnold Hensman
Optimisation and Control of IEEE 1500 Wrappers and User Defined TAMs
Michael Higgins, Ciaran MacNamee, Brendan Mullane
ix
60
Session 3: Applications
Chaired by: Stephen Sheridan, Institute of Technology Blanchardstown
MemoryLane: An Intelligent Mobile Companion for Elderly Users
72
Sheila Mc Carthy, Paul Mc Kevitt, Mike McTear, Heather Sayers
Using Scaffolded Learning for Developing Higher Order Thinking Skills
83
Cristina Hava Muntean, John Lally
Electronic Monitoring of Nutritional Components
91
Zbigniew Fratczak, Gabriel-Miro Muntean, Kevin Collins
A Web2.0 & Multimedia solution for digital music
98
Helen Sheridan, Margaret Lonergan
Session 4: Algorithms
Chaired by: Brian Nolan, Institute of Technology Blanchardstown
Adaptive ItswTCM for High Speed Cable Networks
108
Mary Looney, Susan Rea, Oliver Gough, Dirk Pesch
Distributed and Tree-based Prefetching Scheme for Random Seek Support in P2P
Streaming
116
Changqiao Xu, Enda Fallon, Paul Jacob, Yuansong Qiao, Austin Hanley
Parsing Student Text using Role and Reference Grammar
122
Elizabeth Guest
A Parallel Implementation of Differential Evolution for Weight Adaptation in
Artificial Neural Networks
130
Stephen Sheridan
x
Session 5a: Wired & Wireless
Chaired by: David Tracey, Salix
The Effects of Contention between stations on Video Streaming Applications over
Wireless Local Area Networks- an experimental approach
139
Nicola Cranley, Tanmoy Debnath, Mark Davis
An Investigation of the Effect of Timeout Parameters on SCTP Performance
147
Sheila Fallon, Paul Jacob, Yuansong Qiao, Enda Fallon
Performance Analysis Of Multi-hop Networks
156
Xiaoguang Li, Robert Stewart, Sean Murphy, Sumit Roy
Session 5b: Wired & Wireless
Chaired by: Nick Timmons, Letterkenny Institute of Technology
EmNets - Embedded Networked Sensing
167
Ken Murray, Dirk Pesch, Zheng Liu, Cormac Sreenan
Dedicated Networking Solutions for Container Tracking System
175
Daniel Rogoz, Fergus O'Reilly, Kieran Delaney
Handover Strategies in Multi-homed Body Sensor Networks
Yuansong Qiao, Xinyu Yan, Enda Fallon, Austin Hanley
xi
183
Session 6: Doctoral Symposium
Chaired by: Declan O’Sullivan, Trinity College Dublin
Hierarchical Policy–Based Autonomic Replication
192
Cormac Doherty, Neil Hurley
Sensemaking for Topic Comprehension
196
Brendan Ryder, Terry Anderson
A Pedagogical-based Framework for the Delivery of Educational Material to
Ubiquitous Devices
203
Caoimhin O'Nuallain, Sam Redfern
xii
ITT07 Author Index
A
Anderson,
M
Terry
196
Jonathan
Kevin
Abbas
Kevin
Joan
Nicola
Kevin
21
50
29
91
29
139
29
Mark
Tanmoy
Kieran
Cormac
Andrew
139
139
175
192
40
MacNamee, Ciaran
Matthews,
Adrian
McCarthy,
Sheila
McKevitt,
Paul
McTear,
Mike
Molloy ,
Derek
Mullane,
Brendan
Muntean,
Cristina Hava
Muntean,
Gabriel-Miro
Murphy,
Sean
Murray,
Ken
Muthukumaran, Panneer
C
Campbell,
Casey,
Cheddad,
Collins ,
Condell,
Cranley,
Curran,
D
Davis,
Debnath,
Delaney,
Doherty,
Donnellan,
O
F
Fallon,
Fallon,
Fratczak,
Fu,
Enda
116, 147, 156, 183
Sheila
147
Zbigniew
91
Bo
3
Oliver
Elizabeth
Austin 116, 147, 156, 183
Gareth
183
Arnold
50
Michael
60
Neil
192
Paul
116, 147
Anthony
Kenneth
Wayne
13
183
40
Dirk
108, 167
Yuansong 116, 147, 183
Rea,
Redfern,
Rogoz,
Roy,
Ryder,
Susan
Sam
Daniel
Sumit
Brendan
108
203
175
156
196
Heather
Helen
Stephen
Weiping
Rostislav
Cormac
Robert
72
98
130
167
167
167
156
Duong
167
Changqiao
116
Xinyu
183
S
T
L
Laffey,
Lally,
Li,
Liu,
Lonergan,
Looney,
Pesch,
Sayers,
Sheridan,
Sheridan,
Song,
Spinar,
Sreenan,
Stewart,
K
Keane,
Kearney
Kelly,
21
175
203
175
3
R
J
Jacob,
Malachy
Brendan
Caoimhin
Fergus
Declan
P
Qiao,
108
122
H
Hanley,
Hay,
Hensman,
Higgins,
Hurley,
O'Doherty,
O'Flynn,
O'Nuallain,
O'Reilly,
O'Sullivan,
Q
G
Gough,
Guest,
60
183
72
29, 72
72
40
60
83
91
156
167
167
Dennis
John
Xiaoguang
Zheng
Margaret
Mary
Ta,
175
83
156
167
98
108
X
Xu,
Y
Yan,
xiii
xiv
Session 1
Trust & Security
1
2
Trust Management in Online Social Networks
Bo Fu, Declan O’Sullivan
School of Computer Science & Statistics,
Trinity College, Dublin
[email protected], [email protected]
Abstract
The concept of trust has been studied significantly by researchers in philosophy, psychology and
sociology; research in these fields show that trust is a subjective view that varies greatly among
people, situations and environments. This very subjective characteristic of trust however, has been
largely overlooked within trust management used in the online social network (OSN) scenario. To
date, trust management mechanisms in OSNs have been limited to access control methods that
take a very simplified view of trust and ignore various fundamental characteristics of trust. Hence
they fail to provide a personalized manner to manage trust but rather provide a “one size fits all”
style of management. In this paper we present findings which indicate that trust management for
OSNs needs to be modified and enriched and outline the main issues that are being addressed in
our current implementation work.
Keywords: Trust, Online Social Networks, multi-faceted, personalisation, ratings.
1
Introduction
The concept of social networking dates back to 1930s, when Vannevar Bush first introduced his idea
about “memex” [Vannevar, 1996], a “device in which an individual stores all his books, records, and
communications, and which is mechanized so that it may be consulted with exceeding speed and
flexibility”, and predicted that “wholly new forms of encyclopedias will appear, ready made with a
mesh of associative trails running through them, ready to be dropped into the memex and there
amplified.”
Since the launch of the first online social networking website USENET [Usenet] in 1979, we have seen
a dramatic increase of online social networks such as Bebo [Bebo], Facebook [Facebook] and
MySpace [MySpace] just to name a few, these OSNs allow users to discover, extend, manage, and
leverage their personal as well as professional networks online.
OSNs serve various purposes, mostly center around the following topics: business, education,
socializing and entertainment.
Business oriented OSNs help registered individuals make connections, build business contacts and
maintain professional networks for potential career opportunities; as well as allowing organizations to
advertise their products and services. Examples of such OSNs are LinkedIn [LinkedIn], Ecademy
[Ecademy], Doostang [Doostang], XING [XING] and Plaxo [Plaxo].
Educational OSNs usually focus on groups of people who wish to gain knowledge in the same field
through the forms of blogs and link sharing with a great variety of subject matters. Examples of such
networks can be found in many institutions, where intranets are set up for specific schools, faculties,
or classes.
Socializing OSNs aim to provide users with a virtual environment in which online communities can
exchange news, keep in touch with friends and family, and make new connections. Usually, various
features are implemented which allow users to keep journals, post comments and news, upload
pictures and videos as well as send each other messages. Such OSNs tend to centre around themes,
such as music, movies, personal life, etc., and are designed to be either user-centric or topic-centric,
where online communities can focus on developing profiles all about oneself or developing particular
3
hobbies. Several examples of this type of OSNs are 43 Things [43 Things], CarDomain [CarDomain],
Friendster [Friendster], Hi5 [Hi5], and MOG [MOG].
Closely associated with socializing OSNs are entertaining OSNs, where focus on personal aspects of
the online communities is less visible, compared to the entertainment attributes these communities
may offer to a network. For example, on YouTube [YouTube], focus is shifted away from personal
profiles, and video sharing feature is greatly valued. Since its launch in early 2005, YouTube has
quickly become the home of video clip entertainment, it now accounts for 29% of the U.S. multimedia
entertainment market [USA Today, 2006].
According to registration requirements, OSNs can be grouped into two main categories, sites that are
open to anyone and sites that are invitation only.
Anyone is welcomed to set up an account and put up a representation of oneself in open invite OSNs,
such as Graduates.com [Graduates], and Friends Reunited [Friends Reunited]. However, in order to
join some sites, you need to be invited by a trusted member, aSmallWorld [aSmallWorld] is an
example of such OSNs, where high profile celebrities are among its registered members.
The predominant business model for most OSNs is advertising. It is free for anyone to join, and
revenue is made by selling online advertising on these websites. However, a number of OSNs charge
their members for the information or services they provide, such as LinkedIn where employers can
advertise their vacancies looking for suitable candidates.
The remainder of this paper is organized as follows. We first examine the state of the art in trust
management mechanisms deployed in OSNs in section 2, which has led to our belief that very little
attention is being paid to personalized trust management in OSNs. Next, in order to explore this, we
designed an online questionnaire to determine if our initial belief was well founded. The design and
execution of the survey is then presented in section 3, followed by the findings in section 4. These
findings have helped us identify issues, discussed in section 5, that users have with current trust
management in OSNs. And finally, these identified issues have provided a backdrop for the
prototyping of our solution that is currently underway and briefly described in section 6.
2
State of the Art
2.1 Trust – Definitions and Characteristics
Trust is an elusive notion that is hard to define, the term “trust” stands for a diversity of concepts
depending on the person you approach. To some, trust is predictability, where evidence of one’s
reputation suggests a most-likely outcome; to others, trust is dependability, where one truly believes in
and depends upon another; yet, to many, trust is simply letting others make decisions for you and
knowing that they would act in your best interest.
Several notable definitions of trust are presented below.
Grandison and Sloman [Grandison & Sloman, 2000] defined trust as “the firm belief in the
competence of an entity to act dependably, securely, and reliably within a specified context.”
Mui et al. [Mui et al., 2002] defined trust as “a subjective expectation an agent has about another’s
future behaviour based on the history of their encounters.”
Olmedilla et al. [Olmedilla et al., 2005] stated that “Trust of a party A to a party B for a service X is
the measurable belief of A in that B behaves dependably for a specified period within a specified
context (in relation to service X).”
In summary, trust can not be defined by a single consensus, there is a wide and varied range of
synonyms for trust, and the answer to “what is trust” can not be easily provided. Hence, significant
challenges are presented for modelling trust in the semantic Web, therefore, it is important for us to
concentrate on the core characteristics of trust [Golbeck, 2005; Dey, 2001] which remain true
regardless of how trust is modelled.
4
Trust is Asymmetric. Between two parties, trust level is not identical. A may trust B 100%, however, B
may not necessarily feel the same way about A; B may only trust A 50% in return for example.
Arguably, trust can be transitive. Let us say that A and B know each other very well and are best
friends, B has a friend named C whom A has not met. But since A knows B well and trusts B’s
choices in making friends, A may trust C to a certain extent even though they have never met. Now let
us say C has a friend named D whom neither A nor B knows well, A could find it hard to trust D.
Hence, it is reasonable to state that as the link between nodes grow longer, trust level decreases.
However, others [Grandison, 2003; Abdul-Rahman, 2004] disagree and argue that trust is nontransitive, [Zinnermann, 1994] states that if I have a good friend whom I trust dearly, who also trusts
that the president would not lie, does that mean that I would therefore trust that the president would
not lie either?
Trust is personalised. Trust is a subjective point of view, two parties can have very different opinions
about the trustworthiness of the same person. For example, a nation may be divided into groups who
strongly support the political party in charge and groups who would strongly disagree.
Trust is context-dependent. Trust is closely associated with overall contexts, in other words, trust is
context-specific [Gray, 2006]. One may trust another enough to lend that person a pencil, but may find
the person hard to trust with a laptop for instance.
2.2 Current Trust Mechanisms in Online Social Networks
Current trust mechanisms used in OSNs have been limited to simple access control mechanisms,
where authorization is required to contact, to write on, and to read all or part of a user’s profile, given
that blogging and commenting features are enabled. Communities in OSNs are usually categorized
into groups, i.e., one’s family, friends, neighbours, etc., with all or limited access to one’s photos,
blogs and other resources presented.
In Bebo for instance, a user can acquire URL for his/her profile which then is viewable to anyone with
a browser, or he/she can set the profile “private” which means that only the connected friends to this
user are authorized to view the profile and everything presented in it.
In Yahoo! 360° [Yahoo!360], access control mechanism is refined by letting users set their profiles
and blogs viewable to the general public, their friends, friends of their friends or just the users
themselves. The site allows users the freedom to create specific friend categories, such as friends in
work, friends met while travelling, etc. Users can then control whether to be contacted via email or
messenger by anyone in the Yahoo! 360° network, people whom one is connected to, or only those in
the defined categories.
In Facebook, privacy settings of a profile is further refined by allowing the owner of a profile grant
different levels of access to sections of a profile such as contact information, groups, wall, photos,
posted items, online status, and status updates. Also, users can decide whether they would like the
search engine to list their profiles in search results; as well as whether they would like to notify friends
with their latest activities. Finally, a user can select which parts of the profile are to be displayed to the
person who tries to contact him/her through a poke, message, or friend request.
Among many notable OSNs, we have found that controlling access seems to be the only way to
express trust, where users group their connections into categories and grant all or limited access to
these specified categories. Studies [Ralph, Alessandro et al. 2005] of FaceBook have shown that many
people who are connected to a person are not necessarily “friends” as such, but simply people whom
this person does not dislike. Hence, there is a great variety of the levels of trust among these connected
“friends” of a person. However, this variety of trust level has not been captured in OSNs, and users
can not annotate their variety of trust in a person, nor can they personalise that trust depending on the
situation. In some cases, we want private information to be known only by a small group of people and
not by random strangers. Such information may be where you live, how much money you make, etc.,
in an OSN environment, you probably would dislike the idea of letting random strangers read
comments left by your friends detailing a trip you are about to take, for safety reasons. In other
instances, we are willing to reveal personal information to anonymous strangers, but not to those who
5
know us better. For example, if desired, one can state one’s sexuality on a profile page and broadcast
that to the world, however, one may not be ready to reveal that very piece of information to the family
and friends whom one trusts most.
2.3 Related Work
Much research has been carried out in the field of computer science in relation to trust management,
various algorithms, systems and models have been produced, such as PGP [Zimmerman, 1995],
REFEREE [Chu et al, 1997], SULTAN [Grandison et al, 2001], FOAF [Dumbill et al, 2002],
TRELLIS [Gil et al, 2002], Jøsang’s trust model [Jøsang A., 1996], Marsh’s trust model [Marsh,
1994] and many more. In particular, a multi-faceted model of trust that is personalisable and
specialisable [Quinn, 2006] has been designed in the Knowledge and Data Engineering Group
(KDEG) [KDEG] from the Computer Science Department in Trinity College Dublin.
While reviewing trust management systems in computer science, Quinn found that current methods
“tend to use a single synonym, or definition in the use of trust… such approaches can only provide a
generic, non-personalised trust management solution”. To address this problem of the lack of potential
for personalizing trust management, a multi-faceted model of trust that is both personalisable and
specialisable was proposed, implemented and evaluated. In the proposed model, trust is divided into
concrete concept and abstract concept with attributes of their own, where the former includes
credibility, honesty, reliability, reputation and competency attributes, and the later with belief, faith
and confidence attributes. Ratings are then given to each of the eight attributes, and trust is calculated
as the weighed average of these ratings.
The claim for this model is that it has “the ability to capture an individual’s subjective views of trust,
also, capture the variety of subjective views of trust that are exhibited by individuals over a large and
broad population”, which in turn, provides “a tailored and bespoke model of trust”. In addition to
demonstrating its personalization capabilities, Quinn demonstrated how the model could be specialised
to any application domain.
The two applications that were used to trial the model and approach were web services composition
and access control in a ubiquitous computing environment. However, Quinn did speculate in his
conclusions that the model would be suitable for use in the OSN domain.
3
Survey Design and Execution
Given the lack of trust management features within OSNs and our belief that such features would be
welcomed, we decided to explore with users whether Quinn’s multi-faceted model of trust that enables
personalization and provides the freedom of annotating trust subjectively be welcomed in OSNs? And
what would be the desired functionalities if such a trust model would be integrated into OSNs? With
these questions in mind, A Survey of Online Social Networks was designed.
The questionnaire groups participants into three categories as follows, people who are currently using
OSNs, people who have used OSNs in the past but are no longer active, and finally, people who have
never used OSNs. With the former two categories, the survey aimed to find out user behaviour in
relation to trust management aspect in OSNs, and gather user experience with existing trust
mechanisms. With the last category, we aimed to find out why some have not or will not use OSNs.
Most importantly, without excluding anyone, regardless of participants’ experience with OSNs and
current trust mechanisms, we asked for their desired trust features as well as their opinions on the
multi-faceted model of trust.
A trial questionnaire was first designed and road tested in a computer science postgraduate class,
where a group of thirteen people took part in the survey, which has helped the refinement of the
official questionnaire.
Considering their flexibility, feasibility and easy data gathering factors, online questionnaires was
convenient as we were aiming at a large audience, therefore, SurveyMonkey [SurveyMonkey] was
chosen to host the online survey on the 27th of May, 2007, over a period of two weeks time. Invitations
6
to take part in the survey were sent out via email, to targeted third level institutions in Ireland, and
interested parties were encouraged to distribute the questionnaire further.
4
Findings
In total, 393 people took part in answering the online questionnaire. Among which, 59% were male,
41% were female. 68% of respondents were undergraduate students, 21% were postgraduate student
and with the remaining being college employees. Most survey participants come from science related
background, with a high 70% of people either studying for or having a degree in engineering,
computer science or information technology related fields.
4.1 Category One – Active OSN users
Among 243 respondents who are currently using OSNs, the majority of the profiles are set to be
viewable by the general public, while less than 20% of people allow only direct linked friends to view
their profiles, as Figure 1 shows.
71.60%
19.75%
4.12%
4.53%
People directly
linked with you
Only some of your Other friends of
directly linked
your directly linked
friends
friends
Anyone
Figure 1: Access settings of user profiles – Category One
We asked the question of whether these users are happy with the available ways of controlling access
to their profiles. As Figure 2 shows, most people are pleased with current access control methods,
while around 20% of the respondents are not concerned with it and less than 10% of people are not
pleased with it. Among reasons given for their dissatisfaction, almost every comment of those 10% of
people was in relation to the lack of better access controls to user profiles. For example, many
mentioned that in Bebo, despite having a private profile, others can still send emails to the profile
owner.
72.43%
18.11%
9.47%
Yes
No
Don’t care
Figure 2: User satisfaction towards current access control methods
Since the majority of this category has public profiles, we asked the question of whether they trust
random strangers to view their profiles, as well as the question of whether access control really is
necessary. As Figure 3 shows, despite having public viewable profiles, only 25% of these people
actually stated the fact that indeed, they do trust anyone and everyone to view their profiles. Most
people however, claimed that they do not, while also a large number of people are not bothered by it at
7
the same time. We have found a similar contradictive response regarding the necessity of access
control in OSNs, as Figure 4 shows, only less than 20% of these people think it is not necessary, while
most people, nearly 55% of the respondents believe that controlling access is necessary, and around
25% of people are not concerned.
38.72%
35.74%
25.53%
Yes
No
Don’t care
Figure 3. Would you trust random strangers to view your profile?
54.89%
25.53%
19.57%
Yes
No
Don’t care
Figure 4. Is it necessary that only certain people can view certain parts of your profile?
4.2 Category Two – No Longer Active OSN users
During their memberships of the 50 respondents in this category, 46% of people had set their profiles
accessible by anyone, as Figure 5 shows, 26% allowed only direct linked people to view their profiles.
46.00%
26.00%
16.00%
12.00%
People directly
linked with you
Only some of your Other friends of
directly linked
your directly linked
friends
friends
Anyone
Figure 5. Access settings of user profiles – Category Two
When asked about why have they stopped using OSNs, this category of people gave several interesting
reasons. For instance, a lot of people lost interest in OSNs, sometimes due to unpleasant personal
experience, or the completion of research or work related projects, or simply do not have time for
them any more. In our survey, 5% of people in category two view OSNs as a rather sad way of
replacing real life associations, particularly since a lot of sites keep records of the number of visits a
profile gets, turning OSNs into forms of popularity contests. However, at the same time, many
acknowledge the fact that OSNs are cheap alternatives to keep updated with others, but believe that a
refinement in their structure is needed. In particular, privacy concerns were top of the list, with
8
individuals mentioning unpleasant experiences during their membership. For example, on some sites,
comments left by close friends are displayed to everyone connected to an individual or sometimes,
anyone with a browser; also, being contacted unwillingly by random strangers or friends of a
connected friend whom they barely knew. Unfortunately, ways to stop these from happening do not
always seem to work, distress and frustration had been caused due to the limited methods that are
available.
When asked whether they think access control of profiles are necessary in OSNs, this group of people
had a similar response to category one. Among 47 participants who answered this question, 66% of
people believe that it is necessary, only 6% of people disagree, with the remaining not caring.
4.3 Category Three – Not Users of OSNs as yet
We were interested to find out why this group of people have never used OSNs, among 57
respondents, some had no interest, some had no time, others dislike the idea of having private
information on the Internet and a small number of people have not heard of OSNs, as Figure 6 shows.
Again, privacy concerns and the lack of freedom of controlling access to information have been
mentioned by the 21.05% of people who stated otherwise when answering this question.
40.35%
35.09%
21.05%
19.30%
12.28%
Have never Not interested
heard of OSNs in using OSNs
Don’t have
time
Don't want to
put personal
things on the
internet
Other (please
specify)
Figure 6. Why have you never used OSNs?
Among 52 participants from this category, we asked whether it is likely for them to use OSNs in the
future and whether they believe controlling access to profiles are necessary, 44% of people stated that
they would start using OSNs in the future and 69% of whom think it is necessary to control access,
only 4% of people disagree and 27% say that they do not care.
4.4 Desired Trust Features and Opinions on a Proposed Solution
If a multi-faceted model of trust with the eight trust attributes: credibility, honesty, reliability,
reputation, competency, belief, faith and confidence, is to be integrated into OSNs, would that be
welcomed? Would ratings of these eight attributes of a person portrait subjective views of trust in
OSNs? With the aim of finding out more on our proposed solution, we asked our participants’ views
on desired trust features in OSNs as well as their feelings towards a rating feature.
We asked 315 participants, which of those eight attributes of trust are most important in their opinions,
as figure 7 shows, honesty appears to be the most important factor, closely followed by credibility and
reliability, as well as reputation.
9
60.95%
47.94%
39.68%
24.13%
17.78%
15.56%
en
ce
Fa
i th
3.81%
Co
nf
id
Be
lie
f
es
ty
Re
lia
bi
lity
Re
pu
ta
tio
n
Co
m
pe
te
nc
y
Ho
n
Cr
ed
ib
ilit
y
2.54%
Figure 7. Views on the eight attributes of trust
When asked if a user would like to see the ratings given by others, 44% of participants said yes, 36%
said no, with the remaining not caring about it. However, when asked whether they would like to rate
others, 67% of people think it is unnecessary, only 9% of respondents believe that it would be helpful,
another 10% of people do not care and with the remaining not being able to decide on the subject.
5
Analysis
Several issues have been discovered during the survey, as discussed below:
Current trust mechanisms need to be refined. Most mentioned unpleasant experiences are related to a
lack of, or unsatisfying privacy control, while a large number of OSNs fail to allow users to express
their various degrees of trust in a person, or a group of people context-specifically. Hence, refinement
of current trust mechanisms is welcomed in OSNs.
Personalisation is not provided in current trust mechanisms. Users cannot personalise trust with their
subjective views in OSNs at the moment; important trust characteristics as mentioned in section 2 are
not captured in OSNs. Even though trust levels vary among members of defined groups, users can not
adjust their levels of trust among their connected friends using current trust mechanisms deployed in
OSNs.
Users are unsure about a multi-faceted model of trust with rating features. Contradictive findings in
relation to a trust rating feature suggest that on one hand, users think that such facilities would help in
gaining better control of online profiles, but on the other hand, they find it hard to rate someone they
know personally. Such opinions could be the result of a lack of understanding regarding the proposed
solution, as for a large percentage of candidates, the word “rating” is very open to interpretation, it
would be hard for them to simply imagine what ratings could be like without having the slightest ideas
of how-to go about doing it. Also, we need to recognise limitations of the questionnaire, phrasing of
the questions and limited open-ended questions in the survey may have restricted the amount of
quality data.
6
Current Work
In order to find out whether the proposed multi-faceted model of trust would truly satisfy user
requirements regarding trust management in OSNs, currently, implementation of a small scale OSN
named miniOSN is in progress, powered by Ruby on Rails [RoR] and a trust management approach
strongly influenced by Quinn’s multi-faceted model of trust.
miniOSN has functionalities of a basic online social networking website, it allows users to create
accounts for themselves with a username, password and a valid email address. Users of miniOSN can
then set up representations of themselves, upload photos, post blog entries, as well as leaving
comments in connected friends’ profiles. The trust management approach implemented in miniOSN
aims to capture the fundamental characteristics of trust found in the literature review and has the
following main features:
• Each user holds ratings of his/her connected friends in the database, which are only viewable
to this particular owner and can be adjusted at any time
10
•
•
•
•
•
Ratings can be given to credibility, honesty, reliability, reputation, competency, belief, faith
and confidence attributes of a person
The owner of a resource - be it a picture, a blog, or a comment - can set trust requirements
before distributing that resource
All users and resources have the highest ratings by default unless specified otherwise
Users decide whether to transfer a same set of trust values to all other friends of a connected
friend
Users decide which connected friends should start with what ratings
Profile owners can then express trust with personalization, adjust minimum trust rating requirements
when granting access to certain resources in their profiles. For example, a family member with a high
rating in honesty but a low rating in competency cannot read a certain blog entry; while a work
colleague with high ratings in reputation and competency but low rating in reliability cannot see a
particular group of photos. Evaluating such an OSN integrated with the multi-faceted model of trust is
part of our continuing research agenda.
References
[43Things] 43Things website, http://www.43things.com
[Abdul-Rahman, 2004] Abdul-Rahman, A., (2004). “A Framework for Decentralised Trust
Reasoning”, Ph.D. thesis, University of London, UK.
[aSmallWorld] aSmallWorld website, http://www.asmallworld.net
[Bebo] Bebo website, http://www.bebo.com
[CarDomain] CarDomain website, http://www.cardomain.com
[Chu et al, 1997] Chu, Y., Feigenbaum, J., LaMacchia, B., Resnick, P., and Strauss, Ma., (1997).
‘REFEREE: Trust Management for Web Applications.’, The World Wide Web Journal, 1997, 2(3),
pp. 127-139.
[Dey, 2001] Dey, A., (2001). “Understanding and Using Context”, Personal and Ubiquitous
Computing 5(1): 4-7.
[Doostang] Doostang website, http://www.doostang.com
[Dumbill et al, 2002] Dumbill, E., (2002). ‘XML Watch: Finding friends with XML and RDF.’, IBM
Developer Works’, June 2002. Last retrieved from
http://www-106.ibm.com/developerworks/xml/library/xfoaf.html
[Ecademy] Ecademy website, http://www.ecademy.com
[Facebook] Facebook website, http://www.facebook.com
[Friends Reunited] Friends Reunited website, http://www.friendsreunited.com
[Friendster] Friendster website, http://www.friendster.com
[Gil et al, 2002] Gil, Y., Ratnakar, V., (2002). ‘Trusting Information Sources One Citizen at a Time’,
Proceedings of the First International Semantic Web Conference (ISWC), Sardinia, Italy, June 2002.
[Golbeck, 2005] Golbeck, J. A., (2005). “Computing and Applying Trust in Web-Based Social
Networks”, Ph.D. thesis, University of Maryland.
[Graduates] Graduates website, http://graduates.com
[Grandison, 2003] Grandison, T., (2003). “Trust Management for Internet Applications”, Ph.D. thesis,
University of London, UK.
[Grandison et al, 2001] Grandison, T., Sloman, M., (2001). ‘SULTAN - A Language for Trust
Specification and Analysis’, Proceedings of the 8th Annual Workshop HP Open View University
Association (HP-OVUA), Berlin, Germany, June 24-27, 2001.
[Grandison & Sloman, 2000] Grandison, T., and Sloman, M., (2000). “A survey of trust in internet
applications”, IEEE Communications Surveys and Tutorials, 4(4):2–16.
[Gray, 2006] Gray, E. L., (2006). “A Trust-Based Management System”, Ph.D. thesis, Department of
Computer Science and Statistics, Trinity College, Dublin.
[Hi5] Hi5 website, http://www.hi5.com
11
[Jøsang A., 1996] Jøsang A., (1996). “The right type of trust for distributed systems”, Proceedings of
the 1996 workshop on new security paradigms. Lake Arrowhead, California, United States, ACM
Press.
[KDEG] Knowledge and Data Engineering Group website, http://kdeg.cs.tcd.ie
[LinkedIn] LinkedIn website, http://www.linkedin.com
[Marsh, 1994] Marsh S., (1994). “Formalising Trust as a Computational Concept”, Ph.D. thesis,
Department of Mathematics and Computer Science, University of Stirling.
[MOG] MOG website, http://mog.com
[Mui et al., 2002] Mui, L., Mohtashemi, M., and Halberstadt, A., (2002). “A computational model of
trust and reputation”, In Proceedings of the 35th International Conference on System Science, pages
280–287.
[MySpace] MySpace website, http://www.myspace.com
[Olmedilla et al., 2005] Olmedilla, D., Rana, O., Matthews, B., and Nejdl, W., (2005). “Security and
trust issues in semantic grids”, In Proceedings of the Dagsthul Seminar, Semantic Grid: The
Convergence of Technologies, volume 05271.
[Plaxo] Plaxo website, http://www.plaxo.com
[Quinn, 2006] Quinn K., (2006). “A Multi-faceted Model of Trust that is Personalisable and
Specialisable”, Ph.D. thesis, Department of Computer Science and Statistics, Trinity College, Dublin.
[Ralph, Alessandro et al. 2005] Ralph, Alessandro et al., (2005). “Information revelation and privacy
in online social networks”, Proceedings of the 2005 ACM workshop on Privacy in the electronic
society. Alexandria, VA, USA, ACM Press.
[RoR] Ruby on Rails project homepage, http://www.rubyonrails.org
[SurveyMonkey] Survey Monkey website, http://www.surveymonkey.com
[USA Today, 2006] USA Today, (2006). “YouTube serves up 100 million videos a day online”, last
retrieved from http://www.usatoday.com/tech/news/2006-07-16-youtubeviews_x.htm
[USENET] USENET website, http://www.usenet.com
[Vannevar, 1996] Vannevar, B., (1996). "As we may think." interactions 3(2): 35-46.
[XING] XING website, http://www.xing.com
[Yahoo!360] Yahoo!360 website, http://360.yahoo.com
[YouTube] YouTube website, http://youtube.com
[Zimmermann, 1994] Zimmermann, P., (1994). “PGP(tm) User's Guide”, October 1994.
[Zimmerman, 1995] Zimmerman, P.R., (1995). “The Official PGP Users Guide”, MIT Press,
Cambridge, MA, USA, 1995.
12
Irish Legislation regarding Computer Crime
Anthony J. Keane
Department of Informatics
School of Informatics and Engineering
Institute of Technology Blanchardstown
Dublin 15
[email protected]
Abstract
Most people that use computers, whether for personal or work related activities, do so oblivious of the general
legalities of their actions, in terms of the enacted legislation of the State. Of course, we all have a good idea of
the obvious illegal activities like using computers to commit criminal fraud, theft or to view child pornography,
especially where high profile cases of these crimes appear in the news media from time to time. This paper is an
overview of the Irish legislation regarding computer crime and examines how the wordings of these laws are
interpreted by the legal community in the identification of what could be considered a computer crime.
Keywords: Computer Forensics, Computer Crime, Irish Law
1
Introduction
The Internet is a global communications system that allows easy access to resources and people. It has
been adopted by business as a means of increasing their customer base and improving their ability to
provide their service. Criminals have also adopted it as another means of committing crime. The
mechanism of the Internet is based in technology protocols, many of which are open standard and
easily available, so with a little effort in educating one self on the inner working of the Internet, the
criminal mind can conceive many imaginative ways of misrepresenting themselves and tricking the
remote user into divulging their personal details, financial details, user access codes and passwords.
Who hasn’t received a spam email asking for bank details, offering to get large amounts of money for
a small deposit, and similar type of get-rich-quick schemes? Other tricks are not illegal but border on
being so and are defiantly unethical, are the selling of so-called special drugs reporting that they can
satisfy some social desire on the part of the customer.
The commercialisation of the Internet is a relatively recent phenomenon and it is only since the late
1990s that companies have wanted to do business on the Web and for users to communicate via email.
It is estimated today that they are over one billion users with a presence on the Internet and as such,
has attracted the attention of the criminals where they actively targeted Internet users’ everyday to
relieve them of their cash and identities. Their primary means of contacting individuals is by
spamming users with email messages (MessageLabs[1] have reported detecting over 83% of email
traffic as being spam). Other approaches involve hacking into networks and as early as 1999 a number
of high profile hacks were reported, the Hotmail email service was broken into and user accounts
could be accessed with the use of their passwords, the New York stock exchange was attacked,
Microsoft is constantly attacked as well as NASA and the Pentagon. Closer to home, a recent survey
was conducted of Irish businesses and according to the results of “The ISSA/UCD Irish Cybercrime
Survey 2006: The Impact of Cybercrime on Irish Organisations” report[2], Irish organisations are
significantly affected by cybercrime where virtually all (98%) of respondents indicated that they had
experienced some form of cybercrime with losses of productivity and data being the main
13
consequences. High profile attacks have been the Department of Finance where the phone system was
hijacked and used to run up a bill of thousands of euros over one weekend.
It is from the explosion of growth in computer use that a new field of computer science has emerged to
deal with computer related crime and it is called Computer Forensics. It was initially developed by
police enforcement agencies, like FBI[3] where techniques, tools and best practices were needed for
information in a crime to be extracted from computer storage devices and used as evidence in the
prosecution of the case. Today the Computer Forensics field has many contributors from academic
research groups to professional companies specialising in Security and Forensics and the law
enforcements agencies. There is also a variety of propriety application tool kits for analysing storage
media together with a growing array of free open source tools.
There are three areas of demand for the services of a computer forensics professional, the criminal
area, the corporate requirement, the private / civil area. Here we look at the criminal area and
concentrate on the legislation in force in Ireland that is available for prosecution of computer related
crimes. We ask the following questions: How is the legislation framed and what computer related
activities are considered illegal?
2
Irish Legislation
In Irish Law, there is no individual legislation Act that is specially targeted at computer crime, instead
computer crime has been treated as an afterthought and incorporated in Acts whose primary focus is
elsewhere. As such, most of the computer crime related offences can be found in section 5 of the
Criminal Damage Act, 1991[4] and Section 9 of the Criminal Justice (Theft and Fraud) Offences Act
2001[5].
2.1
The Criminal Damage Act 1991, Section 5 states that:
(1) A person who without lawful excuse operates a computer—
(a) within the State with intent to access any data kept either within or outside the State,
or
(b) outside the State with intent to access any data kept within the State,
shall, whether or not he accesses any data, be guilty of an offence and shall be liable on
summary conviction to a fine not exceeding ¼634 or imprisonment for a term not exceeding 3
months or both.
(2) Subsection (1) applies whether or not the person intended to access any particular data or
any particular category of data or data kept by any particular person.
We need to look in other sections in the Criminal Damages Act of 1991 to get the definition for
criminal damage and to see how it applies to data. The offence of criminal damage is identified in
section 2, part 1 which states that “A person who without lawful excuse damages any property
belonging to another intending to damage any such property or being reckless as to whether any such
property would be damaged shall be guilty of an offence”. The offence applies to data as follows: (i)
to add to, alter, corrupt, erase or move to another storage medium or to a different location in the
storage medium in which they are kept (whether or not property other than data is damaged thereby),
or (ii) to do any act that contributes towards causing such addition, alteration, corruption, erasure or
movement. Note also that the term "data" is defined as information in a form in which it can be
accessed by means of a computer and includes a program.
14
At first glance, the Criminal Damages Act appears to create an offence for the unauthorised operation
of a computer and unauthorised access of data. However the wording of the Act is sufficiently loose
to raise some comments from the legal community as regards its meaning and how it would apply in
court.
The following points have been made by McIntyre[6] regarding the Criminal Damages Act:
• it create an offence for the modification of any information stored on a computer whether or
not it has an adverse effect,
• the Act doesn’t differentiate between less serious offences of unauthorised access and more
serious offences of actual damage,
• it has undefined terms like “operate” and “computer”,
• section 5 creates an offence of operating a computer without lawful excuse but section 6
discusses lawful excuse in terms of authority to access data and not operate a computer,
McIntyre uses two examples to illustrate the problems of interpreting the offence:
Example 1:
Suppose that X sends an email to Y, which travels via Z’s computer. X will be seen to have
“operated” Y’s and Z’s computers since he has caused them to executed programs to deliver and
process his email. If Y indicated that the email was unwelcome, then X could be charged with
operating Y’s computer without lawful excuse and guilty of an offence under section 5 of the
Act.
Example 2:
X uses Y’s computer without Y’s permission, to access data he is entitled to access. This is
unauthorised operation but not unauthorised access. This is an offence under section 5 but
section 6 suggests that X has lawful excuse and so no offence has occurred.
Conversely, X uses Z’s computer with permission to access data he is not entitled to access.
This is authorised access but unauthorised access. This is not an offence under section 5 but
with section 6 taken into account, an offence of unlawful access has occurred.
As the unauthorised access to information is handled by the Criminal Damage Act, 1991 and is
supposed to safeguard the possibility where a “hacker” has not committed any damage, fraud or theft
but has tried or succeeded to gain access to a computer system.
When a system is damaged, then Section 2 of the Criminal Damage Act, 1991 is used. This creates
the offences of intentional or reckless damage to property. While the wording of the Act does not
explicitly use computer terms like virus, an offence could be applied to damage caused to a computer
system by a virus or similar computer generated attack. Reckless damage under section 2 has as the
penalties a minimum fine of ¼12,700 to a maximum imprisonment for a term not exceeding 10 years
or both.
One of the problems with the legislation is the poor definition of computer terms, for example data and
computer with the reason given for this approach as a means to prevent the legislation from becoming
obsolete by the rapid advancement of technology. However the range of meaning of data could lead to
a scenario outlined by Karen Murray[7];
“The Criminal Damage Act 1991 has sought to avoid ambiguous definitions by avoiding a definition
at all. This may have bizarre results; the human memory is undoubtedly a “storage medium” for
‘data’; if a hypnotist causes a person to forget something, have they committed criminal damage?”
Murray argues that such vagueness may be subject to a Constitutional challenge in Ireland under the
doctrine where “the principle that no one may be tried or punished except for an offence known to the
law is a fundamental element of Irish and common-law system and essential security against arbitrary
prosecution”. In other words, “if there is no way of determining what the law is, there is no crime”
15
2.2
The Criminal Justice (Theft and Fraud) Offences Act 2001, Section 9 states that:
(1) A person who dishonestly, whether within or outside the State, operates or causes to be
operated a computer within the State with the intention of making a gain for himself or herself
or another, or of causing loss to another, is guilty of an offence.
(2) A person guilty of an offence under this section is liable on conviction on indictment to a
fine or imprisonment for a term not exceeding 10 years or both.
The Criminal Justice Act 2001 followed the Electronic Commerce Act 2000 and provides for the
dishonest operation or cause of to operate a computer. This was seen as a safe guard for the new age
of electronic commerce. The term “dishonestly” is defined in section 2 of the Act as meaning
“without a claim of right made in good faith”.
This definition and wording of the Act could be interrupted in a broad sense to mean that if someone
honestly uses a computer where he or she does so with claim of right made in good faith, there is no
offence no matter what they did with the computer.
Kelleher-Murray[8] argues that the Act appears to cover almost any use of a computer which could be
considered to be dishonest.
McIntyre[6] observations are as follows:
•
The Act does not differentiate between the gain being dishonest or honest.
•
If dishonesty is considered in a wider context, then the severity of the offence is not inline
with similar offences committed without a computer, for example the maximum penalty for
the sale of copyright material out a suitcase in the street is 5 years while selling similar
material over the Internet has a 10 year maximum penalty.
•
Act does not apply where computer is misused for improper purpose, for example the
collection of information to commit a crime.
Many Irish businesses have suffered from Denial of Service (DoS) attacks and T.J. McIntyre uses the
denial of service attacks as an example to show the difficulty in applying the Irish legislation to an
actual offence. The difficulty lies with what law has actually been broken. In a denial of service
attack, no unauthorised access has been made, no data or information has been moved or modified, the
perpetrator has used his computer of which he has authorised access and operated honestly. McIntyre
argues that an indirect prosecution may be possible where unauthorised access and criminal damage of
other computers used to commit the denial of service attack would apply and he cites a UK case of (R
vs Aaron Caffrey 2003)[9] as an example of how this was unsuccessfully attempted to get a prosecution
for data modification under the UK Computer Misuse Act 1990.
3
Developments in Computer Law
Irish Computer Law is currently under review and this is mainly due to Ireland signing up to the
European Convention on Cyber-Crime and also the adoption of the Council Framework Decision.
McIntyre expects new changes to Irish legislation over the next few years and urges the legislators not
to adopt a minimalist approach to reform by tinkering around the edges of existing laws but to give the
area of computer crime the special attention it requires in its own right and engage with a
comprehensive reform program. However any new computer crime laws should be balanced so as not
to exclude research and testing of live systems least it be taken as an attack and prosecuted as an
offence. An example of such tight legislation is the recent amendments to the UK Computer Misuse
16
Act that could possibility criminalise legitimate security researchers and guidelines are being
urgently sought to clarify the law[10].
The European Convention on Cyber-Crime[11]
3.1
Since 1995 the EU has been trying to get a consensus on how to tackle cross-border Internet related
criminal activities. In 2001 it finally got an agreement to what has become known as the Convention
on Cybercrime. Ireland became a signature in 2002 but it only came into force on 1st July 2004. The
cybercrime convention represents the first international attempt to legislate for cross-border criminal
activity involving computers. The Convention covers offences against the confidentiality, integrity
and availability of computer data and systems. It also covers computer related offences of forgery and
fraud, content related offences and offences related to infringement of copyright and related rights like
illegal file sharing. It also covers rules for interception, collection, preservation and disclosure of
computer data and information.
In the broad definition of computer crime, the term cybercrime is viewed as a subcategory and
generally associated with the Internet. The Convention on Cybercrime covers the following three
broad areas:
• All signatures criminalise certain online activities. This will require changes to the Irish
legislation since some of the offences do not exist at the moment in Irish Law.
• States should requires operators of telecommunications networks and ISPs to institute more
detailed surveillance of network traffic and have real-time analysis
• States cooperate with each other in an investigation of cybercrime by allowing data to be
shared among them “but with an opt-out clause if investigations of its essential interests are
threatened”.
As the legislation reflects the needs of law enforcement rather than public interest groups, opponents
of the Convention have cited the lack of privacy issues and forced cooperation clause as endangering
the right to privacy for citizens in the EU.
3.2
Council Framework Decision 2005/222/JHA of 24 February 2005 on attacks
against information systems[12]
The Council Framework Decision on attacks against Information Systems defines the following
criminal offences for punishable as:
•
•
•
illegal access to information systems;
illegal system interference (the intentional serious hindering or interruption of the functioning
of an information system by inputting, transmitting, damaging, deleting, deteriorating,
altering, suppressing or rendering inaccessible computer data);
illegal data interference.
In all cases the criminal act must be intentional, instigating, aiding, abetting and attempting to commit
any of the above offences will also be liable to punishment. The Member States will have to make
provision for such offences to be punished by effective, proportionate and dissuasive criminal
penalties.
3.3
Other Irish Laws of Interest
•
The Child Trafficking and Pornography Act 1998, makes it an offence to traffic in children
for sexual exploitation or to allow a child to be used for child pornography. It also makes it an
offence to knowingly produce, distribute, print or publish, import, export, sell show or possess
an item of child pornography. The Act contains penalties of up to 14 years in prison. The Act
makes it an offence to participate or facilitate in the distribution of child pornography which is
an issue for Internet Service Providers, in particular, that may give rise to potential criminal
liability under the Act.
17
•
Irish Data Protection laws are contained in the Data Protection Act 1988[13], the Data
Protection (amendment) Act 2003 together with EC Regulations 2003 (Directive 2000/31/EC
and EC Electronic Privacy Regulations 2003 (SI 535/2003). The objective is to protect the
privacy of an individual by controlling how data relating to that person is processed. The
Data Protection Act creates an offence of gaining unauthorized access to personal data and the
data protection rules go well beyond any Constitutional or European Court Human Rights
(ECHR) right to privacy.
•
Right to Privacy: Computer intruders with less than honest motivations might not have much
expectation of privacy but the honest computer user would assume their privacy was protected
by the law. Here it is interesting to note that the Irish constitution does not explicitly protect
this right, only gives an implied right. The Supreme Court has ruled an individual may invoke
the personal rights provision in Article 40.3.1 to establish an implied right to privacy. This
article provides that "The State guarantees in its laws to respect, and, as far as practicable, by
its laws to defend and vindicate the personal rights of the citizens". The Irish Supreme Court
recognises its existence in the case of Kennedy and Arnold v. Ireland. Here the Supreme
Court ruled that the illegal wiretapping of two journalists was a violation of the constitution,
stating:
The right to privacy is one of the fundamental personal rights of the citizen which flow from
the Christian and democratic nature of the State…. The nature of the right to privacy is such
that it must ensure the dignity and freedom of the individual in a democratic society. This can
not be insured if his private communications, whether written or telephonic, are deliberately
and unjustifiably interfered with.
The European Convention on Human Rights gives a stronger protection for the individual’s
right to privacy. In Article 8 of the convention, “everyone has the right to respect for his
private and family life, his home and correspondence”. This was used in a recent case in the
UK which highlighted the effectiveness of this convention when an employee’s email and
Internet access was monitored in a College and while she lost the case in the UK, she won it in
Europe when the employer (the UK government) was found to be in breach of the European
Court of Human Rights and had to pay damages and legal costs. The ruling implies that
employers can only monitor business communications and not the private use of a
telecommunications system, assuming the user has authorised access via an acceptable user
policy.
In September 2006 the Irish civil rights group Digital Rights Ireland (DRI) started a High
Court action against the Irish Government challenging new European and Irish laws requiring
mass surveillance. DRI Chairman TJ McIntyre said “These laws require telephone companies
and internet service providers to spy on all customers, logging their movements, their
telephone calls, their emails, and their internet access, and to store that information for up to
three years. This information can then be accessed without any court order or other adequate
safeguard. We believe that this is a breach of fundamental rights. We have written to the
Government raising our concerns but, as they have failed to take any action, we are now
forced to start legal proceedings”. [14]
18
4
Concluding Comments
It is evident that from this paper that cyber criminals and ordinary computer users can be prosecuted
for computer crimes under various Acts but the success of the case may depend on the interpretation
of the law to that particular crime, at that time, “Laws which are not specifically written to prohibit
criminal acts using computers are rarely satisfactory”[15]. This review of computer crime articles and
papers from members of the legal profession has shown that the Irish computer legislation has been
written in a manner that allows various interruptions to be taken and suffers from an effort to be
sufficiently vague to encompass future technological crimes. The confusion could be sorted once the
legislation is tried and tested in the courts but the author was unable to find any examples of this in
Irish courts. Internationally there are many examples of computer crimes being prosecuted only to
have the verdicts overturned at a higher level or through the European Courts. The two examples,
given in appendix 1 below, caught my attention and may be of some interest to other academics are
the Tsunami case and the Copland case. While these are not applications of Irish Crime Law, they do
demonstrate the type of restrictions that could be applied in the future if the reforms to Irish computer
crime laws are not drafted in a skilled and knowledgeable fashion.
Appendix 1: Tsunami Case[16]
In 2005 a college lecturer, Daniel Cuthbert donated money to the relief via a charity website for the
Asian Tsunami disaster,. He entered his personal details and credit card details but when he did not
receive a response after a few days he became concerned that he had given his details to a spoof
phishing site. In an attempt to find out more about the site he did a couple of very basic penetration
tests. If they resulted in the site being insecure, he would have contacted the authorities but after a few
basic attempts failed to gain entry into the site he was satisfied it was secure and assumed the site was
ok. There were no warning messages showing that he had tried to access an unauthorised area but he
had triggered an internal intrusion detection system (IDS) at the company that ran the site and they
notified the police. Later he was arrested and prosecuted under the UK Computer Misuse Act 1990[17].
The relevant section of the Act is Section 1 states that a person is guilty of an offence if:
“he causes a computer to perform any function with intent to secure access to any program or data
held in any computer; the access he intends to secure is unauthorised; and he knows at the time when
he causes the computer to perform the function.”
Due to the wide scope of the Act, the Judge, with ‘some considerable regret’ had no option but to find
Daniel Cuthbert guilty under the Computer Misuse Act 1990 and he was fined. While this is English
law and we don’t have an equivalent Irish case, as yet, it does highlight the care needed when
performing a penetration test (ethical hacking) if you are to be confident that you are not acting
illegally.
Appendix 2: Copland v UK Case[18]
Ms. Lynette Copland worked in a Welsh college as a personal assistant and discovered that the college
deputy principal was secretly monitoring her telephone, email and internet use. The College has no
policy in place for informing employees that their communications might be monitored. She claimed
that this amounted to a breach of her right to privacy under Article 8 of the European Convention on
Human Rights[19] . The UK government admitted that monitoring took place, but claimed that this did
not amount to an interference where there was no actual listening in on telephone calls or reading of
emails. Although there had been some monitoring of the applicant’s telephone calls, e-mails and
internet usage, this did not extend to the interception of telephone calls or the analysis of the content of
websites visited by her. The UK Government argued that the monitoring thus amounted to nothing
more than the analysis of automatically generated information which, of itself, did not constitute a
failure to respect private life or correspondence. However, the European Court disagreed, holding that
this monitoring and storage of details of telephone and internet use was itself an interference under
Article 8. The Court considered that the collection and storage of personal information relating to the
applicant’s telephone, as well as to her e-mail and internet usage, without her knowledge, amounted
to an interference with her right to respect for her private life and correspondence within the meaning
of Article 8.
19
References
[1]
Messagelabs; http://www.messagelabs.com/intelligence.aspx
[2]
“The ISSA/UCD Irish Cybercrime Survey 2006: The Impact of Cybercrime on Irish
Organisations”, http://www.issaireland.org/cybercrime
[3]
FBI Computer Forensics; http://www.fbi.gov/hq/lab/fsc/backissu/oct2000/computer.htm
[4]
Criminal Damage Act 1991; http://www.irishstatutebook.ie/1991/en/act/pub/0031/index.html
[5]
Criminal Justice (Theft and Fraud) Act 2001
http://www.irishstatutebook.ie/2001/en/act/pub/0050/index.html
[6]
McIntyre T.J., “Computer Crime in Ireland”;
Publ. in Irish Criminal Law Journal, vol. 15, no.1 2005
http://www.tjmcintyre.com/resources/computer_crime.pdf
[7]
Karen Murray, “Computer Misuse Law in Ireland”, May 1995 Irish Law Times 114
[8]
D. Kelleher “Cracking down on the hack-pack” 23rd Oct 2000 Irish Times p8.
[9]
R v Aaron Caffrey 2003
http://www.computerevidence.co.uk/Cases/CMA.htm
[10] Ethical hacker protection and security-breach notification law
http://www.out-law.com/page-8374
[11] European Convention on Cybercrime
http://conventions.coe.int/Treaty/en/Treaties/Html/185.htm
[12] Council Framework Decision on Attacks against Information Systems
http://europa.eu.int/eur-lex/en/com/pdf/2002/com2002_0173en01.pdf
[13] Irish Data Protection Act 1988
http://www.dataprotection.ie/docs/Data_Protection_Act_1988/64.htm
[14] Data Retention in Ireland by TJ McIntyre 2006
http://www.tjmcintyre.com/2007/02/data-retention-in-ireland-stealth-bad.html
[15] Dennis Kelleher and Karen Murray, “Information Technology Law in Ireland” 1997
Dublin ; Sweet & Maxwell p.253
[16] Tsunami Case
http://www.theregister.co.uk/2005/10/06/tsunami_hacker_convicted/
[17] Computer Misuse Act 1990
http://www.opsi.gov.uk/acts/acts1990/Ukpga_19900018_en_1.htm
[18] Case of Copland v. The United Kingdom 2007
http://www.bailii.org/eu/cases/ECHR/2007/253.html
[19] European Convention on Human Rights
http://www.bailii.org/eu/cases/ECHR/2007/253.html
20
Distributed Computing for Massively Multiplayer Online Games
Malachy O’Doherty 1 , Dr. Jonathan Campbell 2
1 Letterkenny Institute of Technology,
[email protected]
2 Letterkenny Institute of Technology,
[email protected]
Abstract
This paper discusses a novel approach to distributing work amongst peers in Massively Multiplayer Online Games
(MMOGs). MMOGs cater for thousands of players each providing a node. Traditionally, the networking approach
taken is that of client-server.
This paper examines previous approaches taken to distribute server workload. Then by analysing the problem
domain puts forward an approach which aims to maximise the amount of distribution, in a secure manner, by
concentrating on distributing tasks as opposed to distributing data. Furthermore the approach uses techniques to
reduce the instability caused by the wide variance in node latency which exists in this type of scenario.
The result is that the client nodes act as processing nodes for specific job types. The central server becomes a
job controller, receiving connections and organising which nodes are used, which of the tasks they perform and
when. By doing this server bandwidth is reduced and hence costs.
Keywords: Games, Distributed
1. Introduction
Multiplayer games are becoming an ever more popular aspect of computer gaming. They come in different
forms, LAN based (MMG), internet based i.e. online (MMOG), including the following different genres, role
playing (MMORPG) and first person shooter (MMOFPS). Even online casinos are a form of multiplayer online
game (e.g. http://www.casino.com/).
The expansion of this sector of the market can be of no real surprise. From a commercial point of view it is of
great benefit. It provides both the once-off profit from the purchase of the disc product and a continuing revenue
stream from the monthly subscription fee.
Added to this is the fact that for many recent games, (real) money can be earned in various ways. Some
examples being:
• By auctioning off artifacts, which exist in the game world, on the likes of eBayT M . This is external to the
game.
• Newer games such as Second LifeT M enable the creation of artifacts by the players in the game world and
sale of such items.
21
• Latterly there is yet another possible revenue stream, commission on the exchange of real money to game
world currency.
The down side of providing the servers to host the game world is the extra cost. This includes the cost of the
support staff, heating and electricity but especially the cost of the bandwidth needed to enable gameplay.
One of the main reasons for the client-server architecture being used is the online gaming axiom “Never trust
the client”. This comes about because of the fact that there always seems to be a proportion of the clients who will
try to gain an unfair advantage over other players, that is cheat.
By shifting as much of the workload as possible onto the client machines a reduction in the bandwidth requirement for the server can be achieved. It will also lead to the situation where the server(s) becomes more of a
central control with little involvement in the gameplay. The client machines become a network of machines which
collectively host the gameplay for the game world with the server acting as a job controller.
This is ambitious given that:
• the client should not be trusted
• the makeup and size of the set of connected clients can vary quite dramatically
• there is always the unreliability of the network to contend with
2. Review of Problem Area
2.1. Deployment Structures
Currently most Massively Multiplayer Games (MMGs) follow the server client model. This is where a central
server is the sole arbitrator as to the game state [4][7]. The rational for this is an online gaming axiom “never trust
the client”.
As the scale of the game increases to massively multiplayer, one server is not powerful enough to cope with the
required number of simultaneous clients. Thus many game worlds use a number of servers, each dedicated to a
region within the game world, which collectively govern the whole game world. These server ‘shards’ as they are
called [4], are networked together and keep themselves synchronized and consistent, acting as a grid [9]. But it is
important to note that this grid acts as one server and so is still fulfilling the client-server model.
In addition there is the effect that increasing participant numbers has on bandwidth. The more successful the
game is the greater the bandwidth requirement for the server becomes. This means an increasing cost base which
is a limiting factor for profits.
The goal of reducing server element numbers and bandwidth requirements has led to a number of approaches
being tried in the past. They all look to achieve their aim by introducing varying degrees and methods of reducing
the centralized nature of the client-server paradigm. The aim being to distribute one or more aspects of the
multiplayer game, not to shards, but to the computers of players.
A number of topologies have been explored. Server-client (see Fig.1), cluster (under various names) and lastly
fully distributed. The cluster approach is sometimes referred to as a tiered structure. It entails a number of client
nodes being designated to adopt both the server and client roles. We will refer to this as a cluster server. Each
cluster server is a client to either the main (central) server or to another cluster server, and acts as a server to a
group of other clients. Fig.2 shows two clusters attached to the main server. This represents two tiers of client
node. The nodes operating as client/server form one tier and those operating as client only a second tier.
This approach is used with a slight twist [2] in that each cluster is configured as a fully distributed subnetwork,
see Fig.3, with any one of the nodes acting as the link to the main server. This may be a direct link or via another
node. This hybrid structure should be able to reduce the server bandwidth requirement, however, the lower level
22
servers are in a trusted position being the active connection to the central server. Since this server is hosted on a
client machine this arrangement is susceptible to cheating as the game state can be tampered with easily.
Peer-to-peer overlays, typically used for file sharing (for example BitTorrentT M ), have been applied in [9][11]
and also in [8][5]. Typical problems with using peer-to-peer overlays is that they are susceptible to cheating and
churn (the process of many players logging on and off at the same time).
Figure 1. Network
typography ServerClient
Figure 2. Network Topography - Cluster
Figure 3. Network Topography - full Distribution
One example of a game which is fully distributed is “Age of Empires”T M [13] which has the full game on each
player’s machine and uses a star configuration in the network to connect all the player machines to all the others
(see Fig.3).
This leads to a situation where every move/event which occurs in the gameplay is broadcast to every other
machine. The consequence is that a lot of redundant information is being passed around the network and essentially
the same work is being done on every machine.
Other interesting approaches to using distributed technologies to the area of MMOGs have been explored. The
idea of using software agents was considered by [4].
In this framework a player’s character would have to move to another player’s machine which is acting as a
region server. The question (apart from those relating to bandwidth and latency) is could a cheating player hijack
the agent or replace it with a suicidal version before it moves back? After all the region server is deemed to have
the ’original’ version of the agent. Ultimately what is not addressed is the need for a secure environment for a
mobile agent to work in. That is a tamperproof region of memory and tamperproof access rights, such that the
agent could confirm that the local application is unaltered and that data flow with it is secure.
2.2. Synchronization
All distributed systems are concerned with synchronization. Each networked machine has its own timers and
will suffer from drift (i.e. the presented time will gradually vary from the true time due to inaccuracies in the
time-keeping process). Thus even if a number of machines all start with the same time, over a period, each will
drift at a differing rate and loose synchronosity. Furthermore latency makes it difficult (for obvious reasons) to,
for instance, compare timestamps (if a high degree of accuracy is required).
Many approaches to synchronization have been adopted: Lamport’s clock (logical clock), vector clocks and
matrix clocks. For a discussion on these see [3].
However timekeeping for a multiplayer game is a rather special case. There is no need to keep track of an
absolute time. Indeed, since one of the things a game should aim to achieve is a disassociation from reality and an
attachment to the game world, the link with time is gone and replaced with a sequence of events.
This leads on to the next point, namely that each player does not have the same view of the game world. This
is true not just because of position and viewing angle, but also because of latency. Each player ’sees’ the result
23
of their last update, which is the sum of the events which had been registered up to that point. Another player
with a different latency on their connection will have their update occur at a slightly different point and thus with
a different set of events that have been registered.
It is because of this that is said that clients ‘see’ only an approximation of the true game world.
The aim of reducing this lack of consistency (and to avoid some forms of cheating) lead to the concept of
”bucket synchronization” being put forward [6]. This is where game events received over a period of time are,
metaphorically, placed in a bucket. The game proceeds then as a sequence of these buckets, thus providing
synchronization as a game event can be considered to have occurred at a point in the sequence.
This however does not address the problem of cheating. In an attempt to address the issue of cheating lockstep
was introduced [1] and later modified to asynchronous synchronization [8].
Lockstep uses a two phase commit approach. Each node (player) first presents a hash of their move. Then once
all have presented, each reveals the move. This allows verification that the move is unaltered after the start of the
reveal phase.
Asynchronous synchronization is an extension to lockstep. It allows the use of the proximity of game characters
(i.e. in the game world space) to decide which nodes must be involved in the lockstep process.
These measures improved the situation and countered some cheats but is susceptible to others such as collusion.
2.3. Cheat Categorization
When considering the requirements for any type of MMG it is important to consider how to ensure (as much as
is feasibly possible) fair play. This can in a general way be considered as software security which is in turn linked
to cryptography. One excellent text on this subject is [12] which shows how to approach the whole idea of security
of data.
This type of approach has been taken by [7] when considering the type of cheats typically used in games. Recategorizing these we can apply areas of susceptability: game logic; application; network. Now if we look more
closely at these categories discussed in [7] and the type of cheats in each we see the following.
1. Game logic. This covers cheats such as when a player discovers that by using an item and dropping it at
the same time they create a replica. While this is undoubtedly unfair to other players and may cost the game
company lost revenue, it is actually a legal action in the game. The problem is that it was an unintentional
ability that the game designer(s) never guarded against. Thus it is debatable if this is really a classification
of cheat or of application error (i.e. designers should not blame the player for their mistakes).
2. Application. This is the home of the dedicated and knowledgable cheat. It covers activities such as decompiling the code and altering it to modify various attributes, for example making walls on the local node
see-through, thus giving the cheat an advantage. It need not be so complicated a task. Some games give
the player character a plain text file which specifies the character abilities which can easily be altered to the
players advantage.
3. Network. Here we are concerned with types of cheat which do not depend on the particular game being
played but rather on the interworking of the various nodes in the game. A number of these exist.
(a) Infrastructure. Not really a cheat, this is where game play is disrupted by an attack on the basic
infrastructure e.g. a denial of service attack.
(b) Fixed-delay. Here the cheat delays their outgoing network packets. This results in the other players
receiving events later and the cheat receiving the other player’s events and having some extra time to
react.
24
(c) Time-stamp. Consider the situation where the cheat alters his local code so that when his character is,
say, hit by a bullet when he is trying to dodge and the code changes the time-stamp on the movement
event to be in advance of the movement of the bullet. When the other player nodes reconcile the event
timings it will appear as if the cheat managed to dodge the bullet.
(d) Suppressed update. This one enables a form of hiding. The cheat stops their gameplay updates from
being sent. As far as the other player nodes are concerned they have no position and are not rendered.
Of course the cheat has to ensure that periodic updates are sent to ensure they are not dropped from
the game and that they can recommence standard updates when it suits them.
(e) Inconsistency. Rather than skipping update messages this cheat involves sending false data to one or
more players. This allows the cheat to, say, appear to be some distance to the left of their true position.
The cheat has to be careful that the false data and true data merge at a later time in a graceful way or
it will be obvious that something is amiss.
(f) Collusion. Here a number of cheaters help each other by detecting specific information of interest and
passing it on. This is useful to them when the receiver is not supposed to be privy to the information.
For example, cheat A tells fellow cheat B the location of player C but B should not know that given
the current state of the game.
3. Design
At the low level of implementation all computer games follow (for general play) the same basic approach. That
is they flow through a game loop. Generally, multiplayer games running over some sort of network have client
code which follows a modified game loop as in Fig. 4a. Comparing this with the game loop of Fig. 4b, we can see
that two extra phases have been introduced for the approach described here. In particular, the added activity “net
job(s)” in this figure is key to this approach.
It is the use of this game loop that allows the gameplay logic to be executed utilizing client processor cycles
instead of just the server.
Having reviewed the available literature describing previous approaches a number of considerations occur.
First the core idea of bucket synchronization (as described in chapter 2.2 ) is appropriate for games as it synergizes with the notion of animation.
Next, it would seem that peer-to-peer mechanisms for persistency are insufficient for a gaming network where
every node must be considered untrustworthy. Therefore game state persistency must remain the remit of a centralized server.
A fully distributed architecture is only applicable to games where all of the players can accommodate the
bandwidth requirements which will increase proportionately with the number of players. This precludes massively
multiplayer games.
The degree of distribution possible inevitably depends on the number of players. Taking an extreme situation
of having just two players connected, one might think that the workload could be shared between them, ensuring
that each does the work pertinent to the other. But this does not guard against cheating, in particular collusion
cheating, so in principle the server should not off-load any of the work unless there is an excess of player nodes
available. This way it can rotate the work around.
Considering this the following fundamental design principles have emerged:
• A client node should not know which player the work relates to.
• A client node should not be given the same task repeatedly.
• A client node should not be given tasks pertaining to the same game world geographical region sequentially.
25
(a) typical Game Loop
(b) this Game Loop
Figure 4. Game Loops
Figure 5. frame Participation Sequences
• A client node where calculations are done should not know the final destination of the results.
• Bucket synchronization should be used.
• Persistency of the game state is a responsibility of the main server.
• Distribution of the workload can only take place when there are more client nodes than roles to be assigned.
With the above in mind the most basic decision is to utilize bucket synchronization [6]. Henceforth this will be
referred to as frame synchronization.
Having decided on a frame structure for synchronizing network updates, it is appropriate to look at how the
requirement to allow divergent latencies can be accommodated.
Looking at Fig. 5 one can see how different clients can opt into different frequencies of update. It is important
that whatever the rate of update opted for remains consistent, otherwise cheating is made more possible (it may
be possible to allow reductions in the frequency of update if it is sanctioned by the server). Also there has to be a
minimum number of clients in each frame so that the server can allocate jobs to minimise cheating potential.
The potential disturbances caused by players leaving (either gracefully from the game or a dropped connection)
needs to be compensated for and requires that each task be replicated. As an initial estimated figure the replication
factor will be three. Three is chosen as it is the lowest number that provides both replication and can indicate a
majority in the case of a disputed result (see [10] for a proof of this).
26
A case exists for using four as the minimum. This is based on the idea that in the case of three, if one node
disconnects then only two remain and no majority decision can be made. In the case of using four as a minimum,
one is designated as a backup. It performs the same calculations but is not included in the decision making.
Furthermore the latency for each client node must be taken into consideration. For example only nodes with a
relatively low latency should be allowed to undertake tasks. This effectively increases the number of players that
are required to be connected before the server can distribute workload.
The overall result is a hybrid model which starts out as server client but with increasing numbers of players will
transform to a more distributed form. In the new scenario the server acts as a (login) gateway to the game and as a
controller - allocating jobs/roles to varying client nodes.
So then what type of jobs/roles are required. Well there has to be a role which is responsible for receiving
player updates (e.g. character movement). Then there is a role which is responsible for calculating the effect of
all the various updates. Lastly there is the task of informing all concerned players of the results of the cumulative
updates.
With these roles on the client nodes the server, in conjunction with the processes running these roles, will act as
one computer.
Note that what is of interest is the distribution of workload, thus no consideration has been given to the changeover from client-server to distributed server.
4. Conclusions
The prototype which was developed demonstrates that it is possible to redeploy game processing from a centralized server structure to a more de-centralized structure.
In order to quantify the performance of the overall system, it would be necessary to conduct a large scale
simulation and/or mathematical analysis. This would need to measure the difference between the client-server
approach and the approach outlined here in these regards:
1. Required server processing power.
2. Required client processing power.
3. Server bandwidth requirement.
4. Client bandwidth requirement.
5. Range of client bandwidths that can be accommodated.
In each case the measurements would need to use the same game scenario and to be on the basis of a defined
number of client nodes being connected. Indeed they should be repeated for each of a range of such deployments.
As referred to in 2.1 the notion of using mobile agents [4] could be particularly useful for autonomous characters. These need to be controlled by the server even in the distributed model presented. With a secure environment
to relocate to this responsibility could be de-centralized.
References
[1] N. E. Baughman and B. N. Levine. Cheat-proof playout for centralized and distributed online games. INFOCOM 2001,
pages 104–113, 2001.
[2] K. chui Kim, I. Yeom, and J. Lee. A hybrid mmog server architecture. IEICE TRANSACTIONS on Information and
Systems, E87-D(12):2706–2713, December 2004.
[3] G. F. Coulouris, J. Dollimore, and T. Kindberg. Distributed systems : concepts and design. Addison-Wesley, Wokingham, England ; Reading, Mass., 3rd edition, 2001.
27
[4] A. ElRhalibi and M. Merabti. Agents-based modeling for a peer-to-peer mmog architecture. ACM Computers in
Entertainment, 3(2), April 2005. Article 3B.
[5] C. Gauthier-Dickey, D. Zappala, and V. Lo. A fully distributed architecture for massively multiplayer online games.
ACM SIGCOMM 2004 Workshop on Network and System Support for Games, pages 2706–2713, September 2003. Year
of Publication: 2004.
[6] C. Gauthier-Dickey, D. Zappala, and V. Lo. A fully distributed architecture for massively multiplayer online games.
ACM SIGCOMM 2004 Workshop on Network and System Support for Games, pages 2706–2713, September 2003. Year
of Publication: 2004.
[7] C. Gauthier-Dickey, D. Zappala, V. Lo, and J. Marr. Low latency and cheat-proof event ordering for peer-to-peer games.
Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video
table of contents, June 2004.
[8] A. S. John and B. N. Levine. Supporting p2p gaming when players have heterogeneous resources. Proceedings of
the international workshop on Network and operating systems support for digital audio and video, pages 1–6, 2005.
NOSSDAV’05.
[9] B. Knutsson, H. Lu, W. Xu, and B. Hopkins. Peer-to-peer support for massively multiplayer games. Proceedings of
IEEE INFOCOM’04, March 2004.
[10] L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems, pages 382–401, July 1982.
[11] H. Lu, B. Knutsson, M. Delap, J. Fiore, and B. Wu. The design of synchronization mechanisms for peer-to-peer
massively multiplayer games. Technical Report Penn CIS Tech Report MS-CIS-04-xy.pdf, Computer and Information
Science, University of Pennsylvania, 2004.
[12] B. Schneier. Applied Cryptography. John Wiley & Sons, Inc, second edition, 1996.
[13] M. Terrano and P. Bettner. 1500 archers on a 28.8: Network programming in age of empires and beyond. Proceedings
of the 15th Games Developers Conference, March 2001.
28
A Comparative Analysis of Steganographic Tools
Abbas Cheddad, Joan Condell, Kevin Curran and Paul McKevitt
School of Computing and Intelligent Systems, Faculty of Engineering
University of Ulster. Londonderry, Northern Ireland, United Kingdom
Emails: {cheddad-a, j.condell, kj.curran, p.McKevitt}@ulster.ac.uk
Abstract
Steganography is the art and science of hiding data in a transmission medium. It is a sub-discipline
of security systems. In this paper we present a study carried out to compare the performance of
some common Steganographic tools distributed online. We focus our analysis on systems that use
digital images as transmission carriers. A number of these systems exceptionally do not support
embedding images rather they allow text embedding; therefore, we constrained the tools to those
which embed images files. Visual inspection and statistical comparison methods are the main
performance measurements we rely on. This study is an introductory part of a bigger research
project aimed at introducing a robust and high payload Steganographic algorithm.
Keywords: Steganography, Image Processing, Security Systems, Statistics.
1
Introduction
In the realm of this digital world Steganography has created an atmosphere of corporate vigilance that
has spawned various interesting applications. Contemporary information hiding is due to the author
Simmons [Simmons, 1984] for his article titled “The prisoners’ Problem and the Subliminal Channel”.
More recently Kurak and McHugh [Kurak and McHugh, 1992] published their work which resembles
embedding into the 4LSBs (Least Significant Bits). They discussed image downgrading and
contamination which is known now as Steganography. Steganography is employed in various useful
applications e.g., copyright control of materials, enhancing robustness of image search engines and
Smart IDs where individuals’ details are embedded in their photographs. Other applications are videoaudio synchronization, companies’ safe circulation of secret data, TV broadcasting, Transmission
Control Protocol and Internet Protocol (TCP/IP) packets [Johnson and Jajodia, 1998], embedding
Checksum [Bender et al., 2000]…etc. In a very interesting way Petitcolas [Petitcolas, 2000]
demonstrated some contemporary applications. One of these was in Medical Imaging Systems where a
separation is considered necessary for confidentiality between patients’ image data or DNA sequences
and their captions e.g., Physician, Patient’s name, address and other particulars. A link however, must
be maintained between the two. Thus, embedding the patient’s information in the image could be a
useful safety measure and helps in solving such problems. For the sake of providing a fair evaluation
of the selected software tools, we restricted our experiments to embedding images rather than text.
The location of the message in the image can vary. The message may be spread evenly over the
entire image or may be introduced into areas where it may be difficult to detect a small change such as
a complex portion in the image. A complex area is also known as an area of high frequency in which
there are considerable changes in colour intensity. Embedding can be performed in the image spatial
domain or in the frequency domain. Embedding in the spatial domain can be achieved through altering
the least significant bits of the bytes of image pixel values. This process can be in a sequential fashion
or in a randomised form. Algorithms based on this method have a high payload, however the method
is fragile, prone to statistical attacks and sometimes visual attacks can suffice. The second type of
29
method, the frequency domain method, is based on the embedding in the coefficient in the frequency
domain (i.e., Discrete Cosine Transformation (DCT), Discrete Wavelet Transformation (DWT)). This
type of technique is more robust with regard to common image processing operations and lossy
compression. Another type of method is that of adaptive Steganography which adapts the message
embedding technique to the actual content and features of the image. These methods can for example
avoid areas of uniform colour and select pixels with large local standard deviation. Edge embedding
can also be used alongside adaptive Steganography.
2.
Steganographic Tools
The tools that we used for this study are detailed and discussed below. Sources are shown as
necessary.
2.1 Hide and Seek (V. 4.1)
Hide and Seek is one of the older methods of Steganography [Wayner, 2002]. It uses a common
approach and is relatively easy to apply in image and audio files [Johnson and Katzenbeisser, 2000].
Steganography by this method is carried out by taking the low order bit of each pixel and using it to
encode one bit of a character. It creates some noise in the image unless a greyscale image is used.
When using Hide and Seek with colour GIFs noise is very obvious. Although it has been asserted that
greyscale GIFs do not display any of the artefacts or bad image effects associated with 8-bit colour
images which have undergone Steganography, our experiment shows an obvious random salt and
pepper like noise on the cover image.
Hide and Seek can be used on 8-bit colour or 8-bit black and white GIF files that are 320 by 480
pixels in size (the standard size of the oldest GIF format) [Wayner, 2002]. There are 19200
(320*480/8) bytes of space available in this GIF image which gets rounded down in practice to 19000
for safe dispersion. In version 4.1 if the cover image is larger than allowed the stego-image is cropped
or cut to fit the required size [Johnson et al, 2001]. When an image contains a message it should not be
resized because if it has to be reduced part of the message bits will be lost. If the image is too small it
is padded with black space. There is also a version 5.0. It works with a wider range of image sizes.
However, this version of Hide and Seek also uses a restricted range of image sizes. The images must
fit to one of these sizes exactly (320* 200, 320* 400, 320* 480, 640* 400 and 1024* 768) [Johnson et
al, 2001]. In version 5 if the image exceeds the maximum allowed which is 1024*768 an error
message is returned. If the image is smaller than the minimum size necessary the image containing the
message is padded out with black space. The padded areas are added before the message is embedded
and are therefore also used as areas in which to hide the message [Johnson et al, 2001]. But if the
padded area is removed the message cannot be recovered fully. These characteristics of Hide and Seek
stego-images lead searchers/crackers to the fact that a hidden message exists. Hide and Seek 1.0 for
Windows 95 has no size limit restrictions and uses an improved technique for information hiding
however it can still only be used on 8-bit images with 256 colours or greyscale [Johnson et al, 2001].
BMP images are used with this version instead of GIF images because of licensing issues with GIF
image compression [Johnson et al, 2001].
A user chosen key can be inserted into a pseudo random number generator which will determine
random number which indicate bytes in the image where the least significant bit is to be changed
[Wayner, 2002]. This makes the system more secure as it has two layers of security. The positions in
which the message bits are hidden are not in fact random but do follow some sort of pattern. An 8-byte
header on the message controls how the message data is dispersed. The first two bytes indicate the
length of the message. The second two are a random number key. The key is chosen at random when
the message is inserted into the image. The key is firstly inserted into the random number generator
[Wayner, 2002]. In Hide and Seek 4.1 there is a built in C code random number generator. A
cryptographically secure random number generator could also be used to increase security or IDEA
could be used to encrypt the random numbers using a special key. The third pair of bytes is the version
of Hide and Seek used. The fourth pair of bytes is used to complete the eight byte block which is
necessary for the IDEA cipher [Wayner, 2002]. The 8-byte block is encrypted using the IDEA cipher
30
which has an optional key and is then stored in the first 8 bytes of the image. If the key is not known
the header information cannot be understood and the dispersion of the data in the image cannot be
found.
Stego-images will have different properties depending on the version of Hide and Seek used. In
version 4.1 and version 5 all palette entries in 256 colour images are divisible by four for all bit values
[Johnson et al, 2001]. Greyscale stego-images have 256 triples. They range in sets of four triples from
0 to 252 with incremental steps of 4 (0, 4, 8,…, 248, 252). This can be detected by looking at the
whitish value which is 252 252 252. This signature is unique to Hide and Seek [Johnson et al, 2001],
[Johnson and Jajodia, 1998]. Later versions of Hide and Seek do not produce the same predictable
type of palette patterns as versions 4.1 and 5.0 [Johnson et al, 2001; Johnson and Jajodia, 1998].
The DOS command for the Hide and Seek software is as follows:
x hide <infile.ext> <Cover.gif> [key]
x seek <Stego.gif> <outfile.ext> [key]
2.2 S-Tools (V. 4)
The S-Tools package was written by Andy Brown [Wayner, 2002]. Version 4 can process image or
sound files using a single program (S-TOOLS.EXE). S-Tools involves changing the least significant
bit of each of the three colours in a pixel in a 24-bit image [Wayner, 2002] for example a 24-bit BMP
file [Johnson et al, 2001]. The problem with 24-bit images is that they are not commonly used on the
web and tend to stand out (unlike GIF, JPEG, and PNG). This feature is not helpful to Steganography.
It involves a pre-processing step to reduce the number of colour entries by using a distance
measurement to identify neighbour colours in terms of intensity. After this stage each colour of the
dithered image would be associated with two palette entries one of which will carry the hidden data.
The software for S-Tools can reduce the number of colours in the image to 256 [Wayner, 2002]. The
software uses the algorithm developed by Heckbert [Heckbert, 1982] to reduce the number of colours
in an image in a way that will not visually disrupt the image [Wayner, 2002; Martin et al, 2005]. The
algorithm plots all the colours in three dimensions (RGB). It searches for a collection of n boxes,
which contains all of the colours in one of the boxes. The process starts with the complete
256*256*256 space as one box. The boxes are then recursively subdivided by splitting them in the
best possible way [Wayner, 2002]. Splitting continues until there are n boxes representing the space.
When it is finished the programme chooses one colour to represent all the colours in each box. The
colour may be chosen in different ways: the centre of the box, the average box colour or the average of
the pixels in the box. S-Tools as well as other tools based on LSBs in the spatial domain take for
granted that least significant bits of image data are uncorrelated noise [Westfield and Pfitzmann,
1999]. The system interface is easy to use. It supports a drag and drop method to load images. Once
the cover image is dragged in; the system will advise the user on how much data in bytes the image
can hold.
2.3 Stella (V. 1.0)
The Stella program derives its name from the Steganography Exploration Lab, which is located at the
University of Rostock. Its embedding process exploits the visually low prioritised chrominance
channels; the YUV-colour system is used. Here, the embedding algorithm considers only one channel
and works as follows:
1. Consider the chrominance value of a given pixel.
2. Read a bit from the secret message.
3. To embed a “0”, decrease the chrominance value of the pixel by one.
4. To embed a “1”, increase the chrominance value of the pixel by one.
5. Go to the next pixel.
It can be assumed that these slight changes of the chrominance values are smaller than a given JND
(Just Noticeable Difference).
31
2.4 Hide in Picture (HIP)
HIP (v 2.1) was created by Davi Tassinari de Figueiredo in 2002. Hide In Picture (HIP) uses bitmap
images. If the file to be hidden is large, it may be necessary to modify more than a single bit (LSB)
from each byte of the image, which can make this difference more visible. With 8-bit pictures, the
process is a little more complicated, because the bytes in the picture do not represent colour intensities,
but entries in the palette (a table of at most 256 different colours). HIP chooses the nearest colour in
the palette whose index contains the appropriate least-significant bits. The HIP header (containing
information for the hidden file, such as its size and filename) and the file to be hidden are encrypted
with an encryption algorithm, using the password given, before being written in the picture. Their bits
are not written in a linear fashion; HIP uses a pseudo-random number generator to choose the place to
write each bit. The values given by the pseudo-random number generator depend on your password, so
it is not possible for someone trying to read your secret data to get the hidden file (not even the
encrypted version) without knowing the password.
2.5 Revelation
Revelation was launched in 2005 by Sean Hamlin. It was entirely coded in Java, developed in the
Eclipse IDE. It operates in the same manner as the previous methods in terms of LSB embedding. The
basic logic behind their technique is the matching LSB coding which leaves a gray value not altered if
its LSB matches the bit to be hidden. Otherwise a colour indexed as 2i will be changed to 2i+1 if the
embedded bit is 1, or 2i+1 is shifted back to 2i in case of embedding a 0. Although the software’s
author claimed the use of a smart embedding method and the Minimum Error Replacement (MER)
algorithm to obtain a more natural Stego Image, the latter is prone to first order statistical attack as
shown in Figure 3.
3 Steganalysis
There are two stages involved in breaking a Steganographic system: detecting that Steganography has
been used and reading the embedded message [Zollner et al, 1998]. Steganalysis methods should be
used by the Steganographer in order to determine whether a message is secure and consequently
whether a Steganographic process has been successful. Statistical attacks can be carried out using
automated methods. A stego-image should have the same statistical characteristics as the carrier so
that the use of a stenographic algorithm can not be detected [Westfield and Pfitzmann, 1999].
Therefore a potential message can be read from both the stego-image and the carrier and the message
should not be statistically different from a potential message read from a carrier [Westfield and
Pfitzmann, 1999]. If it were statistically different the Steganographic system would be insecure.
Automation can be used to investigate pixel neighbourhoods and determine if an outstanding pixel is
common to the image, follows some sort of pattern or resembles noise. A knowledge base of
predictable patterns can be compiled and this can assist in automating the detection process [Johnson
and Jajodia, 1998]. Steganalysis tools can determine the existence of hidden messages and even the
tools used for embedding. Attacks on Steganography can involve detection and/or destruction of the
embedded message. A multiple tools have been introduced to perform Steganalysis; among them is
Chi-Square Statistical test, ANOVA test, StegSpy1, StegDetect2, Higher-level statistical tests3, etc. The
adopted methods of evaluation in this study are as follows:
1
2
3
www.spy-hunter.com
The algorithm described in: http://www.citi.umich.edu/u/provos/papers/detecting.pdf
And the software available at: http://www.outguess.org/
http://www.cs.dartmouth.edu/~farid/publications/tr01.html
32
3.1 Visual Inspection of the Image
This evaluation method is based on visual inspection of the image. The question is how vulnerable the
different Steganographic techniques are to detection through visual inspection of the image for telltale
distortion.
3.2 Statistical Analysis
There are two main types of statistical analysis methods investigated here for comparative analysis.
These are the peak signal-to-noise ratio and image histograms. They are outlined below.
x Peak-Signal-to-Noise Ratio
As a performance measurement for image distortion, the well known Peak-Signal-to-Noise Ratio
(PSNR) which is classified under the difference distortion metrics can be applied on the stego images.
It is defined as:
C2
(1)
PSNR 10 log 10 ( max )
MSE
where MSE denotes Mean Square Error given as:
MSE
and
1
MN
M N
2
¦ ¦ ( S xy C xy )
x 1y 1
(2)
2
C max
holds the maximum value in the image, for example:
1 in double precision intensity images
2
C max
d
255 in 8-bit unsigned integer intensity images
x and y are the image coordinates, M and N are the dimensions of the image, S xy is the
generated stego image and C xy is the cover image.
PSNR is often expressed on a logarithmic scale in decibels (dB). PSNR values falling below 30dB
indicate a fairly low quality (i.e., distortion caused by embedding can be obvious); however, a high
quality stego should strive for 40dB and above.
x Image Histogram
Histograms are graphics commonly used to display data distributions for quantitative variables. In the
case of image; these variables or frequencies are the image intensity values. In this study we can trace
any abnormalities in the Stego’s histogram.
3.3 Comparative Analysis and Results
The following table (Table 1) tabulates different PSNR values spawned by aforementioned software
applied on the images shown in Figure 1. Figure 2 and Figure 3 show the output of each of the tools
and Figure 4 depicts the pair effect that appears on Stego images generated by the Revelation
software. The authors suspect that the pair effect was caused by adopting the sequential embedding
concept, which creates new close colours to the existing ones or reduces the frequency difference
between adjacent colours.
33
Set A
Set B
Figure 1: Images used to generate tables 1. (Left to right) Set A: Cover image Boat, (321x481) and the
secret image Tank, (155x151). Set B: Cover image (Lena 320x480), Secret image (77x92).
Table 1: Summary of performance of the different software reported in this study.
Software
[Stella]
PSNR
Visual Inspection
Set A
Set B
18.608 22.7408 Very clear grainy noise in the Stego image,
which renders it the worst performer in
this study.
23.866
28.316 Little noise/accepts only 24-bit bmp files.
Creates additional colour palette entries. In
this case the original boat image has 32
colours and the generated Stego
augmented the number to 256 by creating
new colours.
26.769
16.621 Little noise/Works only with 24-bit images
[S-Tools]
37.775
25.208
No visual evidence of tamper
[Revelation]
23.892
24.381
No visual evidence of tamper, but pair
effect appears on the histogram of some
outputs
[Hide&Seek]
[Hide-in-Picture]
34
Hide and Seek
Hide-in-Picture
Stella
S-Tools
Revelation
Original
Figure 2: Set A. Stego images of each tool.
Hide and Seek
Hide-in-Picture
Stella
S-Tools
Original
Figure 3: Set B. Stego images of each tool.
35
Revelation
Frequency
Gray value
Figure 4: Revelation leaves very obvious pair effect on the histogram. (Top) Original image
histogram and (Bottom) Stego image generated by Revelation tool.
4. Conclusion
We have presented a comparative study of some Steganographic tools distributed online. The
Revelation tool seems to do a good job in hiding any visual tamper on the cover image, but the
histogram of its generated output reveals some traces left by the tool which draws suspicion to the
stego image. Hide-In-Picture earned a good PSNR value but the cover image was distorted a bit after
the embedding. Hide and Seek shows very clear grainy salt and pepper noise on the stego image, the
noise appears random. S-Tools shows better performance taking into consideration the two factors
(i.e., PSNR and Visual Inspection). Stella leaves prints at the end of the stego file which can be picked
up easily by some Steganalysis software like the one we tried, which is an open source application
(ImageHide Hidden Data Finder v0.2, see “Internet Resources” at the References section). All the
above tools can not resist image compression and that is why all their input image files were of
lossless type (i.e., BMP, GIF). The authors affirm that S-Tools algorithm has the highest performance
and its software provides a better graphical interface. It should be worth noting here that
Steganographic tools have undergone significant improvement by exploiting JPEG compression
method. For example F5 and OutGuess algorithms (see references) are strong enough to resist major
attacks, moreover their resulting image Stego is of high quality (i.e., high PSNR). This study is meant
for evaluating tools that act in the spatial domain, thus discussing F5 and OutGuess is out of scope of
this evaluation.
References
[Simmons, 1984] Simmons, G. J., (1984). The Prisoners’ Problem and the Subliminal Channel.
Proceedings of CRYPTO83- Advances in Cryptology, August 22-24. 1984. pp. 51.67.
[Kurak, 1992] Kurak, C. and McHugh, J., (1992). A cautionary note on image downgrading.
Proceedings of the Eighth Annual Computer Security Applications Conference. 30 Nov-4 Dec 1992
pp. 153-159.
[Johnson, and Jajodia, 1998] Johnson, N. F. and Jajodia, S., (1998). Exploring Steganography: Seeing
the Unseen. IEEE Computer, 31 (2): 26-34, Feb 1998.
[Bender et al., 2000] Bender, W., Butera, W., Gruhl, D., Hwang, R., Paiz, F.J. and Pogreb, S., (2000).
Applications for Data Hiding. IBM Systems Journal, 39 (3&4): 547-568.
[Petitcolas, 2000] Petitcolas, F.A.P., (2000). “Introduction to Information Hiding”. In: Katzenbeisser,
S and Petitcolas, F.A.P (ed.) (2000) Information hiding Techniques for Steganography and Digital
Watermarking. Norwood: Artech House, INC.
36
[Wayner, 2002] Wayner, P. (2002). Disappearing Cryptography. 2nd ed. USA: Morgan Kaufmann
Publishers.
[Johnson and Katzenbeisser, 2000] Johnson and Katzenbeisser, (2000). “A survey of Steganographic
techniques”. In: Katzenbeisser, S and Petitcolas, F.A.P (ed.) (2000) Information hiding Techniques
for Steganography and Digital Watermarking. Norwood: Artech House, INC.
[Johnson et al., 2001] Johnson Neil F., Zoran Duric, Sushil Jajodia, Information Hiding,
Steganography and Watermarking - Attacks and Countermeasures, Kluwer Academic Publishers,
2001.
[Heckbert, 1982] Heckbert Paul, Colour Image Quantization for Frame Buffer Display. In Proceedings
of SIGGRAPH 82, 1982.
[Martin et al, 2005] Martin, A., Sapiro, G. and Seroussi, G., (2005). Is Image Steganography natural?.
IEEE Trans on Image Processing, 14(12): 2040-2050, December 2005.
[Westfield and Pfitzmann, 1999] Andreas Westfield and Andreas Pfitzmann, Attacks on
Steganographic Systems Breaking the Steganography Utilities EzStego, Jsteg, Steganos and S-Tools
and Some Lessons Learned. Dresden University of Technology, Department of Computer Science,
Information Hiding, Third International Workshop, IH'99 Dresden Germany, September / October
Proceedings, Computer Science 1768. pp. 61- 76, 1999.
[Zollner et al, 1998] Zollner J., H. Federrath, H. Klimant, A. Pfitzmann, R. Piotraschke, A. Westfeld,
G. Wicke, G. Wolf, Modelling the Security of Steganographic Systems, Information Hiding, Second
International Workshop, IH'98 Portland, Oregon, USA, Proceedings, Computer Science 1525. pp. 344354, April 1998.
Internet Resources
[Hide and Seek]:
ftp://ftp.funet.fi/pub/crypt/mirrors/idea.sec.dsi.unimi.it/cypherpunks/steganography/hdsk41b.zip
[S-Tools]:
ftp://ftp.funet.fi/pub/crypt/mirrors/idea.sec.dsi.unimi.it/code/s-tools4.zip
[Stella]:
http://wwwicg.informatik.uni-rostock.de/~sanction/stella/
[Hide in Picture]: http://sourceforge.net/projects/hide-in-picture/
[Revelation]:
http://revelation.atspace.biz/
[ImageHide Hidden Data Finder]: http://www.guillermito2.net
[F5]: http://wwwrn.inf.tu-dresden.de/~westfeld/f5.html
[OutGuess]: http://www.outguess.org/
37
Session 2
Computing Systems
39
A Review of Skin Detection Techniques for Objectionable
Images
Wayne Kelly1, Andrew Donnellan1, Derek Molloy2
1
Department of Electronic Engineering, Institute of Technology Tallaght Dublin, Tallaght,
Dublin 24, Ireland.
[email protected]
2
School of Electronic Engineering, Dublin City University.
[email protected]
Abstract
With the advent of high speed Internet connections and 3G mobile phones, the relative ease of
access to unsuitable material has become a major concern. Real time detection of unsuitable
images communicated by phone and Internet is an interesting academic and commercial problem.
This paper is in two parts. Part I compares and contrasts the most significant skin detection
techniques, feature extraction techniques and classification methods. Part II gives an analysis of
the significant test results. This paper examines twenty-nine of the most recent techniques along
with their specific conditions, mathematical foundations and their pros and cons. Finally, this
paper concludes by identifying future challenges and briefly summarizes the proposed features of
an optimal system for future implementation.
Keywords: Skin Detection, Objectionable Image, Feature Extraction, Texture Analysis.
1
Introduction
In early 2004 the Irish government demanded that its mobile phone networks must take responsibility
for material transmitted across their systems and implement security precautions to prevent the
distribution of objectionable material to minors. This was after two cases concerning the transmission
of pornographic images to teenagers. The first incident was when sexually explicit images, showing a
14 year old girl, were found to be circulating amongst school students [43]. The second, when a
teenage girl received pornographic images from an unidentified phone number [42].
The development of objectionable image detection systems has been instigated throughout the world
by events such as the incidents which took place in Ireland in 2004. The identification processes
generally follow the format, outlined in Figure 1, with each paper varying in its implementation
technique.
The structure of this report is as follows: Section 2 gives a comparison of the most significant skin
detection techniques, while Section 3 and 4 give comparisons of the feature extraction techniques and
classification methods respectively. Section 5 analyses the most significant test results. In Section 6 a
discussion on future challenges and a summarized proposal for an implementation technique is given.
Objectionable Image
Input Image
Skin Detection
Feature
Extraction
Image Classification
Benign Image
Figure 1: General Objectionable Image Detection System
40
2
Skin Detection
The detection of skin is an indication of the presence of a human limb or torso within a digital image.
In recent times various methods of identifying skin within images have been developed. This section
gives an overview of the main skin detection methods implemented for the detection of objectionable
images.
2.1
Colour Spaces for Skin Detection
Colour space can be described as the various ways to mathematically represent, or store, colours.
Choosing a colour space for skin detection has become a contentious issue within the image
processing world. Shin [32] found that colour space transformation was unnecessary in skin detection
as RGB gave the best results and that the luminance component gave no improvement so could be
ignored. Jayaram [35] found that the best performance was obtained by converting the pixels to SCT,
HSI or CIE-LAB and using the luminance component did improve the results. Albiol [36] declares
that if an optimum skin detector is designed for every colour space, then their performance will be the
same. Gomez [34] states that for pixel based skin detection there is seldom an appropriate colour
model for indoor and outdoor images, but does show that a combination of colour spaces can improve
the performance (E of YES, red/green and H of HSV).
2.1.1 Basic Colour Space (RGB, Log-opponent RGB)
One of the most commonly used methods for representing pixel information of a digital image is the
RGB (Red, Green, Blue) colour space. In this colour space levels of red, green and blue light are
combined to produce various colours. Jones and Rehg [5] identified 88% of pixels correctly while
using RGB for simplicity and speed, as most web images use RGB colour space. It is also stated here
that the accuracy could be increased if another colour space was used. RGB colour space has been
used extensively in the detection of objectionable images [15][16][21].
Another form of RGB colour space is log-opponent RGB (IRgBy) [31] which is logarithmic transform
of the RGB colour space. IRgBy relies on the blood (red) and melanin (yellow, brown) properties of
human skin for detection, which is based around hue and saturation components of the colour space
[1]. IRgBy does not contain a hue or saturation component so it must be calculated separately. As
HSV and HSI colour spaces contain these components the IRgBy colour space has been largely
ignored as a skin detection colour space but has been used in objectionable image detection and has
showed poor results (70% accuracy with colour histogram) [7]. Comparisons of its skin detection
capabilities has been made by Chan [3] showing it to give less accuracy compared to HSV.
2.1.2 Perception Colour Space (HSV, HSI)
The HSV (Hue, Saturation, Value), also referred to as HSB (Hue, Saturation, Brightness), colour space
is a nonlinear transform of RGB and can be referred to as being a perceptual colour space due to its
similarity to the human perception of colour. Hue is a component that describes pure colour (e.g. pure
yellow, orange or red), whereas saturation gives a measure of the degree to which a pure colour
diluted by white Light [33]. Value attempts to represent brightness along the grey axis (e.g. white to
black), but as brightness is subjective it is therefore difficult to measure [33]. Along with RGB, HSV
is one of the most commonly used colour spaces for skin detection [10], although sometimes said to
give better results [3][6][25]. Q. Zhu et al [20] also notes that dropping the Value component and only
using the Hue and Saturation components, can still allow for the detection 96.83% of the skin pixels.
HSI (Hue, Saturation, Intensity), also referred to as HSL (Hue, Saturation, Luminance), is another
perceptual colour space that gives good skin detection results. Like Value in HSV, Intensity is another
representative of grey level, but decoupled from the colour components (Hue and Saturation). The
41
HSI colour space was used by Wang [24] as part of a content-based approach, stating that the skin and
background pixels can be better differentiated using HSI rather than RGB.
2.1.3 Orthogonal Colour Space (YCbCr, YIQ, YUV)
Often associated with digital videos the YCbCr colour space is one of the most popular colour spaces
for skin detection [17][18][29]. YCbCr is a colour space where the luminance (Y) component and the
two chrominance components (Cb Cr) are stored separately. Luminance is a representation of
brightness in an image and chrominance defines the two attributes of a colour hue and saturation. Y is
another representation of brightness and is obtained with a weighted sum of RGB, whereas Cb and Cr
are obtained by subtracting the luminance (Y) from the Blue and Red components of RGB [29]. Due
to the fact that the luminance and chrominance components are stored separately, YCbCr is greatly
suited to skin detection and Shin [32] found that YCbCr gives the best skin detection results compared
to seven other colour space transformations.
YUV and YIQ are colour spaces normally associated with television broadcasts however they have
been used in digital image processing. Similar to YCbCr the three components are in the form of one
luminance (Y) and two chrominance (UV or IQ), where IQ and UV represent different coordinate
systems on the same plane. Although both colour spaces have been used independently for skin
detection [30], a combination of both YUV and YIQ together is used in objectionable image detection
[4][8], giving poor results compared to the original RGB colour space.
2.2 Skin Detection by Colour
Pixel colour classification can be complicated and there have been many suggested methods for
classifying pixels as skin or non-skin colour in an attempt to achieve the optimum performance. Fleck
et al [1] says that skin colours lie within a small region (red, yellow and brown) of the colour spectrum
regardless of the ethnicity of the person within an image. Although this is a small region within the
colour spectrum, it also incorporates other, easily identifiable, non-skin objects such as wood.
Furthermore, human skin under significant amount of light can appear as a different colour. Colour
detection methods can be classed as physical based, parametric or non-parametric. The choice of
colour space can greatly affect the performance of both the physical based and parametric approaches,
however the influence of the colour space choice is said to reduce greatly in the non-parametric
approaches [36][37].
2.2.1 Physical Based Approaches
Using explicit threshold values in a colour space to detect skin is one of the most simplistic ways of
detecting skin pixels. A physical based approach, using thresholds is often referred to as a colour
model. This is the creation of parameters to stipulate the values a pixel can be if it is to be considered
as skin. Example:
Jiao et al [4] found that 94.4% of adult images could be detected using only the YUV and YIQ colour
spaces, in which a pixel can be considered to be skin if
(20 ≤ I ≤ 90) ∩ (100 ≤ θ ≤ 150)
(Eq.1)
where,
§|V |·
¸¸
©| U |¹ .
θ = tan−1¨¨
(Eq.2)
This method of skin detection can be used with a single colour space [1][17] for simplicity or multiple
colour spaces [7][24] to increase accuracy.
Related to the explicit threshold is the skin probability ratio (also known as skin likelihood). This is
where a pixel is classified as skin using various probability theories to create a skin likelihood map. Ye
et al [9] uses Bayes’ theorem to reduce the effect of variations in light while detecting skin
42
2.2.2 Parametric Approaches
As previously discussed, skin colours lay within a small region of the colour spectrum, within this
colour cluster skin is normally distributed i.e. Gaussian distribution. The Gaussian joint probability
distribution function (pdf), a parametric approach, is a measure of skin likeness and is defined [30] as
1
p(c) =
1
2
(2π ) | ¦|
1
2
ª 1
º
exp«− (c − μ)T ¦ −1(c − μ)»
2
¬
¼
, (Eq.3)
where c is the colour vector, ȝ is the mean vector and ™ is the diagonal covariance matrix. The
Gaussian mixture model is a combination of Gaussian functions. The number of Gaussian functions
used is critical and the choice of colour space is also of great importance [30]. It is widely regarded
[5][37][38] that the Gaussian mixture model gives inferior results to that of such systems like the
colour histogram, yet it has been extensively used for skin colour segmentation in objectionable image
detection systems [27][28] showing surprisingly high sensitivity (92.2) and specificity (97.9) [12].
2.2.3 Non-Parametric Approaches
Colour histograms are a statistical method for representing the distribution of colour in an image and
are constructed by counting the number of pixels of each colour. Jayaram et al [35] shows that the
number of bins used in the histogram is a large factor in the performance of the skin detection. Duan
[8] created a colour histogram of an image and use a support vector machine to classify it with 80.7%
sensitivity and 90% specificity. Wang [2] uses weighed threshold values to create a skin colour
histogram of an image, and then sums the entire histogram to establish the total skin within the image.
Another use of the colour histogram is the likelihood histogram [6][22], created with the skin colour
likelihood algorithm which establishes the probability of a pixel being a skin pixel. Jones and Rehg [5]
used a set of training images to create two colour histograms of skin and non-skin pixels; maximum
entropy modelling was then used to train a Bayes’ classifier with 88% accuracy. This model has been
repeatedly used as part of other objectionable image detection systems [15][19].
A major issue with colour histograms is they only measure colour density, this means that two images,
although completely unrelated, can have very similar histograms. A solution to this issue is the colour
coherence vectors (CCV). CCV establishes the relevance (coherence) or irrelevance (incoherence) of a
pixel to the region in which the pixel is situated, where a pixel’s colour coherence is the degree to
which pixels of that colour are members of large similarly-coloured regions [39]. Jiao et al [4] found
that using CCV along with a colour histogram improved specificity (87.7% to 90.4%) but decreased
sensitivity (91.3% to 89.3%)
2.3 Skin Detection by Texture
Although the texture of skin is quite distinct from a close range, skin texture appears smooth within
most images. One of the biggest problems with skin colour modelling is falsely detecting non-skin
regions as skin (false/positive) due to similar colour. Skin texture methods are principally used to
boost the results of the skin colour modelling by reducing this false/positive rate.
2.3.1 Gabor Filter
Gabor filters are band-pass filters that select a certain wavelength range around a centre wavelength
using the Gaussian function. Gabor filters measure by performing image analysis in the space/wave
number domain. Jiao [4] used a Gabor filter along with a Sobel edge operator to simply boost the
performance of the skin colour detection finding that specificity was improved (63.3% to 87.7%) but
sensitivity was decreased (94.4% to 91.3%). Whereas Wang [24] and Xu [27] use a Gabor filter to
train a Gaussian mixture model to recognise skin and non-skin texture features.
43
2.3.2 Co-Occurrence Matrix
The two-dimensional co-occurrence matrix measures the repetitive changes in the grey level
(brightness) to measure texture. The matrix records the simultaneous occurrence of two values in a
certain relative position. After the co-occurrence matrix has been constructed, the entropy, energy,
contrast, correlation and homogeneity features of the image can be calculated. The co-occurrence
matrix is used as a good trade off between accuracy and computation time [7][13].
2.3.3 Neighbourhood Gray Tone Difference Matrix
The neighbourhood grey tone difference matrix (NGTDM) is another texture feature analysis method
very similar to the co-occurrence matrix as it measures the changes in intensity and dynamic range per
unit area. NGTDM extracts the visual texture features such as Coarseness, Contrast, Busyness,
Complexity and Strength. Cusano [10] used NGTDM with Daubechies' wavelets to extract the texture
features of skin regions to boost the classification of skin.
Other methods to help in skin classification include: region-growing algorithm [6], maximum entropy
modelling [28], morphological operations [19], Bethe Tree Approximation and Belief Propagation
[14], extreme density algorithm [29], entropy of intensity histogram [28] and median filters [1].
3
Feature Extraction
The classification of digital images is a memory hungry and computationally complex process. The
solution for this is a process called feature extraction. Feature extraction is a form of dimension
reduction, where resources used to describe large sets of data are simplified with as little loss to
accuracy as possible. The colour and texture methods discussed previously are forms of feature
extraction, but they are used solely in the classification of skin. This section discusses the features
used in the classification of the objectionable image, predominately geometric and dimensional.
3.1 Skin Features
After skin has been detected various features can be extracted. The skin area/image ratio is the
percentage ratio of the image which is covered by skin. As most objectionable images would be
predominately skin, the skin area/image ratio is used by most, if not all, the reviewed systems. This
ratio does not depend on the method of skin classification and can be used as an input to the classifier
[15][16] or as an early filtering system [2].
The amount [10], position [14], orientation [28], height and width [13], shape [17][20], eccentricity
[21], solidity [21], compactness [19], rectangularity [19] and location [27][29] of skin regions are
features used as input components to the machine learning classifiers. Liang [13] found that the height
feature was the most important feature for the detection of objectionable images. The choice and
implementation of classifier would stipulate the influence of the skin features, however it has been
shown that skin features can improve accuracy [29]. The ability to extract these skin features depends
on the method used in skin detection, if colour histograms are used then only the skin area/image ratio
can be used, whereas using a skin likelihood map could allow the use of skin features such as position,
orientation, height and width of skin regions.
3.2 Moments
Moments are commonly used in shape and pattern recognition because a moment-based measure of
shape can be derived that is invariant to translation, rotation and scale [2]. A descriptor, moments can
be either geometric (Hu moments, Zenike moments) or statistical (mean, variance). Geometric
moments are the product of a quantity and its perpendicular distance from a reference point or the
tendency to cause rotation about a point or an axis. Statistical moments are the expected value of a
44
positive integral power of a random variable. Liang [13] found that the Hu moments are of less
importance than the height skin feature and the skin area/image ratio, but of more importance than
most of the other skin features when used with the Multi-Layer Perception classifier (NN).
3.3 Face Detection
If it was assumed that all images with large areas of skin are objectionable, then a perfectly acceptable
portrait image would be classed as objectionable. Face detection algorithms are used to filter any
images whose skin pixels are mainly occupied by a face or faces. The face detection algorithms
proposed by Viola and Jones [40] and Leinhart [41] give good trade offs between accuracy and
computational speed, for this reason they have become popular methods of face detection in
objectionable image detection systems [22][28].
3.4 MPEG-7 Descriptors
The eXperimentation Model (XM) software is used to access the MPEG-7 descriptors, which describe
the basic characteristics of audio or visual features such as colour, texture and audio energy of media
files. The descriptors that are available to image processing which have proven useful to objectionable
image detection include the colour layout [11], colour structure [25], homogenous texture [11], edge
histogram [11], region and contour shape [25] and dominant colour descriptor [26]. Kim [26] achieved
high levels of accuracy with the colour structure descriptor used with the neural network classifier.
4
Classifiers
A classifier is a mathematical method of grouping the images based on the results from the feature
extraction and skin detection. Most of the systems class the images as benign or objectionable,
however some have various levels such as topless, nude or sex image [25].
4.1 Supervised Machine Learning
Machine Learning is a field in artificial intelligence that develops algorithms to allow a computer to
use past experience to improve performance. Supervised learning is when the algorithm learns from
training data that shows desired outputs for various possible inputs and is the most used form of
classification in the objectionable image detection field, with 22 of the reviewed publications using at
least one of four various methods: Support Vector Machine (SVM), Neural Networks (NN), Decision
Tree (DT) and k-Nearest Neighbour (k-NN).
4.1.1 Support Vector Machine
The SVM is a kernel based classifier, that is relatively easy to train (compared to neural networks).
The trade-off between accuracy and classifier complexity is controlled by the choice of an appropriate
kernel function. Given a training set of benign and objectionable images the SVM will find the
hyperplane between the two sets that will result in the highest number of benign images together and
objectionable images together. The distance between the hyperplane and both sets must also be at its
maximum. The SVM has been shown to be able to give high performance when used with the
Gaussian mixture model (92.2% sensitivity and 97.9% specificity) [12], skin probability map (97.6%
sensitivity and 91.5% specificity) [23] and colour histogram (89.3% sensitivity and 90.6% specificity)
[4]. R. Cusano [10] found that the SVM gave better results than multiple decision trees.
4.1.2 Neural Networks
NN are a machine learning algorithm based on how a biological brain learns by example.
Classification is performed by a large number of interconnected neurones working simultaneously to
process the image features and decide if the image is benign or objectionable. NN can implicitly detect
45
complex nonlinear relationships between independent and dependent variables, but can be
computationally complex compared to SVM and can be difficult to train. Bosson [6] found that neural
networks (83.9% sensitivity and 89.1% specificity) gave slightly better results to that of k-NN and
SVM. Kim [26] attained 94.7% sensitivity and 95.1% specificity using NN with MPEG-7 Descriptors.
4.1.3 Decision Tree (DT)
A DT is a classifier in the form of a tree structure, where each leaf node indicates the value of a target
class and each internal node specifies a test to be carried out on a single attribute, with one branch and
sub-tree for each possible outcome of the test. The classification of an instance is performed by
starting at the root of the tree and moving through it until a leaf node is reached, which provides the
classification of the instance. Zheng [18] shows that a DT can give 91.35% sensitivity and 92.3%
specificity in detecting objectionable images. Zheng [19] also found that the DT (C4.5 method) gave
higher accuracy than NN and SVM.
4.1.4 k-Nearest Neighbour
The k-NN is based on finding the closest examples from the training data to classify an image as
objectionable or benign. The training of the k-NN is very fast and Xu et al [27] found that the k-NN
(81% sensitivity and 94% specificity) outperformed the NN (79% sensitivity and 91% specificity).
4.2 Statistical Classifier
The Generalized Linear Model (GLM) extends the standard Gaussian (linear) regression techniques to
models with a non-Gaussian response. GLM do not force data into unnatural scales, allowing for nonlinearity and non-constant variance structures in the data. Bosson [6] shows that the GLM can be used
to detect objectionable images, the results acquired (83.9% sensitivity and 87.5% specificity) indicate
that the NN, k-NN and SVM perform considerable better.
4.3 Geometric Classifier
Fleck [1] used an Affine Imaging Model to identify limbs and torsos from detected skin regions, and
then established if the limb and torso arrangement matches a geometric skeletal structure. Affine
geometry is the geometry of vectors, which do not involve length or angle. Fleck achieved 52.2%
sensitivity and 96.6% specificity using the Affine Imaging Model, which is poor compared to the
machine learning classifiers.
4.4 Boosting Classification
Boosting is the use of an algorithm to increase the accuracy of the learning classifiers and has been
performed in two ways in the reviewed publications; Adaboost and Bootstrapping. Adaboost
repeatedly calls weak classifiers, learning from each, correct and incorrect classification, but this
process can be vulnerable to noise. Bootstrapping is where one is given a small set of labelled data and
a large set of unlabelled data, and the task is to induce a classifier. Lee [29] shows that the addition of
a boosting algorithm increases sensitivity from 81.74% to 86.29%.
5
Results
The test results given are in the form of sensitivity and specificity, where sensitivity is defined as the
ratio of the number of objectionable images identified to the total number of objectionable images
tested and specificity is defined as the ratio of the number of benign images passed to the total number
of benign images tested [2]. Due to space constraints Table 1 only shows the results of the reviewed
publication whose sensitivity and specificity are both above 90%. As can be seen from this table the
results of the detection systems look to give extremely high sensitivity and specificity.
46
Publication
Sensitivity
Specificity
Wang et al 1997 [2]
91%
96%
Yoo et al 2003 [11]
Jeong et al 2004 [12]
93.47%
92.2%
91.61%
97.9%
Zheng et al 2004 [18]
91.35%
92.3%
Zhu et al 2004 [20]
Belem et al 2005 [23]
Kim at al 2005 [26]
92.75%
97.6%
94.7%
92.81%
91.5%
95.1%
Dataset
Source:
Internet,
Corel Library
Internet
Internet
Internet,
Corel Library
Internet
Internet
Not Provided
Ethnicity:
Illumination Conditions:
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Not Provided
Table 1: Top 7 results from reviewed publications
Table 1 also shows that very little information is given for the training and testing dataset used. If little
information of the testing methods or images used is given, it is hard to accept the results presented.
This is a frequent problem throughout most of the publications here as there is no standard
objectionable image datasets; there is no sure way to adequately comparing all systems. Not all the
publications omit the details of their datasets; Table 2 shows 5 publications which do give reasonable
amounts of information on the datasets used to test their respected systems. This table illustrates that
as the sensitivity increases from system to system the specificity decreases; this would suggest a more
realistic set of results.
Publication
Sensitivity
Specificity
Fleck et al 1996[1]
52.2%
96.6%
Jiao et al 2003 [4]
89.3%
90.6%
Dataset
Source:
Internet, CDs,
Magazines
Internet,
Corel Library
Duan et al 2002 [8]
80.7%
90%
Internet,
Corel Library
Cusano et al 2004
[10]
90.4%
88.4%
Not Provided
Lee et al 2004 [29]
86.4%
94.8%
Not Provided
Ethnicity:
Illumination Conditions:
Caucasians
Various
Caucasians,
Asian
Caucasians,
Asian,
European
Caucasians,
African, Indian
Caucasians,
African, Asian
Not Provided
Various
Various
Controlled
Table 2: Publications that give adequate dataset information
6
Conclusion
To reduce false-positives some papers have added various steps such as face detection and swimsuit
detection. Generally the techniques have implemented a skin detection method, as large amounts of
skin are generally a sign of the presence of naked people, followed by a feature extraction method, to
identify the features such as shape and location, and finally classification from the results of the two
previous steps.
The right choice of method to perform colour analysis in the skin identification process directly
stipulates the features that can be extracted from the image. The use of colour histograms to find the
colour density of an image may identify if large skin areas are present, however they do not allow for
features such as shape and location to be found. However, using colour histograms to train a Bayes’
probability algorithm has been proven to give good results [5]; note this is an old method and newer
adaptive methods of skin detection have since been developed [12].
Much of the datasets used are described as being gathered randomly from the Internet (Some papers
count logos as images from the Internet thus boosting their results.), nevertheless they do not state
from what domain (Asian, American or European) or of what the images depict (indoor, outdoor,
professional, amateur, etc). Both of these issues can affect the accuracy as the ethnicity of the persons
within the images changes with the domain and the variations in quality and lighting could reduce the
skin identification performance. The need for an academically available datasets is essential, however
due to the nature of the images needed this may be impossible. There are legal and ethnical issues
47
surrounding the distribution of such images which prevent the creation of a dataset, as no academic
institute wishes to be perceived as a distributor of pornographic material.
After careful examination of the published papers it was decided that an optimal system would consist
of:
1. HSV/HSI or YCbCr should be the choice of colour space for accuracy; RGB gives good
results and can reduce computational complexity assuming most of the images are originally
in RGB, (e.g. TIFF, GIF, PNG).
2. An adaptive skin colour technique should be used to eliminate variations in image quality and
lighting. None of the reviewed publications give adequate solutions. This system must retain
the ability for all feature extraction, so the use of a type of skin likelihood map may be
preferable.
3. Gabor Filters have the greatest affect on increasing the specificity as texture analysis method.
4. The analysis of skin features such as location and orientation should be utilised along with a
face detector to reduce false positives.
5. NN and SVM consistently give high levels of accuracy (need large datasets to train which
may be an issue for some).
6. A Boosting algorithm such as Adaboost [29] should be used to boost the classification
process.
This paper has reviewed the best performing techniques used in skin detection for objectionable
images. It has evaluated the best of the current techniques used in skin classification and feature
extraction. Future challenges have been identified, and the proposed features of an optimal
implementation technique are provided.
References
[1] M. Fleck, D.A. Forsyth, C. Bregler. “Finding naked people”, Proc. 4th European Conf. on
Computer Vision, vol. 2, 1996, pages 593-602.
[2] J.Z. Wang, J. Li, G. Wiederhold, O. Firschein, “System for screening objectionable images”,
Computer Communications, Vol.21, No. 15, pages 1355-1360, Elsevier, 1998.
[3] Y. Chan, R. Harvey, D. Smith, “Building systems to block pornography”, In Challenge of Image
Retrieval, BCS Electronic Workshops in Computing series, 1999, pages 34-40.
[4] F. Jiao, W. Gao, L. Duan, G. Cui, “Detecting Adult Image using Multiple Features” ICII 2001.
[5] M. Jones, J. M. Rehg, “Statistical colour models with application to skin detection”, Int. J. of
Computer Vision, 46(1), Jan 2002, pages 81-96.
[6] Bosson, G.C. Cawley, Y. Chan, R. Harvey, “Non-Retrieval: blocking pornographic images”,
ACM CIVR, Lecture Notes in Computer Science, Vol.2383, 2002, pages 60-69.
[7] L.L. Cao, X.L. Li, N.H. Yu, Z.K. Liu, “Naked People Retrieval Based on Adaboost Learning”,
International Conference on Machine learning and Cybernetics Vol. 2, pages 1133 - 1138, 2002.
[8] L. Duan, G. Cui, W. Gao, H. Zhang, “Adult image detection method base-on skin colour model
and support vector machine”, Asian Conference on Computer Vision, pages 797-800, 2002.
[9] Q. Ye, W. Gao, W. Zeng, T. Zhang, W. Wang, Y. Liu, “Objectionable Image Recognition System
in Compression Domain”, IDEAL 2003, pages 1131-1135.
[10] C. Cusano, C. Brambilla, R. Schettini, G. Ciocca, “On the Detection of pornographic digital
images”, VCIP, 2003, pages 2105-2113.
[11] SJ.Yoo, MH.Jung, HB.Kang, CS.Won, SM.Choi, "Composition of MPEG-7 Visual Descriptors
for Detecting Adult Images on the Internet", LNCS 2713, Springer-Verlag, pg 682-687, 2003.
[12] C. Jeong, J. Kim, K. Hong, “Appearance-based nude image detection”, ICPR2004, pg 467–470.
[13] K.M. Liang, S.D. Scott, M. Waqas, “Detecting pornographic images”, ACCV2004, pg 497-502.
[14] H. Zheng, M. Daoudi, B. Jedynak, “Blocking Adult Images Based on Statistical Skin Detection”,
Electronic Letters on Computer Vision and Image Analysis, Volume 4, Number 2, pages 1-14,
2004.
48
[15] W. Zeng, W. Gao, T. Zhang, Y. Liu, “Image Guarder: An Intelligent Detector for Adult Images”,
ACCV2004, pg 198-203.
[16] Y. Liu, W. Zeng, H. Yao, “Online Learning Objectionable Image Filter Based on SVM”, PCM,
2004, pg 304-311.
[17] W.Arentz, B.Olstad, “Classifying offensive sites based on image content”CVIU2004, pg 295-310.
[18] QF.Zheng, MJ.Zhang, WQ.Wang, “A Hybrid Approach to Detect Adult Web Images”, PCM2004,
pg 609-616.
[19] QF.Zheng, MJ.Zhang, WQ.Wang “Shape-based Adult Image Detection”, ICIG2004, pg 150-153.
[20] Q. Zhu, C-T. Wu, K-T. Cheng, Y-L. Wu, “An adaptive skin model and its application to
objectionable image filtering”, ACM Multimedia, 2004, pages 56-63.
[21] J. Ruiz-del-Solar, V. Cataneda, R. Verschae, R. Baeza-Yates, F. Ortiz, “Characterizing
Objectionable Image Content (Pornography and Nude Images) of Specific Web Segments: Chile
as a Case Study”, LA-WEB, 2005, pages 269-278.
[22] Y. Wang, W. Wang, W. Gao “Research on the Discrimination of Pornographic and Bikini
Images” ISM, 2005, pg 558-564.
[23] R. Belem, J. Cavalcanti, E. Moura, M. Nascimento, “SNIF: A Simple Nude Image Finder”, LAWeb, 2005, pages 252-258.
[24] S-L. Wang, H. Hu, S-H. Li, H. Zhang, “Exploring Content-Based and Image-Based Features for
Nude Image Detection” FSKD (2), 2005, pages 324-328.
[25] W. Kim, S.J. Yoo, J-s. Kim, T.Y. Nam, K. Yoon, “Detecting Adult Images Using Seven MPEG-7
Visual Descriptors”, Human.Society@Internet, 2005, pages 336-339.
[26] W. Kim, H-K. Lee, S-J. Yoo, S.W. Baik, “Neural Network Based Adult Image Classification”
ICANN (1), 2005, pages 481-486
[27] Y. Xu, B. Li, X. Xue, H, Lu, “Region-based Pornographic Image Detection”, MMSP, 2005.
[28] H.Rowley, Y.Jing, S.Baluja, “Large scale image-based adult-content filtering”, VISAPP2006, pg
290-296.
[29] J.-S. Lee, Y.-M. Kuo, P.-C. Chung, E.-L. Chen, “Naked image detection based on adaptive and
extensible skin colour model”, Pattern Recognition (2006), doi: 10.1016/j.patcog.2006.11.016.
[30] P. Kakumanu, S. Makrogiannis, N. Bourbakis, “A survey of skin-colour modelling and detection
methods”, Pattern Recognition, Volume 40, Issue 3, March 2007, Pages 1106-1122.
[31] R. Gershon, A.D. Jepson, J.K. Tsotsos, “Ambient illumination and the determination of material
changes”, J. Opt. Soc. Am. A vol. 3, 1986, pages 1700–1707.
[32] M.C. Shin, K.I. Chang, L.V. Tsap, “Does Colour space Transformation Make Any Difference on
Skin Detection?” IEEE Workshop on Applications of Computer Vision, Dec 2002, page 275-279.
[33] R.Gonzalez, R.Woods, S.Eddins, Digital Image Processing Using MATLAB, Prentice Hall, 2004.
[34] G. Gomez, M. Sanchez, L.E. Sucar, “On Selecting an Appropriate Colour Space for Skin
Detection”, MICAI, 2002, pages 69-78.
[35] S. Jayaram, S. Schmugge, M.C. Shin, L.V. Tsap, “Effect of Colour space Transformation, the
Illuminance Component, and Colour Modelling on Skin Detection”, CVPR, 2004, pages 813-818.
[36] A.Albiol, L.Torres, E.Delp, “Optimum colour spaces for skin detection”, ICIP2001, pg 122-124.
[37] S.L. Phung, A. Bouzerdoum, D. Chai, “Skin segmentation using colour pixel classification:
analysis and comparison”, IEEE Trans. Pattern Anal. Mach. Intell, 2005, pages 148-154.
[38] V. Vezhnevets, V. Sazonov, A. Andreeva, "A Survey on Pixel-Based Skin Colour Detection
Techniques". Proc. Graphicon, 2003, pages 85-92.
[39] G. Pass, R. Zabih, J. Miller, “Comparing Images Using Colour Coherence Vectors”, ACM
Multimedia, 1996, pages 65-73.
[40] P.A. Viola, M.J. Jones, “Robust Real-Time Object Detection”, Tech report COMPAQ CRL, 2001.
[41] R. Lienhart, A. Kuranov, V. Pisarevsky. “Empirical Analysis of Detection Cascades of Boosted
Classifiers for Rapid Object Detection”, DAGM, Pattern Recognition Symposium 2003, pg 297304.
[42] A. Healy, “Call for mobile phone security”, The Irish Times, 17th February, 2004.
[43] A. Healy, “Gardai seek distributor of explicit image of girl on phone”, The Irish Times, 23rd
January, 2004.
49
Optical Reading and Playing of Sound Signals from
Vinyl Records
Arnold Hensman
Department of Informatics, School of Informatics and Engineering
Institute of Technology Blanchardstown, Dublin 15, Ireland
Email: [email protected]
Kevin Casey
Faculty of Computing, Griffith College Dublin
South Circular Road Dublin 8, Ireland
Email: [email protected]
Abstract
While advanced digital music systems such as compact disk players and MP3 have become the standard
in sound reproduction technology, critics claim that conversion to digital often results in a loss of sound
quality and richness. For this reason, vinyl records remain the medium of choice for many audiophiles
involved in specialist areas. The waveform cut into a vinyl record is an exact replica of the analogue
version from the original source. However, while some perceive this media as reproducing a more
authentic quality then its digital counterpart, there is an absence a safe playback system. Contact with the
stylus provided by a standard turntable causes significant wear on the record (or phonograph) over time,
eventually rendering it useless. Couple this with the historic value and an abundance of such vinyl media,
and the need for a non-contact playback system becomes evident. This paper describes a non-contact
method of playback for vinyl records which uses reconstruction of microscopic images of the grooves
rather than physical contact with the stylus.
Keywords: Waveform Reproduction, Image Stitching, Vinyl Record, Groove Tracking, 78rpm
1
Introduction
Since a vinyl record is an analogue recording, many claim that the application of a sample rate when
making digital recordings for CDs and DVDs results in too great a loss in sound quality. Natural sound
waves are analogue by definition. A digital recording takes snapshots of the analogue signal at a certain
sample rate and measures each snapshot with a certain accuracy. For CDs the sample rate is 44.1 kHz
(44,100 times per second at 16-bit). The sample rate for DVD audio is 96 kHz or 192 kHz for HighDefinition DVD (HD –DVD). A digital recording, however, cannot capture the complete sound wave;
at best it is a close approximation and many claim that, although high quality, it still cannot fully
reproduce the fidelity of sound that vinyl records can. Figure 1 illustrates the application of a sample
rate upon a simple sound wave. Sounds that have fast transitions, such as drum beats or a trumpet's
tone, will be distorted because they change too quickly for the sample rate. Historical recordings
archived for posterity are often fragile with owners not wanting to risk the use of conventional stylus
playback. The development of a non-contact player that carefully reconstructs a restored image of the
original analogue groove would not only remove this risk, but it would make safe playback possible for
records that are severely damaged. Normally the downside of playing an analogue signal is the fact that
all noise and other imperfections are also heard. So, if there is a period of silence on a record you hear
background noise. With the proposed system any background noise that was present could be removed
since the optical player would detect this noise in advance and simply ignore it.
Figure 1: Comparison of CD and DVD sample rates upon an analogue waveform
50
1.1
Conventional playback methods for vinyl records
The first method of recording sound was the phonograph created by Thomas Edison in 1877. He used a
mechanism consisting of a needle and collection horn to store an analogue wave mechanically by
cutting a waveform directly onto a cylindrical sheet of tin. The use of flat records with a spiral groove
was an improvement on Edison’s machine by Emil Berliner in 1887. The stereo evolution of this
method - High Fidelity (Hi Fi) - didn’t lose popularity until compact disks revolutionised the consumer
market in the early 1980’s. Only the most advanced digital systems can rival its fidelity. At
microscopic level this groove resembles an undulating track with varying width. A turntable stylus
follows these undulations along the groove towards the centre thus following the waveform along the
way. Records skip when the bass information is too loud and the stylus is thrown into a neighbouring
section of the groove. Vocal sibilance and sudden loud symbol crashes can also cause a rapid increase
in frequency so the stylus faces a pronounced ‘S’ effect in the groove that could potentially cause it to
skip. The traditional method of a diamond shaped stylus running along a V-shaped groove also applies
constant weight and pressure to the groove and results in a increase in temperature causing further
damage. A system that views the waveform up close without any contact would completely remove
this problem. Skipping of grooves would be eliminated along with unwanted background noise,
scratches and distortion from tiny particles.
1.2
Assumptions
The terms LP record (LP, 33, or 33-1/3 rpm record), EP, 16-2/3 rpm record (16), 45 rpm record (45),
and 78 rpm record (78) all refer to different phonographs for playback on a turntable system. The rpm
designator refers to the rotational speed in revolutions per minute. They are often made of polyvinyl
chloride (PVC), hence the term vinyl record. For the purposes of this study, monaural signals only are
processed, i.e. vinyl 78rpm records. The groove may be viewed clearly in two dimensions so image
acquisition may be performed more easily. Since most historic recording are stored on 78s that are now
considered antiques, it makes sense to restrict this study to that media.
1.3
Evaluation of Existing Non-Contact Systems
The main technology currently using a method for non-contact playing of vinyl records is the Japanese
ELP corporation’s laser turntable ™ [ELP, 2003]. This impressive system can play grooved, analogue
33.3, 45 or 78 RPM discs by illuminating the walls of each groove with five laser beams [Smart, 2003].
In essence, to play the sound, a laser views the image by reflecting back the amplitudes of the
waveform. At a basic price over US$10,000 it will hardly ease into the mass production market. In fact
to play 78s it requires the advanced model with extra sensors to monitor speed. The laser of the basic
model cannot accurately track the groove at such speed without the risk of intermittent pauses
throughout playback. The cost of the advanced model is almost twice the basic price plus additional
costs for the patching of scratches and noise. Any noise, damage or dirt will be picked up as the laser
cannot overcome serious flaws on the disk. If the disk is slightly bent or warped in any way, the laser
turntable will reject it. It works best with black records rather than coloured or vinyl with added
graphics. The system was invented by Robert E. Stoddard, a graduate student at Stanford University in
1986 [USPO, 1986]. The dual beam model was patented in 1989 [USPO, 1989].
The same result could be achieved by a microscopic imaging system at a fraction of the cost. Since the
objective is to play the sound data without contact with a stylus, a microscopic imaging camera could
be used to replace the stylus. The added advantage of this method is that an image could easily be
enhanced to smooth out noise at source and overcome damage. Ofer Springer proposed the idea of the
virtual gramophone [Springer, 2002]. Springer’s system scans the record as an image and applies a
virtual needle following the groove spiral form. P. Olsson’s Swedish team developed this to use digital
signal processing methods such as FIRWiener filtering and spectral subtraction to reduce noise levels
[Olsson et al., 2003]. These two systems however only used a basic scanner, limiting the resolution to a
maximum of 2400dpi or 10 per pixel. At this resolution, quantisation noise is high because the
maximum lateral travel of the groove is limited. Fadeyev and Haber’s 2D method reconstructs
mechanically recorded sound by image processing [Fadeyev and Haber, 2003]. The resolution was
much higher due to the use micro-photography. Baozhang Tian proposed a 3D scene reconstruction
method using optical flow methods to generate a virtual 3D image of the entire groove valley [Tian,
Bannon, 2006]. But as we shall see, quality results can be achieved by processing 2D images of the 2D
sound signals from 78rpm records.
_m
51
2
Specification of Proposed Methods
The objective of this system is a process for optical playing of vinyl records (for the purposes of this
system, 78rpm phonographs will be used) by reconstructing a restored image of the original analogue.
The surface reconstruction will be two dimensional as that is all that is necessary in the case of 78s.
Image analysis techniques will be used to stitch together a longitudinal image of the overall groove
There are four stages of implementation in order to achieve this. This section will briefly describe each
of those stages in sequence and the problems encountered at each and how they are overcome. The
four main sections in this playback process are:
(i)
(ii)
(iii)
(iv)
Image collection
Stitching of overlapped Images
Groove tracking
Waveform creation from the tracked groove and sound file creation.
The following chart indicates the flow of each phase.
Stage 1 :
Image Collection
Stage 2 :
Stitching algorithm
applied to Images
Stage 3 :
Groove
tracking
Bitmap files generated
Selected files passed to
Stitching algorithm
Single image passed to
tracking algorithm
Stage 4 :
Waveform and
sound file creation
Sound data passed
Sound File
generated
(.wav)
Figure 2: Stages in development of optical playback process
2.1
Image Collection Phase – Stage 1
Perhaps the most important and significant stage in the process is the image collection phase. If proper
images of high quality are not collected initially it will be inevitable that further problems will occur in
the stitching and groove tracking phases. Potential hazards to the collection of data will now be
explored so that they may be foreseen and overcome in advance. Figure 3 illustrates how the images
are retrieved using a computer microscope and a stepper motor to turn the record. The stepper motor
takes the place of a turntable and is connected to the record in such a manner that any partial rotations
cause it to move in discrete steps.
For each step the motor takes (the exact distance of which is controlled by software), the microscope
will in turn take an image of a small section of the groove. On subsequent movements, the stepper
motor will position the next section of the groove for imaging and the microscope will take further
pictures. Each image is saved in bitmap format and converted to greyscale to optimise the stitching
algorithm at a later stage by keeping the images as simple as possible.
Figure 3: Image Collection Stage
52
One of the most important factors in maintaining consistency of image quality is to ensure that the
microscope remains in focus. Any tiny variation in the record from its level position will move the lens
slightly out of focus. That is, the focal length (distance between the record and microscope lens) will
increase or decrease. For this reason, a focus detection algorithm is incorporated so as to warn the user
when re-focusing is required. A second motor may be added to automatically readjust the focus, which
would slightly add to the system’s cost. Warped or slightly bent disks will also gain from this feature as
they are prone to going out of focus.
Figure 4 shows a set of a sample images maintaining a consistency of three groove sections per frame
all to be stitched together into a panoramic view. There is a deliberate overlap - most evident in
pictures (c) and (d). This overlap can easily be controlled by the stepper motor which acts as the
turntable. It is incorporated to aid the stitching process and to ensure that the correct panoramic image
has been obtained.
(a)
(b)
(c)
(d)
Figure 4: Sample set of microscopic groove images (x200 Magnification)
Approximately 60 to 100 slightly overlapping images will be required to capture one revolution of a
78rpm groove. A simple algorithm to detect de-focus is added to automatically pause the image
collection phase and re-adjust microscope focal length using a second stepper motor before resuming
image taking. The MilShaf ltd stepper control motor that was used used has a minimum shift capability
of 0.25° [MilShaf SMC]. This means it takes exactly 60 images (6° x 60 = 360°) to capture one
complete revolution of a groove image i.e. it will take 24 steps of the stepper motor before the
microscope should take the next image.
1 step = 0.25° =>
Frame width = 320 pixels =>
1 Image shift
Pixel shift per step
= 24 steps
= 320 / 24
=
=
0.25° * 24
13 pixels
=
6°
This high level of accuracy meant that the potential overlap can be set to within 13 pixels. Although the
6° movement will be the same throughout the collection process, the above calculations only consider
the outer circumference of the record. A revolution of the groove will become smaller as it gets closer
to the centre of the record. The angular speed of vinyl is constant, so the speed of the needle at the
center is lower than at the outside. Therefore the information density is lower on the outside. So it is
beneficial if there is a lot of overlap towards the center. The overlap can always be changed as the
central grooves are captured to better contribute towards capturing the higher density data there more
accurately. In the case of a 6° set of steps at the outer circumference, the image will perform a 13 pixel
shift. Take this outer circumference as being radius r, the radius of the record, as the centre approaches
the radius will become r’. To obtain the number of pixels for a 6° set of steps at radius r‘ the following
equations are formed.
(a) Circumference *
(b) Circumference *
(a) (2Πr) * (a/360)
(b) (2Πr’) * (a/360)
%movement = 13
%movement is unknown
= 13
= pixel movement at r ‘
at radius r
at radius r’
where a is the movement in degrees
Dividing (a) by (b) gives r/r’ = 13/ pixel movement at r ‘ => pixel movement at r ‘ = (13r’)/r
Image collection was performed using oVWF.ocx, a Video for Windows ActiveX control that required
no external interaction or use of a DLL. It could easily be incorporated into our software providing the
required versatility in taking images. It was created by Ofer LaOr, director of Design by Vision [LaOr
1997]. This control has all the convenience of any ActiveX control and can easily be used in the
Microsoft suite of programming languages. It can prompt dialogs to allow you to change the settings of
53
the bitmap being imaged (i.e. 16 colour, 256 colour etc). Hence our need to obtain greyscale images
was met.
2.1.1
Overcoming Potential Hazards in Image Collection
(a) Image drift: In a sample containing three grooves as in Figure 5, one of the grooves would often
move to the top of the video window after several shots were taken and eventually go out of sight
as illustrated in Figure 6(b). This was due to the way in which the record was first centred upon the
turntable. Any tiny variation off-centre at this level of magnification would be noticed. However,
even if the geometric centre of the record is chosen with perfect accuracy, the spiral centre cut onto
the groove may be slightly different
(b) De-Focus: Figure 5(b) is clearly not as sharp as Figure 5(a). This is only after 25 frames have been
taken in a particular set. The focal length (the distance between the record and microscope lens)
would increase or decrease due to a slightly unlevelled record causing the image to go out of focus.
A simple algorithm to give a deterministic value for focus is used based on the contrast of images.
(a)
(b)
Figure 5: Contrast within frames is used to gauge the focus level
Since the image is converted to greyscale, every pixel has a value between 0 and 255. The deterministic
value obtained is the average contrast per adjacent pixel of any image. Converting these values to a
percentage value between 0 and 1, will allow us to apply the algorithm. For example consider the
following 3 by 4 bitmap matrix A:
⎡ a11
a12
a13
a14 ⎤
⎢⎣a31
a32
a33
a34 ⎥⎦
A ⎢
⎥
⎢a21 a22 a23 a24 ⎥
The absolute row-contrast differences may be determined by:
⎡| a11 − a12 | | a12 − a13 | | a13 − a14 |⎤
⎢| a − a | | a − a | | a − a |⎥
22
23
23
24 ⎥
⎢ 21 22
⎢⎣| a31 − a32 | | a32 − a33 | | a33 − a34 |⎥⎦
It is possible to approximate the row contrast R, by calculating the sum of these absolute differences.
The absolute column-contrast differences may be determined by:
⎡| a11 − a21 | | a12 − a22 | | a13 − a23 | | a14 − a24 |⎤
⎢| a − a | | a − a | | a − a | | a − a |⎥
22
32
23
33
24
34 ⎦
⎣ 21 31
It is possible to approximate the column contrast C, by calculating the sum of these absolute
differences.
Total Contrast = R + C
If this is applied to the initial frame to get a focus standard, and further applied to all adjacent elements
of a bitmap matrix, the average contrast per pixel can be used as the deterministic coefficient for
determining whether or not an image is in focus.
2.2
Image Stitching – Stage 2
Stitching of the images collected from Stage 1 is based on determining the exact positions where
frames overlap. Image mosaics are collections of overlapping images that are transformed in order to
54
result in a complete image of a wide angle scene. Two frames at a time were stitched together to build
up the mosaic image. The simplest algorithm in stitching simple images together is the least squares
algorithm which determines a set of overlap errors for various overlapping positions. The position with
the lowest error will be taken as the point where the two images overlap. These positions will then be
saved in a text file in order to reproduce the stitched image at any time. For every subsequent pair of
images, the overlap positions will thus be recorded. Baozhang Tian suggests a similar method of image
acquisition involving complex use of surface orientation analysis to record the image [Tian, Bannon,
2007]. Stitching algorithms however provide a more simple approach, since a delay will be present
anyway in both methods. The images are converted to grayscale in the capture process so we will get a
bitmap matrix of values between 0 and 255. The images will be taken under similar lighting conditions
and will not vary greatly in content. When dealing with panoramic images, there is usually the source
position issue to contend with. Our images will be taken from a new point of reference above the vinyl
record at each instance to suit the microscope as the record moves on the turntable, so this is angular
distortion is not an issue as illustrated in Figure 6.
Figure 6: Angular positioning problem will not be present
The stitching algorithm may be defined as follows:
sum = 0
For all overlapping pixels
pixel1 = Numeric pixel value in the first image
pixel2 = Corresponding numeric pixel value in the second image
sum = sum + (pixel1 - pixel2)²
End For
error = sum / X ,
where X is the total number of overlapping pixels
The minimum overlap can be chosen by the user in form that executes the stitching, but essentially
every test of the least squares algorithm operates in a similar way. A column by column test is
performed for every overlap beginning with the overlap of the upper midway position of height and
width of the first frame of a pair. Frame 2 is then tested at the same column position, but on the next
row down, and so on, until it overlaps midway with the lower half of frame1. At this point the column
shift should take place towards the left and the procedure begins again. It will stop when the column
position of frame2 reaches the minimum overlap point. The basic process used is outlined below. The
height and width will be the same for both frames.
For column = Width/2 To MinOverlap
For row =
(Height + Height/2) To
(Height - Height/2)
Perform Least Squares algorithm upon the overlap
Next row
Next column
Two 2D arrays; pic1[ ][ ] and pic2[ ][ ] are passed to the function image( ) contained within a file.
They will contain integer values between 0 and 255 since the images are converted to greyscale. Each
array is essentially a bitmap matrix of the two frames being stitched together.
55
Figure 7: Dimensions of two overlapping images
The values indicated in Figure 7 represent the dimensions processed within the algorithm. The values
for rm and cm will change continually as the least squares error is determined for each position. The
values for the number of rows - r, columns - c and the least_overlap field (representing the minimum
overlap input by the user) are also passed as parameters. The least squares algorithm is conducted
based on the for every changing values nr to test if the correct stitching position has been found.
2.2.1 Testing the stitching process
The Minimum overlap was included for two reasons:
1. To speed up the stitching process as less positions are tested
2. In cases where the overlap is small, the summations made by the algorithm are also small. The
possibility therefore exists that a smaller error in one of these positions will be returned
instead of the true overlap position.
Along with the text file containing the top and left positions for overlapping images, a second file is
created. This is called errorfile.txt and it contains a list of all the errors calculated for every X and Y
value tested for the pair of overlapping images. It is used to confirm that the correct overlap position
has in fact been obtained. Graphically viewing this error data (when capturing three grooves per
frame), reveals that there are usually three possible overlaps to consider, Figure 8.
Figure 8: Potential overlap positions (a) Case 1 (b) Case 2 (c) Case 3
On observation, case three appears to be the correct version, but a more structured testing approach is
adopted. A surface plot is generated from the error file containing all error values for the overlaps, at
each X and Y position. The three potential overlaps can be seen in Figure 8. The x and y co-ordinates
indicated above correspond to those shown in the images of Figure 9. The lowest error value (z -axis)
can be seen in the centre surface plot. Note the protruding lower value of the central set of errors.
According to the graph, this lowest error, in the range of 0 to 200 is to the order of 4 times lower then
the closest match in either of the other two errors sets which begin at 800. By including a minimum
56
overlap in the algorithm the number of positions tested can be calculated as follows: Positions tested =
240 * (320-minimum overlap)
Figure 9: The lowest error value as seen on surface plot.
2.3
Groove Tracking
The next stage of the system deals with the tracking of a groove across the stitched images in order to
create a waveform of sound data. As stated, the record groove is in fact the representation of the sound
waveform. Our system allows the user to select the first image from which to begin the tracking
process. The image focus algorithm will determine whether or not this is an appropriate frame to begin
with.
Figure 10: Groove tracking process
57
Once the initial frame is loaded the user simply clicks on a pixel slightly above the groove they wish to
track. The test track button performs the tracking algorithm and displays a sample of exactly which
range of pixels will be considered. Figure 10 illustrates how this is achieved by displaying the groove
outline in red. If the user is satisfied with this groove, and the default settings appear to trace it
accurately, the accept groove button is chosen and a set of text files are generated containing the
groove information. The upper and lower tracks of the groove. Two ranges of pixel values are
considered when tracking the groove:
1.
2.
Those considered to be in the range of the groove colour (White)
Those considered to be in the range of the non- groove colour (Black)
The default settings for groove/non-groove ranges are based on the first frame that is loaded. Every
pixel is then compared to this range so the algorithm can make an accurate estimation of which pixels
belong to the groove.
The ranges for groove/non–groove values may also be chosen manually for better accuracy in the case
where the defaults are unsatisfactory. This is done by sampling a pixel in a typical position on the
groove (white) and another which is not part of the groove (black). The focus threshold value may also
be set by the user by accepting the average contrast per pixel of the selected image as the optimum
focus value. All other frames will be tested relative to this focus range.
3 Sound file creation – Stage 4
The data saved in the waveform.txt file is prepared for transfer to a .wav file. This file contains a series
of undulating vertical values (y-axis) along a horizontal plane (x-axis). The graphic of this file may be
displayed in our system’s software. The minimum vertical value is calculated so that the relative offsets
from this minimum can be manipulated such that the lowest value becomes 0 and the upper limit of the
waveform becomes 255. This is merely a manipulation to enable the file to be placed into a wav file.
There are several methods of .wav file creation once the waveform has been stored in a text file. There
are also a number of off-the-shelf packages that will create the image of any sound signal you record
through a microphone, and allow you to modify the image for playback. Such programs could too be
incorporated by this system to playback the image of the record groove. The simplest way to ensure the
correct sound signal has been recorded is to test it by creating a .wav file from the data. There is a
sampling rate applied but this would merely be for testing purposes only. Many off the shelf products
will play such waveform once created.
4 Conclusions
This paper described a method for non-contact playing of vinyl records by stitching together smaller
microscopic images of the waveform into one larger panoramic view. This enables playback of the
waveform from the image rather than through contact with a stylus. Previous attempts did not consider
the added simplicity involved in restricting the image to a two-dimensional greyscale format. The
grooves of 78rpm phonographs can be seen clearly in a two-dimensional format with relatively
inexpensive equipment.
Stitching algorithms, although simpler in approach to computer vision techniques such as optical flow
and surface orientation, will perform competitively with higher processing speeds. Serious scratches
and even broken records may be played and enhanced by image manipulation. The laser turntable
cannot do this and has problems with even slightly warped disks. The disadvantage of this method is in
the timescale required to fully execute the image stitching process. This renders instant playback
impossible since the image must first be collected. However, once the image is in fact collected, and a
mosaic stitched together, virtual real-time playback is possible with added features of noise reduction,
no skipping and broken segments causing no obstructions. Since all other proposed methods, apart
from the expensive laser turntable have similar delays in image acquisition, it can be argued that the
image collection process that is the most paramount to the systems success.
58
The image stitching method proposed in this paper used 2D greyscale microscopic pictures rather than
a full 3D groove construction which will obtain a large amount of redundant data. Since older 78s
contain mono signals only, the grooves can be viewed in two dimensions. If a delay is inevitable, then
the time required for image acquisition in such systems becomes secondary to the quality of images
taken.
Further enhancements would include ways to more accurately obtain this overall record image while
minimizing the hazards outlined in sections 2.1.1 above, namely image drift and de-focus. Image drift
could be instantly eliminated by using a method of keeping the record stationary and moving the
microscope instead over it in grid like movements through two by two sections rather than following
the groove specifically. The fact that no specific groove is followed would mean the image drift
problem disappears. The same stitching methods could be used to stitch grid sections. This sectioning
of the image would also give more control and flexibility than one long groove image. The fact that the
record is stationary would mean that de-focus is less of a problem as zero movement of the disc will
not create an unlevelled surface (unless of course the disc is warped) and thus focal length should
remain more or less consistent. To make real time playback possible while the imaging is taking place
a system of buffered images might be incorporated where there would only be one initial short delay.
With the current system, image analysis methods to dismiss the majority of unsuccessful overlaps
would have to be developed to speed up the process. Essentially however, it does indeed appear
possible for this system to be used as an inexpensive way to safely transfer the information from rare,
antique or damaged records so they can be played in analogue form.
References
[ELP, 2003] ELP Laser Turntable ™, No needle No wear. Web reference www.elpj.com
[Fadeyev and Haber, 2003] Fadeyev, V. and Haber, C, 2003. Reconstruction of Mechanically Recorded
Sound by Image Processing. Journal of Audio Engineering. Society. Pgs: 1172–1185.
[LaOr 1999] Ofer LaOr, 1999. A Video for Windows ActiveX control. Dr Dobbs Programmers
Journal. June 1999.
[MilShaf SMC] StepperControl.com, a division of MilShaf technologies inc.
Web reference: www.steppercontrol.com/motors.html
[Olsson et al., 2003] Olsson, P, Ohlin. R. Olofsson, D., Vaerlien, R., and Ayrault, C, 2003. The digital
needle project - group light blue. Technical report, KTH Royal Institute of Technology, Stockholm,
Sweden. Web reference: www.s3.kth.se/signal/edu/projekt/students/03/lightblue/
[Smart, 2003] The amazing laser turntable. Smart Devices Journal, Aug 2003.
Web reference www.smartdev.com/LT/laserturntable.html
[USPO 86] Stoddard, Robert E, Finial Technology Inc, 1986. United States Patent Office. Number
4,870,631. Optical turntable system with reflected spot position detection,
[USPO 89] Stoddard, Robert E et al, Finial Technology Inc. 1989. United States Patent Office, Number
4,972,344. Dual beam optical turntable,
[Springer, 2002] Springer, O, 2002. Digital needle - a virtual gramophone.
Web reference: www.cs.huji.ac.il/~springer/
[Tian, Bannon, 2006] Baozhang Tian and John L.Barron, 2006. Reproduction of Sound Signals from
Gramophone Records using 3D Scene Reconstruction. Irish Machine Vision and Image Processing
Conference.
[Tian, Bannon, 2007] Baozhang Tian and John L.Barron, 2007. Sound from Gramophone Record
Groove Surface Orientation. 14th IEEE-Intl Conferece on Image Processing.
59
Optimisation and Control of IEEE 1500 Wrappers and
User Defined TAMs
Michael Higgins, Ciaran MacNamee, Brendan Mullane.
Circuits and Systems Research Centre (CSRC),
Department of Electronic and Computer Engineering,
University of Limerick,
Limerick,
Ireland.
[email protected]
Abstract:
With the adoption of the IEEE 1500 [1] Standard, the opportunity exists for System on Chip (SoC)
designers to specify test systems in a generic way. As the IEEE 1500 Standard does not address the
specification and design of the on-chip Test Access Mechanism (TAM), considerable effort may
still be required if test engineers are to optimise testing SoCs with IEEE 1500 Wrapped Cores. This
paper describes novel research activity based on the design of TAMs that are compatible with IEEE
1500 wrapped cores and once a Test Resource Partitioning (TRP) scheme has been adopted it is
shown that multiple TAM sections and Core Wrappers on a SoC can be controlled through the use
of an intelligent test controller. Taking into account previous work on TRP, functional testing using
the system bus and TAM architectures, a novel approach is introduced that allows some elements
of the system bus to be used as part of the TAM while retaining compatibility with the IEEE 1500
wrapped cores. A small micro-controller SoC design based on the AMBA APB bus is used to
investigate this approach. A crucial element of this approach involves interfacing the combined
TAM to the mandatory Wrapper Serial Port (WSP) and the optional Wrapper Parallel Port (WPP)
of the IEEE 1500 wrapped cores in the chip. Test Application Time (TAT) results are presented
that establish the viability of the ideas described, as well as comparative analysis of TAT results
derived from a number of test structures based on these techniques.
Keywords: IEEE 1500, TAT, TAM, SoC, Intelligent Test Controller.
1. Introduction
The ever-increasing SoC test problem has been well documented in recent years [2]. The
many factors that contribute to the overall problem can be broken down as follows: low Test
Access Port (TAP) bandwidth, limited embedded core accessibility, large volumes of test
data, deep sub-micron effects not covered by standard fault models and undefined TAM
structures. The contributing factors are by no means limited to the above list but can be more
accurately defined by the test objectives or constraints such as cost, time or test coverage.
The purpose of this paper is to investigate how an IEEE 1500 wrapper can be configured and
combined with TAM optimisation techniques to reduce overall test time. A novel on-chip test
controller is also presented that has the ability to manage multiple TAM sections and wrapper
configurations. The concept of a bus-based TAM used for both functional and structural
testing is also introduced to allow for further reduction of resources i.e. silicon.
Section 2 gives an overview and the history of TAM types and structures, namely bus and
non-bus based TAMs. The concept of TRP is covered in section 3. An overview of the IEEE
1500 standard is covered in section 4. The bottom up optimisation model [3] is analysed in
section 5. The TRP solution for the benchmark circuit is discussed in section 6. TRP can be
used to find the trade-off between TAM width and Test Application Times (TATs) and results
are presented later in section 6 showing TATs based on TAM structures with and without the
application of TRP. The effect of 3 different wrapper configurations: WSP Only, WPP Only
and WSP and WPP combined, on overall TATs are also considered in section 6. Section 7
60
introduces and describes the novel test controller. Section 8 details future work and
conclusions.
2. TAM Types
A TAM is an on-chip mechanism that is used to transport test vectors and test responses from
cores to an on-chip test controller or an off-chip test manager (Automated Test Equipment
(ATE)). The TAM is user definable and is generally based on one of the architectures
described below.
The earliest TAMs were categorised into 3 distinct types [4]: daisy-chain architecture,
distributed architecture and multiplexed architecture.
SoC
IN
SoC
SoC
Core A
IN
Core A
OUT
Core A
Core B
IN
Core B
OUT IN
Core B
Core C
IN
Core C
OUT
Core C
OUT
OUT
( b)
(a)
(c)
Figure 1: (a) Daisychain, (b) Distributed, (c) Multiplexed [4]
• A daisy-chain architecture (Figure 1 (a)) is where the input TAM for 1 core is the output
TAM from the previous core, i.e. if there are 10 cores in the SoC, before the 10th core can
be tested the first 9 cores must have been tested, therefore this is also a sequential testing
scheme. A parallel core-testing scheme is one where each of the cores can be tested in
parallel.
• A distributed architecture (Figure 1(b)) is one where the TAM lines are divided between
all of the cores, so that core testing can occur in parallel.
• A multiplexed architecture (Figure 1 (c)) is one where each core within the SoC has access
the whole TAM, but only 1 core can ever use the TAM at any given time, so testing is
sequential.
Previous work [5] on TAM assignment techniques has been based partially on these 3
architectures mentioned above.
TAM structures can use a non-bus based or a bus based strategy.
• A non-bus based strategy is where extra interconnections are added to a design to facilitate
a TAM. These extra interconnections will only ever transport test stimuli and test response
to and from the cores. There are 2 main disadvantages to a non-bus based TAM: the extra
area required for the TAM and the extra complexity added to the layout stage of the SoC
design flow. These 2 disadvantages can have an impact on the SoC, possibly resulting in
increased time to silicon and increased overall cost in terms of silicon area.
• A bus-based strategy is one where the system bus is reused to transport test stimuli and test
responses across the chip.
For a full scan design, using the stuck at fault (SAF) model, the test vectors can be separated
into 2 main categories: the scan vectors and the functional vectors. Using the SAF model one
of the signal lines within the digital core is stuck at a fixed logic value, regardless of inputs
are applied to the core. Previous approaches [6-8] using bus based TAM strategies have
mainly focused on carrying only functional vectors; therefore only functional tests are
implemented. Functional testing on its own can only produce a limited test coverage, where in
most cases it would be less than the recommended test coverage of 95% - 99.9% [9]. A bus
based TAM strategy delivering both structural and functional test vectors [10] has been
investigated, but core test data had to be buffered before it could be applied. Bus-Based TAM
61
strategies have had 2 distinct disadvantages: the width of the TAM is limited by the existing
system bus structure and the type of test methods has been restrictive. We address both of
these points in our proposal.
There is no universal TAM scheme that can be applied to all SoC digital designs as each
design has different requirements depending on its target market, fabrication process used,
design flow used and budget and overhead allowances. The recently accepted IEEE 1500
Standard for Embedded Core Test (SECT) has an optional WPP which can be connected to a
user defined TAM for faster test vector application, but the TAM cannot be defined in the
standard as it is design dependent. The TAM design can be one of the most important aspects
of a SoC test structure as it can have severe impacts on the overall TAT and the silicon area if
not designed with careful planning and consideration.
3. Test Resource Partitioning
Many resources are required to execute a SoC test. The amount of each resource used is
dictated by the test resource-partitioning scheme in operation. Examples of test resources are
as follows: test cost, test time, test power, test interface bandwidth, and TAM width. The
above list is not exhaustive and may be different depending on the SoC design and test
environment.
• Test cost may contain many different elements such as test engineer costs, additional
silicon area, and ATE cost which may all have an impact on the end cost of the SoC to the
customer. Increasing overall test costs by a small fraction may give a competitor added
advantages.
• Test time is the amount of time that it takes to achieve reasonable test coverage of the SoC.
• When a SoC is placed in test mode, more power may be dissipated than in normal
operation.
• The test interface bandwidth limits the amount of test data that can be transported on and
off chip at any given time. An example of the importance of test interface bandwidth is the
IEEE 1149.1 TAP where all test data had to be serialised and de-serialised due to
restrictive test interface bandwidth limits. The IEEE 1500 has built in a WPP to
accommodate a higher bandwidth test interface if required.
• A wider TAM may in certain circumstances introduce additional interconnections and
routing complexity.
Each of these resources is closely associated with each other and placing a constraint on one
resource will have direct consequences on other resources. Table 1 shows possible outcomes
of increasing and decreasing certain test resources.
Test Cost Increase
Test Cost Decrease
Test Time Increase
Test Time Decrease
Test Power Increase
Test Power Decrease
Test I/F Bandwidth Increase
Test I/F Bandwidth Decrease
TAM Width Increase
TAM Width Decrease
Test Cost Test Time Test Power Test I/F Bandwidth TAM Width
↑
↓
↑
↑
↑
↓
↑
↓
↓
↓
↓
↑
↓
↓
↓
↑
↓
↑
↑
↑
↓
↓
↑
↑
↑
↑
↑
↓
↓
↓
↑
↓
↑
↑
↑
↓
↑
↓
↓
↓
↑
↓
↑
↑
↑
↓
↑
↓
↓
↓
Table 1: Test Resource Comparisons
The test resource partitioning scheme can only be decided when the SoC design is known and
the test requirements and constraints are decided.
62
4. IEEE 1500 Overview
The IEEE 1500 [1] provides a scalable test architecture for embedded digital cores within a
SoC. Access is provided to these embedded digital cores using the IEEE 1500 wrapper for
controllability and observability of those cores. An IEEE 1500 wrapper can be used as a
bridge between core users and core providers. A standard IEEE 1500 wrapped core is shown
in Figure 2.
Figure 2: Standard IEEE 1500 Wrapped Core[1]
The main building blocks of the 1500 wrapper are shown in Figure 2. The WIR (Wrapper
Instruction Register) enables all of the IEEE 1500 operations. The IEEE 1500 wrapper has
several modes of operation. There are modes for functional (nontest) operation, inward facing
(IF) test operation, and outward facing (OF) test operation. Different test modes also
determine whether the serial test data mechanism (WSI–WSO) or the parallel test data
mechanism (WPI–WPO), if present, is being utilised. The WBY (Wrapper Bypass Register)
provides a bypass path for the WSI – WSO terminals of the WSP (Wrapper Serial Port). The
WBR (Wrapper Boundary Register) is the data register through which test data stimuli are
applied and pattern responses are captured. The WPP is used for increased data bandwidth to
the wrapped core.
5. TRP Utilisation
There has been much previous work [5] in the area of TRP for SoC designs. The resource that
this paper has concentrated on is efficient TAM allocation to reduce the total test. An
approach used by [3], TR_ARCHITECT, has been the basis for the TRP for TAM allocation
in this work. There are two steps from TR_ARCHITECT used: CreateStartSolution and
OptimiseBottomUp. The main constraint that has to be decided before the TAM allocation can
be computed is the total TAM width.
• T = TAM WIDTH
The total number of cores in the SoC also needs to be known.
• C = Total Number of Cores
For each core ‘i’ (1 < i < C) in the SoC the number of primary inputs that will have a WBR
(wrapper boundary register) cell when the IEEE 1500 wrapper is in place must be known.
• ni = Core ‘i’ Primary Inputs with WBR cell
63
The number of scan flip-flops (fi) contained in each core is required to calculate a test time for
each core along with the number of test patterns (tpi) for that core.
• fi = Core ‘i’ number of scan flip flops
• tpi = Core ‘i’ number of test patterns
To calculate the test time for each core, several assumptions are made such as: each core is
wrapped with an IEEE 1500 compliant wrapper; the amount of time that it takes to set-up the
wrapper for test is 7 cycles (4 cycles for the instruction op-code and 3 cycles setup) plus 1
cycle per pattern to apply the test patterns to the core in normal functional mode; the scan
chains contained in each of the cores are balanced so that test time can be reduced further; and
each of the scan chains are to have a dedicated input via the WPP or WSP of the IEEE 1500
wrapper. Wpp denotes the width of WPP in bits. When only the WPP is used for test vector
loading and unloading each individual core test time (ti) can be calculated as follows:
• WPP = WPP width (bits)
• ti = Core ‘i’ test time
• ti = ((fi + ni)/ WPP) * tpi) + 7 + tpi
The first step of the CreateStartSolution [3] step from the TR_ARCHITECT algorithm
allocates the TAM bits one at a time giving each core access to one TAM line initially where
T > C. If a core has access to only one TAM line, then there is only one scan chain in that
core, whereas if a core has access to three TAM lines then the core would have 3 balanced
scan chains. The test time (ti) for each core changes according to the amount of access it has
to the TAM (i.e. more access to the TAM leads to more scan chains, therefore lowering test
time). After the first allocation of TAM lines, the remaining TAM lines are allocated to the
cores with the largest test times. Each time a core is allocated another TAM line, its test time
is reduced.
When all TAM lines have been allocated the OptimiseBottomUp [3] step from the
TR_ARCHITECT algorithm is applied to distribute the TAM lines evenly between cores.
After the initial allocation of the TAM lines some cores have a larger test time than others.
The goal of the OptimiseBottomUp algorithm is to make the test time on each TAM lines as
even as possible. To achieve this, the 2 TAM lines are found with the smallest test time, the
test(s) that are performed on the smallest test time TAM line are added to the 2nd smallest test
time TAM line, therefore freeing up the TAM line that previously had the smallest test time.
The freed up TAM line is allocated to the core that currently has the largest test time, thus
reducing the test time. One TAM line may service several different cores for test if required.
This process repeats until the test times on each of the TAM lines are equal (or as close as
possible).
6. Optimisation of the IEEE 1500 Wrapper and user defined TAM
In this novel IEEE 1500 and TAM optimisation a bus based TAM is used. The advantages of
the bus based TAM have been described in section 2. The benchmark test circuits that were
used for this paper can be found at [11]. The system bus that is incorporated in this
benchmark circuit is the AMBA APB bus [12]. The Test Interface Controller (TIC) and re-use
of the AMBA bus previously implemented by ARM [6-8] is similar to the novel TAM
architecture presented in this paper, but this architecture not only delivers functional test
vectors but also the scan vectors. The benchmark circuit has been calculated to have a
complexity factor of 131 according to the naming format stipulated by [13], therefore this
circuit is one of low complexity and ideal to demonstrate the advantages of a bus based TAM
where additional overheads required for test must be kept to a minimum. An overview of the
proposed novel test architecture is shown in Figure 3.
64
System Bus / TAM
Figure 3: Proposed Debug/Test Architecture
The test structure consists of each core being wrapped using an IEEE 1500 compliant
wrapper, an ‘input TAM’ to deliver the test vectors and an ‘output TAM’ to collect the test
responses. Each scan chain has an input and output, therefore the number of bits required for
the ‘output TAM’ must equal the number of bits for the ‘input TAM’. In a worst case
scenario, where an 8 bit system is in operation using the AMBA APB bus, calculations have
been made to determine which signals could be reused from the system for a bus based TAM.
It has been calculated that in the worst-case scenario, 15 signals could be re-used from the
AMBA APB bus for the ‘input TAM’ and 8 signals could be re-used for the ‘output TAM’.
Therefore an additional 7 bits would have to be added to the ‘output TAM’ so that the number
of bits of the ‘input TAM’ would equal the ‘output TAM’. These additional bits are not part
of the bus based TAM as they are not system bus signals that are being re-used.
The WPP of each IEEE 1500 wrapper is used to deliver and collect test responses, as a higher
bandwidth is provided using the WPP than the WSP. The wrappers have also implemented
the WSP to comply with the IEEE 1500 standard and to control the WIR, WBY and the
WBR.
The 1500 wrapper can be configured in 3 different ways for test vector loading and unloading:
1. WSP only mode,
2. WPP only mode and
3. WSP & WPP hybrid mode.
To illustrate the different wrapper configurations, a core (SPI Core) from an in-house
benchmark SoC is considered. The test characteristics of the SoC are shown can be found at
[11].
In WSP only mode (Figure 4), all test vectors and wrapper instructions are delivered via the
WSP. In this mode the core can only have one scan chain and the test data is delivered in a
serialised format.
65
Figure 4: Wrapper in WSP Only Configuration
In WPP only mode (Figure 5), the test vectors are delivered via the WPP, but instructions are
still delivered via the WSP. The number of balanced scan chains that the core contains
determines the width of the WPP: if the core has 3 balanced scan chains, the WPP is 3 bits
wide.
Figure 5: Wrapper in WPP Only Configuration
The final mode of operation combines the WSP and WPP (Figure 6); the wrapper instructions
are delivered via the WSP, the test vectors are then delivered via the WSP and WPP to the
balanced scan chains making full use of the available TAM resources.
Figure 6: Wrapper in WSP & WPP Configuration
66
6.1 Wrapper/TAM Optimisation Comparison
The theoretical experimental results for the in-house SoC design, that are shown in Table 2
are for the case of using; WPP to load and unload test vectors or WSP to load and unload test
vectors, or a combination of WSP and WPP using both distributed and multiplexed TAM
structures.
Table 2 gives a summary of the 5 different IEEE 1500 wrapper/ TAM optimisation
investigated
A:
B:
C:
D:
E:
Test Architecture
WSP Only (Distributed TAM)
WPP Only (Distributed TAM)
WPP Only (Multiplexed TAM)
WSP & WPP (Multiplexed TAM)
WSP & WPP (Distributed TAM)
TAT
697261
87697
85758
80792
68548
% TAT decrease
87.423
87.701
88.413
90.169
Table 2: Wrapper and TAM mode comparisons
The TAT for the WSP only (Distributed TAM) architecture is used as a lower bound for the
TAT and column 3 of Table 2 represents the percentage decrease that each test architecture
has on the overall TAT. The % decrease ranges from 87.423% to 90.169%.
The TAT can be calculated for each core in the SoC for all test architecture shown in Table 2
using equations 6.1.1 – 6.1.5:
ti = ((fi + ni) * tpi) + 7 + tpi
(Test Architecture A TAT calculation)
(6.1.1)
ti = ((fi + ni)/ WPP) * tpi) + 7 + tpi
(Test Architecture B TAT calculation)
(6.1.2)
ti = ((fi + ni)/ WPP) * tpi) + 7 + tpi
(Test Architecture C TAT calculation)
(6.1.3)
ti = ((fi + ni)/ (WPP + 1)) * tpi) + 7 + tpi
(Test Architecture D TAT calculation)
(6.1.4)
ti = ((fi + ni)/ (WPP + 1)) * tpi) + 7 + tpi
(Test Architecture E TAT calculation)
(6.1.5)
7. Control of the IEEE 1500 Wrappers and user defined TAMs
Figure 7: Test Controller
Figure 3 shows the proposed novel SoC debug/test architecture. In this architecture an
intelligent test controller is required to control the TAM and also the WSC (wrapper serial
control) ports of each core’s 1500 wrapper. A block diagram of the intelligent test controller
is shown in Figure 7.
67
The test controller is based on the well known IEEE 1149.1 TAP state machine[14]. A
conventional 1149.1 TAP state machine has an instruction register and a data register whereas
this test controller has an instruction register and control and status registers instead of the
data register but is still utilizing the 16 state state-machine. An additional PTDI (parallel test
data in) and PTDO (parallel test data out) is provided to allow for a higher bandwidth port for
test vector application and test vector response for all cores in the SoC.
The total TAM within the SoC is divided up into TAM sections. Each TAM section has its
own TAM section state machine to control the TAM and WSC ports associated with it. Each
TAM section is operated independently of each other. Each TAM section also has its own
WSC port for the cores on that TAM section. If the there is more than one core on a TAM
section then further control signals are needed for that TAM section i.e. CoreSelect, to enable
the individual selection of a core’s wrapper. The test vectors are transmit to the cores on a
TAM section using the appropriate PTAM_ip (parallel TAM input) lines and the test
responses are transported via the appropriate PTAM_op (parallel TAM output) lines. The
number of PTAM_ip and PTAM_op lines associated with each TAM section is derived from
the TAM optimisation scheme used.
Each TAM section state machine also has a small piece of memory that has the necessary core
test information stored to carry out tests on the cores associated with the cores on that TAM
section. This information includes: the number of cores, length of the longest scan chain in
each core, number of WBR cells for each core and also the number of test patterns for each
core. There is section of memory for the core test data when the cores only have a mandatory
serial test interface and there is a section of memory that contains information about each core
when it has a hybrid interface, i.e. a serial and parallel interface combined.
Each TAM section state machine supports all of the IEEE 1500 mandatory test modes and
also some additional hybrid test modes with higher bandwidth interfaces to reduce TATs.
The intelligent test controller is required as there is no mechanism specified by the IEEE 1500
standard for the control of multiple wrappers in a digital SoC. Not including an intelligent test
controller would require bringing all WSC signals for each wrapper to the SoC primary inputs
and primary outputs for external control, resulting in the addition of physical pins which are
already under tight constraints.
8. Future Work & Conclusion
The results in Table 2 show that combining WSP and WPP for test vector application and test
response collection provides the best time for overall TAT. This is based on a distributed
TAM approach with the TRP TR_ARCHITECT algorithm applied. The bus based approach
introduced in this paper is based on adding an additional seven lines to the ‘output TAM’ that
is not part of the system bus. If a multiplexed TAM approach is used, each core in the system
requires access to these additional seven TAM lines, introduces interconnection complexity
(i.e. additional silicon) and routing overheads.
Using the TAM TRP approach the overall TAT decreases by 90.169% compared to a WSP
only approach with a distributed TAM. In addition to this considerable reduction in TAT, the
silicon cost may also be reduced due to the lower interconnection complexity of the bus-based
TAM and by carefully planning the layout, placing the additional seven ‘output TAM’ lines
closest to the test data source/sink (test controller or TAP).
The addition of the intelligent test controller IP to a digital SoC design will allow each of the
digital cores within the system to be managed according to the TAM allocation scheme. The
test controller could be added to any digital design, where all the cores are wrapped with an
68
IEEE 1500 compliant wrapper alleviating the need for an expensive piece of ATE to control
each of the core’s wrappers within the digital SoC.
This TRP approach only focuses on the TAM resource, further analysis of this TAM
allocation technique would have to take power into consideration; for example activating too
many cores in parallel may increase power consumption considerably, perhaps to an
unacceptable level.
Future work involves applying this novel approach to other benchmark circuits using a system
bus architecture and eventually bringing the L131 (in-house SoC design) circuit to fabrication,
so that the viability of the approach can be verified on silicon and with external tester
technology. Before fabrication the approach has to be verified on a FPGA. The authors have
already shown that it is possible to validate a full scan design on FPGA [15] To further
validate this novel architecture it would also be necessary to replace the bus based TAM with
a traditional non-bus based TAM to generate figures for silicon overheads and interconnection
complexity.
Acknowledgement
The authors acknowledge the support of the CSRC and the Department of Electronic and
Computer Engineering at the University of Limerick. This project has been funded under an
Enterprise Ireland Commercialisation Fund - Technology Development Grant: CFTD/05/315
(GENISIT).
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
IEEE, "IEEE Standard Testability Method for Embedded Core-based Integrated Circuits," in IEEE Std
1500-2005, 2005, pp. 0_1-117.
E. Larsson, K. Arvidsson, H. Fujiwara, and Z. Peng, "Efficient test solutions for core-based designs,"
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 23, pp. 758775, 2004.
S. K. Goel and E. J. Marinissen, "Effective and efficient test architecture design for SOCs," presented at
Test Conference, 2002. Proceedings. International, 2002.
J. Aerts and E. J. Marinissen, "Scan chain design for test time reduction in core-based ICs," presented at
Test Conference, 1998. Proceedings. International, 1998.
K. Chakrabarty, V. Ivengar, and A. Chandra, Test Resource Partitioning for System-on-a-Chip: Kluwer
Academic Publishers, 2002.
D. Flynn, "AMBA: enabling reusable on-chip designs," Micro, IEEE, vol. 17, pp. 20-27, 1997.
P. Harrod, "Testing reusable IP-a case study," presented at Test Conference, 1999. Proceedings.
International, 1999.
A. Burdass, G. Campbell, R. Grisenthwaite, D. Gwilt, P. Harrod, and R. York, "Microprocessor cores,"
presented at European Test Workshop, 2000. Proceedings. IEEE, 2000.
A. L. Crouch, Design-for-Test for Digital IC's and Embedded Core Systems. New Jersey: Prentice Hall,
1999.
A. Larsson, E. Larsson, P. Eles, and Z. Peng, "Optimization of a bus-based test data transportation
mechanism in system-on-chip," presented at Digital System Design, 2005. Proceedings. 8th Euromicro
Conference on, 2005.
CSRC, "http://www.csrc.ie/Documents/tabid/141/Default.aspx," 2006.
ARM, "AMBA Specification. http://www.arm.com/products/solutions/AMBAHomePage.html," ND.
ITC, "http://www.hitech-projects.com/itc02socbenchm/," 2002.
IEEE, "IEEE standard test access port and boundary-scan architecture," in IEEE Std 1149.1-2001, 2001,
pp. i-200.
B. Mullane, C. H. Chiang, M. Higgins, C. MacNamee, T. J. Chakraborty, and T. B. Cook, "FPGA
Prototyping of a Scan Based System-On-Chip Design," presented at Reconfigurable CommunicationCentric SoCs 2007, Montpellier, France, 2007.
69
Session 3
Applications
71
MemoryLane:
An Intelligent Mobile Companion for Elderly Users
Sheila Mc Carthy 1, Paul Mc Kevitt 1, Mike McTear 2 and Heather Sayers 1
Intelligent Systems Research Centre
School of Computing and Intelligent Systems
Faculty of Computing & Engineering
University of Ulster, Magee
Derry/Londonderry BT48 7JL
Northern Ireland
{McCarthy-S2, p.mckevitt, hm.sayers} @ulster.ac.uk
1
School of Computing and Mathematics
Faculty of Computing & Engineering
University of Ulster, Jordanstown
Newtownabbey BT48 7JL
Northern Ireland
[email protected]
2
Abstract
Mobile technologies have the potential to enhance the lives of elderly users, especially those who
experience a decline in cognitive abilities. However, diminutive devices often perplex the aged
and many HCI problems exist. This paper discusses the development of a mobile intelligent
multimodal storytelling companion for elderly users. The application, entitled MemoryLane,
composes excerpts selected from a lifetime’s memories and conveys these past memories in a
storytelling format. MemoryLane aims to possess the capability to produce bespoke stories that
are both appropriate and pleasing to the user; this paper documents the proposed methodology and
system design to accomplish this. As MemoryLane is expected to be deployed on a Personal
Digital assistant (PDA), the preliminary field work to date investigating the usability of PDAs by
elderly users is also discussed.
Keywords: Digital Storytelling, Multimodal, Elderly, Usability, MemoryLane.
1
Introduction
The elderly population is dramatically increasing, especially in the more economically developed
countries of the world and Ireland is no exception, according to the 2002 census [Department of
Health and Children, 2007] there are 436,001 people aged 65 and over living in Ireland, an increase of
22,119 since the previous census of 1996. It is well accepted that with age there is often an associated
cognitive decline, which varies among individuals, affecting abilities such as memory and planning.
For example, severe cognitive decline in the form of dementia currently affects 1 in 20 over the age of
65, 1 in 5 over the age of 80, and over 750,000 people in the UK [Alzheimers Society, 2006].
Cognitive decline is an inherent part of the natural ageing process ensuring that the numbers of
sufferers increase steadily as the elderly population grows. Catering for such a diverse sector requires
detailed analysis.
72
Reminiscence plays an important role in the lives of elderly people; many perfect the art of storytelling
and enjoy its social benefits. The telling of stories of past events and experiences defines family
identities and is an integral part of most cultures. Losing the ability to recollect past memories is not
only disadvantageous, but can prove quite detrimental, especially to many older people.
Ethnographical studies rely on participants’ powers of recall to successfully conduct their research,
and often bear witness to the intangibility of precious memories. Considerable research is being
conducted into how technology can best serve and assist the elderly. Pervasive environments (smart
homes with smart appliances) are being developed to assist elderly users to remain living
independently in their own homes while maintaining a high quality of life. This, in turn, minimises the
emotional and financial strain often caused by nursing home accommodation. Memory prompts have
been developed to remind users to perform imminent activities and the prospect of personal artificial
companions has often been proposed [Wilks, 2005]. Mobile technology is commonplace and offers the
potential to be harnessed as a tool to assist many of these elderly people. However, diminutive
devices often perplex the aged and many usability problems exist. Consequently this potential is very
often not maximised.
The aim of this research is to develop a usable, mobile intelligent multimodal companion for elderly
users. Due to the known benefits of reminiscence among the elderly, the objective of the companion
will be to assist the elderly in recalling their own past life events and memories as they experience the
natural cognitive declines associated with the ageing process. The application is entitled MemoryLane
and will employ digital storytelling techniques to relay the memories to the user. MemoryLane will be
deployed on a Personal Digital Assistant (PDA) which will equip users with the ability to re-live
bygone days, and the portability to relay them to others. The application will also address the usability
problems encountered by the elderly when using mobile devices. In addition to this, it is envisaged
that MemoryLane could posthumously be inherited by family members and drawn on to revive the
memory of a loved one. This paper will discuss the background areas and related work to the research,
the system design, the work accomplished to date, and the remaining challenges.
2
Background and Related Research
The focus of this research is underpinned by several distinct research areas including gerontechnology,
HCI, usability studies, memory, reminiscence, life-caching, pervasive computing, mobile companions,
ethnography, digital storytelling, artificial intelligence and multimodality. A background to these areas
is now provided.
2.1
Intelligent Storytelling
Traditionally, intelligence is perceived as problem solving techniques, where composing and listening
to ‘stories’ may be construed as a peripheral aspect of intelligence. However the term ‘intelligent’
implies having the ability to relay appropriate information, of particular relevance to the user, in a
suitable context and format [Schank, 1995], such an ability is also a critical feature of intelligent
storytelling. Humans possess an intrinsic desire to both tell and hear stories. It is widely accepted that
children are especially fond of stories yet adults too love to read or watch stories in various formats.
Schank [1995] observes that it is essential for people to discuss what has happened to them and to hear
about what has happened to others, especially when such experiences directly affect the hearer, or the
teller is known personally. Schank [1995] considers the connotations of how recalling past stories
shape the way in which new ones are heard and interpreted, he also endeavors to develop storytelling
systems which not only have appealing stories to relay, but encompass the awareness to know when to
tell the stories. Indeed Schank’s work [Schank, 1995] forms the basis of various other storytelling
systems.
Intelligent storytelling systems very often incorporate multimodality and interactivity for a rich user
experience. Larsen & Petersen [1999] developed multimodal storytelling environment in which the
73
user traverses a virtual location in subjective camera view and is both active story-hear and storyteller. Similarly, the Oz project [Loyall, 1997] also allows the user to interact with a virtual
environment called ‘The Edge of Intention’, a peculiar world populated by 4 ellipsoidal creatures
called Woggles. The user embodies one of the Woggles, the remaining 3 being controlled by the
computer. KidsRoom by Bobick et al. [1996] is also typical of interactive multimodal storytelling
systems. KidsRoom is a fully-automated, interactive narrative play-space for children. Images, lights,
sound, and computer vision action recognition technology are combined to transform a child's
bedroom into a curious world for interactive play. Such storytelling systems enable the user to
dynamically interact during storytelling, allowing them to play pivotal roles in the proceedings.
However, in contrast to this genre of storytelling systems, which focus largely on story scripts, Okada
[1996] developed AESOPWORLD. This storytelling system is not interactive, moreover it aims to
model the mind, developing human-like intelligence, and modelling the activities of the central
character accordingly. STORYBOOK by Callaway & Lester [2002] uses a narrative plan to convert
logical representations of the characters, props and actions of a story into prose. MemoryLane will
draw on the intelligent storytelling techniques discussed in this section to relay memories to the user.
2.2
Gerontechnology
Due to the increasing numbers of the elderly population they have become the focus of much research
designed to improve, prolong and enhance their lives. Gerontology is the study of elderly people and
of the social, psychological and biological aspects of the ageing process itself, as distinct from the
term Geriatrics, the study of the diseases which afflict the elderly. Gerontechnology, the merger
between gerontology and technology is a newer genus, concerning itself with the utilisation of
technological advancements to improve the health, mobility, communication, leisure and environment
of elderly people, effectively allowing them to remain living independently in their own homes for
longer. Stanley & Cheek [2003] discuss what is understood by the ‘well-being’ of the elderly in their
comprehensive literature review. Therefore gerontechnology is heavily concerned with the ways in
which elderly people interact with computers and technology, and substantial research is being
conducted in this area.
Willis [1996] discusses cognitive competence in elderly persons, while Melenhorst et al. [2004]
investigated the use of communication technologies by elderly people and explored their perceived
and expected benefits. Fisk & Rogers [2002] discuss how psychological science might assist with the
issues of age-related usability, and Van Gerven et al. [2006] formulates recommendations for
designing computer-based training materials aimed at elderly learners. In a recent paper, Zajicek
[2006] reflects upon established HCI research processes and identifies certain areas in which this type
of research differs significantly from other research disciplines. Pervasive environments designed to
assist older people to live independently and maintain a high quality of life have been developed.
Search engines have been specifically designed for elderly users [Aula & Kaki, 2006], and many
pervasive gadgets are evident, including a meal preparation system [Helal et al., 2003], a self
monitoring teapot [AARP, 2005] and a hand held personal home assistant capable of controlling a
range of electronic devices in the home [Burmester et al., 1997]. By implementing MemoryLane, we
hope to add to the large body of gerontechnology research.
2.3
Digital Memories
Digital memory aids have been designed to assist users in various ways, acting as digital companions,
especially in later life. The value of such devices was initially debated by Bush [1945], and has since
been deliberated and discussed by Wilks [2005]. In addition to digital memory aids, memories
themselves are being digitalised. Nokia provide a digital photo album, often utilised by the blog
community to organise photos and videos to a timeline. Kelliher [2004] discuss an online weblog
populated by the daily submissions of events experienced by a group of camera phone using
participants. An experiment which digitalises and stores the lifetime memories of one man is being
conducted by Gemmell et al. [2006], and another of the UKCRC’s Grand Challenges is focused in this
area (GC3 project). The GC3 project aims to gain an insight into the workings of human memory and
74
develop enhancing technologies. Incidentally, this project also envisages featuring personal
companions in the next 10 to 20 years, using information extracted from memories to aid elderly
persons as senior companions for reminders. SenseCam [Hodges et al., 2006] is a revolutionary
pervasive device, which aims to be a powerful, retrospective memory aid. SenseCam is a sensor
augmented, wearable, stills camera, worn around the neck, which is designed to record a digital
account of the wearer’s day. SenseCam will take (wide-angle) photographs automatically every 30
seconds, without user intervention, and also when triggered by a change in the in-built sensors, such as
a change in light or body heat. The rationale behind SenseCam is that having captured a digital record
of an event, it can subsequently be reviewed by the wearer to stimulate memories. Dublin City
University’s Centre for Digital Video Processing (CDVP) is currently using two sensecams in their
Microsoft funded ‘personal life recording’ research project. MemoryLane will use similar ‘lifecached’ data to compose personal digital memories for output.
2.4
Usability Studies
Myriad HCI usability studies are being conducted in the area of computers and the elderly, but
substantially less are being conducted into the specifics of how the elderly interact with pervasive
devices, despite the fact that active researchers within this area have discussed the benefits of mobile
devices to the elderly, and have highlighted the need to learn more to design for this genre [Goodman
et al., 2004]. An initial PDA usability study conducted by Siek et al. [2005] compared differences in
the interaction patterns of older and younger users. This work attempted to ascertain whether older
people, who may be subject to reduced cognitive abilities, could effectively use PDAs. However, this
initial research was conducted with a small sample of 20 users, made up from a control group of 10
younger users aged 25-35, and 10 elderly users aged 75-85 years. The study was restricted to the
monitored analysis of the participants’ abilities to perform 5 controlled interactive tests using a ‘Palm
Tungsten T3 PDA’. The findings of this basic study failed to identify any major differences in the
performance of the two groups which could be due to the fact that the elderly group was extended
extra practice time privileges. Siek et al. work [Siek et al., 2005] offers an early insight into the nature
of the proposed field work for this research. A study conducted into determining the effects of age and
font size on the readability of text on handheld computers is also of particular interest [Darroch et al.,
2005]. Additional research has been conducted into mobile phone usage by the elderly; usability issues
identified include displays that are too small and difficult to see, buttons and text that are too small
causing inaccurate dialling, non user-friendly menus, complex functions and unclear instructions
resulting in limited usage, usually reserved for emergencies [Kurniawan et al., 2006]. Research shows
that mobile devices that are not designed to include the needs of the elderly have the potential to
exclude them from using the device, therefore it is imperative that MemoryLane be developed using a
user-centred approach.
2.5
Ethnographical Studies
Cultural probes and props such as photographs and memorabilia are often used in ethnographical
studies to prompt participants. The benefits of photo elicitation have been widely acclaimed by
Quigley & Risborg [2003] who document tremendous success with the elderly users of their digital
scrapbook. The work conducted by Wyche et al. [2006] also employs cultural probes in a ‘historicallygrounded’ research approach to designing pervasive systems and assistive home applications which
present findings from an ethnographic study which examined ageing and housework. The study
employed a physical ‘memory scrapbook’ as seen in Fig. 1, and used photo elicitation to provoke
responses from elderly participants. The memory scrapbook was constructed from an 8.5 x 11 inch,
fabric bound volume and was filled with dated images and memorabilia applicable to the focus of the
study. Approximately 100 photos, greeting cards, magazine snippets, advertisements and other
mementos were displayed. Wyche et al. [2006] found that the images contained in the memory
scrapbook stimulated the memories of participants and evoked deep elements of human consciousness
which yielded rich user experiences. It is envisaged that cultural probes be used in a similar way
during subsequent ethnographical studies for MemoryLane to elicit oral histories from participants.
75
Fig. 1. The Memory Scrapbook [Wyche et al., 2006]
3
MemoryLane Design and Architecture
MemoryLane will accept various media objects as input, personal items applicable to the history of the
user such as photographs, video clips, favourite songs or even a favoured poem. These objects
together with personal details and preferences of the user will be intelligently utilised in the
composition of a story told for the pleasure of the user. MemoryLane needs to mimic the notion of
understanding to compose appropriate and interesting stories and respond effectively to the user.
People have a memory full of experiences that they may wish to recount and relay to others.
MemoryLane needs to create an account of the right ones to tell in anticipation of their eventual use.
The platform for deployment is a PDA, which would enable the users to carry their memories in a
mobile companion. A visual concept of MemoryLane is depicted Fig. 2.
Fig. 2. Concept of MemoryLane
The need for multimodal intelligent user interfaces has been identified and embodied in various
applications such as landmark project SmartKom [Wahlster, 2006]. In accordance with this
requirement, it is envisaged that MemoryLane be designed to support multimodal input via a touch
screen and possible use of simple voice control commands. The benefits of multimodal interaction are
widely discussed by López Cózar Delgado & Araki [2005], and the design of MemoryLane will assure
a multimodal interface which will accommodate elderly users with different capabilities, expertise or
expectations. MemoryLane will also provide multimodal output in the form of images, video, audio
and text to speech synthesis. There are several security and privacy aspects of MemoryLane which
will require definition during MemoryLane’s development phase such as, ownership of the media and
the rights of individuals present in other people’s memories. The Unified Modelling Language (UML)
will be used as a method for designing the application incorporating use cases and the standardised
graphical notation to create an abstract high level model of MemoryLane as a whole.
3.1 Artificial Intelligent Techniques for Storytelling
MemoryLane will incorporate Artificial Intelligence (AI) techniques to compose life-caching data into
appropriate and pleasing ‘stories’ for the user. It is vital that stories are constructed in an intelligent
way, so that they (a) make sense, and (b) don’t include erroneous data objects that do not belong to the
history of the current user. Case-Based Reasoning (CBR) and Rule-Based Reasoning (RBR) will be
employed for the decision making in MemoryLane. Decision making will be necessary to
76
appropriately compose the various input data objects into personalised stories. MemoryLane needs to
be aware of sensitive data, how to handle it, and be able to accommodate the preferences of the users.
Speech processing can be divided into several categories, two of which are related to this research:
speech recognition, which analyses the linguistic content of a speech signal, and speech synthesis, the
artificial production of human speech. Speech recognition will be investigated as a possible user input
mode, however speech recognition is notoriously difficult, the main problem being that speech
recognition systems cannot guarantee as accurate an interpretation of their input as systems whose
input is via mouse and keyboard [McTear, 2004], and the varying speech abilities of the elderly may
cause problems in this area. MemoryLane will employ Text to Speech (TTS) to convert normal
language text into speech for both verbal directions to guide user interaction and as part of the
memories output to the user. Speech synthesis systems allow people with visual impairments or
reading disabilities to listen to written works which will prove beneficial in systems designed for the
elderly, however, speech synthesis systems are often judged on intelligibility, and their similarity to
the human voice [McTear, 2004].
3.2 MemoryLane Architecture
The architecture as depicted in Fig. 3 visually represents the data flow of MemoryLane. To begin, the
elderly user interacts with the AI multimodal interface and inputs a request to view a memory. This
request is transmitted to the AI decision making module, which uses RBR and CBR to interpret the
user’s request. The decision making module will first establish if the request is for a previously viewed
memory (saved as a favourite) or for a new, (previously un-composed) memory. The decision making
module will then either retrieve a complete previously seen ‘favourite’ memory, or the data objects
required to compose a new one from storage. The decision making module will also commit favourite
memories to file for future viewing. The user’s previously input personal data objects (images, audio,
video and text) are stored on the storage module and are made available to the decision making
module. The decision making module uses its rule bases to compose a memory for output in
association with the personal user information stored by MemoryLane. This memory transcript is
transmitted to the memory composition module which will design the memory output in a
‘storytelling’ format, using speech processing if required. The formatted memory is then relayed to the
multimodal interface which will output the memory to the user. The multimodal interface also
transmits and records user information during user interaction, for example, MemoryLane may record
the preferences of the user for subsequent usage.
Fig. 3. MemoryLane Architecture
77
3.3 Software Analysis
It is envisaged that MemoryLane will be coded using the Visual Studio developer suite. The
utilisation of X+V the latest addition to the XML family of technologies for user interface
development, will be investigated for its usefulness to the project as will the various development
platforms as discussed by McTear [2004]. It is also envisaged that SPSS (Statistical Package for the
Social Sciences) be used in the statistical modelling of the data. A variety of handheld devices, such
as smart phones and tablet PCs, may be investigated for their usefulness to the project; however the
preferred hardware PDA device is a Dell Axim™ X51v-624 MHz Handheld which runs the Windows
Mobile 2005 operating system. The Axim has a colour touch screen, stylus, and navigational input
buttons.
3.4 Usability Evaluation
The completed MemoryLane application will be deployed on a PDA device for testing and evaluation.
The preferred deployment platform, a DELL Axim X51v PDA device is pictured beside an impression
of the proposed MemoryLane prototype in Fig. 4. In the final phase of the project it is hoped to
conduct a usability evaluation of the PDA based MemoryLane prototype with a section of original
participants from the field study which evaluated the usability of a PDA.
Fig. 4. Dell Axim X51v PDA
4 Usability of PDAs
The initial stage of this research began with a preliminary HCI pilot study conducted with a sample of
elderly users and aimed at investigating the usability of a PDA. Prior to conducting interviews many
preliminary visits were initially required to gain trust and build a rapport with the elderly participants.
The pilot study sample comprised 15 participants of apparent good health. The sample was aged
between 55 and 82 years and included 6 males and 9 females. Participants were selected from four
different sources; 6 attended an Age Concern centre, 3 were members of The University of the 3rd
Age, 2 were day patients of a local Nursing Unit and the remaining 4 were selected at random from
responses received from volunteers. Each participant was interviewed separately in a one-to-one
structured interview format in familiar surroundings. The interviews involved completion of a detailed
questionnaire, a demonstration of how to interact with a PDA by the researcher, followed by
observation of participants’ capability in attempting to complete pre-set interactive PDA tasks. Initial
research for the questionnaire design discovered that questions requiring prose type answers took
participants too long to complete, during which they often became frustrated and seemed to prefer
yes/no or tick box answers. Prose answers also proved ambiguous and often difficult to quantify,
therefore the questionnaire followed the 5 point Likert- type scale giving participants 5 optional
answers. The ensuing questionnaire was divided into sections A and B. Section A of the questionnaire
was designed to acquire background information regarding participants’ physical characteristics,
socio-economic factors, perceived technical abilities, prior exposure to technology and personal
opinions of modern day technology. Section B of the questionnaire was designed to be completed in
conjunction with undertaking the interactive PDA tasks; this section determined the participant’s
ability to complete the set tasks and ascertained their HCI preferences. This section centred on
questions regarding preferred interaction modalities and aspects and elements of the PDA hardware
and software. As part of section B, participants were asked to attempt 6 basic tasks on the PDA as
78
illustrated in Fig. 5. This section of the interview was videotaped where possible, in conjunction with
the participant’s approval.
Fig. 5. Participant Interacting with PDA
It was clear from the outset that the participants found the PDA extremely complicated to use and had
difficulty even knowing where to start; no one found the interface instinctive or intuitive. This was
evidenced by the level of assistance requested and given. Despite the functionality of a PDA being
demonstrated beforehand, not one of the participants could carry out even the most basic of tasks
unaided. There was also a noticeable level of general disinterest in applications hosted on the PDA;
none were of particular personal appeal to the participants. For example most thought that its
functions as a calendar or diary were of little interest as they preferred a pen and diary. When asked,
many agreed that they would certainly be more interested, and inclined to engage with the PDA if it
provided an application of personal interest, such as MemoryLane. However, despite participants
initially expressing concern about being unable to partake in the study due to their lack of computer
knowledge, and the difficulties incurred during the tasks, many participants said they actually enjoyed
the experience of PDA interaction. Most felt that their skills would improve if they had more time with
the PDA and some expressed a desire to learn more about a PDA given the desired surroundings and
instructor. The portability of a PDA appealed to the majority of participants who remarked on it being
‘small enough’ to fit into a handbag or breast pocket. This would imply that many elderly users
possess a genuine interest in engaging with mobile technologies and that a PDA has a certain appeal to
many elderly people, however, due to complex interfaces and interactions, many choose not to
experiment with such devices. These findings suggest that the interface for MemoryLane must strive
to be simplistic, usable and intuitive to be successfully deployed on a PDA.
5 Relation to Other Work
Mobile devices that are not designed to include the needs of the elderly users have the potential to
exclude them from using such devices. Technologies are often developed for elderly users without
specific usability studies having been conducted with target users, and are typically based on generic
HCI guidelines. Minimal usability studies focus on elderly users’ interaction with mobile devices
[Goodman et al., 2004] and those that have are small scale [Siek et al., 2005]. This research aims to
incorporate a large sample and perform a detailed analysis in a bespoke usability study using the
intended hardware conducted with the target audience prior to developing the application.
MemoryLane will then be designed and implemented in a storytelling format based on the specific
findings of the study. This research also aims to deploy MemoryLane to a PDA - rarely used in
Gerontechnology, and as yet no PDA based multimodal storytelling companion, which takes existing
memory data and builds it into a coherent story for users, exists. Most existing memory assistive
devices are prompts for current or future events [Morrison et al., 2004]; MemoryLane will be a
multimodal reminder of memories and past events. Therefore the contributions of this research are a
set of design guidelines for PDA based applications for the elderly users and multimodal storytelling
of memories and past events.
79
6 Conclusion & Future Work
This paper provides a summary of issues relating to the development of MemoryLane. The objectives
of MemoryLane, in providing a usable, intelligent mobile companion for elderly users have been
defined, and the importance of reminiscence to the elderly clearly stated. The work completed to date
has largely centred on requirements gathering, the first stage of which took the form of an
investigative study into the usability of PDAs by the elderly and the second phase of requirements
gathering, a field study which will investigate reminiscence patterns among the elderly is currently
underway. This next phase of requirements gathering is concerned with eliciting the user requirements
for MemoryLane. In order to develop a system which presents users with digital accounts of their
memories, it is first important to see how people reminisce and recall their episodic memories.
This study will establish what the users require from such an application and will form the basis of the
design and implementation of MemoryLane. The study will also initiate storytelling and reminiscence
to elicit oral histories of the past lives and experiences of the elderly participants. Video-taped
informal focus groups will be conducted, at which, there will be guided open discussion.
Questionnaires will not be used at this point to avoid incorporating bias and inhibiting the flow of
conversation. Participants will be observed to ascertain how well they remember, and the manner in
which they recount their memories. The participants will also be observed to elicit the emotions and
feelings that reminiscence evokes, to note if the experiences are pleasant or uncomfortable;
MemoryLane can then incorporate procedures to handle sensitive data. The focus sessions will also
aim to establish any omissions, similarities, patterns or trends in the discourse of participants. A
bespoke ‘memory scrapbook’ will be constructed and used in the next phase in this research.
Photographs and mementos of by-gone eras, applicable to the socio-economic climate of the area will
be included in the scrapbook. Cultural probes, everyday artefacts from bygone days, will also be used
in the study to provoke responses from participants. Participants will be asked about their ability to
recall memories prior to using the scrapbook and then, in contrast, whilst using the scrapbook as a
visual aid and prompt. The hypothesis is that the latter discussions, with the scrapbook, will elicit far
richer oral histories than discussion based on recollect alone.
The remaining challenges of the research will be to implement the design for MemoryLane while
adopting a user-centred methodology. The development process will be iterative in nature, requiring
repeated evaluations with the elderly sample, and will incorporate the findings of the two field studies.
Acknowledgments: The authors would like to express gratitude to Dr. Norman Alm for his input and
to Dr. Kevin Curran and Professors Bryan Scotney and Sally McClean for their valuable advice and
guidance. The authors would also like to extend appreciation to the pilot study participants who took
the time to contribute to the research.
7
References
[AARP, 2005] AARP (2005). Japan: i-pot—A Virtual Tea for Two [Homepage of AARP], [Online].
Available at: www.aarp.org/international/agingadvances/innovations/Articles/06_05_japan_ipot.html
[Alzheimers Society, 2006] Alzheimers Society, (2006). Facts about Dementia [Online] Available at:
http://www.alzheimers.org.uk/
[Aula & Kaki, 2006] Aula, A. & Kaki, M. (2006). Less is more in Web search interfaces for older
adults,
First
Monday,
[Online],
vol.
10,
no.
7.
Available
at:
http://www.firstmonday.org/issues/issue10_7/aula/
[Bobick et al., 1996] Bobick, A., S. Intille, J. Davis, F. Baird, C. Pinhanez, L. Campbell, Y. Ivanov, A.
Schtte & A.Wilson (1996). The KidsRoom: A Perceptually-Based Interactive and Immersive Story
Environment. In PRESENCE: Teleoperators and Virtual Environments, 8(4): 367-391
[Burmester et al., 1997] Burmester, M., Machate, J. & Klein, J. (1997). Access for all: HEPHAISTOS
- A Personal Home Assistant, Conference on Human Factors in Computing Systems, CHI '97 extended
80
abstracts on Human factors in computing systems: looking to the future, Atlanta, Georgia, USA, ACM
Press, New York, USA, 36 - 37.
[Bush, 1945] Bush, V. (1945), The Atlantic Monthly Group, Boston, USA, As We May Think, The
Atlantic Monthly.
[Callaway & Lester, 2002] Callaway, C. & Lester, J.C (2002). Narrative Prose Generation.
Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence. Seattle, USA.
[Darroch et al., 2005] Darroch, I., Goodman, J., Brewster, S. & Gray, P. (2005). The Effect of Age and
Font Size on Reading Text on Handheld Computers, Proceedings of Interact 2005, Rome, September
2005. Springer Berlin, Heidelberg, 253-266.
[Department of Health and Children, 2007] Department of Health and Children. (2007). Population of
Ireland: summary statistics for census years 1961-2002 [Online] Available at:
http://www.dohc.ie/statistics/health_statistics/table_a1.html
[Fisk & Rogers, 2002] Fisk, A.D., & Rogers, W.A. (2002). Psychology and aging: Enhancing the lives
of an aging population. Current Directions in Psychological Science, 11, 107–110
[Gemmell et al., 2006] Gemmell, J., Bell, G. & Lueder, R. (2006). MyLifeBits - A Personal Database
for Everything, Communications of the ACM, vol. 49, Issue 1, Microsoft Research Technical Report
MSR-TR-2006-23, San Francisco, USA, 88-95
[Goodman et al., 2004] Goodman, J., Brewster, S. & Gray, P. (2004). Older People, Mobile Devices
and Navigation, HCI and the Older Population. Workshop at the British HCI 2004, Leeds, UK. [Helal
et al., 2003] Helal, S., Winkler, B., Lee, C., Kaddourah, Y., Ran, L., Giraldo, C. & Mann, W. (2003).
Enabling Location-Aware Pervasive Computing Applications for the Elderly, 1st IEEE Conference on
Pervasive Computing and Communications (Percom) Fort Worth
[Hodges et al., 2006] Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth,
G., Kapur, N., and Wood, K. (2006). SenseCam: A retrospective memory aid. Proc. Ubicomp 2006.
[Kelliher , 2004] Kelliher, A. October (2004). Everyday Cinema, SRMC 2004, New York, USA,
ACM Press
[Kurniawan et al., 2006] Kurniawan, S., Mahmud, M. & Nugroho, Y. (2006). A Study of the Use of
Mobile Phones by Older Persons, CHI 2006, 989 - 994.
[Larsen & Petersen, 1999] Larsen, P.B. & Petersen, B.C. (1999). Interactive StoryTelling in a Multimodal Environment, Institute of Electronic Systems, Aalborg University, Denmark
[López Cózar Delgado & Araki, 2005] López Cózar Delgado, R. & Araki, M. (2005). Spoken,
Multilingual and Multimodal Dialogue Systems: Development and Assessment. Wiley & Sons,
Hoboken, N.J., U.S.A.
[Loyall, 1997] Loyall, A. B.(1997). Believable agents: building interactive personalities. Ph.D. thesis,
CMUCS-97-123, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA.
[McTear, 2004] McTear, M. F. (2004). Spoken Dialogue Technology: Toward the Conversational
User Interface, Berlin, Germany: Springer-Verlag
[Morrison et al., 2004] Morrison, K., Szymkowiak, A. & Gregor, P. (2004). Memojog – An Interactive
Memory Aid Incorporating Mobile Based Technologies, in Lecture Notes in Computer Science,
Volume 31, Springer Berlin, Heidelberg, 481-485.
[Melenhorst et al., 2004] Melenhorst, A.S., Fisk, A.D., Mynatt, E.D. & Rogers, W.A. (2004).
Potential Intrusiveness of Aware Home Technology: Perceptions of Older Adults. Proceedings of the
Human Factors and Ergonomics Society 48th Annual Meeting 2004. Santa Monica, CA: Human
Factors and Ergonomics Society
[Okada, 1996] Okada, N. (1996). Integrating Vision, Motion and Language through Mind. In Artificial
Intelligence Review, Vol. 10, Issues 3-4, 209-234.
[Quigley & Risborg, 2003] Quigley, A. & Risborg, P. (2003). Nightingale: Reminiscence and
Technology – From a user perspective, OZeWAI 2003, Australian Web Accessibility Initiative,
Latrobe University, Victoria, Australia
[Schank, 1995] Schank, R.C. (1995). Tell me a story: narrative and intelligence. Evanston, Ill.: North
WesternUniversity Press
[Siek et al., 2005] Siek, K.A., Rogers, Y. & Connelly, K.H. (2005). Fat Finger Worries: How Older
and Younger Users Physically Interact with PDAs, INTERACT 2005, eds. M.F. Costabile & F.
Paterno, Springer Berlin, Heidelberg 267 – 280
81
[Stanley & Cheek, 2003] Stanley, M. & Cheek. J. (2003). Well-being and older people: a review of
the literature: A Review of the Literature. Canadian Journal of Occupational Therapy 70(1):51-9
[Van Gerven et al., 2006] Van Gerven, P.W.M., Paas, F. & Tabbers, H.K. (2006). Cognitive Aging
and Computer-Based Instructional Design: Where Do We Go From Here? Educational Psychology
Review, Springer Netherlands, Volume 18, Number 2
[Wahlster, 2006] Wahlster, W. (2006). Smartkom: Foundations of Multimodal Dialogue Systems,
Springer Berlin, Heidelberg, New York
[Wilks, 2005] Wilks, Y. (2005), Artificial Companions in Lecture Notes in Computer Science Machine Learning for Multimodal Interaction, Volume 3361/2005 edn, Springer Berlin, Heidelberg,
36 -45.
[Willis, 1996] Willis, S. L. (1996). Everyday Cognitive Competence in Elderly Persons: Conceptual
Issues and Empirical Findings. The Gerontologist. 36, 59
[Wyche et al., 2006] Wyche, S., Sengers, P. & Grinter, R.E. (2006). Historical Analysis: Using the
Past to Design the Future. Ubicomp 2006, LNCS 4206 , pp. 35 – 51, Springer-Verlag Berlin
Heidelberg 2006
[Zajicek, 2006] Zajicek, M. (2006). Aspects of HCI research for elderly people, Universal Access in
the Information Society, Volume 5, Number 3, 279 – 286
82
Using Scaffolded Learning for Developing Higher Order
Thinking Skills
Cristina Hava Muntean and John Lally
School of Informatics, National College of Ireland, Mayor Street, Dublin 1, Ireland
[email protected], [email protected]
Abstract
This paper presents a research study that investigates whether a scaffolded learning structure such
as a WebQuest can be used to effectively develop higher order thinking. The results from this
study proved that through the use of scaffolded support and collaboration, teachers can effectively
direct students learning and help them to gain higher order thinking skills moving beyond simple
rote learning and towards the higher levels of Bloom’s Taxonomy.
Keywords: scaffolded learning, WebQuest, higher order thinking, educational content delivery
1
Introduction
Military forces develop skills based on the principles of drill and practice. From a civilian perspective,
this type of rote learning can also be seen in education where students are taught the skills to
successfully pass examinations and not necessarily the skills required to develop a deeper
understanding of a subject. Therefore, majority of students learn to become capable of effectively
solving problems which relate to individual areas in a logical sequential manner. However, problems
are rarely so simple and straight forward. There is a strong need to develop educational methods which
encourage students to move beyond rote learning and develop higher order thinking skills such that
principles learnt in the traditional manner can be concurrently applied to multiple areas and to solving
non-linear problems. This paper will review scaffolded learning and will investigate the effectiveness
of WebQuest as a learning support tool to develop students’ level of knowledge and their thinking
skills beyond basic rote learning towards higher level of Bloom’s taxonomy such as Analysis,
Synthesis and Evaluation.
1.1
Bloom’s Taxonomy and Higher Order Thinking
Rote learning is a learning technique that focuses on memorizing the material, or learning “off-byheart”, often without an understanding of the reasoning or relationships involved in the material that is
learned, and after that simple remembering or recalling the facts. Higher-order thinking involves
engaging students at the highest levels of thinking and allowing them to become creators of new ideas,
analysers of information and generators of knowledge. If we wish to achieve higher order thinking we
need to do something with the information available. We need to categorise the information and
connect it to pre-existing information already stored in the memory as a model, enhancing it. Using
this internal model, we can now attempt to develop new solutions to existing real-world problems.
Higher-order thinking is represented in Bloom's Revised Taxonomy (by Lorin Anderson in 1990) by
the top three levels: Analysing, Evaluating and Creating. Bloom’s taxonomy (proposed in 1950)
classifies educational goals and objectives and provides a way to organise thinking skills in six levels
from the most basic to the higher order levels of thinking (Figure 1). Each subsequent level is built on
the skills developed during the previous stage.
83
Creating
(designing,, planning, constructing, inventing )
Evaluating
(checking, critiquing, experimenting, judging)
Analysing
(comparing, organising, interrogating, finding )
Applying
(implementing,, using, executing )
Understanding
(explaining, interpreting, summarising, classifying)
Remembering
(recognising, describing, naming)
Figure 1. Bloom’s Revised Taxonomy ‘s thinking levels
Nowadays we live in a digital world where access to a large quantity of information is only a click
away. The key issue is how to shift through this volume of information and to find the answers
required. Due to the potential information overload students are required to possess higher order
thinking skills such as analysis, evaluation and creation. By designing education courses which require
the development and use of higher order thinking skills, we can provide learners with opportunities to
critically assess and transform their experiences into authentic learning experiences [1].
1.2
Scaffolded Learning
In education, scaffolding is structure which support learning and problem solving. Scaffolding can
include helpful instructor comments, self-assessment quizes, practice problems, collections of related
resources, a help desk, etc. The original term “scaffolding” was developed by Wood et al. in their
1976 study [2] and is described as a metaphor for an instructional technique where the teacher
provides assistance for the student to reach a goal or complete a task which they could not complete
independently. The key element of the scaffolded support is that the student is only assisted to
complete the tasks which are currently beyond their capabilities.
One problem with scaffolding is finding the right balance of scaffolding required. Lipscombe, et al. [3]
suggest that requiring students to complete tasks too far out of their reach can lead to frustration, while
tasks which are too easy can also lead to the same frustration. It is therefore important that teachers
understand the current level of knowledge of the students so that their interests can be “hooked” or
connected to the new information being presented and made relevant to the students so that the
motivation to learn is increased [3].
A key element in the development of scaffolded learning is structure and without a clear structure and
precisely stated expectations from the exercise, many students are vulnerable to distraction and
disorientation and effectively become lost in the volume of available information. Based on his study
McKenzie [4] suggested the following eight guidelines to be followed in the educational scaffolding:
• Provide clear directions – The goal is to develop a set of user-friendly instructions which will
minimise confusion and help move students towards the learning outcome.
• Clarify the purpose of the scaffolded lesson –Students are told early in the lesson why the
studied issues are important and given the bigger picture so that they may see the connections
in their own lives. This enables them to view the lesson as a worthwhile study and one where
they should apply their talents.
• Keep students on task –Students are provided with “the guard rail of a mountain highway” [4].
This enables the students and the teacher to ensure that although the students may be
researching for information under their own direction they do not stray too far off the predefined path and do not waste valuable lesson time.
84
•
•
•
•
•
2
Offer assessments to clarify expectations – From the beginning students are made aware of the
requirements and standard expected by the teacher at the end of the assignment. This guide
helps students to aim at a target of quality and to understand the important areas of the study.
Point students to worthy sources – The Internet has proven itself as a valuable source of
information for both formal and informal research. Information overload can be greatly
reduced or eliminated by providing relevant data sources for the students.
Reduce uncertainty, surprise and disappointment – The ultimate goal of the teacher is to
maximise the learning and efficiency of the lesson. Therefore the various elements of the
lesson should be tested for problems and alternative solutions should be considered. A review
of the success of the lesson should also help to refine the lessons for future students.
Deliver efficiency – If done successfully a scaffolded lesson should “distil” the work effort
required for both student and teacher showing obvious signs of efficiency.
Create momentum – The momentum is used by the students to find out more about the subject
and therefore increase their understanding of the topic being researched.
What exactly is WebQuest?
Traditional teaching methods have relied on the principle of the transmission of knowledge through
word of mouth. With the explosion of information available on the Internet many see it as an online
library that requires teachers to think more creatively on how they may employ these information
sources while also providing engaging material for their learners through the use of guided activities,
self discovery and reflection as both an individual and in collaboration with other students [5, 6].
However, Reynold et al. study [7] found that the simple exposure to Internet resources is not enough
to significantly improve student learning. Surfing the web can lead to the loss of precious time and can
also, if not monitored, lead to access to inappropriate material. A WebQuest offers a structured format
which enables students to gather information and construct new knowledge and learning.
WebQuests were first developed by Bernie Dodge and Tom March at the San Diego State University
in 1995 and are defined by Dodge [8] as “an inquiry-oriented activity in which most or all of the
information used by learners is drawn from the web. WebQuests are designed to use learners’ time
well, to focus on using information rather than looking for it, and to support learners’ thinking at the
levels of analysis, synthesis and evaluation”. This structured approach to using the Internet as a
learning resource helps to focus those involved into suitable areas of the web “otherwise, the World
Wide Web becomes similar to having 500 TV channels” [9]. Since the original development and
definition of a WebQuest, Dodge and March have developed and refined the original framework. The
following definition can be seen as a more concrete definition of a WebQuest: “scaffolded learning
structure that uses links to essential resources on the World Wide Web and an authentic task to
motivate students’ investigation of a central, open-ended question, development of individual
expertise and participation in a final group process that attempts to transform newly acquired
information into a more sophisticated understanding.”
Two types of WebQuests were proposed by Dodge according to their duration. A short-term
WebQuest has the instructional goal of knowledge acquisition and integration where a learner can be
made aware of a significant amount of information and make sense of it similar to the lover levels of
Bloom’s Taxonomy. This type of WebQuest would typically last from one to three class sessions. A
long-term WebQuest has the instructional goal of extending and refining knowledge by requiring the
learner to demonstrate the higher levels of Bloom’s Taxonomy by analysing the information and using
this deep understanding to create something which others can respond to. This type of WebQuest
would typically last from one week to one month of a classroom setting.
In conclusion, the main purpose of the WebQuest model is to harness the advantages of the resources
available on the Internet while also focusing students to complete the task. In order to achieve this
85
efficiency and clarity of purpose the following six sections are critical attributes of a WebQuest and
are required for both short term and long term WebQuests:
• Introduction – This section provides an overview of the learning objectives and attempts to
motivate the students to begin the WebQuest.
• Task – The task is a clear formal description of what the students are required to accomplish
by the end of the exercise.
• Process – Explicit details of the various steps required to be accomplished in order to achieve
the stated task are given.
• Resources – Sources of information which the teacher has deemed appropriate and relevant
are given to the students.
• Evaluation – The evaluation tool used is a rubric that presents a defined set of criteria in
which submissions can be clearly and consistently measured against.
• Conclusion – At this stage students are given an opportunity to reflect on the exercise.
In addition to the critical attributes of a WebQuest there are three additional non-critical attributes
which may be also included if required.
• Group Activities – Students can share their knowledge and experience helping each other,
while also reinforcing their own understanding.
• Role Playing – In order to increase the motivation of the students the learners are encouraged
to adopt a role to play during the exercise.
• Single Discipline or Interdisciplinary – Students can try real-world problems and solutions
while gaining an understanding of how there choices and decisions can affect other areas.
3
Preliminary Research Findings
3.1
Study background
In this study the development of a scaffolded learning strategy using WebQuest is investigated to
determine its level of success when trying to encourage students to develop higher order thinking
skills. During current military career courses the students are required to conduct individual study and
presentations on a particular topic of interest (referred to as a “test talk”) which is closely aligned to
the course objectives. A typical test talk in this area would require the students to review specific areas
of a major battle, Operation Market-Garden, the battle for Arnhem for example, and discuss how the
logistics and resource management of this battle were conducted and more importantly what lessons
can be learnt and applied to today’s military operations.
A solution to this type of study method is the development of a group WebQuest where students are
required to collaborate as small groups in the development of a final product and where the various
groups are designed to combine with the other groups to develop a larger body of research. The design
of the WebQuest could follow along the lines of the chapters of a book, where each group of students
are required to develop a specific section. This initial research would fulfill the role of the initial
“background for all” stage of a WebQuest for each group. When this has been finished the students
would be required to answer an open ended question relating to this section and to develop a solution
which requires them to transform the research developed to more sophisticated understanding of the
topic being learnt. When each group has completed their work it can be compiled to form a larger
piece of research which would be stored or published and used by future students as research material.
Therefore students are required to apply critical thinking skills to develop their final solution.
3.2
Research Procedure
The sample used for this study consisted of a group of approximately 9 students who were undergoing
an Officer promotion course. At this level of the military, students are expected to use their initiative
and be constructive in their problem solving abilities. A WebQuest would offer a good foundation in
86
the development of these necessary skills. These students came from a number of different units and
military trades; with various levels of prior training and educational backgrounds. Although this group
is a convenience sample, the various levels of training and prior education helped to ensure a random
selection of test subjects and their skills levels and gave a representation of the overall population.
The measurement and analysis for this study was through a number of means both quantitative (using
assessment rubric), and qualitative (through the comparison of a pre-study and post-study survey and
informal interviews). The data was analysed at each stage to identify any trends or issues, for example,
the individual WebQuest results were compared against the group WebQuest results to determine if
any noticeable improvement was apparent.
3.2.1
Stage 1: Pre-Study Survey
This first survey was designed to capture general information from the students such as age, gender,
computer experience, etc. The students were also questioned on their current preference to working on
assignments and for their preference with regard to the use of technology. We tried to assess if
students had a preference for the traditional classroom delivery, an online delivery preference or a
blended preference of e-learning delivery supplemented by face-to-face training. The results from this
survey were compared to the final survey (see Stage 4) to determine if a change in student opinions
has been developed through their participation on this study.
3.2.2
Stage 2: Individual WebQuest
After students had completed their pre-study survey the individual WebQuest could begin. This
WebQuest was designed to capture the attention and interest of the students by providing them with an
authentic, open-ended task and the role-play technique was as a means of motivating the students
while completing the WebQuest assignments.
It was decided to focus on the current Iraq conflict as the main topic for this WebQuest. The students
were required to take on the role of an Officer serving in Iraq. In order to provide the necessary
background for the students they were exposed to a number of documentaries and discussions on this
topic that later put them in a better position to adopt the role of a serving soldier. The students were
given the task to research the resources presented to them as part of the WebQuest, to gather the basic
information of the battle and to highlight the valuable lessons learnt from this conflict. As part of the
scaffolding structure in order to help the students to complete this task they were required to prepare a
presentation on the findings of their research.
The submitted assignments were marked against a rubric designed to facilitate both the individual and
group WebQuests. Rubrics are used to assess the submitted assignments because they help to make the
expectations of the teacher clearer and also offer the students targets to achieve [10]. The rubric was
generated following the suggested three stage format of Dodge [11].
3.2.3
Stage 3: Group WebQuest
At the completion of the individual WebQuest the students were randomly assigned to three groups.
Students were asked to make communications with the other members of their group using Moodle
Learning Content Management System (LCMS). Since the students used for this study came from a
number of different locations throughout the country and in order to facilitate group work, Moodle
was considered an appropriate tool because of its available features such as discussion forums, chat
rooms, private messaging and the use of Wiki. Building from the individual WebQuest and the
background developed on the training day the group WebQuest was again focused on the Iraq war.
The students were required to adopt the role of a person who has been assigned to a working board
tasked with the development of a report investigating the problems faced by reserve members and
formulate preventative measures. As in the individual WebQuest, the submitted assignments were
marked against a rubric. The students have used the Wiki as the delivery method for the assignment.
87
During the analysis of the group collaboration it was envisioned that the initial responses on the
discussion boards will be at a low level of critical thinking. However, as the study develops responses
should increase to show signs of critical thinking going from an “I agree” type post to posts applying
learned content and then even to posts which show obvious synthesis of learned content. The analysis
of the study attempted to highlight this progression of critical thinking on the discussion boards.
3.2.4
Stage 4: Post-Study Survey and Interview
A post-study survey was presented to the students after the successful completion of the group
WebQuest. The survey was based on the pre-study test and included the general areas of the use of
technology for learning for military training and for the development of higher order thinking skills in
the Defense Forces. The results from both surveys were brought together and in the analysis of the
data, it was attempted to determine what changes in the student’s initial understanding of the subject
and their impressions of the value and use of technology had changed during the course of the study.
3.3
Pre and Post Study Survey Results
This paper presents in details only the results of the two surveys. Other papers will address the
individual and group study outcomes when using WebQuest. As mentioned before, one survey
evaluated the students’ current thinking in relation to study, the use of IT and critical thinking, before
the experimental study. The second survey was given to the students after the study and it was
designed to assess the changes which had occurred to the students during the WebQuest-based course
of the study. Each survey consisted of twenty-two core questions plus questions that were developed
to gather further information useful for the design and development of additional courses.
There was originally a strong trend for individual learning (55%), but this changed significantly by the
end of the study. Only 22% still maintained this original point of view and 77% of students were now
comfortable in group learning, up from the original 44%. Students also commented that they found the
group study easier because they were able to discuss ideas and they felt less pressure knowing that if
they were unable to finish a task someone else in the group would be able to compensate.
Which of the following best describes your preference with regard to
the use of technology?
Pre-Study
Number of Students (%)
Post-Study
60
50
40
30
20
10
0
I prefer taking I prefer taking
classes that classes that
use no
use limited
information
technology
technology
features
I prefer taking
classes that
use moderate
level of
technology
I prefer taking
classes that
use
technology
extensively
I prefer taking
classes that
are delivered
entirely online
Figure 2. Students’ opinions on the usage of technology during a study
Question (Figure 2) that assessed the students’ preference regarding the use of technology for the
educational content delivery has shown a strong and positive reaction of the students towards
receiving electronic lectures instead of traditional delivery methods, i.e. an instructor in a classroom.
The survey also aimed at determining how students’ study habits were affected by the use of
technology. The majority of e-learning courses developed for the Irish Defense Forces are CD-based
materials that are a huge improvement over the issue of paper-based manuals. The issue of CD based
88
training materials is slow development time of the material and the lack of resources/funding. Using
the Internet in a similar manner as used in this study reduces some of the burden on the development
team in that the Internet can be used to disseminate the material to those required. In addition, the
material is always available where there is in Internet connection and can be easily updated and made
available very quickly.
The instructor's use of technology in my classes can
increase my interest in the subject matter
80
80
70
70
Number of students (%)
Number of students (%)
I spend more time engaged in course activities in those
courses that require me to use technology
60
50
40
30
20
10
60
50
40
30
20
10
0
0
Not
applicable
Strongly
disagree
Disagree
Pre-Study
Neither
agree nor
disagree
Agree
Strongly
agree
Not
applicable
Strongly
disagree
Post-Study
Disagree
Pre-Study
Neither
agree nor
disagree
Agree
Strongly
agree
Post-Study
Figure 3 a and b. The influence of the technology-based study on the students’ study habit
How would you most like to receive training for future
courses?
Pre-Study
Post-Study 60
Number of students (%)
50
40
30
20
10
0
Printed Notes
Online
Initially online In a traditional
but follow ed
class
w ith a
traditional class
to reinforce the
material
Other
Figure 4. Teaching techniques preferred by the students
It was found from the pre-study survey that for the vast majority of courses run in the Irish Defense
Forces there is very little (if any) use of technology other than PowerPoint. However, there is a
tendency to develop e-learning material that is more like e-reading than e-learning. This lack of
interactivity is unfortunately forced upon the developers of these e-learning packages because of
delivery constraints placed upon them. It was satisfying to see from the results in the post-study
(Figure 3) that when students were given the opportunity to interact with the material presented and
with the other colleges or with the instructor they availed of this opportunity. This interactivity was
possible due to the use of Internet-based technology that enables forums, chat rooms or private
messaging. It can be seen from the responses given to the questions presented in Figure 3 (a and b)
that in general students were in favour of technology being used in the classroom and 77% had formed
the opinion that their interests could be increased through the proper use of technology.
89
As already mentioned, all students were required to use the Moodle LCMS that served as the main
collaboration tools for the study. After the study had finished and all students had experienced the
advantages and disadvantages of using an LCMS the results from both pre and post surveys were
compared. It could be clearly seen that there was a vast improvement in the opinions of the students to
the use of course management system (44% in the pre-survey, 88% in the post-survey).
The surveys also included a question (Figure 4) that assessed students own preferred method for the
receipt of training material. There can sometimes be a tendency to push technology towards the
students without actually consulting with them on their preference. The results shown here offer no
real surprises. In the pre-study survey the students were mostly inclined towards the traditional
classroom delivery, 44% preferring this method, 22% willing to take a blend of online and traditional
learning, 22% favoring a completely online delivery and finally 11% preferring a printed version of
future courses. These kinds of results would be normally expected by students who have not used an
LCMS or have had a bad experience of e-learning in the past.
After the study was completed and this question asked again the results were in favour of an e-learning
solution but now 55% were in favour of a blend of the new and the old using the technology available
on the Internet to introduce the students to the material. As in the pre-study survey, there was an 11%
preference for the printed based option of delivery.
4
Discussions and Conclusion
This research has developed a WebQuest-based scaffold learning strategy that encourages students to
move beyond rote learning and develop higher order thinking skills such that principles learnt in the
traditional manner can be concurrently applied to multiple areas and to solving non-linear problems.
The results analysis indicated that although initially students have preferred to study and work on
projects individually, by the end of the study the vast majority of the students felt that the ability to
discuss problems within a group was beneficial when problem solving. Students were also of the
opinion that the use of the Internet as the main delivery tool offered them a much greater level of
control over their own learning. However, they questioned the ability to effectively communicate via
the Internet through the use of discussion forums and chat. For the initial stage of the project,
discussion forums provided on the site were considered useful. However, as the discussion progressed
the students needed real-time discussions to develop deeper understanding of the subject. Since
Moodle LCMS does not currently permit the use of voice communications, a suitable solution using a
third-party software called “iVocalize Web Conference” has been used.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
O’Murchu, D. and Muirhead, B. (2005). Insights into promoting critical thinking in online classes.
International Journal of Instructional Technology and Distance Learning, 2(6):3-14.
Wood, D., Bruner, J., and Ross, G. (1976). The role of tutoring in problem solving. Journal of Child
Psychology and Psychiatry, 17:89-100
Lipscomb L., Swanson J., and West A.. (2004). Scaffolding. In M. Orey (Ed.), Emerging
perspectives on learning,teaching, and technology.
McKenzie, J. (1999). Scaffolding for success. The Educational Technology Journal, 9(4).
Oliver, R. and Omari, A. (2001). Exploring student responses to collaborating and learning in a
Web-based environment. Journal of Computer Assisted Learning, 17(1):34-47.
Leahy, M. and Twomey, D. (2005). Using web design with pre-service teachers as a means of
creating a collaborative learning environment. Educational Media International, 42(2):143-151.
Reynolds, D., Treharne, D., and Tripp, H. (2003). ICT-the hopes and the reality. British
Journal of Educational Technology, 34(2):151-167.
Dodge, B.(1995). Some thoughts aboutWebQuests. The Distance Educator Journal, 1(3):12-15.
Erthal, M.J. (2002), Developing a WebQuest. Book of Readings. Delta Pi Epsilon National
Conference, Cleverland, OH, USA.
Whittaker, C., Salend, S. and Duhaney, D. (2001). Creating instructional rubrics for inclusive
classrooms. Teaching Exceptional Children Journal, 34(2):8-13.
Bernie, D (2001) Creating a rubric for a given task http://webquest.sdsu.edu/rubrics/rubrics.html
90
Electronic Monitoring of Nutritional Components for a
Healthy Diet
Zbigniew Frątczak1,2, Gabriel-Miro Muntean 2, Kevin Collins 2
1
2
International Faculty of Engineering,
Technical University of Lodz,
Skorupki 10/12, Łódź,
90-924, Poland
[email protected]
Performance Engineering Laboratory,
School of Electronic Engineering,
Dublin City University,
Glasnevin, Dublin 9, Ireland
{munteang, collinsk}@eeng.dcu.ie
Abstract
Obesity and other diseases related to unhealthy diet are problems of near epidemic proportion and
become a growing issue every year. This paper presents a solution to this issue by proposing the
use of a computer application that is able to suggest the appropriate products related to one’s diet,
and to keep track of nutritional intake. The paper also describes the principle of the solution,
system architecture and implementation and presents testing results. If the application’s
instructions are followed by users it is expected that an optimal diet will be achieved resulting in
users good health.
Keywords: Healthy diet, e-health, utility function, nutrition control
1
Introduction
Some of the most serious social issues of our time are obesity and dietary problems. Approximately
39% of Irish adults are overweight and 18% are obese [1]. Approximately two thousand premature
deaths are attributed to obesity annually, at an estimated cost of €4bn to the Irish State, expressed in
economic terms [1]. People are not conscious of the gravity of these issues and consequently the
situation is worsening. In order to combat this growing problem it is necessary to bring it to the
attention of society. One way to achieve this is an application that enables people to monitor and
control nutritional value in a fast and simple way while shopping.
The aim of this research is to propose computer based solution which will assist users in controlling
the nutrition values of the food products they buy. The application will include several diet plans
suitable for potential users from simple ones which focus on the energetic values of the products
(expressed in calories), to more complex ones which also consider other nutritional components such
as proteins, carbohydrates, sugars and fats. By using a utility function, the proposed solution will
91
select a set of products from a range of products considered by the user for purchasing based on their
nutritional values and the user’s selected diet plan.
An important goal was also to build a highly usable and portable application as possible. In order to
achieve this, an application was developed to be used not only on a laptop or desktop PC but also on
smart phones, PDAs and gaming consoles. Consequently a web browser accessible application was
designed, implemented and tested. It uses a server-located database to minimize the memory
consumption on the client devices and give higher flexibility. With this approach users may work with
information held in an in-shop database, which is customized for each individual shop to reflect the
products available there.
This paper is structured as follows: Section 2 summarizes related work, section 3 describes the design
of the proposed solution as well as the algorithm, whereas section 4 presents the testing process and
related results. The paper finishes with section 5 which focuses on conclusion and future work.
2
Related Work
The diet monitoring problem is not a new one, as software for computing calories or nutrition diaries
have been developed since 1980’s. There were many such applications such as ”The diet balancer” [2]
or “MacDine II” [3], but they differ in approach and target audience.
In 1999 a diet calculation software called FUEL Nutrition Software was released [4]. This application
was capable of calculating the nutrition values for professional athletes. FUEL allows the access to
applied sport nutrition information on topics such as nutrition during regular training, food appropriate
for pre- and post-exercise meals, eating for recovery, hydration, eating strategies during trips or in
forezign countries and vitamin and mineral supplements. The program is suitable only for fit and
healthy individuals. Anyone with special health conditions such as diabetes, osteoporosis, etc. will
require individualized professional advice [4]. The program itself offered many interesting solutions
but was targeted at professionals and was developed for stationary computers.
An other electronic system is eHDNAS – electronic Healthy Diet and Nutrition Assessment System
[6]. This recently developed software was created to fight malnutrition and other nutrition related
disease over a sustained period of time. Its aim was to inform people about the nutrients of certain
foods in restaurants and it is mainly based on the food pyramid described in [5]. The system
specifically targeted elderly people. This is a major limitation as such applications should take into
account people of all ages. Another drawback of the system is that it operates on full meal level, rather
than a product level, which makes it very inflexible as regards to individuals eating habits.
The report on “The Food We Eat” [7] found that it is more user friendly to work with barcode
scanners than voice recording, while using electronic self monitoring application. This observation
influenced the decision to use barcode scanning for this application. Those results were gathered by
tests carried out on a group of participants with an average age of 52 and using the DietMatePro [8]
and BalanceLog [9] applications.
DietMatePro is a commercial web application designed specifically for PDAs, that uses the
expandable USDA-based nutrient database [10] and supplemental databases for restaurant and brand
name foods. It addresses the needs of researchers and dietitians. While a very powerful dietary tool a
major drawback of this application is that it was developed for scientific purposes, and as such lacks
the simplicity for more general use.
3
Design and Solution
3.1
Architectural Design
The main aim when designing this application was to create user-friendly, portable software for
calculating nutrition values. It is supposed to also be flexible and customizable. To fulfill these
92
requirements it must have different diet plans and must enable the creation of user specific diets. Many
previous attempts to solve the diet monitoring problem resulted in diet diaries or calorie counters.
While it was desirable to include calorie counter functionality it was also wished to go a step further
and create a diet validator: i.e. given a set diet plan to which an individual is to adhere to the
application can verify if a users food shop falls within the nutritional parameters of this plan. Another
design prerequisite was to enable the user to run the application not only on a PC, but also on mobile
devices such as smart phone, PDA and portable game consoles. MySQL database, Java Server Pages
and Tomcat web server were used in order to achieve these goals.
Information about the products and diet plans is stored in the database. There is a server side
administration interface enabling the modification of the database to reflect the products available.
In order to provide a degree of flexibility to users, the solution was deployed into a web application
which can be accessed using any web browser. This makes the application accessible for any owner of
a networked mobile device.
The application was placed in a Tomcat web container which enables multithreading, allowing
multiple users to access the application simultaneously at the same time.
Figure 1 illustrates the proposed system architecture. It can be seen clearly that the user connects to an
in-store Wi-Fi network and then by means of a web browser on their mobile device can communicate
with the Tomcat web server that maintains the web application which communicates with the database
in order to retrieve the data. It is believed that the best solution is to have a separate database in every
shop so a user entering the shop would use that shop’s database which contains only the products
available there. Alternatively a shop ID can be used to select the products within a particular shop
from a larger central-located database.
Figure 1 Architectural Design
3.2
Algorithm Description
A novel algorithm is used for verifying the compliance of products with users’ diet plans. The
algorithm is based on a modified Knapsack problem, which takes into consideration all nutritional
values: energy, as well as carbohydrates (including sugars), proteins and fats. The algorithm’s goal is
to optimize the selection of products in order to maximize their utility to users, according to their
diet plan.
93
A novel utility function was introduced to describe the usefulness of the product to users. This utility
considers grades computed for each nutrition component, weighted according to the importance of that
particular component for a user diet plan.
Below equation 1 presents the function for calculating the utility of a product i.
Utilityi =
w1 * Gi proteins + w2 * (Gi carbohydrates − Gi sugars ) + w3 * Gi fats
(1)
w1 + w2 + w3
In equation (1) w1, w2 and w3 are weights which depend on the diet type and express the importance of
the nutrients in any specific diet plan. Giproteins, Gicarbohydrates, Gisugars, Gifats represent the grades computed
based on the quantity of a particular nutrient and are expressed in the [0:1] interval.
Equation (2) presents the formula for calculating individual grades.
Gi nutrient =
Qi proteins
Qi nutrient
+ Gi carbohydrates + Gi fats
(2)
In equation (2), Qiproteins, Qicarbohydrates, Qisugars, Qifats represent the quantities with which each individual
nutrient component is present in the product i. The nutrient component grade Ginutrient describes the
ratio of certain nutrients in comparison to all nutrients within the given food item.
The equation for calculating the utility parameter of the product was based on the healthy diet pyramid
as presented in Figure 2. It states that the healthiest products are those which contain the smallest
possible amount of fats and sugars. This equation gives the highest values to products containing the
most protein and carbohydrates (excluding sugars) and the lowest to those with high levels of fats and
sugars.
Figure 2 Healthy diet pyramid
Having calculated the utility of every product, the benefit of the product in terms of value to a
particular user is computed as the ratio between the utility and the calorie amount suggested to the user
by their diet plan. Next all the products are sorted in descending order based on their value to the user.
The products whose energy values are exceeding the user’s calorie limit which is computed based on
their physical parameters (weight, height, age, gender) are discarded and are not shown.
The Knapsack problem uses as limit a daily energy requirements (DER) expressed as a calorie
amount, but this number is different for every user as people are characterized by different physical
parameters. To calculate the amount of calories to be “spent” by each user the Mifflin formula was
used [11]. This equation expresses Resting Daily Energy Expenditure (RDEE) and uses parameters
such as weight, height, age and gender.
These formulae were used as they give a very high accuracy (over 80%) [12]. They are presented in
equation (3):
94
RDEE male = 5 + (10 * weight ) + (6.25 * height ) − (5 * age)
(3)
RDEE female = (10 * weight ) + (6.25 * height ) − (5 * age) − 161
In order to calculate users’ DER, a formula that factors in the so-called activity factor was used. This
is essentially a number based on the level of physical activity the users have interactively selected. The
users can choose between: sedentary, lightly active, moderately active, and extremely active. This
activity factor is multiplied by the RDEE value and the result expresses DER. DER is used as limit by
the Knapsack problem.
4
Testing
The proposed algorithm was deployed in a system that conforms to the description made in section 3.
The application was tested in a number of different settings which included variable diets and different
user parameters (weight, height, gender, etc.).
The application provides user choice between several diet plans and different diet consideration
modes. After running the application in the web browser user may choose one of the following
options: Weekly Shopping or Diet Check. Weekly shopping is an option enabling user to do the
shopping for a specified number of days. It involves using calculated DER as a limit for calorie
counter per each day. The application adds up the energy values of each product in the cart and if the
limit is reached prints the notification. Another option is Diet Check, where application uses the
algorithm described in section 3.2. The System creates a list of most diet suitable products from those
in the users’ cart.
Table 1: Test 1 - Input products
Table 2: Test 1 - Sorted products
Currently the application offers two specified diet plans: normal diet plan and protein diet plan.
Normal is suitable for most healthy people and assigns higher level of importance to products that are
described as significant for each healthy person. Second diet plan is based on protein diet, in which
more valuable are proteins. This diet plan could be addressed to athletes wishing to build muscle mass.
The first test used the diet check mode and the normal diet type the utility value of products was
calculated and is presented in Table 1.
95
As presented, the above algorithm works successfully on the chosen group of products. It can clearly
be seen, that in Table 1 there are products in the order in which they were added to the cart. Table 2
includes the same products sorted in the order of their significance to the user diet. On the bottom of
the table there are flavor products with high value of carbohydrates and proteins while on the top are
products with sugars and fats. Results are correct as the layout of the table corresponds to the pyramid
of healthy diet which was presented in Figure 2.
The second test involved the diet check mode and the protein diet type, where the most valuable
products are those with significant amounts of proteins and the smallest amount of fat. The test
produced the following results, as shown in Table 3 and Table 4.
Table 3: Test 2 - Input products
Table 4: Test 2 - Sorted products
In this test the same products were used, but the results presented in Table 4 correspond to a different
diet type, which places the products rich in proteins at the top of the table.
It can be clearly seen that there is a distinct difference between the arrangement of the products in the
results for the normal and protein diets. While the normal diet selects mainly products full of
carbohydrates, the protein diet gives precedence to products with high protein values. At the same
time it is possible to observe that most of high energy products are at the top of the table.
5
Conclusion and Future Work
This paper proposes an intelligent system which will assist users while shopping by suggesting the
appropriate products related to people diets, and by keeping track of their nutritional intakes. The system is
capable of verification of the chosen products and includes an option of calorie counter. Simple
navigation and use of web browser minimizes maintaining difficulties. In-store database with clear
administration interface enables user friendly management of in-stock products.
Future extensions may allow the addition of new diet plans which require other parameters than those
used at the moment. The application needs further testing with different diet types and different user
parameters. Verification by medical staff in terms of correctness of the approach and exactness of the
results is also envisaged. Medical approval is crucial because it may have high influence on the future
of the proposed solution. The application may be extended to make use of barcode scanner.
96
Acknowledgement
This paper presents work performed within the ODCSSS (Online Dublin Computer Science Summer
School) 2007. The support provided by the Science Foundation Ireland is gratefully acknowledged.
References:
[1]
Obesity (2005), Obesity the Policy Challenges- the Report of the National Taskforce on Obesity,
Department of Health and Children, Ireland, [Online] Accessed: August 2007 Available at:
http://www.dohc.ie/publications/pdf/report_taskforce_on_obesity.pdf
[2]
Marecic, M., Bagby, R. (1989). The diet balancer, Nutrition Today, 1989; 24-45
[3]
Crisman M., Crisman, D., (1991). MacDine II – Evaluation, Nutrition Today, 1991.
[4]
Durepos, A. L. (1999), FUEL Nutrition Software and User Manual, Canadian Journal of
Dietetic Practice and Research. Markham: Summer 1999, 60: 111-113
[5]
Russell, R.M., Rasmussen, H., Lichtenstein, A. H. (1999), Modified Food Guide Pyramid for
People over Seventy Years of Age, USDA Human Nutrition Research Center on Aging, Tufts
University, Boston, USA, Journal of Nutrition. 1999; 129: 751-753
[6]
Hung, L. H., Zhang, H. W., Lin, Y. J., Chang, I. W., Chen H. S. (2007), A Study of the
Electronic Healthy Diet and Nutrition Assessment System Applied in a Nursing House”, 9th
International Conference on e-Health Networking, Application and Services; 64-67
[7]
Siek, K. A., Connelly, K.H., Rogers, Y., Rohwer, P., Lambert, D., Welch, J. L. (2006), The
Food We Eat: An Evaluation of Food Items Input into an Electronic Food Monitoring
Application, Proc. of the First International Conference on Pervasive Computing Technologies
for Healthcare (Pervasive Health), Innsbruck, Austria, November 2006
[8]
DietMatePro, PICS (Personal Improvement Computer Systems), Accessed: August 2007,
Available at: http://www.dietmatepro.com
[9]
BalanceLog, HealtheTech, http://www.healthetech.com, Accessed: August 2007
[10] USDA Palm OS Search, I. H. Tech, USDA, [Online] Accessed: August 2007 Available at:
http://www.nal.usda.gov/fnic/foodcomp/srch/search.htm
[11] Miffin M. D. (1990). A new predictive equation for resting energy expenditure in healthy
individuals. American Journal Clinical Nutrition, 1990; 51: 241-247
[12] What is Normal? Predictive Equations for Resting Energy Expenditure (REE/RMR), [Online]
Accessed:
August
2007
Available
at:
http://www.korr.com/products/predictive_eqns.htm#ref_miffin
97
A Web2.0 & Multimedia solution for digital music
Helen Sheridan & Margaret Lonergan
National College of Art & Design, 100 Thomas Street, Dublin 8, Ireland
[email protected], [email protected]
Abstract
Presented are a number of solutions utilizing multimedia and Web 2.0 for the sale, playing and
promotion of digital music files. Sales of CDs still greatly out perform those of digital music files.
We find out why and present a number of solutions that will enhance users digital music
experience. Web 2.0 has dramatically changed the way we use, collaborate and interact using the
World Wide Web and this interactivity will play a vital role in the future of digital music.
Keywords: Web 2.0, Digital Music, Multimedia
1 Introduction
The introduction of digital music to the online market place has revolutionised how we buy, sell,
distribute and listen to music. Since EMI released the first ever album to be offered as a digital
download, David Bowie’s ‘Hours’ in 1999, the digital music marketplace has evolved and grown at a
rapid rate [1]. In May 2007 Apple Inc. and EMI began to sell Digital Rights Management free (DRM
free) music files to iTunes customers. Now iTune’s digital music purchasers can buy higher quality
and DRM free music that can be played on multiple devices and shared freely with friends. It is
inevitable that other music companies and sellers of digital music will follow Apple’s lead allowing
more online sharing and swapping of digital music files. With these legal and ethical issues removed
this allows for the development of a Web 2.0 application that allows users to not only store and buy
digital music but share music with friends.
The principal aim of this paper will be to discuss how people are now buying and listening to music,
how technology has changed to meet the demands of the digital user, how the role of multimedia
designers has also changed and ulitmately how changes in technology, including Web 2.0, will be
used to promote music in this new environment. We will begin by presenting a number of key
developments in the technology sector that have not only influenced the growth in sales of digital
music, but also the way we buy, listen to and share music files. We then discuss our methodology and
explain in detail the outcomes of our research questionnaire. We present a summary of findings from
this research and conclude with describing our future work based on our findings.
2 New technologies that are effecting how we listen to music
2.1 Visual Radio from Nokia
In recent years new technologies have been launched onto the market that combine visual
graphics/multimedia content with digital music and offer a more enhanced experience for the user.
Visual radio from Nokia streams syncronised and live graphics to users mobile phone via a GPRS
connection using an “interactive visual channel” that streams visual and interactive content alongside
audio content. Nokia has described visual radio as not just what the listener hears but also what they
see and read. As a result radio has become a more valuable promotional tool as listeners know and see
what they are hearing [2]. At present these graphics are static but the next logical step would be to
introduce enchanced multimedia content.
2.2 Music players for mobile phones
A version of iTunes has been developed for Motorola phones where users can syncronise their desktop
computers iTunes library with their mobile phone. With this addition to your mobile phones usability,
online purchasing of music via over the air (OTA) downloads are predicted to rise greatly over the
98
next 5 years. The International Data Corporation (IDC) is anticipating U.S sales of full-track
downloads to surge to $1.2 billion by 2009. This figure stood at zero in 2004 [3]. If developers
combine the multimedia possibilities of visual radio with the functionality of iTunes the results could
significantly increase the sales of digital music files. Mobile phone users will be able to view moving,
static or interactive graphics on their mobile phone that relate to the music they have just purchased
and downloaded from iTunes or other digital music stores. There are also great possibilities for
marketing messages directly to potential customers using this method of communication [4].
Apple Inc. have also very recently launched the much hyped iPhone. With over a quarter of a million
units sold in its launch weekend in June ’07 Apple have managed to capture consumers conciousness
through clever use of advertising and press releases even before the product had launched. For Apple
the ‘mobile-phone-meets-music-player’ market was an obvious step to take. The iPod, now concidered
Apple’s icon product, has reached the height of its functionality with the addition of larger colour
screens. The addition of the iPod suite of players including the iPod nano and iPod shuffle are
basically a redution in size and functionality to the original iPod. The iPod suite encompasses the iPod
family and major design changes from now on would most likely be in the storage capacities, battery
life of the products or combining it with mobile phone technology. The iPhone combines a lot of what
consumers love about Apple’s products with mobile phone technology. It is essentially a PDA, mobile
phone and iPod in one product. Nokia, Motorola and Sony Ericsson have all launched a number of
multipurpose devices since 2005 but none combine the functionality of iPhone. Another factor which
influences the digital music user is the idea of a digital music package which includes portable and
non-portable players. Combinations like iTunes - iPod – iPhone are hard to compete with and with a
large number of both Mac and PC users using iTunes on their desktop computers the progression to
using iPhone is an easy step to take. Ted Schadler of Forrester Research maintains that iPod and
iPhone competitors are failing to utilise the main selling point of Apple’s music playing products. For
youthful digital music purchasers the personal computer still plays a critical role as Forrester’s
research discovered that 27% of online youth said that they can’t live without their PC while only 4%
said that they can’t live without their MP3 player [5].
Another significant advancement in the mobile phone market is OpenMoko ”the World’s first
integrated open source mobile communicatons platform” [6]. Currenlty in its alpha stage and not
available for use by the general public OpenMoko is more a project than a product with contributions
and participation from the development community. This open source mobile phone will free the end
user from the traditional constratins associated with mobile phone software. Sean Moss-Pultz of First
International Computer (FIC) and the OpenMoko team claims that this open source mobile phone can
and will become the portable computer of the future with the potential to be a platform that can do
anything that a computer with broadband access can [6]. With this level of control over a mobile
phone’s functionality it will be interesting to see what programmers develop for this platform in
relation to digital music.
2.3 Portable games consoles with music playing capabilities
Play Station Portable (PSP) introduced its Media Manager software in November 2006 creating one of
the new media players on the market. PSP in collaboration with Sony has also developed Locationfree
a means of accessing your home entertainment system wirelessly from any location. The addition of
media software to games consoles brings the digital music market to a different audience than the iPod
or PC markets and with multiple enhancements PSP is fast becoming the all round portable
multimedia entertainment system. This is significant for the digital music market as the opportunities
to design and develop sophisticated graphics and multimedia for games consoles is yet to be fully
exploited.
2.4 Agreements between mobile phone companies & music companies
In 2004 Vodafone and Sony Music announced “the world's largest single mobile operator/music
company content distribution agreement’ [7]. This agreement establishes Vodafone and its Vodafone
Live! 3G services as the global leader in bringing enhanced multimedia content to its users worldwide.
99
This content will intially consist of “real music ringtones, polyphonic ringtones, artist images, video
streaming and short video downloads” [7]. More recently Vodafone has signed a similar deal with
Universal Music Group bringing their music catalogue to over 600,000 tracks. Ceo / Chairman of
Universal Music Group International Lucian Grainge has commented that the scale of this agreement
shows that both industries, the music industry and the mobile phone industry, are committed to
providing a vast range of multimedia content to its customers. Inevitably the music industry has had to
embrace the new digital methods of distribution or face huge losses in revenue [8].
2.5 Media centre systems controlled from one computer via your TV
Dell and Apple have also developed media center systems (Dell media centre PC using Microsoft
Windows XP Media Centre and Apple Front Row) where users can control all of their home
entertainment including music, video, DVD, TV, internet and photo albums from one computer. More
often the computer moniter is being replaced by the LCD or Plasma TV where people can view TV,
DVDs, listen to music and look at their photo albums all from the comfort of their sofa on a 50”
Plasma screen. The computer is becoming the heart of the home entertainment system and people can
now purchase music directly from their TV, via the internet, within minutes and play it using a music
player such as iTunes over their surround sound audio system. Motion graphics and multimedia
content to accompany this music would be an obvious enhancement that has not been fully exploited
yet.
These are just a few of the many new advances in technology that are effecting the way that we listen
to and purchase music digitally. However, to really identify how the digital music industry will change
over the next 5 years we carried out primary research and analyis of peoples attitudes towards
listening to and buying digital and non-digital music.
3 Methodology
3.1 Aim of the questionnaire
We began our research by designing a short questionnaire that asked questions about peoples buying
and listening habits in relation to music. The results showed a huge bias towards purchasing CDs over
digital music with 56% perferring to buy CDs. The most popular place to purchase CDs was from
music shops with HMV gaining 24% of the 54% of people that bought CDs from music shops. The
favourite place to listen to music was at home on a CD/Record player. This short questionnaire was
used to develop a more comprehensive second questionnaire that looked at a number of key research
sections. These sections covered both digital and non-digital music. The first section gathered data
relating to peoples personal information such as age, gender and nationality. The second area
concentrated on peoples attitudes to CDs and covered topics like buying, listening to and burning of
CDs. The third section concentrated on digital music and also covered topics such as buying habits,
listening habits and technology associated with digital music. Section four covered peer-to-peer (P2P)
downloads and questions asked researched technology, fear of prosecution and the convenience of
P2P software. The final section concentrated on over the air (OTA) music purchases on users mobile
phones and topics covered included frequency of use, network used and model of phone used. The
questionnaire had 62 questions and was distributed face-to-face as a printed hard copy to a sample
amount of 50 people. Questions were presented using closed dichotomus and multiple questions. The
Likert scale was also used to rate a persons level of agreement or disagreement with a given statement.
We used the following scale: Strongly agree, Agree, Disagree, Strongly disagree and Undecided. We
positioned Undecided as the last option as opposed to positioning it third. This was to avoid the
common mistake of users choosing Undecided, as it is positioned in the middle of options, for large
percentages of answers. Final questionnaires were analysed using SPSS (Statistical Package for the
Social Sciences). This programme allows researchers to input and analyse data and output graphs and
charts that represent this data. Crosstabultions of data can also be carried out with this programme.
Through this research we hope to identify what types of music format people buy and listen to, what
peoples attitudes are to CDs and digital music and why CD sales still out number digital sales. In the
100
IFPI Digital Music Report 2007 research showed that digital music sales accounted for 10% of all
music sales in 2006. This means that 90% of sales were from non-digital formats including CDs [9].
4 Results
4.1 Personal Information
The age range was mainly concentrated in the 21 to 25 age bracket with 44% in this range. The next
highest concentrations were in the 31-35 bracket with 24% and 14% in the 26-36 bracket. The
remaining 18% were spread over the remaining age brackets. There were almost equal amounts of
male and female respondents with 56% being male and 44% being female. It was important to try to
get equal amounts, as we did not want the results to be bias towards any one gender. The main
nationality represented was Irish with 66% in this area. 26% of respondents did not specify what
nationality they were and 6% and 2% were Spanish and African respectively. Unsurprisingly the main
respondents were Irish. The large number of unspecified answers will make it difficult to use this data
during cross tabulations. However we feel that age and gender will be of more concern to us in this
research.
4.2 Research relating to CDs
Using the Likert scale we asked a series of questions about peoples attitudes to CDs. The main
question that we wanted to answer was why are people still buying CDs? Results showed four main
reasons why people buy CDs. Sound quality was a factor with 66% of people feeling that CDs
represented good sound quality. Shopping was another major reason. Fig.1 shows that large numbers
of respondents felt that they liked going shopping for CDs or at least that it did not deter them from
buying CDs. Price, however, was a factor as people felt that CDs did not represent good value for
money with 80% agreeing that CDs are too expensive. Packaging was not a major factor with only
38% of respondents agreeing or strongly agreeing that they liked opening a CD package and
discovering what was inside
Some other general trends in relation to CDs yielded interesting results. People mostly buy CDs from
music shops such as HMV or Tower rather than buying CDs from websites and getting them mailed to
them. This would support the view that people enjoy the experience of shopping for CDs in a
traditional setting such as a music shop. See Fig.2.
Fig.1 I rarely buy CDs as I hate
shopping for them
Fig.2 I mostly buy CDs from
websites and get them mailed to me
Fig.3 I buy CDs but then copy them
to my computer, MP3 player or
iPod and listen to the digital format
When asked if a CD contained extras such as extra songs, DVD style CDs or free gifts would the
purchaser be more likely to buy the CD; 54% felt that it would help to persuade them to make the
purchase. This is significant for the design of digital music. If some of these extras could be
incorporated into digital music files then perhaps the non-digital buyer may be persuaded to switch
allegiance to digital formats over CDs and digital buyers may purchase in larger quantities.
Another interesting result shown in Fig.3 revealed that 88% of CD purchasers bought CDs but then
burned them to their computer and listened to the digital format. This is significant for many reasons.
If people are mainly buying CDs but listening to digital formats why then buy the CD at all? If sound
quality is a factor why choose to listen to a compressed format? Perhaps ownership of the music is a
101
deciding factor. With a CD you can listen to the music on as many CD players as you wish, give the
CD to as many friends as you like and always have a back up of your music collection. Perhaps it is
the experience of going shopping for CDs that people like and results have already supported this
theory. Cross tabulations with further questions will attempt to answer this question.
4.3 Research in relation to digital music
General questions were asked about whether respondents had or had not bought or listened to digital
music. Fig.4 shows that almost double the numbers of people had not bought digital music compared
to those that had with 67% responding that they had never bought digital.
Fig.4 Have you ever
bought digital music
(e.g. from iTunes,
napster, emusic, 3music)
Fig.5 Which of the following would
best describe your music listening
habits?
Fig.6 Which of the following would
best describe your music buying
habits?
However, when asked about their music listening as opposed to their music buying habits large
percentages of people listened to digital music but bought CDs. Fig. 5 and 6 show the differences in
results form this research.
From this series of questions we hoped to develop an understanding of why people would or would
not buy digital. Two sets of questions were asked both using the Likert scale. The first set was asked
of those who do not buy digital and the second set to those that had bought digital. Results have shown
that there are some key reasons that people would buy digital music. Price was a factor with
respondents agreeing that digital music was good value for money. 60% either strongly agreed or
agreed. Portability was a big factor with 76% of respondents agreeing or strongly agreeing that this
was important to them. The ability to purchase one track at a time was also a deciding factor as the
control to buy only one song instead of a whole album was important. 61% strongly agreed or agreed.
A dislike of shopping was not an issue. Surprisingly the majority of digital music purchasers also liked
going shopping in the traditional manner. Only 36% of people felt that they bought digital music, as
they disliked going shopping
So why then do people not buy digital music? From our results the understanding of technology was
not a contributing factor as 85% of people felt that they understood the technology that was associated
with buying digital music. Broadband issues were also not a factor as 76% of respondents felt that
having or not having broadband did not effect their decision to purchase digital music. Price was not
an issue either as most people felt that digital music was good value for money. Not having access to a
credit card was also not a factor as 71% of people felt that having or not having a credit card did not
effect the decision to buy digital music. When asked if lack of packaging / physical object effected
their decision 62% of people responded with disagree or strongly disagree. When asked if sound
quality was an issue surprisingly this too did not deter people from purchasing digital music. With
most people buying CDs but listening to digital this would suggest the people understood the quality
issues associated with digital music. We also asked if not having an iPod / MP3 player deterred people
from buying digital music. 74% of people felt that this was not an issue either. So having a portable
digital music player is not a deciding factor. From these responses the typical reasons that purchasers
would not buy digital can be discounted. This did not tell us, however, why some people did not buy
102
digital music. From analysis of previous questions asked some possible reasons may be due to
ownership issues and the fact that respondents simply like to shop.
4.4 Research in relation to Peer-to-Peer (P2P) software
The third section of the questionnaire dealt with the usage of peer-to-peer software. Questions were
asked to determine the numbers that do and do not use this type of software. The results were almost
even with 42% having used peer-to-peer (P2P) software, 50% having never used peer-to-peer and 8%
not knowing if they had or hadn’t. The software mostly used was Limewire with 80% of the results.
Bit torrent also featured with the next highest results at 15%. Respondents were not overly concerned
with the legal implications of using this type of software, as 84% of people answered no in this area.
From our results there are two main reasons that people use P2P software. The option of downloading
free music was a big factor in the usage of P2P software. 86% of people claimed to use P2P software
to have access to free music. Convenience was another main reason. Over 91% of respondents felt that
P2P downloads were more convenient than shopping and 73% felt that P2P downloads were more
convenient that ripping CDs from friends. Further questions researched why do people not use P2P
software. We asked if access to a PC was an issue. For those that had not used P2P software access to
computers did not deter them from using P2P software as 88% had some kind of computer access.
Broadband or high speed internet access was not a factor in usage as 70% of people either disagreed or
strongly disagreed that they hadn’t used P2P software, as they had no broadband access.
Those who had not used P2P software felt that they did understand the technology associated with
using P2P software. 77% felt that they did understand the technology but still chose to not use P2P
software. As with earlier questions on this topic results showed that fear of prosecution by users was
not a deciding factor as 74% of people were not concerned with legal implications. So why would
people choose not to use P2P software? If the obvious reasons do not play a part perhaps there is a
large portion of music purchasers that simply have no interest in or time to download from P2P
software. Several users felt that the fear of downloading spyware and virus’ prevented them from
using P2P software.
4.5 Research in relation to OTA (over the air) music downloads
The final section of the questionnaire related to the use of mobile phone networks to download music
directly to your mobile device. An establishing question was asked at the start of this section with 86%
of mobile phone owners saying that they had never downloaded music over their mobile network. Of
the people who had downloaded music from a mobile network the highest percentage were using the
Vodafone network and all of the respondents had only downloaded music once or 2-5 times. From the
sample of 50 people questioned we gathered very few responses to this section of the questionnaire.
Some of the reasons for this are due to the slow download speeds and high costs but as 3G networks
become more widely available this slow adoption of OTA downloads should reduce.
4.6 Cross tabulation research
A series of cross tabulations on various results have shown interesting outcomes. In some cases the
results have been as expected and in others unexpected.
4.6.1 Cross tabulation: I buy CDs as I like opening a CD package and discovering what is inside & I
don’t buy digital as I don’t get a physical object when I buy a digital song see Fig. 7. The results from
this cross tabulation were as expected with those that felt that a CDs packaging was important also felt
that they did not buy digital as they did not get a physical object.
4.6.2 Cross tabulation: What is your age & I don’t buy digital music as I don’t have a credit card see
Fig. 8. Expected results would be that a large percentage of the younger market (16 – 25) would agree
that lack of credit cards would deter them from buying digital music. Results showed that those that
agreed or strongly agreed were only from the 21 – 30 age group. However, most numbers were
concentrated in the disagree or strongly disagree area with only small amounts or no respondents
103
choosing agree or strongly agree. This would suggest that for a small number of 21 – 30 year olds not
having a credit card was an issue but for most it did not factor in their decision to buy digital.
Fig.7 I buy CDs as I like opening a
CD package and discovering what
is inside & I don’t buy digital as I
don’t get a physical object when I
buy a digital song
Fig.8 What is your age & I don’t
buy digital music as I don’t have a
credit card
Fig.9 I buy CDs but then copy them
to my computer, MP3 player or
iPod and listen to the digital
format
4.6.3 Cross tabulation: What is your age & I don’t buy digital music, as I don’t understand the
technology see Fig.9. Results form this cross tabulation were not as expected. Of those that answered
agree or strongly agree that they did not understand the technology all respondents were from either
the 21-25 age group or the 26-30 age group with the largest amount coming from the 21-25 age group.
Those in the 31-35 and 36-40 age groups felt that they did understand the technology with all
responses either choosing disagree or strongly disagree. There were a higher number of responses
overall from the 21-25 age group so it is more likely to have a larger variety of results from this age
group. However I feel that the results are still significant and unexpected.
5 Summary of findings
A brief summary of results has shown that one of the main reasons that people still choose to buy CDs
over digital music is that people like to shop. The social interaction and shopping experience is
something that has not been reproduced with digital or virtual shopping environments such as iTunes
store. One solution to this would be to bring digital music purchases into the traditional shopping
environment with, for example, interactive digital shopping booths in music shops or OTA music
downloads within the music shop. Another solution would be to make the online or digital experience
more like a traditional music shop. Interactivity would play a major role here.
Another significant finding was that packaging for CDs was not a major reason that people buy CDs
and also that the lack of physical object with a digital music file was also not a major factor in the
choice to buy digital. However, of those that felt that CD packaging was important all respondents felt
that no physical object with digital music did influence their decision to buy digital. This suggests that
in the majority of cases packaging is not an issue but of the small numbers that felt that it was it is also
a reason to not buy digital. This supports our reason to create graphically enhanced digital music files
in an attempt to create a type of digital packaging. Even if the percentages of people that are swayed
by a CDs packaging is very small (in our research only 38% of people felt that packaging was
significant) 38% of all of the people that buy CDs per year would amount to a huge number. If even
1% of these people could be persuaded to buy digital over CDs this would amount to a huge jump in
revenue for digital music sellers. The next most significant finding was that most people buy CDs but
listen to digital music. CD purchasers are burning their music collections to computers or iPods and
only listening to the digital format. This is quite significant as if CD purchasers can be persuaded to
change their music buying habits to digital there would be a major shift in how people buy music. CD
104
purchasers have already embraced digital music as a format to listen to and so half of the process has
been taken care of. However there are still reasons when it comes to actually buying music CDs over
digital that CDs are the format of choice; ownership and Digital Rights Management (DRM) issues
and the fact that people simply like to shop seem to be contributing factors. If these issues could be
rectified with design then, even a small percentage of CD purchasers may be persuaded to switch to
buying mainly digital music. This would have a huge impact on the music industry
6 Conclusions and future work
So far we have pinpointed several devices that people, at present and in the future, listen to music on.
We have devised a series of multimedia and Web 2.0 design solutions that combine this information
and the information gathered from questionnaires that attempt to solve some of the issues raised.
We will begin by developing a Web 2.0 design solution that will take the form of an online application
that mimics a users CD collection. An in store digital music application will also be developed. This
may take the form of a booth or listening point, which are already familiar to in store music
purchasers. For mobile phone devices with wireless capabilities this application can be accessed and
your entire music collection can be listened to from your mobile phone. This mobile phone application
will also be able to scan digital information from CDs in store allowing users to purchase digital music
content over an in store wireless network. At present the technology for this exists in Japan where
metro users can use their mobile phones to scan the metro turnstiles and enter the underground. They
are then charged to their phone bill for the service. With emerging network technologies such as IEEE
802.11n claiming transmission rates of up to 600 Mbps this idea is technically feasible in the near
future.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
EMI Group, EMI Music launches DRM-free superior sound quality downloads across its
entire digital repertoire Press Release, April 2, 2007. [Online]. Available:
http://www.emigroup.com/Press/2007/press18.htm. [Accessed: 03 August 2007].
Visual Radio, Visual Radio :: Redefining the Radio Experience, 2005. [Online]. Available:
http://www.visualradio.com/1,121,,,541.html.& http://www.visualradio.com/1,121,,,412.html.
[Accessed: 24 Feb. 2006]
[Campey, R, Roman, P, Lagerling, C, 2005] Campey, R et al (2005). The search for Mobile
Data Revenue II – a Sector Overview of Mobile Music. GP Billhound Sector report, London.
Motorola,
Motorola
SLVR
with
iTunes,
2006.
[Online].
Available:
http://www.motorola.com/motoinfo/product/details/0,,139,00.html. [Accessed: 30 Feb. 2006]
[Mello, J.P. Jr., 2005] Mello, J.P.Jr., (2005). iPod slayers misdirecting efforts, 2005. [Online].
Available: http://www.technewsworld.com/story/46236.html. [Accessed: 25 Oct. 2005]
[Moss-Pultz, S, 2007] Moss-Pultz, S, 2007 Openmoko Announce Free Your Phone, 2007.
[Online]. Available: http://lists.openmoko.org/pipermail/announce/2007-January/000000.html.
[Accessed: 03 Aug. 2007]
Vodafone, Media Centre – Vodafone and Sony Music Entertainment hit global high note, May
23,
2004.
[Online].
Available:
http://www.vodafone.com/start/media_relations/news/group_press_releases/2004/press_releas
e23_05.html. [Accessed: 01 March 2006]
Vodafone, Media Centre - Vodafone and Universal Music Group International sign strategic
partnership, Nov. 14, 2005. [Online]. Available:
http://www.vodafone.com/start/media_relations/news/group_press_releases/2005/press_releas
e14_11.html. [Accessed: 01 March 2006]
IPFI: 07 Digital Music Report 2007, International Federation of Phonographic Industry,
London.
105
Session 4
Algorithms
107
Adaptive ItswTCM for High Speed Cable Networks
Mary Looney 1, Susan Rea 1, Oliver Gough 1, Dirk Pesch 1
1
Cork Institute of Technology, Cork, Ireland
{mary.looney, susan.rea, oliver.gough, dirk.pesch}@cit.ie
Abstract
The use of traffic conditioning in high speed networks is significant in today’s cable industry due
to the increased demand of real-time data services such as video streaming and IP telephony.
Various traffic conditioning techniques exist such as traffic shaping, policing and metering. The
focus of this paper is a Rate Adaptive Shaper (RAS), known as the Improved Time Sliding
Window Three Colour Marker (ItswTCM). This RAS was proposed to improve the fairness index
in differentiated service networks and is based on the average arrival rate of packets over a
constant window period of time. For high speed networks the window size required is large due to
the large delay-bandwidth product incurred. For ItswTCM the window size is held constant which
does not greatly improve network efficiency. This paper concentrates on applying an adaptive
sliding window, known as the Improved Time Sliding Window (ITSW), to the ItswTCM
algorithm to produce an adaptive sliding window TCM mechanism. The behaviour of this
Adaptive ItswTCM algorithm is examined under simulation conditions in a high speed DOCSIS
environment.
Keywords: Traffic Conditioning, ItswTCM, DOCSIS, Adaptive Window Scaling.
1
Introduction
With the increase in demand for symmetric real-time services, Data Over Cable Service Interface
Specification (DOCSIS) has been successful in providing cable operators with the high speed data
transfer required [1]. The original specification, DOCSIS 1.0, provided the cable industry with a
standard based interoperability to allow for high speed web browsing and describes the
communications and support operator interface within a fully deployed Hybrid Fiber Co-axial (HFC)
network. With the increase in advanced IP services such as voice over IP (VoIP) and real time data
services, DOCSIS 1.0 needed to be upgraded to support greater levels of Quality of Service (QoS) and
to meet market demands for QoS. Hence the introduction of DOCSIS 1.1 which added key
enhancements to the original standard, enabling it to support several levels of QoS while also
improving bandwidth efficiency and supporting multiple service flows (SFs). Another significant
aspect of DOCSIS 1.1 is that it is backward compatible.
For QoS support, DOCSIS 1.1 specifies a number of enhancements to the DOCSIS 1.0 standard.
Firstly, the DOCSIS 1.0 QoS model has been replaced with a SF model that allows greater flexibility
in assigning QoS parameters to different types of traffic and in responding to changing bandwidth
conditions. Support for multiple SFs per cable modem (CM) is permitted. Greater granularity in QoS
per CM is applied, allowing it to provide separate downstream rates for any given CM to address
traffic conditioning and rate shaping purposes. To support on demand traffic requests, the creation,
modification and deletion of traffic SFs through dynamic MAC messages is also supported. The focus
of this paper is on traffic conditioning and rate shaping for increased downstream throughput within a
DOCSIS environment.
Traffic conditioning improves network efficiency of high speed cable networks and maximises
throughput by rate limiting the flow of packets over the downstream network. It reduces
retransmissions in the network by smoothing traffic rates and dropping packets for more dependable
operation. Effective traffic shaping and policing algorithms already exist, some based on window flow
108
control, others based on rate and prediction flow controls [2]. One particular algorithm used in
Differentiated Services (DiffServ) networks is known as the Improved Time Sliding Window Three
Colour Marker (ItswTCM) [3]. This algorithm was proposed to improve fairness in DiffServ networks
due to the increased demand for greater QoS in the internet. As a result it improved throughput in
DiffServ networks and was therefore applied to a DOCSIS network to provide greater performance in
the network.
The ItswTCM uses a time sliding window along with a colour marking scheme for the conditioning of
its traffic. It is a rate estimation algorithm that shapes traffic according to the average rate of arrival of
packets over a specific period of time (i.e. a window length). This period of time is preconfigured to
be a constant value of either a short value in the order of a round trip time of a TCP connection, or a
long value in the order of the target rate of the SF [6]. This constant value limits the potential of the
algorithm. For instance, when working with high speed cable networks a large window size would be
required due to the large delay-bandwidth product sustained [2]. The static nature of the window can
lead to bandwidth wastage. Dynamically changing the characteristics of the traffic shaper could result
in greater throughput in the network [7] [8]. If the window length was variable, adapting to its
particular environment, performance of the ItswTCM should greatly improve. Various window
adaptation algorithms exist to maximise network throughput [4] [5] [6]. One such algorithm is called
the improved TSW (ITSW) [9]. This algorithm is based on the original TSW that was used in the
creation of the ItswTCM algorithm. It differs from the TSW in that its window length is varied and not
held constant allowing the ITSW to adapt to its environment.
The main contribution of this paper is the merging of the adaptive ITSW with the ItswTCM algorithm
to produce an Adaptive ItswTCM to be used in a high speed DOCSIS network. Simulation results will
demonstrate the beneficial effects of this Adaptive ItswTCM algorithm within a DOCSIS
environment. The layout of the paper is as follows: section 2 reviews the TSW algorithms with a focus
on ItswTCM and ITSW algorithms. The merging of these algorithms to produce an Adaptive
ItswTCM is discussed. The DOCSIS environment where the Adaptive ItswTCM algorithm will be
applied is described in Section 3. Experimental setup and performance results are presented in Section
4 and finally the paper end with the conclusions that are drawn as a consequence of this work.
2
Traffic Conditioning in High Speed Networks
Traffic Conditioners are typically deployed in high speed networks to regulate traffic flow in order to
avoid overloading intermediate nodes in the network. Various traffic shaping and marking schemes
exist such as leaky buckets [10] and token buckets such as the single rate three colour marker (srTCM)
and the two rate three colour marker (trTCM) [11][12]. Rate Adaptive Shapers (RAS) are another type
of traffic shaping mechanism used to produce traffic at the output that is less bursty than that of the
input. Recently, RAS have been successfully combined with the marking schemes of the above
mentioned token buckets to produce the single rate RAS (srRAS) and the two rate RAS (trRAS)
algorithms [13] [14]. These RAS schemes are mainly used in the upstream direction [13].
Since the concern of this paper is in the downstream direction of DOCSIS networks another type of
RAS was considered. This is known as the time sliding window (TSW) [15]. The TSW algorithm is
based on the average rate of arrival of packets and traffic is conditioned according to this value. The
marking schemes associated with the srTCM and trTCM was later adapted to the TSW algorithm so
that traffic streams could be metered and packets marked accordingly [16]. This algorithm is known as
the Time Sliding Window Three Colour Marker (TSWTCM). The unfairness of this algorithm in
differentiated services has been discussed and a solution to solve this unfairness problem was
proposed in [3] and this is referred to as the Improved TSWTCM (ItswTCM) algorithm. The work
presented in this paper uses this algorithm and applies an adaptive window size for improved
downstream throughput.
109
2.1
ItswTCM
The underlying principle of the ItswTCM is that packets are permitted into the network in proportion
to their Committed Information Rate (CIR) depending on their estimated average arrival rate
(avg_rate) of the network over a specific preceding period of time (win_length_const). A constant
value for win_length_const is normally adhered to [15] [17]. The avg_rate and the time the last packet
arrived (prev_time) are variables used within the algorithm that are updated each time a packet arrives
(as in equation 1).
bytes _ in _ TSW = avg _ rate * win _ length _ const ;
new _ bytes = bytes _ in _ TSW + pk _ size;
avg _ rate = new _ bytes /(curr _ time − prev _ time + win _ lenth _ const );
Figure 1: Algorithm for the TSW in ItswTCM
The coloured marker in this algorithm is focused on smoothing traffic in proportion to its CIR and
injecting yellow packets into the network to achieve a fair share of bandwidth across the network.
Hence, yellow packets play a significant role in this algorithm.
if (CIR < avg _ rate <= PIR)
packet = yellow;
elseif (avg _ rate <= CIR )
packet = green;
else
packet = red ;
Figure 2: Algorithm for the colour marker in ItswTCM
In this algorithm the service rate is guaranteed if the avg_rate is less than the CIR i.e. these packets are
marked green. If the avg_rate is greater than CIR but less than the Peak Information Rate (PIR) then
the packets are marked as yellow, thus allowing larger flows to be able to contend with smaller ones.
However, if the avg_rate exceeds the PIR then packets are marked as red. The CIR and PIR are
determined from the networks maximum and minimum guaranteed bandwidth rates, Tmax and Tmin as
illustrated in Equation 1.
PIR = T max/ 8
CIR = T min/ 8
Equation 1.
Considering the ItswTCM is based on a constant previous period of time an efficient use of the
network is not always reflected. For high speed networks large window sizes would be required which
in some cases might permit larger bursts of traffic into the network and less smoothing or shaping of
traffic, which is not ideal. Larger buffering would also be required at each node. This may be
overcome however with the use of an adaptive window scaling algorithm. If such an algorithm was
merged with the ItswTCM, performance could be improved resulting in the smooth injection of traffic
into a network.
2.2
ITSW
The improved TSW (ITSW) is an adaptive window scaling algorithm, which uses a variable window
length method and is a variant of the original TSW [9] [15]. The variable window length
accommodates and reflects the dynamics of TCP traffic. As previously mentioned, in the original
TSW the window length is preconfigured to be a constant value of either a short value in the order of a
round trip time of a TCP connection, or a long value in the order of the target rate of the SF [6]. For
ITSW a combination of both of these are used to determine the variable window length as shown in
Equation 2 below.
110
§
¨
¨
target_rate
win _ len = ¨ n
¨
¨ ¦ target_ratei
¨ i =1
n
©
·
¸
¸
¸ × win _ length _ const
¸
¸
¸
¹
Equation 2
The constant window length used here is the same as was used in the original TSW algorithm. The
incorporation of target rates into the window length allows the window to adjust according to its
environment. This equation still permits high speed networks to adapt to a large window size if
required due to the involvement of their target rates.
2.3 Adaptive ItswTCM
The ITSW has improved performance over the original TSW. It allows for greater fairness in networks
and hence throughput. To improve the ItswTCM algorithm it is merged with the ITSW so that a
variable window length is now used instead of a constant value as described in Figure 3 below.
win_len is the variable window length as described in Equation 1 above.
bytes _ in _ TSW = avg _ rate * win _ len;
new _ bytes = bytes _ in _ TSW + pk _ size;
avg _ rate = new _ bytes /(curr _ time − prev _ time + win _ length _ const );
Figure 3: Adaptive ItswTCM algorithm
The Adaptive ItswTCM can now adapt to changing network conditions. It is expected to outperform
that of the static window used in the original ItswTCM.
3
DOCSIS Simulation Environment
For experimental investigation a computer simulated DOCSIS environment is required. CableLabs
[18], in conjunction with OPNET have developed a model for the HFC DOCSIS 1.1 specification
using the OPNET simulator [19]. The model includes features relevant to both DOCSIS 1.0 and 1.1,
and allows the creation of complex networks so that analysis and evaluation of alternative
configurations can be performed to determine capacity and Quality of Service (QoS) characteristics.
Using this environment the Adaptive ItswTCM algorithm is implemented to provide enhanced QoS
features.
The OPNET DOCSIS implementation is based on the Radio Frequency (RF) Interface Specification
1.1 for equipment and network design and planning [20]. Traffic scheduling classes such as
unsolicited grant service (UGS), real time polling service (rtPS), non real time polling service (nrtPS)
and best effort (BE) are all modelled in the OPNET DOCSIS model as well as upstream QoS features
such as fragmentation, concatenation, contention, piggybacking and payload header suppression
(PHS) to enhance utilisation of bandwidth. Upstream and downstream RF parameters are all
configurable and multiple channels are supported in both the upstream and downstream direction.
However, the model is limited in some of its capabilities as listed below [20].
• The dynamic creation, deletion and modification of services are not permitted in the model.
• Multiple SFs are not permitted.
• Enhancements to QoS features are not implemented. This includes Connection Admission
Control (CAC) and traffic shaping and policing.
• Additional security features and oversubscription rates are not modelled.
To provide realistic results and to comply with the DOCSIS 1.1 standard the following features were
modelled along with the Adaptive ItswTCM algorithm [21].
• Multiple SFs
111
•
4
CAC is modelled as a first come first serve resource reservation policy in which requests for
bandwidth guarantees are rejected if the resulting total utilisation would exceed some
specified threshold. This threshold is based on the total available bandwidth (in both the US
and DS) and the maximum guaranteed bandwidth, Tmax, that any user is allowed have. Tmax
is operator dependant. (Tmin is the minimum guaranteed bandwidth). CAC is implemented
during the CM registration phase with the bandwidth requirements being assessed to control
the traffic entering the network so that each SF obtains its desired QoS.
Simulation Setup and Results
Simulation experiments were conducted to investigate the impact of the proposed Adaptive ItswTCM
algorithm and the ItswTCM algorithm on downstream throughput performance. The associated
algorithm throughputs and End to End DOCSIS network delays are examined to determine
performance in moderately loaded and uncongested network environments with the proportions of red,
green and yellow packets generated by both schemes being analysed.
4.1
Simulation Environment
The OPNET simulator was used to demonstrate performance. Firstly, a set of ftp scenarios were set up
for analysis in moderately loaded and uncongested networks [22]. For each set of experiments the
original ItswTCM and the Adaptive ItswTCM schemes are used. Each scenario is set up with 50 cable
modems (CMs) connected to one cable modem termination system (CMTS). Four downstream
channels with data rates of 41.2Mbps, 38Mbps, 31.2Mbps and 27Mbps are configured along with a
single US channel with a data rate of 10.24Mbps. Each simulation is run for thirty minutes.
CMs are configured to ftp files across the DOCSIS network as follows: For an uncongested network
environment 50% of CMs will ftp 35KByte files, 25% will transfer files of size 30KByte, and the
remaining 25% will transfer 40KByte files. For a moderately loaded network environment all CMs
will ftp files of size 1MByte. Inter-request times of (exponential) 360, 300 and 400 seconds
respectively were used. All traffic is forwarded as BE. The maximum guaranteed bandwidth value,
Tmax is set to 20Mbps for each downstream and the minimum guaranteed bandwidth, Tmin is set to
0Mbps for each downstream. A traffic burst size of 1522 Bytes is used. The constant window length,
win_length_const is set to 90msecs. The target_rate of each SF refers to the maximum guaranteed rate
that is equivalent to Tmax for each flow.
For the second set of scenarios, a traffic mix of ftp, email and video is used. Again, the performance of
the original ItswTCM and Adaptive ItswTCM is analysed and the DS and US channel details are as
described above with 50CMs. The CMTS is configured to transfer MPEG-2 compressed video streams
with a frame rate of 50 frames per second (fps) across the DOCSIS network to 50% of CMs. 25% of
CMs will ftp 35KByte files with an inter-request time of (exponential) 36 seconds. The CMTS is also
configured to send email data to the other 25% of CMs with an email size of 1000Bytes.
4.2 Analysis of Coloured Packets
Table 1 shows the number of green, yellow and red packets generated for simulations with moderately
loaded and uncongested networks. On comparing both algorithms it is evident that for the uncongested
network traffic conditioning does not play a major role as the numbers of green, yellow and red
packets generated are the same. However, for the moderately loaded network the adaptive ItswTCM
algorithm outperforms the original algorithm by adapting its window length to the target rate of the
network. Note that in the explanation of ItswTCM algorithm its purpose was to smooth traffic into the
network, hence the injection of yellow packets rather than green. This is reflected in the table below
with the majority of packets coloured yellow. Green packets are only permitted if the average rate is
less than the CIR. As CIR = 0bps, no green packets are allowed. In comparing both algorithms, the
number of yellow packets is greater for the Adaptive ItswTCM algorithm (almost 9100 packets more)
than the original ItswTCM algorithm and the number of red packets is zero in comparison to 9105
packets for the original algorithm. Adaptive window scaling results in better control of packet rate
112
resulting in fewer red conditions. Also for the moderately loaded case, the aggregate number of yellow
and red packets is greater for the Adaptive ItswTCM algorithm showing a greater throughput than that
of the ItswTCM.
Uncongested Network Moderately Loaded Network
Green Yellow Red Green
Yellow
Red
0
0
0
ItswTCM
187335
376591
9105
0
0
0
Adaptive ItswTCM
187335
385664
0
Table 1: Traffic Marking using ItswTCM and the Adaptive ItswTCM algorithms
The number of green, red and yellow packets for the set of scenarios with a different traffic mix is
represented in Table 2. It verifies the conclusion drawn to from Table 1 above that the Adaptive
ItswTCM outperforms the ItswTCM in throughput. The Adaptive ItswTCM exhibits a lower number
of red packets and therefore a greater amount of yellow packets, coinciding with results presented in
Table 1 above.
Traffic Mix
Green Yellow Red
ItswTCM
Adaptive ItswTCM
0
0
379567
392493
596306
583547
Table 2: Traffic Marking using ItswTCM and the Adaptive ItswTCM algorithms
For the moderately loaded network in Table 1, there is only a difference of 2% in the number of
yellow/red packets. However, we see in Table 2 for a greater traffic mix that the difference in
yellow/red packets is consistent i.e. a 3% difference. Hence, there is a coherent improvement
throughout the experiments when using the Adaptive ItswTCM. This is also reflected in the following
section.
4.3 Throughput Performance
The DS bus throughput values were recorded for the ftp scenarios described above. These values
represent the total throughput in bits/sec on all downstream channels. The maximum, minimum and
average values for each scenario were recorded over the simulation duration of 30 minutes. The results
of these throughput figures in bits per second are shown in Table 3 below.
ItswTCM
Adaptive
ItswTCM
Uncongested Network
Moderately Loaded Network
Minimum Average Maximum Minimum Average Maximum
93,838
1,316,922 18,706,053
48,152
817,512
48,152
93,838
1,437,009 21,129,809
48,152
817,512
48,152
Table 3: Minimum, Average and Maximum Throughput values for the ItswTCM and Adaptive
ItswTCM algorithms
As can be seen from Table 3 for the uncongested networks the bus throughputs are exactly the same as
would be expected from results shown in table 1 above. For the moderately loaded network however,
the average throughput is greater for the Adaptive ITSWTCM algorithm by approximately 120000
bits/sec. This again reflects that the Adaptive ITSWTCM algorithm performs better than the ItswTCM
showing a noticeable improvement in throughput performance.
For the traffic mix scenarios, the average DS bus throughput is plotted in Figure 4 below. Here we see
that the throughput for the Adaptive ItswTCM is approximately 18% greater than that of the original
ItswTCM. This shows the consistency of greater performance for the Adaptive ItswTCM.
113
Figure 4: Average DS Bus Throughput for the ItswTCM Algorithms
4.4 DOCSIS Delay
The End to End delay was also recorded for both sets of scenarios and their average values are
represented in Figure 5 for a simulation duration of 30 minutes. The delay for the original ItswTCM is
greater than that of the Adaptive algorithm for both cases. Even though this difference in delay is very
small it is still consistent, thus confirming the greater DS throughputs experienced above.
5b)
5a)
Figure 5: DOCSIS End to End Delays for the ItswTCM Algorithms for a) moderately loaded
network and b) scenarios set up using a traffic mix
5
Conclusion
This paper presented an Adaptive ItswTCM algorithm from the collaboration of existing algorithms,
ITSW and ItswTCM. From a consistency of results presented in Section 4, it can be concluded that
this adaptive algorithm provides improved throughput and performance within a DOCSIS
environment over the original ItswTCM algorithm. End to End delay was decreased leading to an
increase in DS throughput. The importance of traffic shaping for the enhancement of QoS features
within a DOCSIS environment was also discussed. A network simulated DOCSIS model was
illustrated and it was this simulated model that was used in simulations to confirm that the adaptive
ItswTCM algorithm had greater throughput and performance in a DOCSIS environment than the
algorithm that is based on a constant period of time. Future work will be based on an active queue
114
management policy that will queue red packets rather than simply dumping them. This obviously
degrades performance as can be seen in the number of red packets in Table 2 above.
Acknowledgements
This project is supported by a European Commission Framework Programme (FP6) for Research and
Technological Development titled “CODMUCA - Core Subsystem for Delivery of Multiband data in
CATV networks”, IST-4-027448-STP.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
www.cablemodem.com/specifications/
Chao, H. J., Guo, X. (2001). Quality of Service Control in High-Speed Networks. John Wiley &
Sons, Inc: New York, USA, pp.235-240.
Su, H., Atiquzzaman, M. (2001). ItswTCM: A new aggregate marker to improve fairness in
Diffserv. Proc. of the Global Telecommunications Conference, 3: 1841-1846.
Mitra, D. (1992). Asymptotically optimal design of congestion control for high speed data
networks. IEEE Transactions on Communications, 40:301-311.
Mitra, D. (1990). Dynamic adaptive windows for high speed data networks: theory and
simulations. ACM SIGCOMM Computer Communication Review, 20:30-40.
Byun, H.-J., Lim, J-. T. (2005) Explicit window adaptation algorithm over TCP wireless
networks. Proc. IEE Communications, 152: 691-696.
Ahmed, T., Boutaba, R., Mehaoua, A. (2004). A measurement based approach for dynamic QoS
adaptation in DiffServ networks. Journal of Computer Communications 28: 2020-2033.
Elias, J., Martignon, F., Capone, A., Pujolle, G. (2007). A new approach to dynamic bandwidth
allocation in Quality of Service networks: Performance and bounds. Journal of Computer
Networks, 51: 2833-2853.
Nam, D.-H., Choi, Y.-S., Kim, B.-C., Cho, Y.-Z. (2001). A traffic conditioning and buffer
management scheme for fairness in differentiated services. Proc. Of ATM (ICATM 2001) and
High Speed Intelligent Internet Symposium, 91-96.
Niestegge, G. (1990). The “leaky bucket” policing method in the ATM (asynchronous transfer
mode) network. International Journal on Digital Analog Communications Systems, 3:187-197
Heinanen, J., Guerin, R. (1999). A single rate three colour marker. Internet Draft: RFC 2697.
Heinanen, J., Guerin, R. (1999). A two rate three colour marker. Internet Draft: RFC 2698
Zubairi, J.A-., Elshaikh, M.-A., Mahmoud, O. (2001). On Shaping and Handling VBR traffic in
a Diffserv domain. Englewood Cliffs NJ: Prentice-Hall.
Shuaib, K., Sallabi, F. (2003). Performance evaluation of rate adaptive shapers in transporting
MPEG video over differentiated service networks. Proc. Of Communications, Internet and
Information Technology, pp. 424-428.
Clark, D.D., Fang, W. (1998). Explicit allocation of best effort packet delivery service.
IEEE/ACM Transactions on Networking, 6: 362-373.
Fang, W., Seddigh, N., Nandy, D. (2000). A time sliding window three colour marker. Internet
Draft RFC 2859.
Strauss, M. D., (2005). A Simulation Study of Traffic Conditioner Performance. Proc. of IT
Research in Developing Countries, 150: 171-181.
www.cablelabs.com
www.opnet.com
Specialised Models User Guide, DOCSIS Model User Guide, SP GURU/Release 11.5, SPM 21.
Looney, M., Rea, S., Gough, O., Pesch, D., Ansley, C., Wheelock, I. (2007). Modelling
Approaches to Multiband Service Delivery in DOCSIS 3.0 – An Architecture Perspective.
Symposium on Broadband Multimedia Systems and Broadcasting.
Martin, J., Shrivastav, N. (2003). Modelling the DOCSIS 1.1/2.0 MAC Protocol. IEEE Proc. Of
Computer Communications and Networking. pp.205-210.
115
Distributed and Tree-based Prefetching Scheme for
Random Seek Support in P2P Streaming
Changqiao Xu 1,2,3, Enda Fallon 1, Paul Jacob 1, Austin Hanley1, Yuansong Qiao 1,2,3
1
Applied Software Research Centre, Athlone Institute of Technology, Ireland
2
Institute of Software, Chinese Academy of Sciences, China
3
Graduate University of Chinese Academy of Sceinces, China
[email protected], {efallon, pjacob, ahanley, ysqiao}@ait.ie
Abstract
Most research on P2P streaming assumes that users access video content sequentially and
passively, where requests are uninterrupted from the beginning to the end of the video streaming.
An example includes P2P live streaming system in which the peers start playback from the current
point of streaming when they join streaming session. This sequential access model is inappropriate
to model on-demand streaming, which needs to implement VCR-like operations, such as forward,
backward, and random seek because of the users pattern of viewing at will and the users ignorance
of the content. This paper proposes a distributed and Balanced Binary Tree-based Prefetching
scheme (BBTP) to support random seek. Analysis and simulation show BBTP is an efficient
interactive streaming service architecture in P2P environment.
Keywords: P2P, Balanced binary tree, Prefetching, Random seek
1 Introduction
Using a peer-to-peer (P2P) approach to provide streaming services has been studied extensively in
recent years [1, 2, 3, 4, 5]. P2P consumes the bandwidth of peers efficiently by capitalizing on the
bandwidth of a client to provide services to other clients. An important advantage of P2P streaming is
to provide system scalability with a large number clients sharing stream. Besides, P2P streaming can
work at the application layer without requiring any specific infrastructure. A P2P live streaming
system always assumes that a user who joins a streaming session would receive streaming from the
point of joining time and keep on watching till it leaves or fails the session. However, for on-demand
streaming, through analysis of large volumes of user behavior logs during playing multimedia
streaming in paper [6], the user viewing pattern indicates that random seek is a pervasive phenomenon.
The authors of [6] propose a hierarchical prefetching scheme which prefetchs popular and sub-popular
segments to support random seek based on examination of the large amount of user viewing behavior
logs. This scheme has these limitations: 1)not flexible, before bringing the prefetching scheme into
effect, it should examine and collect the user viewing logs which need much user access information
and suffer a long time testing; 2) impractical, it possibly collects access logs in traditional client-server
model, however, it challenges the implementing in a distributed P2P system. In VMesh [7], videos are
divided into smaller segments (identified by segment IDs) and they are stored in peers distributed over
the network based on distributed hash tables (DHT). A peer may store one or more video segments in
its local storage. It keeps a list of the peers who have the previous and the next video segments. By
following the list, it can find the peers who have the next requested segments. If the client wants to
jump to another video position which is not far away from the current one, it can simply follow its
forward/backward pointers to contact the new nodes. On the other hand, if the new position is too far
away, it triggers DHT search for the segment corresponding to the new position. However, keeping all
116
the pointers would be very costly. In this paper, we propose a novel scheme a distributed and Balanced
Binary Tree-based Prefetching scheme (BBTP) to distribute video segments over the network and
support random seek.
The rest of the paper is organized as follows. A prefetching scheme of BBTP is discussed in
section 2. Section 3 discusses the random seek support procedure of BBTP. The performance of BBTP
is evaluated in section 4. Finally, section 5 concludes the paper and offers some future research
directions.
2 Prefetching Scheme of BBTP
The overlay network in BBTP is a balanced binary tree structure. A tree is balanced if and only if
at any node in the tree, the height of its two subtrees differs by at most one. It has been shown that a
balanced binary tree with N nodes has height no greater than 1.44logN [8].
Table 1 Notations used of prefetching scheme of BBTP
S
The source media server
T
The balanced binary tree
P
The requested video for playback
Pid
P’s identifier number
Length of P
Len
R
The root node of the balanced binary tree
X
Node of T
Prebuf(X)
Node X’s prefetching buffer
d
Size of Prebuf(X)
Seg(X)
The serial number of prefetching unit for node X
Node X’s parent node in the balanced binary tree
Parent(X)
LChild
Left child node in the tree
RChild
Right child node in the tree
LHeight
Height of the left subtree in the tree
RHeight
Height of the right subtree in the tree
As the table 1 shows, we suppose the source media server is S, the balanced binary tree is T, the
requested video stream is P coded by CBR rate with Len length. We divide the video P into equal
segments (the length of a segment is 1 s for playback), and set d numbers of sequential segments as a
prefetching unit, so P have ⎡Len d ⎤ numbers of prefetching units which are numbered from 1 to L
sequentially. Any node X has a prefetching buffer named as Prebuf(X) with size of d. Supposing Seg(X)
is the serial number of prefetching unit for Prebuf(X), the node X should accomplish two important
operations when it joins the system: 1) Node X should become a leaf node of the balanced binary tree.
(2) Node X should prefetch a prefetching unit whose serial number equals Seg(X) into Prebuf(X) from
the source media server or other nodes’ prefetching buffer.
⎣L/16⎦
P
1
⎣3L/16⎦
⎣5L/16⎦
⎣L/4⎦
⎣L/8⎦
⎡5L/8⎤
⎡L/2⎤
⎡15L/16⎤
⎡13L/16⎤
⎡11L/16⎤
⎣9L/16⎦
⎣7L/16⎦
⎣3L/8⎦
⎡3L/4⎤
L
⎡7L/8⎤
1
2
3
T
4
8
5
9
10
6
11
12
7
13
14
15
Fig. 1 Prefetching scheme of BBTP
As fig. 1 shows, We suppose R is the root node of T. At the level 1 of T, we set the mapping
relationship between the corresponding prefetching unit of P’s middle position with Prebuf(R),
namely have Seg(R)= ⎡L 2⎤ . At the level 2 of T, we suppose R1, R2 are node R’s left and right child
node respectively and divide P into two subsections L1 and L2 with the same length. Assuming Mid(L1)
and Mid(L2) are the corresponding serial number of prefething unit for L1’s and L2’s middle position
117
respectively, we set the mapping relationship between Mid(L1), Mid(L2) with Prebuf(R1) and
Rrebuf(R2) respectively, namely have Seg(R1)=Mid(L1), Seg(R2)= Mid(L2). At the level 3 of T, divide
L1, L2 into two equal subsections L11, L12 and L21, L21 respectively, we set the mapping relationship
between the corresponding prefething unit of middle position of L11, L12 and L21, L21 with R1’s, R2’s left
and right node’s prefetching buffer respectively. The above operations are repeated for each tree level
until P can not be divided anymore (the subsection length is less than a prefetching unit length), we set
the prefetching unit’s serial number the same as its parent node’s. For node X, assuming its parent
node is Parent(X), k(X) is the X’s tree level in T, k(R)=1, we have equations as follows:
1) If k(X)=1, X is the root node of T , then have
Seg ( X ) = ⎡L / 2⎤
(1)
2) If 1 < k ( X ) ≤ ⎣log L ⎦ + 1
① if X is LChild
L ⎥
⎢
(2)
Seg ( X ) = ⎢ Seg ( Parent ( X )) − k ( X ) ⎥
2 ⎦
⎣
② if X is RChild
L ⎤
⎡
(3)
Seg ( X ) = ⎢ Seg ( Parent ( X )) + k ( X ) ⎥
2 ⎥
⎢
3) If k ( X ) > ⎣log L ⎦ + 1
Seg ( X ) = Seg ( Parent ( X ))
(4)
Supposing Sibling(X) is the sibling node of X, if Seg(X)=Seg(Parent(X)), so have Seg(X) =
Seg(Sibling(X)), if Seg(X) = Seg(Parent(Parent(X))),then have Seg(X) = Seg(Sibling(Parent(X))).
The nodes which have the same prefetching unit can become the prefetching suppliers of X when X
joins system, which can avoid all nodes to prefetch video streaming from S directly and lighten the load
of S. Assuming Presuppliers(X) is the aggregate of nodes which have the same prefetching unit, Bw[i](X)
is the node i’s usable bandwidth in Presuppliers(X), we will always choose the node whose Bw[i](X) is
maximal as prefetching buffer supplier of X.
From what have discussed above, we draw the prefetching algorithm and the procedure for
constructing balanced binary tree of BBTP:
1) X sends message Join<X,Pid> to S;
2) If there is no request log for the video P in S, establish the balanced binary tree named T and set
X as the root node of T, set the identifier number of X as 1, k(X) as 1, and X prefetchs the prefetching
unit whose serial number equals ⎡L / 2⎤ from the source media server S;
3) If there is a request log for the video P in S, R redirects X to its LChild in the tree if LHeight is
less than or equals RHeight of R, or otherwise to its RChild. The above operations are repeated until
the corresponding child is empty, and node X is then inserted to this position as leaf node. Assuming
the parent node of X is node Y, X Get the value of k(Y), Seg(Y) from Y, set k(X)=k(Y)+1;
4) Node X sends a “HeightChange” message to its parent. Upon receiving the message, the parent
resets its LHeight as LHeight+1 or RHeight as RHeight+1, depending on which branch the message
comes from, and then calculate its new height as max(LHeight, RHeight). If the height is changed, the
node sends “HeightChange” message to its parent. This process continues until the R node of the
balanced binary tree is reached;
5) If 1 < k ( X ) ≤ ⎣log L ⎦ + 1 , X calculates the value of Seg(X) by the equation (2) when X is LChild or
by equation (3) when X is RChild. Node X prefetchs the prefetching unit whose serial number equals
Seg(X) from S;
6) If k ( X ) > ⎣log L ⎦ + 1 , set Seg(X)= Seg(Y) and put Sibling(X) and Y into Presuppliers(X), if Seg(X)
equals Seg(Parent(Y)) , put Sibling(Y) into Presuppliers(X). Calculate the usable bandwidth Bw[j](X) of
node j in Presuppliers(X) and copy all the prefetching buffer content from node j who have maximal
value of Bw[j](X).
For the step of 6), we only search two levels above node X, namely Parent(X) and
Parent(Parent(X)). For adding more candidate nodes into Presuppliers(X), we can search more tree
levels above node X until we encounter node Z and Seg(Z) does not equal Seg(X). In the process of
118
constructing the balanced binary tree, every node needs to keep three peer pointers (i.e., peers’ IP
address and port): parent, left child, right child.
3 Random Seek Support Procedure
A peer can find the peers who have the next requested segments that are its right subtree nodes
and also can find the peers who have the previous requested segments that are its left subtree nodes. If
the client wants to jump to another video position which is not far away from the current one, it can
simply follow its left/right pointers to contact the new nodes. If the new position is too far away, the
following operations are performed:
1) Calculate the video segment serial number for the new jump position from the player interface.
Assuming video segment serial number is g, we can calculate the prefetching unit’s serial number by
⎡g d ⎤ named as M which includes the video segment g.
2) Traverse the balanced binary tree, starting from the root node R;
3) If M is less than Seg(R), the search pointer goes to R’s left child, or otherwise goes to R’s right
child. The above operations are repeated until the encountered node J, Seg(J) equals M, J is the target
searching node.
4) List the nodes of node J’s right subtree by inorder traversal. These nodes send prefetching
buffer content to the node that jumps to another video position when playback. If the continuous nodes
have the same serial number for prefetching unit, we always choose the node that has the maximal
usable bandwidth to supply the prefetching buffer content.
1
1
2
3
3
4
4
5
6
7
2
8
9
10
11
12
13
14
15
Fig.2 Jump operations
In fig. 2, assuming node 5 jumps when playback, the target video segment falls in node 3’s
prefetching buffer by searching in BBTP’s balanced binary tree. The nodes in the subtree of node 3
will be arranged by inorder traversal, that is 3, 14, 7, 15, which is just the continuous playback
streaming which can seen from fig.1 and those nodes send their prefetching buffer content to node 5.
Since the height of the balanced binary tree is O(logN), the cost for a joining operation and random
seek are thus bounded by O(logN).
4 Performance Evaluation
4.1 Simulation setting
In this section, we evaluate the performance of BBTP in simulation. The source media server has
ten videos for streaming, each with 256 Kbps rate and 2-h length. The length of a segment (or a time
unit) is 1 s, and the prefetching buffer at a node can accommodate 720 segments, i.e., 10% of a video
stream.
The underlying network topology is generated using the GT-ITM package [9], which emulates the
hierarchical structure of the internet by composing interconnected transit and stub domains. The
network topology for the presented results consists of ten transit domains, each with twelve transit
nodes, and a transit node is then connected to six stub domains, each with nine stub nodes. The total
number of nodes is thus 6,600. We assume that each node represents a local area network with plenty
of bandwidth, and routing between two nodes in the network follows the shortest path. The initial
bandwidth assigned to the links is as follows: 1.5 Mbps between two stub nodes, 6 Mbps between a
stub node and a transit node, and 10 Mbps between two transit nodes. We will also inject cross traffic
in the experiments to emulate dynamic network conditions.
119
To mitigate randomness, each result presented in this section is the average over ten runs of an
experiment.
4.2 Performance and Comparison
Number of routing hops
(1)Random Seek
We evaluate the random seek performance of BBTP compared with P2VoD[10] and VMesh.
P2VoD organizes nodes into multi-level clusters according to their joining time, and the data stream is
forwarded along the overlay tree built among the peers. Each host receives data from a parent in its
upper cluster and forwards it to its children in its lower cluster. A new node tries to join the lowest
cluster or forms a new lowest cluster. If it fails to find an available parent from the tree and the server
has enough bandwidth, it directly connects to the server. In our experiment, we use Smallest Delay
Selection for P2VoD’s parent selection process. And, we set the system parameter K = 6 and the cache
size same with BBTP's prfetching buffer. We build VMesh on top of a public Chord implementation
and set the length and bit rate of the video and each segment length same with BBTP’s.
60
50
40
P2VoD
VMesh
BBTP
30
20
10
0
1000
2000
3000
4000
5000
Number of nodes
6000
Fig 3 The cost for random seek
We can simulate the random seek operation by searching a video segment. The routing hop-count
can become the cost for searching. As figure 3 shows, the cost for random seek in P2VoD increases
almost linearly with the group size, while that in VMesh and BBTP only increases in logarithmic scale.
In VMesh and BBTP, a new node can quickly locate nodes with the first several segments through
DHT routing or searching in balanced binary tree. The searching time for VMesh and BBTP is O(logN)
where N is the number of nodes in the system. The latency is hence significantly reduced.
(2) Streaming quality
To playback continuity is critical for streaming applications. We adopt the Segment Missing Rate
(SMR) as the major criterion for evaluating streaming quality. A data segment is considered missing if
it is not available at a node till the play-out time, and the SMR for the whole system is the average
ratio of the missed segments at all the participating nodes during the simulation time. As such, it
reflects two important aspects of the system performance, namely delay and capacity. For comparison,
we also simulate an existing on-demand overlay streaming system, oStream [11], with the same
network and buffer settings. oStream employs a pure tree structure, in which each node caches played
out data and relays them to its children of asynchronous playback times. A centralized directory server
is used to maintain the global information of the overlay, and facilitates node join or failure recovery.
Firstly, we investigate the performance of BBTP under dynamic network environments with
bandwidth fluctuations. To emulate bandwidth fluctuations, we decrease the bandwidth from 100% to
64% of the base setting. As figure 4 shows, the SMR of BBTP, VMesh, oStream increases with
decreasing the bandwidth. However, the increasing rate for VMesh, BBTP is generally lower than that
of oStream. VMesh is lower than BBTP, but it is not obvious.
Secondly, we investigate the performance of BBTP under random seek with seek rate fluctuations.
For oStream , random seek can be implemented by letting the node leave the system and then re-join
with the new playback offset. We set the random seek rate as the average ratio of the total nodes which
random seek occurs to the total nodes of the whole system. As Fig. 5 shows, when 8% nodes of system
random seek occurs, the SMR of BBTP less than 10%, the SMR of VMesh equals 10% and the SMR
120
of oStream has reached 30%. So BBTP is an efficient interactive streaming service architecture in P2P
environment.
0.4
oStream
VMesh
BBTP
0.7
0.6
Segment Misssing Rate
Segment Misssing Rate
0.8
0.5
0.4
0.3
0.2
0.1
oStream
VMesh
BBTP
0.3
0.2
0.1
0
0
0
0
4
8
12
16
20
24
28
32
36
1
2
3
4
5
6
7
8
Percentage of Random Seek Nodes
Bandwidth Reduction(%)
Fig. 4 The impact of dynamic network
Fig. 5 The impact of random seek rate
5 Conclusion and Future Work
This paper proposed a distributed and Balanced Binary Tree-based Prefetching strategy (BBTP) to
support random seek for P2P on-demand streaming. Simulation and comparison shows BBTP is an
efficient interactive streaming service architecture in P2P environment. It supports random seek
quickly and plays back smoothly under dynamic network conditions and random seek. Further
research for BBTP includes the recovering algorithm with the leaving or failure of nodes.
References
[1]Hefeeda M, Bhargava B (2003, May) On-demand media streaming over the Internet. In: Proc. IEEE
FTDCS’03, San Juan, Puerto Rico
[2]Guo Y, Suh K, Kurose J, Towsley D (2003, May) P2Cast: peer-to-peer patching scheme for VoD
service. In: Proc. WWW’03, Budapest, Hungary.
[3]Do T, Hua KA, Tantaoui M (2004, June) P2VoD: providing fault tolerant video-on-demand
streaming in peer-to-peer environment. In: Proc. IEEE ICC’04, Paris, France.
[4]Sheng-Feng Ho, Jia-Shung Wang, “Streaming Video Chaining on Unstructured Peer-to-Peer
Networks”, master thesis, 2003.
[5]Zhang X, Liu J, Li B, Yum T-SP (2005, March) CoolStreaming/DONet: a data-driven overlay
network for live media streaming, to appear In: Proc. IEEE INFOCOM’05, Miami, FL,USA.
[6]Changxi Zheng, Guobin Shen, Shipeng Li, "Distributed prefetching scheme for random seek
support in peer-to-peer streaming applications", Proceedings of the ACM workshop on Advances in
peer-to-peer multimedia streaming P2PMMS'05, November2005
[7]W.-P. Ken Yiu, Xing Jin, S.-H. Gary Chan, Distributed Storage to Support User Interactivity in
Peer-to-Peer Video Streaming, Communications, ICC '06, IEEE International Conference on, June
2006, page: 55-60.
[8]D. E. Knuth. The Art of Computer Programming,volume 3. Addison-Wesley Professional, 1998.
[9]Zegura E, Calvert K, Bhattacharjee S (1996, March) How to model an internetwork. In: Proc. IEEE
INFOCOMM, San Francisco, California, USA.
[10]T. T. Do, K. A. Hua, and M. A. Tantaoui, “P2VoD: Providing Fault Tolerant Video-on-Demand
Streaming in Peer-to-Peer Environment,” in Proceedings of IEEE International Conference on
Communications (ICC), Paris, France, jun 2004
[11]Cui Y, Li B, Nahrstedt K (2004, January) oStream: asynchronous streaming multicast. IEEE J Sel
Areas Commun 22:91–106.
[12] Liu Wei , ChunTung Chou, Cheng Wenqing, “Caching for Interactive Streaming Media”, Journal
of Computer Research and Development, 43(4):594~600,2006.
121
Parsing Student Text using Role and Reference
Grammar
Elizabeth Guest
Innovation North, Leeds Metropolitan University, Headingly Campus, Leeds
[email protected]
Abstract
Due to current trends in staff-student ratios, the assessment burden on staff will increase unless
either students are assessed less, or alternative approaches are used. Much research and effort has
been aimed at automated assessment but to date the most reliable method is to use variations of
multiple choice questions. However, it is hard and time consuming to design sets of questions that
foster deep learning. Although methods for assessing free text answers have been proposed, these
are not very reliable because they either involve pattern matching or the analysis of frequencies in
a “bag of words”.
In this paper, we present work for the first step towards automatic marking of free text answers
via meaning: parsing student work. Because not all students are good at writing grammatically
correct English, it is vital that any parsing algorithm can handle ungrammatical text. We therefore
present preliminary results of using a relatively new linguistic theory, Role and Reference
Grammar, to parse student texts and show that ungrammatical sentences can be parsed in a way
that will allow the meaning to be extracted and passed to the semantic framework.
Keywords: Role and Reference Grammar, Parsing, Templates, Chart Parsing
1 Introduction
In the current climate of increasing student numbers and decreased funding per student in many
HEIs internationally, it is necessary to find economies of scale in teaching and supporting
undergraduate students. Economies of scale are possible to a certain extent for lectures and
tutorials, but this is less possible for assessment. The main solution to this dilemma is to mark
student work automatically using variations on multiple choice questions. If designed correctly,
these kinds of tests can provide students with immediate feedback on how well they are doing and
can provide valuable formative pointers for further learning. This kind of feedback can impact
positively on student learning and retention [1] [2] [3], but it can be difficult to design if we want to
avoid encouraging inappropriate behaviour, such as random guessing of answers.
Considerable work has been undertaken in recent years to investigate and implement methods for
automatic marking of free text answers. These methods generally either involves pattern matching
[4] [5] or latent semantic analysis [6] [7], or a combination of these [8]. These methods work to a
certain extent, but because they are not based on the meaning of the text, they are quite easy to fool.
For instance latent semantic analysis can be fooled by writing down the right kinds of words in any
122
order. The problem with current approaches to pattern matching on the other hand, is that if the
student writes down a correct answer in a different way, it will be marked wrong.
In this work we describe a method for using the Role and Reference paradigm (RRG) for parsing
student texts, which do not have to be grammatically correct. RRG [9] [10] is a relatively new
linguistic theory which is related to functional grammar. It separates the most vital parts of the
sentence from the modifiers, which means that the core meaning can be extracted first and then the
modifiers fitted in at a later stage. As long as the arguments and the verbs are in the correct order
for English then the sentence can be understood. It doesn’t matter if (for example) Chinese students
forget the articles, the sentence can still be parsed and the meaning extracted.
2 Parsing
The main constituents of RRG parsing are the use of parsing templates and the notion of the
CORE. A CORE consists of a predicate (generally a verb) and (normally) a number of arguments.
It must have a predicate. Everything else is built around one or more COREs. Simple sentences
contain a single CORE; complex sentences contain several COREs. The fact that RRG focuses on
COREs, means that the semantics is relatively easy to extract from a parse tree. You just have to
look for the PRED, and ARG branches of the CORE to obtain the predicate (PRED) and the
arguments (ARG).
Examples of RRG parse trees of real student sentences are given in figure 1. Notice that in these
examples, the word “would” does not feature in the parse tree, but it is linked to the verbs
“recommend” and “provide”. This is because it is an operator. Similarly the adjectives
“representative” and “stratified” are attached to their nouns, “sample” and “sampling”. An
important feature of RRG from a parsing point of view is that parsing happens in two projections:
the constituent projection, shown in figure 1 and the operator projection, which consists of words
which modify other words (such as auxiliaries and adjectives). This is important because modifiers
are often optional and it simplifies the parsing process considerably if these can be handled
separately. Note that adverbs, which can modify larger constituents (such as COREs and
CLAUSEs) go in the constituent projection so that it is clear what they are modifying.
PERIPHERY’s feature in both of these examples. In the second example, the PERIPHERY
modifies the CLAUSE to tell the reader what is believed. This is another useful feature of RRG to
enable meaning to be extracted easily. In the first example the PERIPHERY is attached to the
CORE. In RRG theory, this should really be attached to the 2nd argument because that is what it is
modifying. However, we need to analyse the meaning in order to find out what it should attach to.
So in this implementation of RRG parsing, we have made a design decision to attach such
structures to the CORE.
RRG makes extensive use of templates. These templates consist of whole trees and are thus harder
to use in a parsing algorithm than rules. The templates can easily be reduced to rules, but only at a
loss of much important information. The first example in figure 1 consists of one large template
that gives the overall structure and some simple templates (which are equivalent to rules) so that
elements such as NP and PP can be expanded. An NP is a noun phrase and in this theory consists of
a noun, pronoun, or question word. Templates are required to parse complex noun phrases, such as
those with embedded clauses. A PP is a prepositional phrase and consists of a preposition followed
by a NP. Clearly if we reduce the large template in the example in figure 1 to the rule
CLAUSE → NP1 V2 NP ADV/PP
123
a lot of the information inherent in the structure of the template is lost. A further feature of RRG is
that the branches of the templates do not have to have a fixed order and lines are allowed to cross.
The latter is important for languages such as German and Dutch where the adverb that makes up
the periphery normally occurs within the core. This feature will be important in our application for
marking work by students for whom English is not their first language.
The above features pose challenges for parsing according to the RRG paradigm. We have
overcome these challenges by making some additions to the standard chart parsing algorithm. The
main innovations are
• a modification to enable parsing with templates
• a modification to allow variable word order.
In addition, parsing also includes elements of dependency grammar to find operators and to
determine which word they belong to. At present the most popular methods of parsing are HPSG
[11-13] and dependency grammar [14-16]. HPSG is good for fixed word order languages and
dependency grammar is good for free word order languages. The approach to parsing described
below is novel in that is allows parsing with templates, and because of the range in flexibility of
word order allowed.
SENTENCE
SENTENCE
CLAUSE
CORE
ARG
NP1
CORE-N
PERIPHERY
NUC2
ARG
PRED2
CLAUSE
PERIPHERY
CORE
CLAUSE
ADV/PP
NP
PP
V2
CORE-N
P
NP
recommend
NUC-N
in
CORE-N
ARG
NUC1
NP1
PRED1 ARG
CORE-N
NUC-N would
PRO
I
N
stratified sampling
NUC-N
NUC-N
PRO
N
I
clusters.
V12
CORE
NP1
believeCORE-N
NUC-N would
NUC2
ARG
PRED2
NP
V2
CORE-N
provide
NUC-N
DEM
this
N
a
more
representative
sample.
Figure 1: Example RRG parse trees.
2. 1 Outline of the parsing algorithm
The parsing algorithm relies on correctly tagged text, for which we use Toolbox (available from
www.sil.org/computing/toolbox). There are three parts to the parsing algorithm:
1. Strip the operators. This part removes all words that modify other words. It is based on a
correct tagging of head and modifying words. This stage uses methods from dependency
grammar and the end result is a simplified sentence.
2. Parse the simplified sentence using templates. This is done by collapsing the templates
to rules, parsing using a chart parser and then rebuilding the trees at the end using a
complex manipulation of pointers. The chart parser has been modified to handle varying
degrees of word order flexibility.
3. Draw the resulting parse tree.
Details of the extensions to the chart parser are given below.
2.2 Parsing Templates
124
Templates are parsed by collapsing all the templates to rules and then re-building the correct parse
tree once parsing is complete. This is done by including the template tree in the rule, as well as the
left and right hand sides. When rules are combined during parsing, we make sure that the right
hand side elements of the instantiated rule, as represented in the partial parse tree, point to the
leaves of the appropriate rule template tree. This is especially important when the order of the
leaves of the template may have been changed. The reference number for the rule that has been
applied is also recorded so that it can be found quickly.
Modifying nodes, such as PERIPHERY, cause problems with rebuilding the tree. This is because
such nodes can occur anywhere within the template, including at the root and leaf levels. Also, if
we are dealing with a sub-rule whose root node in the parse tree has a modifying node, it is not
possible to tell whether this is a hang-over from the previous template, or part of the new template.
To solve this problem, modifying nodes have flags to say whether they have been considered or
not. There is a potential additional problem with repeated nested rules because if processing is done
in the wrong order, the pointers to the rule template tree get messed up. To overcome this problem,
each leaf of a template is dealt with before considering sub-rules.
2.3 Parsing with fixed, free, and constrained word order
There were two main problems to solve in order to modify the chart parser to handle varying
degrees of word order flexibility:
1. Working out a notation for denoting how the word order can be modified.
2. Working out a method of parsing using this notation.
(1) was achieved by the following notation on the ordering of the leaves of the template, treating
the template as a rule:
• Fixed word order: leave as it is.
• Free word order: insert commas between each element {N,V,N} (Note that case information is
included as an operator so that the undergoer and actor can be identified once parsing is
complete.)
• An element has to appear in a fixed position: use angular brackets: {N, <V>, ADV} this means
that N and ADV can occur before or after v, but that V MUST occur in 2nd position. Note that
this is 2nd position counting constituents, not words.
• Other kinds of variation can be obtained via bracketing. So for example {(N, V) CONJ (N, V)}
means that the N’s and V’s can change order, but that the CONJ must come between each
group. If we had {(N,V),CONJ,(N,V)} Then the N’s and V’s must occur next to each other, but
each group doesn’t not have to be separated by the CONJ, which can occur at the start, in the
middle, or at the end, but which cannot break up an {N,V} group.
2.4 Modifications to the parsing algorithm.
Parsing was achieved via a structure that encoded all the possible orderings of a rule. So for
example the rule CORE→N, V, N would become
125
This means that N or V can occur in any position and N has to occur twice. The lines between the
boxes enable the “rule” to be updated as elements are found. Using this schema,
SENTENCE→(N,V) CONJ (N,V) would become
In this case, the CONJ in the middle is by itself because it has to occur in this position as the
grouping word order is fixed. The groupings of N’s and V’s show where the free word ordering can
occur.
To apply a rule, the first column of the left hand side of the rule is searched for the token. When the
token is found, any tokens that do not match are deleted along with the path that leads from them.
In the first example, after an N is found, we would be left with
And in the second example, after an N is found we would be left with
Note that in order for the rule to be satisfied, we must find a V and then a CONJ: there are no
options for position 2 once the element for position 1 has been established. In this way, we can
keep track of which elements of a rule have been found and which are still to be found. Changes in
ordering with respect to the template are catered for by making sure that all instantiated rules point
back to the appropriate leaves of the rule template, as described above.
The different possibilities for each rule are obtained via a breadth first search method that treats
tokens in brackets as blocks. Then the problem becomes one of working out the number of ways
that blocks of different sizes will fit into the number of slots in the rule.
3 Results
Preliminary results of applying these algorithms to student texts are very promising, but some
issues have been highlighted. The method parses relatively simple sentences correctly and the main
arguments and verbs are found. In addition, some very long and complicated sentences are parsed
correctly and many kinds of grammatical errors do not cause any problems.
An example of a correctly parsed sentence is “I would target main areas populated by students and
would attend the same place at different times and during the day.” The parse tree for this example
126
is given in figure 2. Note that the complex object “main areas populated by students” has been
parsed correctly and that the tree attaches the qualifying phrase to “area” so that it is clear what is
being qualified. An important source of ambiguity in English sentences is caused by prepositional
phrases and this is a main cause of multiple parses of a sentence. In this example, the phrases “at
different times” and “during the day” are placed together in the periphery of the CORE, although
arguably they should have a different structure. This is a design decision to limit the number of
parses. This kind of information needs semantic information to sort out what attaches to what. This
cannot be obtained purely from the syntax.
An example of an ungrammatical sentence that is correctly parsed is “Results from the observations
would be less bias if the sample again was not limit the students in the labs between 9:30 and 10:30
on a Thursday morning.” for which the parse tree is given in figure 3. This sentence parses
correctly because the affix that should be on “limit” is an operator and the correctness of the
operators is not checked during the parsing process. The word “bias” is labelled as a noun and gets
attached as the second argument to “would be”, although it should be “biased”, which would get it
labelled as an adjective. Despite these errors, the meaning of the sentence is clear and the parse will
enable the meaning to be deduced.
The sentence “Therefore, asking only the students present on a Thursday morning will exclude all
the students that either have no lessons or are not present” produces two parses: once correct and
one incorrect. The incorrect parse breaks up “Thursday morning” to give two clauses: (1) “Asking
only students present on a Thursday” and (2) “Morning will exclude all the students that either
have no lessons or are not present”
In the first clause, the subject is “asking only students”, the main verb is “present” and the object is
“on a Thursday morning”. This does not make sense, but it is syntactically correct as far as the
main constituents are concerned. Similarly, the second clause is also syntactically correct, although
it does not make sense. There are two ways of eliminating this parse. The first is to do a semantic
analysis; the second is to not allow two clauses juxtaposed next to each other without punctuation
such as a comma. However, students tend to not be very good at getting their punctuation correct.
The current implementation of the parsing algorithm ignores all punctuation other than full stops
for this reason. In fact, there is a tradeoff between allowing the system to parse ungrammatical
sentences and the number of parse trees produced. More flexibility in grammatical errors increases
the number of parse trees.
An issue that makes parsing problematic is that of adverbs. These tend to be allowed to occur
within several places within the core and some, such as yesterday, modify groups of words rather
than a single word. The best solution, given their relative freedom of placing and the fact that
sorting out where best to put them is more a meaning than a syntactic issue, would be to remove
them and work out where they belong once the main verb and arguments have been identified.
Most of the above issues have to be left to an analysis of meaning to sort out the correct parse.
There is no clear division between syntax and semantics. However there is another issue that has
been highlighted to do with grammar and punctuation. How tolerant of errors should the system
be? We have shown that errors in the operators do not cause problems for the parser, and errors in
the placing of adverbs are relatively easy to deal with, but errors in the main constituents are not
handled. For example the phrase “the main people you need to ask will not be in the labs so early
unless that have got work to hand in” occurs in one of the texts. The current algorithm will not
handle these kinds of mistakes. But should the system be able to handle these kinds of mistakes, or
should students be encouraged to improve their writing skills?
127
128
NUC-N
Results
main
populated by
NP
students
PP-BY
PERIPHERY
PRED2 P-BY
NUC2
N
areas
CLAUSE
CORE-MIN
NUC-N
PERIPHERY
LNK
and
CONJ
the
CORE-N
from
BE
would be
less
the
ARG
bias
N
NUC-N
sample
again was
not
limit
V2
PRED2
NUC2
CORE
the
would
students
N
NUC-N in
CORE-N P
NP
ARG
attend
PRED2
NUC2
Figure 3: An example of a correctly parsed ungrammatical sentence.
observations
N
NUC-N
NP
P
PP
NP
CORE-N
PRED
V-AUX
NP1
CORE-N
if
CLAUSE
ARG
NUC
ARG
PERIPHERY
CORE
LNK
CLAUSE
SENTENCE
Figure 2: An example of a correctly parsed sentence.
target
V2
would
NP
CORE-N
PRED2
I
ARG
NUC2
CORE
ARG
CLAUSE
SENTENCE
the
PP
the
P
place
labs
NP
NUC-N
9:30 and
CONJ
10:30
N
NUC-N
ADV/PP
PERIPHERY
NUC-N LNK
N
PP
P
P
on
a
Thursday
N
NP
the
PP
day.
N
NUC-N
CORE-N
NP
morning.
N
NUC-N
CORE-N
CORE-N
NUC-N
PP
CORE-N during
NP
ADV/PP
PERIPHERY
different times
CORE-N
PP
NUC-N at
CORE-N P
NP
ARG
CORE-N between
NP
same
CORE-MIN
4 Conclusion
We argue that this approach, though still under development, potentially has huge benefits for students
and staff in higher education and could, with further improvements, form one building block in
constructing a new paradigm for CAA. Our intention is to use this as the first stage in a system that
uses a new semantic framework, ULM (Universal Lexical Metalanguage) [17], to compare the
meaning of student texts with a (single) model answer. ULM would enable us to convert text to a
meaning representation. The aim is to build up a meaning representation from several sentences and
then compare the meaning of the student text with the model answer – even when the words used are
not the same.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
17.
Rust, C. (2002). The Impact of Assessment on Student Learning. Active Learning in Higher
Education 3(2): p. 145-158.
Sambell, K. and A. Hubbard, (2004). The Role of Formative 'Low Stakes' Assessment in
Supporting Non-Traditional Students' Retention and Progression in Higher Education:
Student Perspectives. Widening Participation adn Lifelong Learning. 6(2): p. 25-36.
Yorke, M.(2001). Formative Assessment and its Relevance to Retention. Higher Education
Research and Development. 20(2): p. 115-126.
Sukkarieh, J.Z., S.G. Pulman, and N. Raikes. (2003). Auto-marking: using computational
linguistics to score short, free text responses. in International Association of Educational
Assessment. Manchester, UK.
Sukkarieh, J.Z., S.G. Pulman, and N. Raikes.(2004). Auto-Marking 2: An Update on the
UCLES-Oxford University research into using Computational Linguistics to Score Short, Free
Text Responses. in International Association of Educational Assessment. Philadephia.
Wiemer-Hastings, P. (2001). Rules for Syntax, Vectors for Semantics. Proceedings of 22nd
Annual Conference of the Cognitive Science Society.
Landauer, T.K., et al. (1997). How well can Passage Meaning be Derived without using Word
Order? A Comparison of Latent Semantic Analysis and Humans. Proceedings of 19th Annual
Conference of the Cognitive Science Society p. 412-417.
Pérez, D. and E. Alfonsa. (2005). Adapting the Automatic Assessment of Free-Text Answers to
the Students. in 9th Computer Assisted Assessment Conference. Loughborough, UK.
Van Valin, R.D.J. and R. LaPolla, (1997). Syntax: Structure, Meaning and Function.
Cambridge: Cambridge University Press.
Van Valin, R.D.J. (2005). Exploring the Syntax-Semantics Interface. Cambridge University
Press.
Hou, L. and N. Cercone, (2001). Extracting Meaningful Semantic Information with EMATISE:
an HPSG-Based Internet Search Engine Parser. IEEE International Conference on Systems,
Man, and Cybernetics. 5: p. 2858-2866.
Kešelj, V.(2001). Modular HPSG. IEEE International Conference on Systems, Man, and
Cybernetics. 5: p. 2867-2872.
Wahlster, W. (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer.
Covington, M.A. (2003). A Free Word Order Dependency Parser in Prolog.
Chung, H. and H.-C. Rim, (2004). Unlexicalized Dependency Parser for Variable Word Order
Languages based on Local Contextual Pattern. Lecture Notes in Computer Science:
Computational Linguistics and Intelligent Text Processing (5th International Conference
CICLING). 2945: p. 112-123.
Holan, T. (2002). Dependency Analyser Configurable by Measures. Text, Speech and
Dialogue 5th International Conference TSD, 2002: p. 81-88.
Guest, E. and R. Mairal Usón, (2005). Lexical Representation Based on a Universal
Metalanguage. RAEL, Revista Española de Lingüística Aplicada. 4: p. 125-173.
129
Parallel Distributed Neural Network Message Passing System
in Java
Stephen Sheridan 1
1 Institute of Technology Blanchardstown, Blanchardstown Rd. North, Dublin 15
[email protected]
Abstract
Many attempts have been made to parallelize artificial neural networks (ANNs) using a wide variety
of parallel hardware and software methods. In this work we employ a parallelized implementation
of the Backpropagation (BP) learning algorithm to optimize neural network weight values. A cluster
of heterogeneous worksations is used as a virtual parallel machine, which allows neural network
nodes to be distributed across several processing elements (PEs). Experimental results indicate that
only small speed-ups can be achieved when dealing with relatively small network topologies and that
communication costs are a significant factor in the parallelization of the BP algorithm.
Keywords: Backpropagation, Distributed, Parallel, Workstation Cluster
1
Introduction
Many attempts have been made to take advantage of the inherent parallel characteristics of Artificial
Neural Networks (ANNs) in order to speed up network training [1, 2, 3, 4]. Most attempts can be
categorised into algorithmic or heuristic approaches. Algorithmic approaches to parallelization focus
on splitting the training algorithm into blocks of code that can execute in parallel on an appropriate
parallel architecture. Heuristic approaches tend to focus on how the ANN behaves and on its architecture.
Heuristic attempts at parallelization tend to take a trial and error approach based on knowledge of the
ANN and of the target platform. In contrast, algorithmic attempts tend to take a more theoretic approach
to the parallelization process.
The focus of this paper will be to describe a heuristic parallel mapping for the Backpropagation
Neural Network (BP). The mapping described uses a cluster of workstations as the target platform and
implements a message passing system using Java and the User Datagram Protocol (UDP). This means
that network nodes on the same layer can compute in parallel. In effect, this mapping allows the BP
network to be split into vertical slices. Each slice of the network can reside on its own workstation
(processing element), thus allowing network nodes to compute in parallel.
2
Mapping BP onto a message passing architecture
Research into the BP training algorithm has revealed three possible parallel mappings commonly referred to as training set parallelism, pipeline and node parallelism. Training set parallelism is where the
networks training data is distributed among a number of processing elements as described in the work
carried out by King and Saratchandaran[5]. In pipelining the training data can be staggered between
each layer of the network as discussed by Mathia and Clark [6]. Node parallelism allows the network to
be distributed across a number of processing elements in vertical slices. For example, a fully connected
network with the topology 4, 7, 3 (4 input, 7 hidden, 3 output) might be split into three vertical slices as
shown in figure 1.
130
Figure 1: Possible vertical distribution of a 4, 7, 3 network
Of the three approaches outlined, node parallelism represents a fine-grained approach whereas pipelining and training set parallelism represent a more coarse-grained solution. The PDNN architecture described in section 3 was built to carry out node parallelism, although only small modifications would be
necessary to implement pipelining.
3
Parallel Distributed Neural Network (PDNN) Architecture
The main goal of the PDNN architecture is to allow the processing that occurs during the training of a
backprop network to be distributed over a number of processing elements. As this architectures target
platform is a cluster of workstations, the processing elements are a group of networked heterogeneous
workstations. The architecture was developed in Java so that the only software requirement on each
processing element is a Java virtual machine and the PDNN software. Figure 2 shows an overview of the
PDNN architecture and its components.
Figure 2: PDNN: architecture overview
131
3.1
Overview of Architecture
The PDNN architecture is comprised of a number of processing elements, a HTTP web server and
a network monitor application. Each processing element executes a thread that listens on a specified
port for incoming messages. When a processing element receives a message it passes it on to its node
controller. The node controllers main responsibility is to act as a container for the network nodes on each
processing element and to carry out computations on each node as specified by the message received. The
network topology and training data are globally available from a HTTP server on the physical network.
The network topology and training data are stored as text files on the HTTP server. Each processing
element reads the network topology and training data from the HTTP server when it starts up. Therefore,
the neural network topology and the training problem can be easily changed from a central location. An
example topology file is shown in table 1 .
Entry
4
0.45
0.7
0.1
3
2
2
1
Description
Number training patterns
Learning rate
Momentum term
Error tolerance
Number of network layers
Size of input layer
Size of hidden layer
Size of output layer
Table 1: Topology file structure
In order to use the PDNN architecture a network monitor program must be run on one of the processing elements. The network monitor allows the user to specify how the neural network is to be distributed
over the set of available processing elements. The distribution of network nodes depends on the problem
to be solved and the proposed neural network architecture, so at present this must be carried out manually. However, the network monitor has been developed in such as way as to make it easy to replace
this manual process with an appropriate load-balancing scheme or a genetic algorithm so that optimal
configurations can be achieved [7]. The network monitor is also responsible for making sure that all
the processing elements are synchronised. Synchronisation is very important because all processing elements must be in step. In other words, each processing element must conduct a forward and backward
pass with the same input pattern data. When a forward and backward pass has been completed with the
current input pattern data the network monitor informs all processing elements to move on to the next
input pattern.
Message ID
1
2
3
4-5
6
7
11
19
Description
Signals PE to create nodes
Signals PE to forward pass
Signals PE to backward pass
Signals PE to start training
Signals PE to remove all nodes
Signals PE print out its nodes
Signals PE to return all weights to monitoring app
Signals PE that training has finished
Table 2: Backprop message overview
132
3.2
BackProp message protocol
In contrast to traditional software implementations of the backpropagation training algorithm that are
encoded in serial a manner using loops, the PDNN architecture encodes the training algorithm in a set of
messages that are broadcast across the physical network to a group of PE’s running the PDNN software.
A backpropagation message protocol was implemented so that each PE could interpret the messages it
receives and process them accordingly. This protocol identifies a number of important features of the
backpropagation algorithm such as the forward and backward pass as well as defining special messages
that are used to synchronise training activity. Table 2 shows an overview of the backpropagation message
protocol.
During training, each message received by a PE contains the data components for either a forward
or backward pass of the backprop algorithm. For example, the net input for any given output node will
require N messages to be broadcast, where N is the number of PE’s used. The net input for the layer
section PE(1,2) in figure 1 is given by
netP E(1, 2) =
i<N
X
M SGi ,j ·wi ,j
(1)
i=0
where N = the number of PE’s and 0 ≤ j <k P E(1, 2) k
Each message contains data components equivalent to the individual net inputs of the nodes from
which the message originated. Therefore, each M SGi ,j is equivalent to:
M SGi ,j = f
k<N
X
!
ni ,k ·wi ,k
(2)
k=0
where 0 ≤ i < N U M LAY ERS
and
0 ≤ k < size of layer i
and
1.0
where f (x) = 1.0+e
−x
4
Testing the PDNN architecture
Since the PDNN’s main goal is to adjust the network weights in parallel, it does not have a built in
mechanism for verifying that the weights produced are valid. In order to ensure that the weights produced
will actually work, the PDNN architecture signals all PE’s to return their weights back to the NetMonitor
application at the end of the training phase. The NetMonitor application then stores all weights in a file
that can be used with a serial version of the BP algorithm for verification. Since the overhead in using
neural networks is the training phase and not the recall phase, it makes sense to use the weights generated
by the PDNN architecture in a serial version of the BP algorithm. Three well known neural network data
sets, XOR[8, 9], 2D Spiral Recognition[10] and Iris[11] were used to test the weight adjustment scheme
in the PDNN architecture. Weights generated by the PDNN architecture were verified by running a
number of test training sessions for each of the data sets listed above and then using the weights generated
with a serial version of the BP algorithm. In all three cases the weights returned by the PDNN architecture
performed well when used in recall mode with a serial version of the BP algorithm.
5
Experimental test with the XOR problem
In this section we present some experimental data that was generated by running the PDNN architecture
on the XOR problem with a 2,N,1 topology, where N varied between 10 and 100. Networks with varying
133
middle layer sizes were used to determine speed-up times against a serial version of the BP algorithm
with the same network topology running on a single processing element. Each network topology was
distributed across 1, 2, 4, 6, 8 and 10 workstations in order to find the optimal distribution if any. Each
network was run a total of ten times on each workstation configuration in order to calculate the average
training time for that setup.
The physical environment for each experiment was setup using 10 Fujitsu Siemens 1GHz Intel based
Window NT workstations with 512MB RAM and 100 MBit Ethernet Network Interface Cards. The
underlying network used was a 100 MBit switched Ethernet Network.
5.1
Results
As can be seen from the graph in figure 3, the training time for a network with 10 middle layer nodes
increased almost linearly as it was distributed over more processors. This is not really surprising given
the communication overheads associated with the BP algorithm. The graph shows a larger jump up in
training times when moving from 1 PE to 2 PE’s. This is to be expected as there are no latency issues
when running all the network nodes on a single processor.
The situation for 20 middle layer nodes is not much better. Training times dramatically increase as
more and more PE’s are used. There seems to be an anomaly around 6 PE’s where the training time
peaks and then drops back down when 8 PE’s are used. This may be due to how the underlying physical
network deals with the broadcast messages from each PE. One other feature of adding more PE’s is
that the size of messages being broadcast actually decreases as each PE has less and less nodes. Large
amounts of small messages is bad news for parallelisation as there is a network latency associated with
each broadcast message.
There is a slight speed-up in training times when 50 middle layer nodes are distributed across 2 PE’s.
However, the speed-up does not continue as more PE’s are added. Once again, the anomalous situation
between 6 and 8 PE’s can be seen. The upper end of the graph for 20 middle layer nodes and this graph
are very similar. This would suggest that their is a point where the communication costs associated with
the BP algorithm peaks. This may represent a mix of conditions that lead to the worst case scenario for
the underlying network.
The final set of experimental data using 100 middle layer nodes is slightly more promising. Two
speed-ups are achieved over the training times for 1 PE. Training times drop from around 156 seconds
on 1 PE to 140 seconds on 2 PE’s and then down to 137 on 4 PE’s. It is not surprising that two decreases
in the training times are observed for a BP network with 100 middle layer nodes. With the increased
amount of nodes, each PE must carry out more work and hence there is a better balance between the time
spent processing and the time communicating.
Figure 3: Experimental data for XOR problem
134
6
Analysis of Experimental Data
This paper shows the reality of implementing the BP algorithm solving the XOR problem on a software
based message passing system such as the purpose built PDNN architecture. The experimental data
presented in section 5 confirms that the standard BP algorithm cannot take advantage of the parallel
processing power of a cluster of workstations. This is due to the fact that network traffic negates any
benefit that is to be gained by distributing the training phase over a number of workstations. Although a
workstation cluster may reduce the completion time of a system, the benefits depend on how the message
passing interfaces are designed. It is clear to see that without reducing the communications overhead of
the BP algorithm it is difficult to achieve any significant speed-up in training times. While execution
of neural networks on serial machines are linear, it would seem that when the BP algorithm is run in a
message passing environment it completion time is non-linear.
While the experimental data generated is interesting from the point of view that it is the first set
of data generated by the PDNN architecture, it could not be used as a definitive argument against BP
on a message passing architecture. This is because the XOR problem is not an ideal candidate for
experimentation in a distributed parallel environment. A problem that requires a larger input and output
layer would be better suited to experimentation in these conditions. It is most likely that any speed-up
to be gained by the PDNN system will only be observed for large networks that can drain the processing
resources of a conventional PC.
It is obvious from the experimental data produced in section 5 that the communication cost versus
the time spent processing for the BP algorithm is far too high. This communication cost must be reduced
in order to observe any benefits.
7
Conclusion and Future Work
We have shown that it is possible to implement a purpose built message passing architecture in Java that
will allow the BP algorithm to distribute is training workload over a number of networked workstations.
We have also confirmed that the weights returned by the PDNN architecture are valid by using them in
the recall phase of a serial version of the BP algorithm solving a number of well known problems.
While the experimental data produced for this paper only serves as an initial test of the PDNN
architecture, it raises some very important questions. Such as how to reduce the communication overhead
of BP and what types of neural network problems are suitable for experimentation. These questions will
form a major part of the next phase of this project, which is to refine the PDNN architecture with a view
to running further experiments in order to achieve a significant speed-up over serial implementations of
the BP algorithm.
Future work will include developing a modified version of the BP algorithm to cut communication
costs. Research of, and selection of other neural network training algorithms that may be better suited to
distribution across a number of workstations, such as, Differential Evolution, Spiking Neural Nets and
Liquid-state Machines. Further work will also need to take into consideration the exact measurement of
communication versus processing costs and will have to include metrics for network latency such as the
standard PingPong and Jacobi tests carried out by Wang and Blum [1].
References
[1] X. Wang and E. K. Blum, “Parallel execution of iterative computations on workstation clusters,”
Journal of Parallel and Distributed Computing, vol. 34, no. 0058, pp. 218–226, 1996.
[2] D. Anguita, S. Rovetta, M. Scapolla, and R. Zunino, “Neural network simulation with pvm,” 1994.
[3] A. Weitzenfeld, O. Peguero, and S. Gutiérrez, “NSL/ASL: Distributed simulation of modular neural
networks,” in MICAI, pp. 326–337, 2000.
135
[4] J. Lut, D. Goldman, M. Yang, and N. Bourbakis, “High-performance neural network training on a
computational cluster,” in Seventh International Conference on High Performance Computing and
Grid Computing (HPC Asia’04), 2004.
[5] F. King and P. Saratchandran, “Analysis of training set parallelism for backpropagation neural networks,” Int J Neural Syst, vol. 6, no. 1, pp. 61–78, 1995.
[6] K. Mathia and J. Clark, “On neural hardware and programming paradigms,” in International Joint
Conference on Neural Networks, pp. 12–17, 2002.
[7] S. W. Stepniewski and A. J. Keane, “Topology design of feedforward neural networks by genetic algorithms,” in Parallel Problem Solving from Nature – PPSN IV (H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.-P. Schwefel, eds.), (Berlin), pp. 771–780, Springer, 1996.
[8] R. Bland, “Learning XOR: exploring the space of a classic problem,” Computing Science Technical
Report CSM-148, University of Stirling, Dept of Computing Science and Mathematics, Department
of Computing Science and Mathematics University of Stirling Stirling FK9 4LA Scotland, June
1998.
[9] D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. MIT Press, 1986.
[10] S. Singh, “2D spiral recognition with possibilistic measures,” Pattern Recognition Letters, vol. vol.
19, no. no. 2, pp. 141–147, 1998.
[11] C. B. D.J. Newman, S. Hettich and C. Merz, “UCI repository of machine learning databases,” 1998.
136
Session 5a
Wired & Wireless
137
138
The Effects of Contention among stations on Video
Streaming Applications over Wireless Local Area
Networks- an experimental approach
Nicola Cranley, Tanmoy Debnath, Mark Davis
Communications Network Research Institute,
School of Electronic and Communications Engineering,
Dublin Institute of Technology,
Dublin 8, Ireland
[email protected], [email protected], [email protected]
Abstract
Multimedia streaming applications have a large impact on the resource requirements of the WLAN.
There are many variables involved in video streaming, such as the video content being streamed, how
the video is encoded and how it is sent. This makes the role of radio resource management and the
provision of QoS guarantees extremely difficult. For video streaming applications, packet loss and
packets dropped due to excessive delay are the primary factors that affect the received video quality.
In this paper, we experimentally analyse the effects of contention on the performance of video
streaming applications with a given delay constraint over IEEE 802.11 WLANs. We show that as
contention levels increase, the frame transmission delay increases significantly despite the total
offered load in the network remaining constant. We provide an analysis that demonstrates the
combined effects of contention and the playout delay constraint have on the video frame transmission
delay.
Keywords: Video Streaming, Multimedia, WLAN, Quality of Service
1. INTRODUCTION
Streaming multimedia over wireless networks is becoming an increasingly important service [1] [2].
This trend includes the deployment of WLANs that enable users to access various services including
those that distribute rich media content anywhere, anytime, and from any device. There are many
performance-related issues associated with the delivery of time-sensitive multimedia content using
current IEEE 802.11 WLAN standards. Among the most significant are low delivery rates, high error
rates, contention between stations for access to the medium, back-off mechanisms, collisions, signal
attenuation with distance, signal interference, etc. Multimedia applications, in particular, impose
onerous resource requirements on bandwidth constrained WLAN networks. Moreover, it is difficult to
provide QoS in WLAN networks as the capacity of the network also varies with the offered load.
Packet loss and packets dropped due to excessive delay are the primary factors that have a negative
effect on the received video quality. Real-time multimedia is particularly sensitive to delay, as
multimedia packets require a strict bounded end-to-end delay. Every multimedia packet must arrive at
the client before its playout time, with enough time to decode and display the contents of the packet. If
the multimedia packet does not arrive on time, the playout process will pause and the packet is
effectively lost. In a WLAN network, in addition to the propagation delay over the air interface, there
are additional sources of delay such as queuing delays in the Access Point (AP), i.e. the time required
by the AP to gain access to the medium and to successfully transmit the packet which may require a
number of retransmission attempts.
139
Multimedia applications typically impose an upper limit on the tolerable packet loss. Specifically, the
packet loss ratio is required to be kept below a threshold to achieve acceptable visual quality. For
example, a large packet loss ratio can result from network congestion causing severe degradation of
multimedia quality. Even though WLAN networks allow for packet retransmissions in the event of an
unsuccessful transmission attempt, the retransmitted packet must arrive before its playout time or
within a specified delay constraint. If the packet arrives too late for its playout time, the packet is
effectively lost. Congestion at the AP often results in queue overflow, which results in packets being
dropped from the queue. In this way, packet loss and delay can exhibit temporal dependency or
burstiness [3]. Although, error resilient encoded video and systems that include error concealment
techniques allow a certain degree of loss tolerance [4], the ability of these schemes to conceal bursty
and high loss rates is limited.
In IEEE 802.11b WLANs, the AP is usually the critical component that determines the performance of
the network as it carries all of the downlink transmissions to wireless clients and is usually where
congestion is most likely to occur. There are two primary sources of congestion in WLAN networks.
The first is where the AP becomes saturated due to a heavy downlink load which results in packets
being dropped from its transmission buffer and manifests itself as bursty losses and increased delays
[5]. In contrast, the second case is where there are a large number of wireless stations contending for
access to the medium and this results in an increased number of deferrals, retransmissions and
collisions on the WLAN medium. The impact of this manifests itself as significantly increased packet
delays and loss. For video streaming applications, this increased delay results in a greater number of
packets arriving at the player too late for playout and being effectively lost. In this paper, we
experimentally investigate this second case concerning the effects of station contention on the
performance of video streaming applications.
The remainder of this paper is structured as follows. Section 2 provides an analysis of the video clips
used during the experiments. Section 2.1 and 2.2 describe the experimental test bed and experimental
results respectively. We focus on a single video content type and show in detail how the delay and loss
rates are affected by increased station contention. We show the effects of contention on the
performance of the video streaming application for a number of different video content types. We
provide an analysis that shows how the play out delay constraint and the number of contending
stations affect the video frame transmission delay. Finally we present some conclusions and directions
for future work in section 3.
2. VIDEO CONTENT PREPARATION AND ANALYSIS
In the experiments reported here, the video content was encoded using the commercially available
X4Live MPEG-4 encoder from Dicas. This video content is approximately 10 minutes in duration and
was encoded as MPEG-4 SP with a frame rate of 25 fps, a refresh rate of one I-frame every 10 frames,
CIF resolution and a target CBR bit-rate of 1Mbps using 2-pass encoding. Although a target bit rate is
specified, it is not always possible for an encoder to achieve this rate. Five different video content
clips were used during the experiments. DH is an extract from the film ‘Die Hard’, DS is an extract
from the film ‘Don’t Say a Word’, EL is an extract from the animation film ‘The Road to Eldorado’,
FM is an extract from the film ‘Family Man’, and finally JR is an extract from the film ‘Jurassic Park’.
The video clips were prepared for streaming by creating an associated hint track using MP4Creator
from MPEG4IP. The hint track tells the server how to optimally packetise a specific amount of media
data. The hint track MTU setting means that the packet size will not exceed in the MTU size.
It is necessary to repeat the experiments for a number of different video content types since the
characteristics of the streamed video have a direct impact on its performance in the network. Each
video clip has its own unique signature of scene changes and transitions which affect the time varying
bitrate of the video stream. Animated videos are particularly challenging for encoders since they
generally consist of line art and as such have greater spatial detail.
140
Clip
DH
DS
EL
FM
JR
Mean Packet
Size (B)
889
861
909
894
903
TABLE 1 CHARACTERISTICS OF ENCODED VIDEO CLIPS
Frame Size (B)
I-Frame Size (B)
P-Frame Size (B)
Mean Bit
Rate (kbps)
Max.
Avg.
Max.
Avg.
Max.
Avg.
910
682
1199
965
1081
16762
12734
27517
17449
17299
4617
3480
6058
4903
5481
16762
12734
27517
17449
17299
7019
6386
14082
10633
8991
12783
10600
14632
15078
13279
812
713
1587
1188
1006
Peak-toMean Ratio
3.63
3.66
4.54
3.56
3.16
Table 1 summarizes the characteristics of the encoded video clips used during the experiments. The
second column shows the mean packet size of the clip as it is streamed over the network and the third
column shows the mean bit-rate of the video clip. The following columns show the maximum video
frame size and the mean video frame size in bytes as measured over all frames, over I-frames only and
P-frames only. Finally, the last column shows the peak-to-mean ratio of the video frames. It can be
seen that despite encoding the video clips with the same video encoding parameters, the video clips
have very different characteristics. Despite all the video clips being prepared with exactly same
encoding configuration, due to the content of the video clips the mean and maximum I and P frames
vary considerably in size.
2.1 EXPERIMENTAL TEST BED
Fig. 1: Experimental Setup
To demonstrate the effects of station contention on video streaming applications, the video server was
set up on the wired network and streamed the video content to a wireless client via the AP (Figure 1).
The video streaming system consists of the Darwin Streaming Server (DSS) [6] acting as the video
server and VideoLAN Client (VLC) [7] as the video client. DSS is an open-source, standards-based
streaming server that is compliant to MPEG-4 standard profiles, ISMA streaming standards and all
IETF protocols. The DSS streaming server system is a client-server architecture where both client and
server consist of the RTP/UDP/IP stack with RTCP/UDP/IP to relay feedback messages between the
client and server. The video client VLC allowed the received video stream to be recorded to a file for
subsequent video quality analysis. Both the video client and server were configured with the packet
monitoring tool WinDump [8] and the clocks of both the client and server are synchronised before
each test using NetTime [9]. However, in spite of the initial clock synchronisation, there was a
noticeable clock skew observed in the delay measurements and this was subsequently removed using
Paxson’s algorithm as described in [10]. The delay is measured here as the difference between the
time at which the packet was received at the link-layer of the client and the time it was transmitted at
the link-layer of the sender.
141
There are a number of wireless background load stations contending for access to the WLAN medium
where their traffic load directed towards a sink station on the wired network. The background uplink
traffic was generated using Distributed Internet Traffic Generator (D-ITG) [11]. The background
traffic load had an exponentially distributed inter-packet time and an exponentially distributed packet
size with a mean packet size of 1024B. To maintain a constant total background load of 6 Mbps, the
mean rate of each background station was appropriately decreased as the number of background
stations was increased.
2.2 RESULTS
Video streaming is often described as “bursty” and this can be attributed to the frame-based nature of
video. Video frames are transmitted with a particular frame rate. For example, video with a frame rate
of 25 fps will result in a frame being transmitted every 40ms. In general, video frames are large, often
exceeding the MTU of the network and results in a several packets being transmitted in a burst for
each video frame. The frequency of these bursts corresponds to the frame rate of the video [12].
In a WLAN environment, the bursty behaviour of video traffic has been shown to results in a
sawtooth-like delay characteristic [13]. Consider, a burst of packets corresponding to a video frame
arriving at the AP. The arrival rate of the burst of packets is high and typically these packets are
queued consecutively in the AP’s transmission buffer. For each packet in the queue, the AP must gain
access to the medium by deferring to a busy medium and decrementing its MAC back-off counter
between packet transmissions. This process occurs for each packet in the queue at the AP causing the
delay to vary with a sawtooth characteristic. It was found that the duration and height of the sawtooth
delay characteristic depends on the number of packets in the burst and the packet size. This is to be
expected since when there are more packets in the burst, it takes the AP longer to transmit all packets
relating to this video frame.
To describe this sawtooth characteristic we have defined the Inter-Packet Delay (IPD) as the
difference in the measured delay between consecutive packets within a burst for a video frame at the
receiver. When there are no other stations contending for access to the medium, he IPD is in the range
0.9ms to 1.6ms for 1024B sized packets. This delay range includes the DIFS and SIFS intervals, data
transmission time including the MAC Acknowledgement as well as the randomly chosen Backoff
Counter values of the 802.11 MAC mechanisms contention windows in the range 0-31. This can be
seen in Figure 2 where there is an upper plateau with 32 spikes corresponding to each of the possible
32 Backoff Counter values with a secondary lower plateau that corresponds to the proportion of
packets that were required to be retransmitted through subsequent doubling of the contention window
under the exponential binary backoff mechanism employed in the 802.11 MAC.
Delay (ms)
16
14
12
10
IPDi
8
6
FTD
4
2
0
17
27
37
Sequence Number
Fig. 2: PDF of the IPD with and without contention
142
Fig. 3: IPD and FTD Relationship
As contention levels increase, all stations must pause decrementing their Backoff Counter more often
when another station is transmitting on the medium. As the level of contention increases, it takes
longer to win a transmission opportunity and consequently the maximum achievable service rate is
reduced which increases the probability of buffer overflow. In these experiments, the nature of the
arrivals into the buffer remains constant, i.e. only the video stream is filling the AP’s transmission
buffer with packets, but by varying the number of contending stations we can affect the service rate of
the buffer and thereby its ability to manage the burstiness of the video stream. This can be seen in
Figure 2 where there is a long tail in the distribution of IPD values for the 10 station case. In this case,
10 wireless background traffic stations are transmitting packets to the wired network via the AP’s
receiver. The aggregate load from these stations is held constant as the number of background stations
is increased.
For video streaming applications, not only is the end-to-end delay important, but also the delay
incurred transmitting the entire video frame from the sender to the client. A video frame cannot be
decoded or played out at the client until all of the constituent video packets for the frame are received
correctly and on time. For this reason, in our analysis we also consider the video Frame Transmission
Delay (FTD), i.e. the end-to-end delay incurred in transmitting the entire video frame and is related to
the number of packets required to transmit the entire video frame and the queuing delay in the AP
buffer for the first video packet in the burst to reach to head of the queue. Figure 3 shows the
relationship between the IPD and FTD for two consecutive video frames. In our analysis, we also
consider the loss rate and the Playable Frame Rate (PFR). The PFR is inferred by using the statistical
techniques described in [14]. The loss rate corresponds to packets that have failed to be successfully
received as well as those packets that have been dropped as a result of exceeding the Delay Constraint
(Dc). If packets arrive too late exceeding Dc, these packets are effectively dropped by the player since
they have arrived too late to be played out.
2.2.1 THE EFFECTS OF CONTENTION ON STREAMED VIDEO
In this section, we experimentally demonstrate the effects of contention on video streamed
applications. We shall begin by focusing on a single video clip DH being streamed from the wired
network via the AP to a wireless client. This particular clip was chosen since it is representative of a
typical non-synthetic video stream. Table 2 presents the mean performance values for the video clip
DH over the test period with increased contention. It can be seen that the mean delay, loss rate, FTD
and IPD increase with increased contention. In this work we have set the Dc to 500ms which is the
delay constraint for low latency real-time interactive video.
TABLE 2 MEAN PERFORMANCE VALUES FOR DH CLIP WITH INCREASED CONTENTION (DC = 500MS)
0STA
3STA
4STA
5STA
6STA
7STA
8STA
9STA
10STA
10.43
11.50
1.24
29.62
36.62
3.73
30.97
37.96
3.75
37.91
45.39
3.97
63.63
71.76
4.34
105.75
115.61
4.82
174.91
186.05
5.27
311.71
325.01
5.66
395.27
406.83
5.95
0.00
0.01
0.01
0.03
0.08
0.15
0.23
0.34
0.41
25.00
25.00
23.00
21.83
19.04
16.91
14.02
10.51
9.92
1500
1000
500
0
1000
0.5
500
0
0
3
5
7
Num STA
9
30.00
1
Loss Rate
FTD (ms)
Delay (ms)
1500
PFR (fps)
Performance
Metric
Mean Delay (ms)
FTD (ms)
IPD (ms)
Mean Loss Rate
(Dc > 500ms)
PFR (fps)(Dc > 500ms)
3
5Num STA
7
3
9
5
7
Num STA
9
20.00
10.00
0.00
3
5
7
9
Num STA
Fig.4 Mean values for a number of video clips for a fixed total offered uplink load with increased number of contributing
stations (a) Mean Delay, (b) Mean FTD, (c) Average loss rate with a Dc of 500ms, (d) Inferred PFR with a Dc of 500ms.
143
It can be seen that when there are no background contending stations, the mean packet delay is about
10ms. As the number of contending stations increases from 3 to 7 to 10, the mean delay increases to
30ms, to 100ms to 400ms respectively. This can be explained from the growing tail of the IPD
distribution as shown in Figure 2. As the number of contending stations is increased from 3 to 7 to 10
stations with a Dc of 500ms, the mean loss rate including packets dropped due excessive delay is
increased from 1% to 15% to 41% respectively. This in turn affects the ability of the codec to decode
the video frames since there is increased likelihood that packets will not arrive within the given delay
constraint.
The experiment was repeated for the other video clips all encoded with the same encoding
configuration but having different content complexity characteristics. Figure 3 shows the mean
performance metrics for different content types with increased contention. For all content types, it can
be seen that the mean packet delay and FTD increases with increased contention as shown in Figures
4(a) and 4(b). Figure 4(c) shows the mean loss rate over the test period for each of the video clips
where it can be seen that there is a dramatic increase in the mean loss rate when the number of
contending stations exceeds 7 stations when a delay constraint of 500 ms is imposed on the system
which results in an even greater impact of the contention on performance. Figure 4(d) shows the PFR
that is statistically inferred from the packet loss and delay. Apart from the impact of contention,
Figures 4(a)-4(d) also highlight the impact of the video content where it can be seen that the animation
clip EL is the most severely affected by increased contention whilst the clip DS is the least affected.
The high complexity of the animation clip EL is due to frequent scene cuts and line art within the
scene that affects the burstiness of the encoded video sequence since much more information is
required to encode the increased scene complexity.
2.2.2 ANALYSIS
In this section we shall generalize the results presented in the previous section to account for all
content types and a given delay constraint. For video streaming applications, there is a tradeoff
between acceptable delay and tolerable packet loss. A delay constraint imposes an upper limit on this
tradeoff since the lower the delay constraint, the greater the probability of packets being dropped due
to exceeding the delay constraint.
0
500
FTD (ms)
1000
1500
2000
1.00
2500
P(FTD < Dc)
CCDF FTD
1
0.1
0.01
0.80
0.60
0.40
0.20
0
0.001
NO STA
5STA
8STA
3STA
6STA
9STA
4STA
7STA
10STA
Fig.5 Generalized distribution of the FTD with increased contention
2
4
6
8
10
12
14
#Contending STA
FTD Dc 500ms
FTD Dc 1000ms
FTD Dc 1500ms
FTD Dc 2000ms
Weibull <500ms
Weibull <1000ms
Weibull <1500ms
Weibull <2000ms
Fig.6 Fitted Weibull Distribution to CDF of FTD with Dc
TABLE 3: CDF OF FTD BELOW THE PLAYOUT DELAY CONSTRAINT, DC
Number of Contending Stations
Dc (ms)
3STA
4STA
5STA
6STA
7STA
8STA
9STA
10STA
500
1000
1500
2000
2500
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.994
0.996
1.000
1.000
1.000
0.984
0.998
1.000
1.000
1.000
0.957
0.986
0.994
1.000
1.000
0.877
0.942
0.971
0.995
1.000
0.740
0.832
0.903
0.980
1.000
0.653
0.752
0.836
0.945
1.000
144
In our analysis we focus on the FTD since all or most of the packets belonging to a video frame packet
burst must be received in order for the video frame to be decoded on the client device.
Figure 5 shows the Complementary Cumulative Distribution Function (CCDF) of the FTD averaged
over all content types with an increasing number of contending stations. For example consider a video
streaming application with a Dc of 500ms, it can be seen that with 4 contending background stations,
the FTD is always less than 500ms. However with 6, 8, and 10 background contending stations,
statistically 2%, 12% and 35% of video frames will have an FTD that exceeds a Dc of 500ms. The
statistical distribution of the FTD has been summarized in Table 3 which presents the CDF of the FTD
for different values of Dc and with an increased number of contending stations. It can be seen that
when there are 10 contending stations, with a Dc of 500ms 65% of video frames will arrive within this
upper delay bound whereas 95% of video frames will arrive within a Dc of 2000ms. Figure 6 shows a
plot of the fitted Weibull distribution to the probability of the FTD arriving within a given Dc with an
increased number of contending stations. The Weibull distribution fit had a correlation coefficient of
over 99.5% in all cases. The shape and scale parameters are related to the number of contending
stations and the delay constraint of the video. This distribution can be used to provide statistical FTD
guarantees by a resource management system to perform admission control to assess the impact of
station association on the video streaming applications. Furthermore, adaptive streaming systems can
use the statistical characterization of FTD to adaptively dimension the playout buffer on the client
device or to adapt the number of packets per video frame i.e. the bitrate of the video stream based on
current contention load conditions since by reducing the number of packets per video frame, the FTD
is reduced.
3. CONCLUSIONS
In this paper, we have experimentally investigated the effects of station contention on streaming video
over IEEE 802.11b WLAN networks. Video is a frame-based media where video frames are
transmitted from the server to the client at regular intervals that is related to the frame rate of the
video. In general, several packets are required to transmit the video frame. The video frame cannot be
decoded at the client until all the packets for the video frame have been received. In this way, loss and
delay have a serious impact on the performance of video streaming applications. Loss can occur due to
packets reaching their retransmission limit following repeated unsuccessful attempts and packets that
are dropped due to incurring excessive delays resulting in them arriving too late to be decoded.
Through experimental work, we have demonstrated that as the number of contending stations
increases, while a maintaining a constant total offered load, the video streaming application
experiences increased delays. These delays are due to the 802.11b MAC mechanism where stations
must contend for access to the medium. As the number of stations contending for access to the
medium increases, the AP must defer decrementing the Backoff Counter while another station is
transmitting on the medium. Experimental results show that the performance degrades with increased
contention despite the offered load in the network remaining the same. Furthermore we have shown
that the complexity of the video content affects the degree of performance degradation. In our analysis
we focused on the Frame Transmission Delay (FTD) which is the delay incurred transmitting the
entire video frame from the server to the client. The FTD is important for video streaming applications
since a video frame cannot be correctly decoded at the client until all of the packets relating to the
video frame have been received within a given delay constraint. The delay constraint imposes an upper
bound delay threshold for the video frames. Packets that exceed this delay constraint are effectively
lost since they have not been received at the client in time for play out. We statistically analysed the
results to determine the probability of the FTD being within a given delay constraint and have shown
that this can be modeled as a Weibull distribution. This analysis can be used as part of a WLAN access
control scheme or used in a cross-layer contention-aware video playout buffering algorithm. The QoS
capabilities of the IEEE 802.11e QoS MAC Enhancement standard [15] facilitates new management
mechanisms by allowing for traffic differentiation and prioritization. Work is ongoing [16] [17] with
802.11e standard. Further work is required to investigate the benefits for video streaming afforded by
this standard.
145
ACKNOWLEDGEMENT
The support of the Science Foundation Ireland, grant 03/IN3/1396, under the National Development
Plan is gratefully acknowledged.
REFERENCES
[1] J. Wexler, “2006 Wireless LAN State-of-the-Market Report”, Webtorials, July 31, 2006, [Online].
Available: http://www.webtorials.com/abstracts/WLAN2006.htm
[2] Insight Research Corp., “Streaming Media, IP TV, and Broadband Transport:
Telecommunications Carriers and Entertainment Services 2006-2011”, Insight Research
Corp., April 2006, [Online]. Available: http://www.insight-corp.com/reports/IPTV06.asp
[3] S. Moon, J. Kurose, P. Skelly, D. Towsley. “Correlation of packet delay and loss in the
Internet”. Technical report, University of Massachusetts, January 1998.
[4] Y. Wang, S. Wengers, J. Wen, A.K. Katsaggelos, “Error resilient video coding
techniques”, IEEE Signal Processing Mag., vol. 17, no. 4, pp. 61-82, July 2000
[5] N. Cranley, M. Davis, “The Effects of Background Traffic on the End-to-End Delay for
Video Streaming Applications over IEEE 802.11b WLAN Networks”, 17th Annual IEEE
Personal, Indoor and Mobile Communications, PIMRC Helsinki, Finland, September
2006
[6] Darwin Streaming Server, http://developer.apple.com/darwin/projects/streaming/
[7] VideoLAN Client, http://www.videolan.org/
[8] WinDump, http://windump.polito.it/
[9] NetTime, http://nettime.sourceforge.net/
[10] S. B. Moon, P. Skelly, D. Towsley, “Estimation and Removal of Clock Skew from
Network Delay Measurements”, in Proc. of IEEE InfoComm’99, March 1999
[11] Distributed Internet Traffic Generator (D-ITG),
http://www.grid.unina.it/software/ITG/download.php
[12] A. C. Begen, Y. Altunbasak, "Estimating packet arrival times in bursty video
applications," in Proc. IEEE Int. Conf. Multimedia and Expo (ICME), Amsterdam, The
Netherlands, July 2005
[13] N. Cranley, M. Davis, “Delay Analysis of Unicast Video Streaming over WLAN”, 2nd
IEEE International Conference on Wireless and Mobile Computing, Networking and
Communications, WiMob 2006, Montreal, Canada, June 2006
[14] N. Feamster, H. Balakrishnan, “Packet loss recovery for Streaming Video”, Proc. of
12th International Packet Video Workshop, April 2002
[15] IEEE STD 802.11e, September, 2005 Edition, IEEE Standards for Local and Metropolitan
Area Networks: Specific requirements Part 11: Wireless LAN Medium Access Control
(MAC) and Physical Layer (PHY) specifications Amendment 8: Medium Access
Control (MAC) Quality of Service Enhancements
[16] Nicola Cranley, Mark Davis, “Video Frame Differentiation for Streamed Multimedia
over Heavily Loaded IEEE 802.11e WLAN using TXOP”, IEEE PIMRC 2007, Athens,
Greece, September 2007
[17] Nicola Cranley, Tanmoy Debnath, Mark Davis, “An Experimental Investigation of
Parallel Multimedia Streams over IEEE 802.11e WLAN Networks using TXOP”, IEEE
ICC 2007, Glasgow, Scotland, June 2007
146
!"
#$
%&%
'()
*+&%
(
(((
$ ,-$,.#/-.#
(
#0
((
1-.
#
1 (
1 #1 # % 23 (
(( (
( % #-.-.#
$,
( #
# #
%##
(
#
(
% #
#2$
(
% (( ( $,% 2$
# % $ ((( $,% (
( (
#
%((
#0(
3(
%
4" 23567$,
8
-.9:9;:9<:((
0 * # => #
# 3(( ( (
% 3 (
% # 3 (
( #1(
%
# # ( (
(((#1%
3(
#(%5(
#1#
(
%#
( ( (
-. ( # ( ((
%
( % #
((#
(
%
#
(
/
ƒ !"#
3$,-><
.
ƒ !"#$
3$,->
.
147
ƒ !"#$%?0$,->@A
.
ƒ #$%#! 3 #
#
#
->B(.
(
((( $, #/-. #
(
#
0
( (
1 -. #
1 (
1 #1 #
%#-.-.
$, #
( ##
%
9C: (
$,
$,%2% D (0 3((#$,%2
(
567%
((E#%
;#1%
<
( % C 0( % C ( B%
@%
&
! '(
)
$
9C: (
$, ( (
# %
$ $,% ( (
# 1 % # (
(
%9B:
( #
( $, #
0E (
1 ## E # $% $,
%9@:#
$,
/-.(
#
0 -;. ( (1 -<. (
# ( ( ( $% #
$,
13 ((3$,(%
1 #
(
#
(
%9F:<
( $,% $,%20 2$ #
% # ( 1% 9G: (
(
2<=3 D$7 #1% (0E(
#
%((
2$
#
(
%9H:
( (
% $ # 2$ (
(%
)#8! ( %>(
(
(
(I((
148
%5((
1((#
((%
(
1(
% JK / 1 #
% 0(
1#(#(%,#
0
(
( %20%$ %
((
#
%
(
(
#
>
$
0 ( ((%
)#&
!
*"+!",
(
1
#$, #
1%
0
(%20%$#1
% $, ( (
%$,
(
$3(-$.$3(L-$L$.
(% E # $,% #
( /
M
-<%.
$ $L$ ( $3( -$.(%$((%5
$/ $% $ $L$ E
/
M-<%;.
% ; -<%<.
$,(
MN
-<%C.
# $/ $%# $ $L$
# (#/
M
N
-<%B.
M
N
-<%@.
5 OCOG(%
#$,/
M
N
-<%F.
! # $, $,%2" # # $,%2% ! # $, $ $,%20" # # $,%20% % !"$,!###/
M
&-<%G.
!#$,$$,%20"$,%20##!#$,%
!$ (! #$"'(#
#$$L$"$!$,#*-<%B.
-<%F.%
149
!)
!
**-<%;.-<%@.#$,$!($%
$ # ;B # 0% ($$(#@BBAA#!%#
) (% # # #1$ ("
((0O"!AA2%*$;$$
!;B%
8+!P,! !"
#($$!;A(% (3;A
$ $ # CC # 1 ! A-% #
!$!($$$% (
G3AA($#B-$$#CA%
!" 8#$.
A
P,!
CC
!!"
1
$8# .
8#/.
/8#0.
;
<
G
G<
B
B
8+P,!
08#8..
B
CA
!
1#8
( (! ! (2$ $,%
!$'((!7;9:##
= ! ># 9;: '( % ! 0(#C%$-#!$$
*$%($*$;%
$+
150
7 7 $ '( (% Q '( (
# % $"" $";" $;" $;"; % ( !$ # ( ## # (% 7 #$ *( 7$!A%B%
1#$ !
$$ !! ! $,% (2$ '( (! <@
#0%!@#$,%
!AAA-!(!9:."FBA"BAA";BA"AABA%*
! $,% " ( # 0 # (2$ $$!B-!(!9:.A%#;
# ! 2# ! # ! (2$ $,%2 % # ( $ 9F:9G:9H: !
$$ # $ # # (2$ !! $(%
3!
1.
.
8
$
5
1
8..
!"#3
!"
$1.
1..
41.
8...
<F@C
CHCH
<GAC<
<@HB
<@CB@
<FC;C
<GHB
<;;G<
;HB;<
;HAHC
<FF<
;HC
;HAC<
;GAC
;FHHG
;@G;@
;HC
;HAC<
;GAC
;FHHG
;@G@C
;HCC
;HAC<
;GAC
;FHHG
;@G@F
;H;F
;HAC@
;GAFH
;FHHG
$# 4"
6"3! !"#3
<@HB
;GH<C
;FGAH
;FGAH
;FHHG
;@G@C
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
151
!"#
!"#
!"#
!"#
!"#
=$!#;"*$<*$G$(
#!2#!!%
$
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
!"#
5 $ !! ! ( '( (!" #
$ ( $" (2$
((%#(2$#A
;;@%B 2# ! % # (2$ # ; GH%; 2# FH%; 2# ( ($ G<%B- FH%C- ! ##(2$#A%#(2$#
<"CB((0*(!$@G%H2#"@H%
2# @G 2# ( ($ FC%BF-" FC%@F- FC%;;- ! (!!(2$7A$%
# $,%2 ( #$($"!!'((!$$#%
$ # $,%2 ! BA AA H;%F@
2#HA%B2#(%$#$,%2!;BA
G@%B2#%<$#$,%2!BAA"FBA
AAA FF%H" FF%B FB%@ 2# (% $ ! $,%2 !! (! !! $ # $,%2 #
AAAHG%H-!!!!$#$,%7
BA%
*$ H # $ # !
2#!!$,%2AAA"FBA"BAA";BA"
AABA!$!(2$!AB%$#
$,%7AA(2$7A#!!%
$ # $,%2 # AA" ;BA" BAA" FBA AAA
!# $ # (! $ $! (2$
#%0(# #$,%2#BA"
!B%<%
$
"7%&!LQ#PV
"7%&!LQ#PV
"7%&!LQ#PV
"7%&!LQ#PV
"7%&!LQ#
PV
"7%&!LQ#PV
305
8# 4"
3! !"#3
152
1#5
!"
5 (2$ !! '( (! $,%2 ( ( $
$ 0( 567 * #!!!!(2$
$,% $% *$ A *$ B # ! 2#
!$,%RBA"AA";BA"BAA"FBAAA!
$!(2$!BA#;A%
!"#
!"#
!"#
!"#
!"#
!"#
3$'
3$'
!"#
!"#
!"#
!"#
!"#
!"#
&&
&&
&&
&&
&&
&&
&&
3$'
&&
&&
&&
3$'
!"#
!"#
!"#
!"#
!"#
!"#
3$'
3$'
!"#
!"#
!"#
!"#
!"#
!"#
&&
&&
&&
&&
&&
&&
&&
&&
&&
&&
3$'
3$'
!"#
!"#
!"#
!"#
!"#
!"#
3$'
3$'
!"#
!"#
!"#
!"#
!"#
!"#
&&
&&
&&
&&
&&
&&
&&
&&
&&
&&
3$'
3$'
% ! *$ A *$ B $ $$ (! ! (2$
! <" C B $ ! $,% % !
$$ (2$ " ( A (0% *$ < #
(2$$!!#!##$,%BA% (B3@A$$#$#(2$<"
C B # !!% !$ A !! ##$#(2$";<C"B@((%
0(##!(
!$,%(2$!!#%#<#
!#!(%
153
3!
.
8
$
5
1
1.
%AG
%;B
%FA
HG%@C
SAA
SAA
8..
<@%AH
FA%HF
SAA
SAA
SAA
SAA
!"#3
!"
$1.
@A%G;
@H%@
SAA
SAA
SAA
SAA
1..
@C%FG
GA%;C
SAA
SAA
SAA
SAA
5#%
6"3! !"#3
41.
@<%G
G;%A
SAA
SAA
SAA
SAA
8...
@C%AF
G;%B@
SAA
SAA
SAA
SAA
5 $,% BA (2$ A # ! %AG
%#(11!!
;A#A%#!(
# 567 (1 ( # 0 # $,%2 % #(2$;($#%;B
%F(%5##
(% !!#($(!%*$A
;3CA$$#$$
!!%*$<"#1((0@A
! $$ # $ $ ( # !
2# % ! @A $ $ $ (! ! $$ $$ $ $" $$
#$#((!##%
/
* ( '( ! ((( $, !!##/-.$ $#$(!##$
0 $ ( #! $ (1 -#. $ # 1 (1 # !$ #1 #
% !!##-.-#.'(
!!!$,#%
$ ( (2$ !! '(
(!%$$(2$!A";
# ! 0 $(% # (
#9F:9G:9H:%##"#$$(2$
( # $,%2 ( ( # %
>($*!#1"(% 567 ( ( # $! $ (! ! $ 0 #$ $ ( !
! (1 % $ # $,%2 # ((($"!((#1
!$((($,#3%
9% #1#$((!!!!(!%$$
6#-(%6.='$>#>#'=(($
%
154
!
9:$%#":%T%%'("$*';H@A
9;: $% #" :% T %% ' (" > $!$"((02;AA@
9<:2%$$"2%0"2#'("((!2;AA@
9C:"2"(0"LJ,%$%33%7#1(((K"
($!!(("$""
(!(-HHH.
9B: 2" " 2E" ;" J,(E$ '( $ K" C
'!7#1$"$ "*;AAB
9@: %1" <%R 6#$" $" J (13<((/ 7# %33% $
!$#=(K 7*,',2;AAC%LC" "F3
2;AAC($-./;BA;3;B<%C
9F:'"%6%"=%R"(%>%R#"$%$%J%33!!
($K2 6',2;AAC% %%%L"($-./HH3ABL%
9G:D"2"Q"";AAB"J (!!!6'(*
D$7K" '!7#1$
9H:'"""("#"$"J$1$%33%*#(6
2$K"!;AA@
155
Performance Evaluation of Meraki Wireless
Mesh Networks
Xiaoguang Li1,2, Robert Stewart2, Sean Murphy3, Enda Fallon1, Austin Hanley1
and Sumit Roy4
1
Applied Software Research Center, Athlone Institute of Technology
2
Athlone Institute of Technology
3
University College Dublin
4
University of Washington
[email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]
Abstract
Multi-hop networks using 802.11s are currently being standardized to improve the range and
throughput of existing Wi-Fi networks. However, the performance of multi-hop networks in the
indoor environment has to be investigated further to improve reliability and effectiveness. In this
work, several tests have been carried out to investigate the throughput of multi-hop network
equipment available from Meraki Corp. These tests are all in the indoor environment where cochannel interference can cause problems [1]. The application of this technology is instrumental in
improving productivity for companies such as T5 Process Solutions for remote control and
monitoring of the production equipment in large semiconductor fabs [2]. The results show the
impact on throughput performance of co-channel interference from other access points in the vicinity.
This is currently a major problem for users of 802.11b technology in areas with densely populated
access points, forcing some users to migrate to the less congested 802.11a technology.
Keywords: Wireless Lan, Multi-hop, Throughput, Mesh, Meraki, Roofnet.
1. Introduction
An unprecedented popularity and growth in Wi-Fi networks in recent years has seen the technology
being rolled out to Enterprises, public hotspots and for domestic use in the home, etc. The range of
WLAN standards from the IEEE 802.11 working groups has also increased with 802.11b, 802.11a and
802.11g being the most common. The Meraki repeaters used in this experiment operate on the 802.11g
standard. One of the main applications of the Meraki repeaters is to offer cheap simple solutions for
connection between devices (mobile or fixed) and the Internet. The Meraki devices also provide
increased range as the devices provide a multihop function.
The IEEE 802.11 standard implements the bottom two layers of the OSI model: the Data Link layer
and the Physical Layer. In the data link layer the Medium Access Layer (MAC) manages access to the
network medium (e.g. CSMA CA) at the physical layer the communication method is defined (most
common in 802.11 is Direct Sequence Spread Spectrum (DSSS) and Orthogonal Frequency Division
Multiplexing (OFDM)) The IEEE 802.11b equipment operates in the 2.4GHz frequency band at a
speed of 11Mbps and is a commonly used older standard. The IEEE 802.11a standard operates in the 5
GHz band at a maximum speed of 54 Mbps. However, because of the high frequency it uses, the
coverage of 802.11a is smaller, also the attenuation due to obstructions maybe increased. The 802.11g
which works in the frequency of 2.4GHz and is compatible with 802.11b has the advantage of the
156
speed of 54 Mbps. The IEEE 802.11n has recently been standardized. The advantage of 802.11n is its
high speed with a maximum throughput of 108 Mbps by using multiple 802.11g channels.
The IEEE 802.11s standard which has received a lot of attention lately will use mesh networking
techniques to extend the range of wireless LANs securely and reliably. The IEEE 802.11s standard
will be expected to provide an interoperable and secure wireless distribution system between IEEE
802.11 mesh points. This will extend mobility to access points in IEEE wireless local area networks
(WLANs), enabling new service classes to be offered to users as shown in Fig 1.
Fig 1 Infrastructure of wireless mesh network [3]
One current challenge is to wirelessly connect the AP’s to form an Extended Service Set (ESS) as
shown in Fig 2. Connectivity is provided by the Basic Service Set (BSS) consisting of stationary
Access points as shown in Fig 2. The mobile stations (STA) with a stationary AP, transmits data
packets between the wired and the wireless network.
Fig 2 the Architecture of 802.11 [4]
157
Recent developments in worldwide standardization bodies show the industry’s will to put products
enabling mesh based Wireless Local Area Network (WLAN) on the market. The new technology
allows for a transparent extension of the network coverage, without the need of costly and inflexible
wires to connect the Access Points.
The main part of our research is the performance study of 802.11g networks using wireless technology
from MerakiTM and simulation. Meraki has commercially developed technology first researched in the
Roofnet project [5] [6] and created management software called dashboard to allow the mesh network
to be setup [7]. Each device periodically broadcasts and all the other devices in range will report their
routes. Roofnet’s design assumes that a small fraction of users will voluntarily share their wired or
wireless Internet Access. Each gateway acts as a Network Address Translation (NAT) for connection
from Meraki to the Internet.
The organization of the paper is as follows; in Section 2, we introduce some related work in mesh
networking; in Section 3, presenting the experiment and the results of tests. This part has two
experiments, which explain different aspects of Meraki mesh networking; Section 4 includes the
conclusion and future work.
2
Related Work
In [5], it evaluates the ability of wireless mesh architecture to provide high performance Internet
access while demanding little deployment planning or operational management. One of its conclusions
is throughput decreases with number of hops.
In [8], it presented BFS-CA, a dynamic, interference aware channel assignment algorithm and
corresponding protocol for multi-radio wireless mesh networks. BFS-CA improves the performance of
wireless mesh networks by minimizing interference between routers in the mesh network and between
the mesh network and co-located wireless networks.
However, in contrast, this paper will present results from research on channel interference in Meraki
mesh networks. The tests are performed in a small area in a normal office environment where
interference is high, the throughput in a WLAN is also investigated.
3
Throughput Tests
The throughput of the multi-hop network was investigated in the tests using tools AirPcap to monitor
the traffic. As we mentioned above, the main application in wireless mesh network is to extend the
coverage and the capacity. The indoor tests where performed to thoroughly investigate the co-channel
interference [8] and the routes taken by packets.
3.1
Tools for monitoring Traffic in Wi-Fi
To analyze the traffic for a specific wireless AP or station, the identity of the target device, the channel
and frequency must be obtained. The wireless card is configured to use the same channel before
initiating the packet capture. Wireless cards can only operate on a single frequency at any given time.
To capture traffic from multiple channels simultaneously, an additional wireless card for every
channel to be monitored is required.
There are several network analyser tools for Wi-Fi:
(1) Wireshark (http://www.wireshark.org/)
Wireshark has sophisticated wireless protocol analysis support to troubleshoot wireless networks.
With the appropriate driver support, Wireshark can capture traffic “from the air” and decode it into a
158
format to track down issues that are causing poor performance, intermittent connectivity, and other
common problems. The software is provided free.
(2) NetStumbler (http://www.stumbler.net/)
NetStumbler (also known as Network Stumbler) is a tool for Windows that facilitates detection of
Wireless LANs using the 802.11b, 802.11a and 802.11g WLAN standards. It is commonly used for
verifying network configurations, finding locations with poor coverage in a WLAN, detecting causes
of wireless interference and unauthorized ("rogue") access points.
(3) Commview for Wi-Fi (http://www.tamos.com/products/commview/)
CommView for Wi-Fi allows you to see the list of network connections and vital IP statistics and
examine individual packets. Packets can be decrypted utilizing user-defined WEP or WPA-PSK keys
and are decoded down to the lowest layer, with full analysis of the most widespread protocols. Full
access to raw data is also provided. Captured packets can be saved to log files for future analysis. A
flexible system of filters makes it possible to drop unnecessary packets or capture the essential packets.
Configurable alarms can notify the user about important events such as suspicious packets, high
bandwidth utilization, or unknown addresses. However a license is required.
(4) AirMagnet (http://www.airmagnet.com/)
AirMagnet's Laptop Analyzer is the industry's most popular mobile field tool for troubleshooting
enterprise Wi-Fi networks. Laptop Analyzer helps IT staff make sense of end-user complaints to
quickly resolve performance problems, while automatically detecting security threats and other
network vulnerabilities. Although compact, Laptop Analyzer has many of the feature-rich qualities of
a dedicated, policy-driven wireless LAN monitoring system. However the cost is prohibitive.
Considering all of the above network analysers, Wireshark with Airpcap was chosen to
monitor the traffic in Wi-Fi networks. Airpcap has been fully integrated with WinPcap and
Wireshark:, and it enables the capture and analysis of 802.11b/g wireless traffic.
3.2
Test1: Co-channel Interference
Co-channel interference or CCI is crosstalk from two different radio transmitters reusing the same
frequency channel. There can be several causes of CCI. Overly crowded radio spectrum is one of the
main reasons. Stations will be densely-packed in, sometimes to the point that one can hear two, three,
or more stations on the same frequency. Co-channel interference, decreases the ratio of carrier to
interference powers (C/I) at the periphery of cells, causing diminished system capacity, more frequent
handoffs, and dropped calls [9]. Fig 3 shows the layout of channel assignment scheme for a typical
campus to reduce the interference problems.
Fig 3 Typical Channel Assignment for a Campus
159
Fig 4 shows the beacon frames captured from the Wireshark network analyzer of the network shown
in Fig 5.
Fig 4 Beacon Frame Message Captured from Wireshark
Fig 5 shows the environment of the file transfer with the detailed information.
Fig 5 Environment of File Transfer
160
Beacon Broadcast of Idle repeater
Fig 6 Packet capture using Wireshark
Fig 6 shows the detailed information obtained when using the network analyzer during the transfer of
files with meraki repeaters. It is possible from the screenshot shown in Fig 6 to differentiate between
repeaters transferring data (Refer to MerakiNe_01:14:aa and MerakiNe_01:14:3a) and repeaters in the
vicinity (Refer to MerakiNe_01:1c:fc and MerakiNe_01:1c:fa) just transferring management frames.
To avoid the channel interference from other Wi-Fi networks, it was arranged that the repeaters would
work in channel 6 as shown in Fig 3. Other access points in the area operate on channel 1 and channel
11. Because an 802.11 WLAN is a shared medium, the impact of co-channel interference is increased
by client collisions as the clients hear signals from the many APs and clients surrounding them [10].
Results are shown in Fig 8.
In our test, we made all the repeaters to work in the same channel. To increase the hops, we add one
Meraki repeater at a time at some known distance. Fig 7 shows the environment of the test.
Fig 7 Throughput Test for Co-channel Interference
161
3.3
3.1
Throughput (Mbps)
2.9
3m
4m
5m
9m
11m
12m
2.7
2.5
2.3
2.1
1.9
1.7
0
1
2
3
4
Number of Repeaters
5
6
Fig 8 Test Result for Co-channel Interference
We have used monitoring tools to trace the data packets to ensure the path taken by packets from the
source to destination is correct. However, in the indoor environment, from our tests it was found that
repeaters invariably attempt to access the root node unless otherwise directed. As the distance between
root node and last node decreases, the throughput decreases also. Throughput is at a minimum 1.94
Mbps when the devices are places side-by-side.
3.3
Test2: Bandwidth Sharing
Fig 9 Performance Analysis of Multi-hop Network
Node 2 is at the fixed distance of 1m from the root node, and the distance of node 1 varied every time.
Simply, we test the throughput of the root node. It can achieve the maximum of 7.143Mbps, and the
average of 5.611Mbps. Then, we do the test with node1. Fig 10 shows the throughput as the distance
between the root node and node1 increased, the throughput will increase also. As we can see with
single node test in Fig 8, the throughput is less than the half of the root node. Considering this case, we
presume that as two clients connected to the node simultaneously, the packet may pass the root node.
To prove this point, we designed the following tests and monitored the packets passing through.
162
We added another node and two clients. Then we made every two clients connect to node1, and node2
respectively. And we did the throughput test simultaneously. Fig 9 shows the test environment.
2.5
Thoughput (Mbps)
2
Single node test
1.5
Throughput of node1
Throughput of node2
1
Sum of node1 and
node2
0.5
0
0
2
4
6
Distance
8
10
12
(m)
Fig 10 Test Result of Throughput
From Fig 10, we can see that the sum of the sub-node throughput is almost the value of single node
test. In other words, the both two nodes share the bandwidth. We have used the capture tool to monitor
the packet passing through. We noticed that all the data packet have passed the root node. If the client1
want to transfer data packet to client2, the procedure will operate as below.
Fig 11 Data Packet Procedure
According to the procedure, we can see that in some point of view, the Meraki repeaters try to
implement the extension of internet use. However, if transmitting the data in local network, the root
node will become the bottleneck in this situation.
4
Conclusion and Future Work
In this paper, we have done several throughput tests of WLAN mesh networks. Co-channel
interference problems occur since the meraki repeaters work in the same channel which is
163
automatically arranged by the root node. All the repeaters have to share the bandwidth when the
packets are transferred in the WLAN (However, currently this is a limitation of meraki product if you
are accessing one host on the Meraki 10.x.x.x to another host on the Meraki 10.x.x.x. from August,
2007 [7]). These experiments show that the performance of Wireless Mesh Networks is influenced by
the number of nodes in the vicinity, distance between nodes and the number of users. As explained
throughout this article, there still remain many research problems to be investigated when looking at
performance of wireless mesh networks.
Acknowledgment
This work is supported through an Innovation Partnership sponsored by Enterprise Ireland and T5
Process Solutions.
References
[1] J. Robinson and E. Knightly, "A Performance Study of Deployment Factors in Wireless Mesh
Networks," in Proceedings of IEEE INFOCOM 2007, Anchorage, AK, May 2007.
[2] www.t5ps.com
[3] Ian.F. Akyildiz and Xudong Wang,"A Survey on Wireless Mesh Networks," IEE Communications
Magazine, vol. 43, no. 9, s23-s30, Sept. 2005
[4] IEEE 802.11, 1999 Edition
[5] John Bicket, Daniel Aguayo, Sanjit Bisvas, Robert Morris, "Architecture and Evaluation of an
Unplaned 802.11b Mesh Network., " in proceedings of the 11th annual international conference on
Mobile computing and networking.
[6] MIT roofnet. http://www.pdos.lcs.mit.edu/roofnet/.
[7] www.meraki.net/docs
[8] K.Ramachandran, E.Belding, K.Almeroth, M.Buddhikot,"Interference-Aware Channel Assignment
in Multi-Radio Wireless Mesh Networks," in Proceedings of IEEE INFOCOM, 2006
[9] Co-Channel Interference White Paper
[10] “Revolutionizing Wireless LAN Deployment Economics with the Meru Networks Radio Switch,”
www.nowire.se
[11] “The Impact of IEEE 802.11 MAC Strategies on Multi-Hop Wireless Mesh Network”, In Proc.
IEEE WiMesh 2006, Reston, Virginia, USA, Sept. 25, 2006
164
Session 5b
Wired & Wireless
165
166
Embedded Networked Sensing – EmNetS
Panneer Muthukumaran1, Rostislav Spinar1, Ken Murray1, Dirk Pesch1, Zheng Liu2,
Weiping Song2, Duong N. B. Ta2, Cormac J. Sreenan2
1
Centre for Adaptive Wireless System,
Cork Institute of Technology, Ireland
{panneer.muthukumran, rostislav.spinar, ken.murray, dirk.pesch}@cit.ie
2
Mobile and Internet Systems Laboratory,
University College Cork, Ireland
{zl3, wps2, taduong, cjs}@cs.ucc.ie
Abstract
This paper presents the work under investigation within the Embedded Networked Sensing
(EmNetS) project funded by Enterprise Ireland under the WISen industry/academia consortium.
The project addresses four main research areas within the embedded networked sensing space,
namely, protocol stack development, middleware development, sensor network management and
live test bed implementation. This paper will provide an overview of the research questions being
addressed within EmNetS, the motivation for the work and the proposed solutions under
development.
Keywords: Wireless Sensor Networking, Protocol Stacks, Testbed, Middleware, Network
Management.
1
Introduction
The WISen Industry/Academia Consortium has identified wireless sensor networks as a medium term
target for Irish Industry and a range of application domains focusing on Utilities and Resource
Management in the first instance. Market forecasts indicate that the global wireless sensor network
market could be worth $8.2Bn by 2010. The market is currently US-led but there is growing demand
in Europe in particular for applications in the utilities, health-care, and environmental monitoring
domains. The EmNetS project aims to advance research in Ireland in the areas of wireless sensor
networking software and live test bed implementation [1]. The project addresses the need to network a
range of low power heterogeneous sensor devices to be used in the utilities and responsive building
environments. Firstly the project aims at developing a network protocol stack for individual sensor
nodes and for cluster controller/base station type nodes that need to inter-work with other networking
technologies such as local area networks and mobile networks, e.g. GSM. The protocol stack will be
based on the industrial adopted IEEE 802.15.4/Zigbee stack for low power sensing [2]. It is the aim of
the protocol stack development to overcome some of the limitations of IEEE 802.15.4/Zigbee stack
such as scalability, energy efficiency in large scale mesh networks, dynamic address assignment and
energy efficient routing. It is envisaged within the environmental monitoring and responsive building
environments the number of sensing devices can run to the order of hundreds to thousands. The
current sensor networking standards are unable to support such large scale energy efficient
deployments. A further part of the software infrastructure consists of a simple middleware layer that
provides application programmers interfaces to allow rapid development of applications. The
development of such a software platform will ease product development of sensor network
applications. Sensor networks that provide a mission critical role require remote management facilities
in order to monitor the correct operation of the network, query the status of individual nodes, and
provide means to upload software updates. A remote management system is current being developed
that will integrate with the protocol stack within a live system deployment. This test bed deployment
167
will provide for test and validation of networking protocols, as well as scalability and internetworking
trials within the EmNetS team and the sensor network research community at a National level. This
paper provides a technical overview of the current state of research and development activity within
the EmNetS team. The challenges and proposed solutions under investigation will be presented in each
of the aforementioned areas of research.
2
Protocol Stack Development
Wireless sensing within the responsive building environment has been highlighted as the target
application domain for the EmNetS project. In such sensing environments the area of deployment can
be relatively large, for example the control of HVAC systems in multi-story buildings based on user
location/density. To facilitate such large scale deployments, the sensing devices must cooperate to
efficiently route data from source to a destination data sink which can be many hundreds of meters
apart and contain multiple intermediate nodes. Mesh networking topologies can provide high
redundancy for failed data links, provide scalable network topologies and provide the dynamic
selection of alternative routes for high priority traffic. Within a mesh topology, data can be routed to
fulfil requirements of energy efficiency, throughput, and quality of service (QoS). The deployment of
energy efficient mesh wireless sensor networks is therefore desirable in the provisioning of services
over large sensor fields. To enable energy efficient data transmission over sensor networks requires
the use of energy efficient protocol stacks. Energy efficient algorithms should be present at each layer
in the stack, in particular the MAC and NWK layers and cross layer interaction used to optimise
performance. Techniques in the literature employ the transmission of beacon packets between
transmitter and receiver to facilitate low duty cycle, energy efficient channel access in which devices
transmissions are coordinated. With this strategy, devices can sleep between the coordinated
transmissions, which results in energy efficiency and prolonged network lifetimes. Beacon scheduling
is an important mechanism in multi-hop mesh networks to enable multiple beacon enabled devices
function whilst avoiding beacon and data transmission collisions.
2.1
Distributed Beacon Synchronisation
The IEEE 802.15.4 MAC standard for low duty cycle, low data rate devices is the most significant
commercially adopted MAC protocol to date [2]. The standard however does not specify techniques
by which the synchronisation of beacon packets is to be achieved to enable low duty cycle
functionality. Furthermore, the standard specifies that to enable mesh topologies, the router devices
within the network need to be line-powered and engage in idle listening. Recent proposals exist in the
literature for low duty cycle MAC protocols are based on the channel polling, low power listening
technique [3, 4]. These schemes however suffer from long and variable preambles at the transmitter
side and are best suited for bit streaming transceiver chipsets. The latest trend is toward packetized
radios in which the preambles are a fixed length such as the TI CC2420. A collision-free beacon
scheduling algorithm for IEEE 802.15.4/Zigbee Cluster-Tree Networks is presented in [5]. The
approach called Superframe Duration Scheduling (SDS) builds upon the requirement for beacon
scheduling outlined in the Zigbee specification for Cluster-Tree multi-hop topologies. The SDS
algorithm functions within the coordinator. Although centralised control reduces the computational
overhead and information flow between distributed devices, it can result in excessively data flow of
control traffic toward the coordinator, which results in devices close to the coordinator being
excessively overloaded in relaying this data. A distributed approach may be more attractive. The IEEE
802.15.5 Task Group 5 is discussing the proposal for beaconing scheduling for mesh topologies [6].
The proposal involves making fundamental changes to the superframe structure on the MAC to
provide a beacon-only timeslot in which beacons of neighbouring devices will be transmitted. This
proposal however involves changing the MAC superframe structure, which affect the interoperability
with the current MAC standard. In order to overcome the limitations outlined above, the EmNetS team
propose a distributed beaconing scheduling strategy, in which the coordinator device does not
participate in the beacon scheduling process. The proposed algorithm is depicted in Figure 1. Local
decisions are made at each mesh router device based on information received during the beacon scan.
In this way information is not required to be sent to the coordinator each time a new device requests a
168
beacon schedule time, hence reducing the control traffic overhead toward the coordinator. When a
node initially starts, it associates with a beacon enabled device, based on for example, the strongest
signal strength (network layer function). If the node is required to transmit beacons it must build a list
of its neighbours and neighbours’ neighbours. It does this by listening for its neighbour’s beacons
(obtain neighbour list) and records the beacon transmit time of each in a Beacon Schedule Table
(BST). The device will then request the neighbours’ neighbour list in the CAP of each neighbour [2].
The list will contain the beacon offset time of the two-hop neighbours relative to the one-hop
neighbour sending the list. This data is also added to the BST. Upon completion of this step the node
will have a complete list of its neighbours and neighbours’ neighbours in absolute values (one-hop
neighbours) and offset values (two-hop neighbours). The device can now determine its own schedule
period. The new device shall remain scanning for beacons to calculate the transmission offset values
between its scheduled beacon time and that of the received beacons and notify each neighbour of this
offset. This may also facilitate the reception of any unheard beacons in the first beacon scan. The new
device will at this stage have bi-directional non-interfering connectivity between it and all one-hop
neighbours in the PAN.
Association
Procedure
Send offset
values to
neighbours and
place in own
neighbour table
Required to tx BCN = True
Scan Neighbour
BCN & request
neighbour list
> Time to receive all
BCN and neighbours list
via indirect transmission
Scan for all
BCN and
calculate tx
offset of
neighbours
Determine
schedulability
and beacon
transmit time
Figure 1
2.2
Schedule nth
BCN tx
Sleep
Wake at nth
BCN schedule
time
Wake at dest
node scheduled
BCN time if
sending data
Distributed Beacon Scheduling for IEEE 802.15.4
Two level Zone based Routing
The communication scenarios in wireless sensor networks can be classified into few-to-many data
dissemination, many-to-one tree based routing, and any-to-any routing topologies [7]. Many of the
routing algorithms in wireless sensor networks are based on network-wide dissemination and the
collection of data from the interested nodes. Tree based routing is used for forwarding data to a
common destination at the tree root. These scenarios make the sink node or root a fixed node and there
is no support for communication between any two independent devices. To implement dynamic
backbone networking inside a wireless sensor network, any-to-any routing would be useful. However
limited memory in wireless sensor networks makes it impossible to maintain routes to every node in
the network. The Zigbee standard uses a variant of the AODV routing algorithm to route packets in
which each node has to maintain a routing table with the entries of destination and next hop in that
route. This method is not suitable for larger networks. For example when a node tries to send to
multiple destination nodes, it requires multiple route entries along the same path. To combat the
limited resources of sensing devices in terms memory and computational complexity, we have defined
a framework for network routing in wireless sensor networks similar to the Zone Routing Protocol
ZRP [8]. In this strategy we divide the network into clusters or zones. A hybrid of proactive and
reactive routing is used for the intra-cluster level and a reactive approach is used at the inter-cluster
level. Each node in a cluster always maintains routes to the destination cluster, rather than to maintain
the entire route for a destination node. This framework works on the basis of “Think Global and Act
Local”. Each node maintains two types of routing tables to perform routing at the two levels. To
perform inter-cluster routing, nodes on the current cluster do not care about the destination node,
instead they try to route the packet to the next cluster which is en-route to the destination cluster. If the
destination node belongs to the current cluster, it routes using the intra-cluster algorithm based on
169
AODV. The concept is illustrated in Figure 2. The nodes on the edge of the cluster that have
neighbours in other clusters are called Gateway nodes. These gateway nodes forward the data packet
to their neighbour clusters. Gateway nodes broadcast (proactively) their next cluster information to all
nodes inside the cluster. This broadcast enables the nodes within a cluster to build routes to
appropriate gateway nodes depending on the destination cluster. When a data packet is transmitted, it
needs to be supplied with destination cluster address and node address. Nodes send the Inter-Cluster
routing request to all gateway nodes, unless it knows which gateway has a route to the destination
cluster. As the network evolves it is possible that nodes will learn which gateways have paths to the
most recent destined clusters. They forward the route request packet to neighbouring clusters via
neighbour gateway nodes. When this packet reaches a gateway in the destination cluster or a gateway
that has knowledge of the remaining path, a route reply packet is sent back to the source node. The
route reply packet reaches the source node with the next cluster or entire cluster list (optionally). Each
gateway nodes along the path caches the cluster level route. This completes the inter-cluster routing
procedure. The data packet can now be sent to the destination cluster. Inside the cluster, it finds the
destination node using intra-cluster reactive routing. The simplified version of AODV may be used as
a basis of the reactive routing algorithm.
3
Middleware Development
Emnets is developing a simple middleware platform to provide application developers interfaces to
facilitate rapid development of applications in the utilities space; this platform will ease product
development of wireless sensor network. The goals of the middleware are: (1) to develop services
oriented towards rapid applications development; (2) to develop a composition tool for middleware
synthesis; (3) to develop a resource-aware deployment tool for middleware mapping.
C=C6
C=C2
C=C1
C=C7
Pan
C=C0
Clusters
C=C3
C=C8
C=C1
3
C=C1
2
C=C5
Connection between Gateway nodes
Route between Clusters C12 – C7
C=C4
C=C1
1
Figure 2
C=C9
C=C1
0
Concept of cluster based routing in wireless sensor networks
The middleware will employ a VM-based approach. There are two main advantages of employing a
VM-based approach in the middleware system. Firstly, middleware services do not need to be
rewritten for different platforms as they run transparently over the varied platforms. Secondly, a
virtual machine provides a well-designed instruction set. This enables rapid prototyping of highly
compact application binaries which may result in low energy overheads when they are distributed in
the network [9]. As shown in Figure 3, the middleware system will be decomposed into two layers,
virtual machine layer and services layer. Virtual machine layer resides on top of operating system and
network stack, it utilises a virtual machine to provide APIs to abstract different hardware and/or
system platforms. Services layer resides on top of virtual machine layer and consists of middleware
services. All services provided by the middleware will be implemented in this layer. The middleware
services can be tailored for each sensor node, which is based upon the facts that (1) services can be
realised in different ways based upon hardware components, hardware resources, user requirements,
optimisation criteria, etc. (2) services are required differently by applications running on top of it, the
services running on any specific sensor node do not need to reflect the full requirement specification.
The middleware services may include, just to name a few, service discovery, aggregation, localisation,
synchronisation, adaptation, update, security. New services can be added into the system if they use
170
certain interfaces. There are a number of algorithms and/or models for different services in literature,
thus, instead of developing new algorithms and new models for the services, the proposed middleware
services will be realised using the ones in existence. Similarly, the virtual machine used in the
middleware system will be based upon one of the existing virtual machines for wireless sensor
network (e.g., Mate, Agilla, SensorWare, etc.), and further modification will be applied if necessary.
The middleware system will be built by defining a set of components, dependencies, and the specific
components which can be selected under a certain condition. A key design goal is to develop a tool
capable of selecting components, capturing relationships among components and composing specific
components. For the purposes of selection and composition, each component will have enough
information as its attributes; such information may include functional properties, non-functional
properties, required and/or provided guarantees upon qualities, etc. In order to compose a middleware
system, the following information will be needed by the composition tool: platform description,
middleware services, constraints, and quality criteria. The platform description specifies hardware
components and their resources; middleware services specify the services which run on top of a
particular device; constraints specify the resource constraints of each component; and quality criteria
specify user’s non-functional requirements which may include reliability, usability, performance to
name a few. After reading the information, the composition tool builds a dependency diagram by
satisfying the dependencies of a start component which can be any one of the components selected by
the user. During this process, based upon resource constraints and quality criteria, the composer
selects the most suitable components from a set of different possible components, and also produces
additional components required and necessary glue code to hook all components together [9, 10].
Finally, the system will also provide a mechanism to map the middleware images into the entire
network. Figure 4 depicts the processes of composition and deployment.
Application Layer
Middleware Layer
Time
Synchronizer
Group
Manager
Data
Manager
[Service N]
Virtual Machine
Network Stack
Operating System Hardware
Figure 3
4
Proposed Middleware Architecture
Sensor Network Management
The EmNetS project is developing a general purpose remote management system to monitor, manage
and control the behaviors of wireless sensor networks. To date, there has been very little research on
network management and performance debugging for wireless sensor networks. This is mainly
because the original vision for such networks envisaged extremely dense, random deployments of very
inexpensive nodes, operating fully autonomously to solve or avoid faults and performance problems.
However in reality, it is not the case. Many real-world applications, including those for utilities,
require carefully planned deployment in specific locations, and nodes actually are not very
inexpensive. As a result, autonomous approaches will need to be complemented by traditional network
management approaches, but using new algorithms that are cognizant of the severe resource
constraints which characterize sensor nodes.
Some relevant work in this area includes [11], which proposes two simple application-independent
protocols for collecting health data from and disseminating management messages to the sensor
networks. It is limited as a passive monitoring tool only, i.e., it requires a human manager to issue
queries and perform analysis on collected data. In contrast, [12] proposes to reuse the main sensing
application’s tree routing protocol to deliver monitoring traffic. As a result, it might be non-trivial to
171
adapt the current monitoring mechanism to different classes of applications. In [13], the authors have
surveyed some existing work in sensor network management. They found that currently there’s no
generalized solution for sensor network management.
Part of the EmNetS project has been targeted to fill this gap in sensor network research. Our goal is to
develop a simple yet efficient, general purpose, policy-based sensor network management system
which should exhibit the following important characteristics: low management overhead, strong fault
tolerance, adaptive to network conditions, autonomous and scalable.
Constraints
Platform Description
Quality Criteria
Compose
Middleware Services
Middleware Image
Deploy
Wireless
Sensor Network
Figure 4
4.1
Processes of Composition and Deployment
EmNetS’s Network Management System
To strike a good balance between scalability and complexity, a hierarchical network management
system would be an appropriate solution. We propose to use several layers of management, in which
the managers in the lowest layer directly manage the sensor nodes in their part of the network. Each
manager passes collected health data to its higher-level manager and at the same time disseminates
commands from the higher-level manager to the nodes it manages. Typically, a management layer of
the proposed EmNetS’s Network Management System (ENMS) consists of the following components
(see Figure 5):
Data Collection &
Dissemination Protocols
Management Policies
Sensor Network Models
Management Engine
Figure 5
EmNetS’s Network Management System Components
(1) Sensor network models: The models are to depict the actual states of the sensor networks. There
are various possible sensor network models to be captured in our management system, for example
link quality map, network topology map, energy map, etc. An important requirement is that the
network models must be extensible to easily accommodate future classes of sensing applications.
(2) Data collection and dissemination protocols: To collect health data from sensor networks, we are
exploring a combination of energy-efficient application-dependent and application-independent data
collection protocols. The former protocol would use the main sensing application’s tree routing
protocol to deliver health data from the sensor networks. One way to implement this approach is to
piggy-back the health data into the real application data packets. The latter has the advantage of being
independent from the sensing applications, thus can be easily adapted to be used in different
applications. Moreover, when the application fails, the latter protocol can still continue functioning.
We envisage a combined protocol in which the former approach would be used to report network
health data periodically, while the latter is for active node probing and sending management messages
when required.
172
(3) Management policies: ENMS’s management policies will specify tasks to be executed if certain
system health conditions are met, e.g., battery level of node A is now 10%, so node A should go to
sleep mode.
(4) Management execution engine: The management engine uses the data collection/dissemination
protocol to update the sensor network models, and to send commands to sensor nodes. Based on the
collected health data, and the management policies, it will then automatically analyze the current
situation and execute the right management tasks, e.g., re-configuring a network route in case of
congestion. Together with well-defined management policies, an intelligent management engine
would help to achieve the desirable level of autonomy for our ENMS, thus minimizing the need of
human managers.
5
Test bed Implementation
One of the key activities within the EmNetS project is the development of a live system test bed. The
objective of the test bed is twofold. Firstly it provides a hardware platform to validate protocols and
network architectures developed within EmNetS and to execute experiments demonstrating the
suitability of the developed platform for utilities and responsive building applications. Secondly, the
test bed provides network management functionality via a backbone USB/WiFi network for topology
control, network element status indication, performance/fault analysis, configuration and remote
programming of individual devices. The test bed architecture, depicted in Figure 6, is based on the ReMote test bed architecture developed at the University of Copenhagen [14].
Client PCs
Server PC
Host PCs
Sensor-Net
Figure 6
EmNetS Test bed
The test bed consists of the following layers –
The Sensor Net layer consists of the sensor devices. The Xbow MICAz and Moteiv TMote are
currently supported. The test bed includes support for software stacks developed in TinyOS versions 1
and 2. Future work in this layer includes the additional support for the Contiki operating system and
the provisioning of more advanced components for testing and debugging. The Hosts PCs consist of
Linux based embedded PC platforms with USB2 connectivity to the sensor nodes in the layer below.
The host PCs contains the mote control host daemon to facilitate mote discovery and issuing mote
commands start, stop and reset for specific network topology control and remote power management.
This layer also contains the Bootloaders for the sensor devices within the testbed. Connectivity to the
upper IP based Server layer is via an Ethernet/WiFi link. Future work in this layer will focus on the
implementation of wireless USB and the provision of application data logging. The Server PC layer
contains the mote control server daemon to bridge the communication between the clients in the upper
layer to motes in layer 1. It also contains an MYSQL based database for central storage of all system
information and a TOMCAT information server to provide client system information, user
authentication and mote information. Connectivity to the top layer is via WiFi link. Further extensions
173
in this layer are focused on development of an administrative interface to manage the network users,
reservation of test bed resources and mote information. The Client PC layer forms the top layer in the
sensor test bed in which the user can interact with the system. Each client PC contains a java graphical
user interface which lists the available motes and provides services such as start, stop and reset of the
individual devices. Individual motes can also be reprogrammed with a console window available for
each device. Future directions in this layer include the development of an interactive map showing the
node deployment, a reservation system by which users can reserve network resources for a particular
test and a data logging facility. It is envisaged that the Sensor-Net test bed architecture will evolve
across multiple domains and institutes providing a tool for remote access and the deployment of
networking protocols to further advance wireless sensor network research.
6
Conclusion
The EmNetS project is current undertaking a research programme in the area of embedded wireless
sensor networks and is advancing the current state of the art in energy efficient, scalable networking
protocols, middleware, network management and testbed development. The application domain under
investigation includes the responsive building environment which provides a number of key research
challenges in terms of energy efficiency and scalability. This paper has provided an overview of
current research activities within the programme and highlighted the challenges and solutions under
development.
References
[1] http://www.cs.ucc.ie/emnets/
[2] IEEE 802.15.4 Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
specifications for Low-Rate Wireless Personal Area Networks (LR-WPANs), 2006
[3] A. El-Hoiyi, J.-D. Decotignie, and J. Hernandez. Low power MAC protocols for infrastructure
wireless sensor networks. In Proceedings of the Fifth European Wireless Conference, Feb. 2004.
[4] Joseph Polastre, Jason Hill & David Culler, “Versatile Low Power Media Access for Wireless
Sensor Networks”, Proc. Embedded Networked Sensor Systems, 2004, pp. 95 – 107
[5] Anis Koubaa, Mellek Attia, “Collision-Free Beacon Scheduling Mechanisms for IEEE
802.15.4/Zigbee Cluster-Tree Wireless Sensor Networks”, Technical Report, Version 1.0, Nov.
2006, http://www.open-zb.net/
[6] Ho-In Jeon, Yeonsoo Kim, “BOP Location Considerations and Beaconing Scheduling for
Backward Compatibility to Legacy IEEE 802.15.4 Devices”, submitted to IEEE 802.15.5 Task
Group 5.
[7] A Holistic approach to multi-hop routing in sensor networks, Alec Lik Chuen Woo, Thesis,
University of California, Berkeley.
[8] The Haas, Z. J., and Peatlman, M. R., “The zone routing protocol (ZRP) for ad hoe networks”,
Internet Draft -- Mobile Ad hoc NETworking (MANET) Working Group of the Internet
Engineering Task Force (IETF), November 1997.
[9] Joel Koshy, Raju Pandey, “VM*: Synthesizing Scalable Runtime Environment for Sensor
Networks”, SenSys’05, San Diego, California, USA, 2-4 November 2005.
[10] Peter Graubmann, Mikhail Roshchin, “Semantic Annotation of Software Components”,
EUROMICRO-SEAA’06, Cavtat/Dubrovnik (Croatia), August 28 – September 1, 2006.
[11] G. Tolle, and D. Culler, “Design of an Application-Cooperative Management System for
Wireless Sensor Networks”, European Workshop on Wireless Sensor Networks, Istanbul, Turkey,
Jan 2005.
[12] S. Rost, and H. Balakrishnan, “Memento: A Health Monitoring System for Wireless Sensor
Networks”, IEEE SECON, Reston, VA, Sep 2006
[13] W. L. Lee, A. Datta, and R. Cardell-Oliver, “WinMS: wireless sensor network-management
system, an adaptive policy-based management for wireless sensor networks”, Tech. Rep. UWACSSE-06-001, The University of Western Australia, June 2006.
[14] http://www.distlab.dk/sensornet
174
Dedicated Networking Solutions for Container Tracking
System
Daniel Rogoz 1, Dennis Laffey2, Fergus O’Reilly 1, Kieran Delaney 1, Brendan O’Flynn2
1
TEC Centre, Cork Institute of Technology, Rossa Avenue, Cork
(daniel.rogoz, fergus.oreilly, kieran.delaney)@cit.ie
2
Tyndall National Institute, Cork
(dlaffey, boflynn)@tyndall.ie
Abstract
TEC Centre researchers in CIT in collaboration with Tyndall National Institute are currently
developing a container management and monitoring system using Wireless Sensor Networks
(WSNs) with a support of Cork Port and local company Nautical Enterprises. The system is
essentially being designed to seamlessly integrate with existing container management and
monitoring techniques at the port, efficiently and at low cost extending its capabilities with remote
querying, localization and security. To achieve its goals, the system exploits the capabilities of
wireless sensor network nodes used as container tags, forming a wireless, ad-hoc network
throughout the container yard. The paper will briefly describe the current project status which
includes developing hardware solutions by Tyndall – dedicated hardware WSN platform; and
software solutions by the TEC Centre, like specialized graphical user interfaces on PDAs (based
on the .NET Compact Framework) or laptops, and applications for WSN motes running TinyOS
WSN operating system to provide full system functionality on one-hop communication level. The
paper will further introduce current work being done to overcome the main project challenge –
physical, visual and the most constraining, radio shielding of the containers. By implementing
multi-hopping and ad-hoc routing techniques the system will exploit the stacked and rowed
containers to forward information from one to the next thus allowing intelligent and reliable
communication from the depths of the port/yard to the management system user. The system
power constraints will be addressed by using a power efficient MAC layer, placed underneath
routing protocol, extending system lifetime to considerable amount of time, enabling it to operate
throughout the whole container management cycle on ordinary batteries.
Keywords: wireless sensor networks, applications, asset tracking.
1
1.1
Introduction
Project background and rationale
Ireland’s island status and large external trade make efficient, low cost and fast trans-shipment
of goods a strategic economic requirement. The efficient and timely flow of container traffic, through
Ireland’s ports is of vital importance to maintaining Ireland competitiveness and export driven
economy. With Irish ports operating as economic gateways, container traffic is on/off-loaded, moved
and stacked in tiers, for further shipment or transfer to the rail or road network. Within ports the order
of on/off-loading, placement in the storage yard, stacking and equipment levels are all key in
maintaining an efficient low-cost operation. Organizational mistakes can have significant time and
labour costs. Four ports in the Republic and two in Northern Ireland have load-on/load-off(lo-lo)
container services. In 2003, the total island traffic was 1,007,261TEU (Twenty foot container
Equivalent Unit). The ports vary in throughput from Warrenpoint 9,712TEU to Cork 137,246TEU to
175
Dublin at 495,862TEUs. This shows the potential for the technology innovations. In Ireland, lo-lo
traffic in 2003 grew at twice the global average. Projection forward to 2008 sees estimates of a total
growth of 26% in the 5 year to 2008. These trade figures show the continuation of a strong
import/export business and the opportunity for ports to invest in their infrastructure requirements. [1,2]
1.2
Container tracking scenario
Within the container terminal, containers are stored in rows, divided into slots. In the Port of
Cork the main yard area consists of 80 rows with 20 slots in each row. Figure 1 shows the structure of
the rows and slots which provide a regular storage area. The tracks between the containers allow the
wheels of the straddle carriers to pass, over the containers.
Figure 1. Port container storage area and container markings used for identification
Steel walled containers currently used globally are tagged with a registration and classification
identifier code, which consists of both the supplier, container number and a description code. The
Supplier/Unit Number code consists of 4 letter code followed by 6 numeric digits. This code is unique
to each container. In addition, a 4 digit alphanumeric type/classification code identifies the type of
container and its use. Using the full set of codes each individual container can be uniquely identified.
A sample of one such code is shown in Figure 1.
In Port of Cork, all equipment for loading/unloading containers is equipped with a portable
terminal. This allows the machinery operator to key in the last 4 digits of the container identification,
identifying the container and then the operation carried out on the container. This main equipment
mainly used on terminal in the handling consists of the gantry cranes used at the port/sea interface and
the straddle carriers used to load/unload trucks at the port/land interface. This equipment is networked
wirelessly back to a central management system which records the movements and operations carried
out and then interfaces with the accounting/management systems for customs and payment purposes.
Currently the on-site management, tracking, location monitoring and auditing of containers at
Irish container ports is a laborious and time consuming process, necessitating hand location,
identification, tracking of containers. This imposes significant costs on the relatively small and
economically weaker ports and trans-shipment centres. Mistakes or delays result in additional
expenses, all of which impact on the economic costs of export and import. Additionally, systems as
used in large ports abroad, e.g. Rotterdam are too expensive and un-justifiable, given the scale of
traffic levels in the Irish ports.
In large ports such as Rotterdam Machine Vision Systems are being used to read container
numbers and automatically identify them from databases. These vision systems are costly to install
and maintain and are only justified for large volumes of traffic. They suffer from poor weather
conditions, which make vision difficult, damage to numerals and dirt covering the numerals on the
containers. These are conditions which are prevalent in Ireland. Machine vision systems also do not
allow for remote finding and identification of containers, only those currently in vision.
Other container tracking solutions are mainly based on Passive Electronic Tagging and RF-ID.
The use of passive electronic tags/RF-ID, which respond when queried, will give the identification of
specific containers, but fails to provide for remote location/identification and will not allow for
monitoring. Passive tags do not have the ability to network to allow communication with tags out of
range. Passive tags also do not have the capability for container monitoring and/or protection. Such
176
monitoring can extend to ensure that containers are not entered/exited e.g. for terrorism/immigration
purposes and ensure that containers are handled correctly in a yard by measuring vibrations, forces
exerted etc.
1.3
Proposed solution and the challenges
This project proposes a Wireless Sensor based tracking system which will allow the tagging,
identification and tracking of shipping containers, from when they enter a port, to when they depart for
their final destination. This will be low cost and efficient for smaller ports to use, allowing them a
competitive equality, especially in the regional areas.
We propose using Sensor Network derived, Sensor Identification Devices (SID) to self-identify,
track and help manage the individual containers in a yard. These will be self-contained identification
devices, approximately the size of a cigarette box, with radio networking capability. The SID devices
will be attached, in a removable manner, to each container entering the yard and will store,
identification information regarding the container, its source, destination and any important
information regarding its contents. Each SID will run off an enclosed battery and be capable of
communicating via radio over a short distance of approximately 30m with either other SID devices or
handheld readers/PDAs. When containers are stacked and placed in rows, SIDs will use multi-hopping
and ad-hoc networking techniques to forward information from one to the next and allow
communication from the depths of a stack/row to an outside point.
The prototype SID devices are based on existing technology developed in the CIT and the
Tyndall National Institute. Tyndall National Institute has developed and tested a flexible sensor
network platform [3,4] thus providing much of the base hardware for implementing the SID devices.
The container tags wireless communication will be using ZigBee standard in unlicensed ISM RF band
The main challenge in realization of the project is the environment, the fact that containers
provide physical, visual and radio shielding. Radio frequency communication range is limited by
multipath propagation in presence of steel containers – phenomena such as reflection and diffraction
would be omnipresent. In order to facilitate accessibility to each container tag, an approach different
than direct communication needs to be taken. The container deployment manner is highly
unpredictable, imposes no fixed infrastructure of the network formed by the container tags and is
moderately dynamic as the containers are deployed and removed on a regular basis. Lastly, the battery
operation of the tags makes their lifetime limited making power efficiency a significant issue.
2
2.1
Current status
System overview
Our system consists of wireless sensor nodes [3,4] acting as container tags. The tags
communicate wirelessly using 2.4GHz unlicensed ISM frequency band and access to the tags is
enabled through a gateway. The gateway can be connected either to a PDA or PC/Laptop, acting as a
bridge, forwarding messages from serial connection to RF and backward. For user interaction with the
system we have developed a specialised Graphical User Interface, which can run on any Windows
Mobile PDA or Windows PC.
The system functionality is a substitution of current container management and tracking
methods used in the port storage yard. First of all it enables RF communication with the tags,
establishes connection with the nodes to either find active tag (by known container number), or find
empty tag (by known mote ID, which is a unique tags number), or just discover all tags in range (all in
direct communication range, then no information is required). The system provides a Beacon function
to physically locate the tags (locating the tag with RSSI (received strength of the signal) /hop count
and LED indicator). The container location can be determined by accessing stored location data
(row/slot) as well.
All the container information stored on the tag can be accessed by querying the tags. The full
container information can be displayed, including container number, type, arrival and departure dates,
location, owner, and any additional information. The user can change the data stored to update the
177
container information. The system allows activation of new/empty tags and storage of relevant data as
well as tag deactivation (resetting the data). Figure 2 summarizes the system architecture.
PDA GUI
Laptop GUI
Serial connection
Wireless connection
Gateway node
Container tags
Figure 2. System overview
2.2
Graphical user interface
In order to facilitate user interaction with the system, we have developed a specialised Graphical
User Interface. Its purpose is to provide user with the full system functionality through a PDA or PC
screen, thus hiding the underlying complexity of the system. It acts as a bridge between a PDA/Laptop
and the WSN/container tag network – communicates through a serial connection with a Gateway
node, which in turn interacts with the deployed container tags wirelessly using ZigBee standard RF
communication. As the mobility of the user interface is the key aspect – it has to interact with the
deployed container tags in the yard, we have based the GUI application on a widely established
Windows Mobile based PDA device. The gateway it attached to the PDA using a dedicated serial
cable. Figure 3 shows the PDA interface.
Figure 3. Graphical User Interface running on a PDA
As the Windows Mobile GUI is based on .NET Compact Framework, which is to some extent a
subset of full .NET Framework, the same GUI application can be launched on an ordinary Windows
XP PC with installed .NET Framework without any additional changes. Figure 4 shows the same GUI
running on a desktop Windows XP.
Figure 4. Graphical User Interface running on a desktop PC (Windows XP)
178
2.3
Gateway and container tags
2.3.1 Hardware platforms
Tyndall National Institute has provided hardware solution for the project, based on its DSYS25z
wireless sensor node [3,4]. The node is built with ATmega128 microcontroller and Ember EM2420
radio transceiver (Chipcon CC2420 counterpart), a RF monopole antenna is used for communication.
Gateway and container tag modules are essentially the same with the exception of an external serial
connector on the Gateway module. Both are enclosed in waterproof, RF-transparent boxes, with
external power switch and two LEDs. Figure 5 shows exemplary tag, sealed and open with visible
Tyndall mote.
Figure 5. Wireless sensor nodes used as container tags.
2.3.2 Software solutions
Wireless Sensor Networks applications are tightly bound to a particular hardware, manipulating
hardware resources to execute high-level logic tasks. In WSN, applications are specialized and
hardware resources are very limited. Therefore accurate control of how these resources are used is
essential, making software development process long and error susceptible. In addition, changing the
platform requires repeating the whole development process. Programming the motes using
conventional methods can be challenging, especially utilizing some more complex algorithms, such as
ad-hoc networking. Therefore as a software platform for container tags we have chosen TinyOS,
dedicated operating system for wireless sensor networks [5]. Gateway and container tags motes run
applications written in nesC [6,7], a C-like programming language of TinyOS.
TinyOS
TinyOS is a multi-platform sensor networks operating system designed by U.C. Berkeley EECS
Department to address specific needs of embedded wireless sensor networks. TinyOS is a set of
“blocks” representing certain functionality, from which the programmer can choose to build his
application by “snapping” or “wiring” these components together for a target hardware platform.
These components can be high-level logic (like routing algorithms), or software abstractions for
accessing the hardware resources (such as radio communication, ADC, timers, sensors or LEDs), and
interact through well-defined bi-directional interfaces (sets of functions), which are the only access
points to the component. Bi-direction of interfaces allows introduction of split-phase operation,
making commands in TinyOS non-blocking. Components structure goes from top-level logic layer
down to platform-dependent hardware presentation layer. By replacing a component we can change
algorithms, hardware platforms or expand hardware platform functionality.
TinyOS supports high level of resource constrained concurrency in form of tasks and hardware
event handlers as two separate threads of execution. Scheduling tasks allows implementation of
power-saving algorithms; the mote can go into a sleep mode, saving energy while waiting for an event
to occur. [5,6,7]
Gateway application
The basic role of the gateway is to be a bridge between the PDA connected via serial cable, and
the container tag network accessible through wireless RF connection. It forwards the messages
received over UART to RF, and the other way. The messages are following a specific packet structure
179
defined by TinyOS – Active Messages, containing destination address, message type, group ID,
length, CRC and message payload. Basing on this information, the gateway can check if the message
is not corrupted, filter out messages transmitted from outside the system; it can assess the strength of
the received radio signal as well.
Container tag application
The container tag stores detailed container information and makes the information accessible to
the system user. It communicates with the gateway using only radio packets, receiving commands and
responding accordingly, to provide the functionality described in 2.1.
2.4
Current system deployment
Our current system testbed consists of small scale, 4 container tag network, a gateway node and
a PDA interface. The RF communication is based on a direct, single hop connection between the
gateway and the nodes; the network has currently a star topology. At this stage the system full
functionality as described in 2.1 is implemented, with the exception of hop count indication. This
setup provides the testbed for connectivity tests, makes possible measuring the RSSI and drop packet
rate.
Figure 6. Current, single-hop, communication setup
2.5
Feasibility tests results
To verify the feasibility of this container management system solution, we have performed a
number of tests with containers currently stacked in various combinations in one of the Cork Port
container yards. The empty container storage area was used as a test site, as active storage areas were
inaccessible due to normal operation of container loading/unloading equipment within these areas.
Two key tests have been performed. First with the setup as pictured on Figure 7, the containers have
been placed in a grid, 4 rows, 3 containers long, and stacked 3 containers high. One of the tags was
acting as a receiver and the other was transmitting from various locations. For each location the
average strength of the signal was measured as well as packet delivery rate. The results showed packet
delivery rates ranging from 73% to 100% and RSSI from -85dBm to around -40dBm. This test proved
that communication between tags that are up to 2 containers apart is possible with sufficient reliability.
RECEIVER
Figure 7. Test setup used for verifying container-to-container communication
For the second test we used a single tag emitting only raw 2.4GHz carrier frequency and a
directional antenna with spectrum analyzer. The tag was attached to the container door and containers
have been placed 10cm apart. Figure 8 shows the setup of this test. We measured strength of the signal
in 5m radius with receiving antenna pointing in the direction of the gap between containers, where the
transmitter has been placed. The results showed that the transmitter antenna radiates from the gap
180
between the containers in a wide angle, not only the narrow line of sight, which makes container-tocontainer communication feasible. The test results are included in the Figure 8 below.
Figure 8. Shape of antenna transmission field radiating from the tag attached on container door
3
Future work
Most of the functional targets of the project have been already fulfilled, but two key aspects still
remain to be addressed. One of them is accessibility of every single container tag within a network,
from any point in the container yard. This means that a multi-hop networking protocol (such as in [8])
has to be implemented to exploit the manner of container placement (stacked rows) in order to allow
the tags to forward the radio messages from the gateway deeply into the network. In this way no direct
communication range is necessary to access a tag. The multi-hop protocol cannot rely on any topology
as a user should be able to connect to the network from any place, provided that at least one tag is
within the gateway’s communication range. Containers are loaded and unloaded on a regular basis, so
the container tags will be constantly entering and leaving the network, and the size of the network is
not initially defined. Therefore the networking protocol has to be reconfigurable, scalable and
moderately dynamic as well, since the user will be mobile.
Figure 9. Multi-hopping communication scheme in container yard
The other aspect is the power efficiency in order to extend system lifetime to a reasonable
amount of time (i.e. months). A power conserving radio Media Access Control (MAC) protocol,
similar to ones described in [9,10] should be used to manage the radio state by switching it off when
idle, as the radio used in the project consumes roughly the same amount of current, whether in idle,
receive or transmit mode.
The current work focuses on introducing these multi-hopping ad-hoc routing techniques into
networking part of the system and addressing the power constraints problem through power efficient
MAC protocol. This project’s aim is to prove the technology and concepts for container management
and tracking application, and at the end of the project we are expected to deliver a medium scale (10
tags) system demonstrator with suitable multi-hopping ad-hoc techniques employed (Figure 9),
providing functionality as described in system specification.
4
Conclusions
The proposed solution for a container management and tracking system is competitive to the
existing solutions, has the potential for future enhancement and is an attractive opportunity for
commercialization. The two enabling technologies provided by the system are Intelligence and
Networking capabilities. The combination of these technologies enables several innovations which are
not in existing tracking systems.
181
System intelligence allows containers to be tracked and monitored individually, permitting them
to log their own conditions/movement and generate alarms where appropriate. This gives the
capability to extend to security/immigration control purposes and automatically audit the container
management in the port. Passive or unintelligent tagging, e.g. RF-ID, does not allow for this future
value added capability. The ability to record/monitor each container’s stay in a port will allow for
quality management procedures and provide an automatic electronic audit trail. This is important for
food, valuable and dangerous goods trans-shipments.
The sensor based hopping networking ability, allows potential access and communication to all
containers within a yard from a central location, without needing to individually visit them. This
provides for radio communication, in what is in fact a difficult radio environment due to the
significant quantity of steel present. The system will allow full scalability in operation, scaling with
port development and container traffic growth.
Given that there are low infra-structural overheads the system would be affordable across a wide
range of port sizes and can grow through the networking function without significant additional
outlay.
Acknowledgements: This work is carried out as part of the Enterprise Ireland funded Project
Containers [11] PC/2005/126, and the support of all project partners is recognised.
References
[1]
[2]
[3]
Irish Maritime Transport Economist, Sept. 2004, published by IMDO-Ireland;
Irish Short Sea Shipping, Inter-European Trade Corridors, 2004, published by IMDO-Ireland;
S.J. Bellis, K. Delaney, B. O'Flynn, J. Barton, K.M. Razeeb, and C. O'Mathuna, “Development
of field programmable modular wireless sensor network nodes for ambient systems”, Computer
Communications, Special Issue on Wireless Sensor Networks and Applications, Volume 28,
Issue 13 , 2 August 2005, Pages 1531-1544
[4] B. O'Flynn, S. Bellis, K.Mahmood, M. Morris, G. Duffy, K. Delaney, C. O'Mathuna “A 3-D
Miniaturised Programmable Transceiver”, Microelectronics International, Volume 22, Number
2, 2005, pp. 8-12;
[5] Hill J, Szewczyk R, Woo A, Hollar S, Culler D, Pister K “System architecture directions for
networked sensors” SIGOPS Oper. Syst. Rev., Vol. 34, No. 5. (December 2000), pp. 93-104;
[6] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, D. Culler. “The nesC language: A
holistic approach to networked embedded systems”;
[7] D. Gay, P. Levis, D. Culler, E. Brewer. “nesC 1.1 Language Reference Manual”, May 2003;
[8] C. Gomez, P. Salvatella, O. Alonso, J. Paradells. “Adapting AODV for IEEE 802.15.4 Mesh
Sensor Networks: Theoretical Discussion and Performance Evaluation in a Real Environment”
International Symposium on a World of Wireless, Mobile and Multimedia Networks, 2006
(WoWMoM'06);
[9] W. Ye, F. Silva, J. Heidemann “Ultra-Low Duty Cycle MAC with Scheduled Channel Polling”
in Proceedings of the 4th ACM Conference on Embedded Networked Sensor Systems (SenSys),
Boulder, Colorado, USA, Nov., 2006;
[10] J. Polastre, J. Hill, D. Culler. “Versatile low power media access for wireless sensor networks”
In Proceedings of the Second ACM Conference on Embedded Networked Sensor Systems
(SenSys), November 3-5, 2004;
[11] D. Laffey, D. Rogoz, B. O’Flynn, F. O’Reilly, J. Buckley, J. Barton. “Containers – Innovative
Low Cost Solutions for Cargo Tracking” Information Technology & Telecommunications
Conference 2006, Institute of Technology, Carlow, October 25-26, 2006. Proc pp 187-188;
182
Handover Strategies in Multi-homed Body Sensor
Networks
Yuansong Qiao 1,2,3, Xinyu Yan 1, Adrian Matthews 1, Enda Fallon 1, Austin Hanley 1,
Gareth Hay 4, Kenneth Kearney 4
1
Applied Software Research Centre, Athlone Institute of Technology, Ireland
2
Institute of Software, Chinese Academy of Sciences, China
3
Graduate University of Chinese Academy of Sciences, China
4
Sensor Technology + Devices Ltd
[email protected], [email protected], {amatthews, efallon, ahanley}@ait.ie,
{Gareth.Hay, Kenneth.Kearney}@stnd.com
Abstract
Wearable wireless medical body sensor networks provide a new way of continuous monitoring and
analysis of physiological parameters. Reliable transmission of real-time vital signs is a basic
requirement for the design of the system. This paper explores multi-homing to increase data
reliability for body sensor networks. It proposes a multi-homed body sensor network framework
and investigates handover strategies during sensor nodes movement.
Keywords: Multi-homing, Body Sensor Network, Handover
1
Introduction
Wireless sensor networks have been developing rapidly in recent years. Much effort has been put into
the exploration of wireless sensor network applications. Body sensor network for medical care is an
emerging branch amongst these applications. They use wearable sensors to continuously monitor
patient vital signs such as respiration, oxygen in the blood, temperature and electrocardiogram (ECG)
etc. The real-time vital sign information can be delivered to doctors, nurses or other caregivers through
the communication module in the wireless sensor node. Through a body sensor network, patient status
monitoring can be extended from hospital to home, working place or other public locations. Any
changes in patient status can be reported immediately to corresponding responders. This can expand
the reach of current healthcare solutions, provide more convenience for patients and potentially
increase patient survival probability in the case of emergency situations such as heart attack [1].
Although a body sensor network is derived from a sensor network, there exist several significant
differences between the two [2]. Unlike common sensor networks, the data rate in a body sensor
network may range widely according to different medical monitoring tasks. Life-critical data should
be delivered reliably. Furthermore, medical data usually can not be aggregated by the network because
the data is generated from different patients. Consequently, the technologies in common sensor
networks can not be used directly in body sensor networks directly. Nevertheless, these features make
it possible for the body sensor network to take advantage of the traditional Internet technologies.
Currently, many solutions for body sensor networks use a Personal Digital Assistant (PDA) on a
patient to gather data from sensors and forward the data to a central server through cellular networks
183
[3][4][5]. This paper investigates utilizing multi-homing technologies (a node with multiple network
interfaces) in a body sensor network to increase data delivery reliability and decrease data delay in the
case of network failures. The sensor node transfers data directly to ambient network nodes without the
need for a bulky coordinating unit carried by the patient. In particular, this paper studies the handover
strategy of the multi-homed sensor node. As the patient is mobile, the transmission distance of the
sensor node is short and the wireless signal suffers interference from the environment, which will
cause network handovers to occur frequently.
Despite the fact that Internet protocols usually can not be used in sensor networks directly, the
algorithms in the Internet protocols are still valuable for the design of such networks. Multi-homing
technologies, where a host can be addressed by multiple IP addresses, are increasingly being
considered by the Internet society. Two multi-homing transport protocols have been proposed in the
current stage. They are Stream Control Transmission Protocol (SCTP) [6] and Datagram Congestion
Control Protocol (DCCP) [7]. DCCP is an unreliable transport protocol with congestion control,
whereas SCTP is a reliable transport layer protocol and employs a similar congestion control
mechanism to TCP. As this paper focuses on reliable data transmission and handover strategies in
multi-homed body sensor networks, SCTP is employed in simulations. Currently, the performance of
SCTP for bulk data transmission is studied in [8]. This paper focuses on the SCTP performance for
delay sensitive situations.
This paper is organized as follows. Section 2 discusses related work. Section 3 presents the system
architecture. Section 4 analyzes handover strategies. Section 5 discusses conclusions and future work.
2
Related Work
In [3], a remote heart monitoring system is proposed. It transmits ECG signals to a PDA which
forwards the signals to the central server through the cellular network. In [4], a wearable MIThril
system is proposed. It uses a PDA to capture ECG data, GPS position, skin temperature and galvanic
skin response. In [5], a body sensor network hardware development platform is presented. It is also
based on the sensor node plus PDA solution.
SCTP [6][9][10] is a reliable TCP-friendly message-oriented transport layer protocol defined by the
IETF. The features of multi-homing, multi-streaming, partial reliability [11] and unordered delivery of
SCTP make it possible for transmission of real-time data in multi-homed contexts. SCTP supports link
backup for a multi-homed endpoint through its built-in multi-homing feature. Data is transmitted on
the primary path. Retransmission is performed on an alternate path. The handover mechanism of
SCTP is based on link failures. After the primary path failure is detected, data will be sent on the
backup path.
2.1
Path Failure Detection and Handover Algorithms in SCTP
SCTP is designed to tolerate network failure and therefore provides a mechanism to detect path
failure. For an idle destination address, the sender periodically sends a heartbeat chunk to that address
to detect if it is reachable and updates the path Round Trip Time (RTT). The heartbeat chunk is sent
per path RTO (Retransmission TimeOut) plus SCTP parameter HB.interval with jittering of +/- 50%
of the path RTO. The default value of HB.interval is 30s. RTO is calculated from RTT which is
measured from non-retransmitted data chunks or heartbeat chunks. For a path with data transmission,
it can be determined if it is reachable by detecting data chunks and their SACKs. When the
acknowledgement for a data chunk or for a heartbeat chunk is not received within a RTO, the path
RTO is doubled and the error counter of that path is incremented. For a data chunk timeout, the sender
retransmits data chunks through an alternate path. For a heartbeat chunk timeout, the sender sends a
new heartbeat chunk immediately. When the path error counter exceeds SCTP parameter PMR
(Path.Max.Retrans), the destination address is marked as inactive and the sender sends a new heartbeat
chunk immediately to probe the destination address. After this, the sender will continuously send
184
heartbeat chunks per RTO to the address but the error counter will not be incremented. When an
acknowledgement for an outstanding data chunk or a heartbeat chunk sent to the destination address is
received, the path error counter is cleared and the path is marked as active. If the primary path is
marked as inactive, the sender will select an alternate path to transmit data. When the primary path
becomes active, the sender will switch back to the primary path to transmit data.
The path failure detection time is determined by SCTP parameters PMR and RTO. The default PMR
value in SCTP is 5, which means that SCTP needs 6 consecutive transmission timeouts to detect path
failure. RTO will be doubled for each transmission timeout and ranges between the SCTP parameters
RTO.Min and RTO.Max. The default values for RTO.Min and RTO.Max are 1s and 60s respectively.
If RTO is 1s (RTO.Min) in the case of a path failure, the minimum time for detecting path failure is
1+2+4+8+16+32=63s. However, the initial RTO could be 60s (RTO.Max). Therefore, the maximum
path failure detection time is 6*60=360s.
3
System Design
The wearable sensor node is deployed on patient. Each node has multiple Bluetooth [12] interfaces
which are connected to ambient separate Bluetooth access points. The sensor node selects one network
interface to transmit data. If the network interface fails, it switches to another interface to transmit
data. The failed interface keeps searching available access points. It attaches to one of the access
points except those that have been used by other interfaces.
Figure 1: Architecture of Body Sensor Network Node
The architecture of the body sensor node is shown in Figure 1. The communication entity includes
three modules:
Network Status Measurement (NSM): NSM provides local and end-to-end network dimensioning
information such as available access points, available bandwidth, delay, jitter, and loss to other
modules in the system.
Network Handover Management (NHM): NHM select one of the available access points which are
provided by NSM.
Path Handover Management (PHM): PHM manages end-to-end switchover amongst connections
between the source sensor node and the destination central server. A connection is identified by source
address and destination address. PHM makes a path handover decision based on the network status
information provided by NSM. The handover strategies will be discussed in the next section.
185
4
Investigation of Handover Strategies
This section studies the effects of path failure threshold on transmission delay. SCTP is used in
simulations.
4.1
Simulation Setup
The simulations in this section focus on a sensor node with two Bluetooth interfaces. All simulations
in this paper are carried out by running a revision of Delaware University's SCTP module [13] for NS2 [14].
Sensor
AP1
R=10M
0
8
16
AP2
R=10M
24
32
40
48
56
Time (s)
AP3
R=10M
64
72
80
88
96
104
Figure 2: Sensor Node Mobility Scenario
Figure 3: Simulation Network Topology
In the simulations, it is supposed that the sensor node is outfitted with two Bluetooth interfaces. The
transmission radius of the sensor is 10 meters (Figure 2). The patient walks in slow speed which is
about 0.5 meters per second. The overlap area between the access points is 20% of transmission
diameter, i.e. 4 meters. The patient walks directly from one access point to another access point. When
an interface in the sensor node fails, it will attach to the next access point when the access point is
available.
186
In the current NS2-SCTP implementation, the SCTP module does not work well with the wireless
module. This paper uses wired network to simulate network switch off. The simulation topology is
shown in Figure 3. Node S (the sensor node) and Node R are the SCTP sender and receiver
respectively. Both SCTP endpoints have two addresses. R1,1, R1,2, R2,1 and R2,2 are routers. The
implementation is configured with no overlap between the two paths. The MTU of each path is 1500B.
The queue length of bottleneck links in both paths is 50 packets. The queue length of other links is set
to 10000 packets. SCTP parameters are all default except those mentioned. The initial slow start
threshold is set large enough to ensure that the full primary path bandwidth is used. Only one SCTP
stream is used and the data is delivered to the upper layer in order. Initially the receiver window is set
to 100MB (infinite).
In order to simulate the network changes, the loss rate of the bottleneck links in Figure 3 are set to 0%
when the patient enters the area of an access point and it is set to 100% when the patient leaves the
area of an access point. At the initial stage, the primary path loss rate is set to 0% and the secondary
path loss rate is set to 100%.
Mean of Delay (s)
Simulation Results & Analysis
18
16
14
12
10
8
6
4
2
0
0.167
0.333
1
10
20
30
40
50
Data Rate (Packets/Second)
PMR=0
PMR=1
PMR=2
PMR=3
PMR=4
PMR=5
Figure 4: Mean of Delay
Standard Deviation of Delay (s)
4.2
25
20
15
10
5
0
0.167
0.333
1
10
20
30
40
50
Data Rate (Packets/Second)
PMR=0
PMR=1
PMR=2
PMR=3
Figure 5: Standard Deviation of Delay
187
PMR=4
PMR=5
As the transmission speed for different medical monitoring task varies widely, the simulated date rate
is changed from 10 packets per minute to 50 packets per second. CBR (Constant Bit Rate) is used for
data transmission. The effective payload length is 4 bytes. For each transmission speed, the PMR
value is changed from 0 to 5 and the data transmission time is 1000 seconds. The mean and standard
deviation of transmission delay are calculated for each simulation as shown in Figure 4 and Figure 5.
The results show that PMR=0 gives the minimum mean and standard deviation of delay amongst all
PMR settings for all data rates. The mean and standard deviation of delay increase when the PMR
value grows. However, there is a performance gap between PMR=2 and PMR>=3. For PMR<=2, the
performance difference of different PMR values is not significant.
5
Conclusion & Future Work
This paper proposes a multi-homed medical body sensor network framework to increase data
reliability and studies handover strategies in multi-homed environments. It puts forward a two-level
network selection strategy. The Network Handover Management module controls local access point
selection. The Path Handover Management module controls end-to-end path selection. Through SCTP
simulations, the results show that smaller path failure detection can achieve lower transmission delay
for various data rates in the case of path failures.
Future work is to study path handover strategies for multi-homed medical body sensor networks on
more complex environments. Wireless transmission distance, wireless access points deployment,
patient walking speed and network loss will be considered in the work.
Acknowledgements
The authors wish to recognize the assistance of Enterprise Ireland through its Innovation Partnership
fund in the financing of this Research programme.
References
Guangzhong Yang (2006). Body Sensor Networks. Springer, ISBN: 978-1-84628-272-0.
Victor Shnayder, Bor-rong Chen, Konrad Lorincz, Thaddeus R. F. Fulford-Jones, and Matt
Welsh (2005). Sensor Networks for Medical Care. Harvard University Technical Report TR-0805.
[3] ROSS P.E. (2004). Managing Care through the Air. IEEE Spectrum, 14-19.
[4] PENTLAND A. (2004). Healthwear: Medical Technology Becomes Wearable. IEEE Computer,
37(5): 42-49.
[5] B Lo, S Thiemjarus, R King, G Yang (2005). BODY SENSOR NETWORK – A WIRELESS
SENSOR PLATFORM FOR PERVASIVE HEALTHCARE MONITORING. The 3rd
International Conference on Pervasive Computing.
[6] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L.
Zhang, V. Paxson (2000). Stream Control Transmission Protocol. IETF RFC 2960.
[7] E. Kohler, M. Handley, S. Floyd (2006). Datagram Congestion Control Protocol (DCCP). IETF
RFC 4340.
[8] Caro, A., Amer, P., Stewart R (2006). Rethinking End-to-End Failover with Transport Layer
Multihoming. Annals of Telecommunications’, 61, (1-2), pp 92-114.
[9] Shaojian Fu and Mohammed Atiquzzaman (2004). SCTP: State of the art in Research, Products,
and Technical Challenges. IEEE Communications Magazine, vol. 42, no. 4, pp. 64-76.
[10] Randall R. Stewart, Qiaobing Xie (2006). Stream Control Transmission Protocol (SCTP) – A
Reference Guide. Addison-wesley.
[11] R. Stewart, M. Ramalho, Q. Xie, M. Tuexen, P. Conrad (2004). Stream Control Transmission
Protocol (SCTP) Partial Reliability Extension. IETF RFC 3758.
[1]
[2]
188
[12] IEEE 802.15.1 (2005). Wireless medium access control (MAC) and physical layer (PHY)
specifications for wireless personal area networks (WPANs).
[13] Caro, A., and Iyengar, J.. ns-2 SCTP module, Version 3.5.
http://www.armandocaro.net/software/ns2sctp/.
[14] UC Berkeley, LBL, USC/ISI, and Xerox Parc (2005). ns-2 documentation and software, Version
2.29. http://www.isi.edu/nsnam/ns.
189
Session 6
Doctoral Symposium
191
Hierarchical Policy–Based Autonomic Replication
Cormac J. Doherty 1 , Neil J. Hurley 1
1 School of Computer Science & Informatics
University College Dublin
{cormac.doherty, neil.hurley}@ucd.ie
Abstract
The complexity of managing and accessing large volumes of data is fast becoming the most pertinent
problem for users of large scale information systems. However, current trends in data management
requirements and data production exceed the capability of storage systems in existence. The current
state of a solution to this problem is presented in the form of a system for policy–based autonomic
replication of data. The system supports multiple distinct replication schemes for a single data item
in order to exploit the range of consistency and quality of service requirements of clients. Based on
traffic mix and client requirements, nodes in the system may make independent, integrated replica
management decisions based on a partial view of the network. A policy based control mechanism is
used to administer, manage, and control dynamic replication and access to resources.
Keywords: Distributed systems, Data management, Replication, Autonomic
1
Introduction
As exemplified by the notions of ubiquitous computing and personal area networks, technology is penetrating and permeating everyday life to an ever-increasing degree. With this acceptance of technology
come applications demanding access to data and services from any geographical location (e-banking,
news, video–on–demand, OS updates, games etc.). As demonstrated by collaborative work environments, researchers sharing datasets across institutional and national boundaries, and remote access to
corporate VLANs, this demand for data persists in the workplace. Provisioning timely and reliable access to this data may be viewed in the abstract as a data or replica management problem. Replication
may be used to increase scalability and robustness of client applications by creating copies throughout
the system such that they can be efficiently accessed by clients. The data management issues faced by
applications and technologies are mirrored in the systems and networks that support them. We now
consider as a concrete example, a telecommunications network.
1.1
Motivation
As 3G mobile networks are deployed, and pervasive, highly heterogenous 4G networks are developed,
a scalability crisis looms in the current network operations and maintenance (OAM) infrastructure. Due
to the trend towards ubiquitous computing environments, customers of future networks are expected to
use several separate devices, move between locations, networks and network types, and access a variety
of services and content from a multitude of service providers. In order to support this multiplication of
devices, locations, content, services and obligatory inter–network cooperation, there will be an increase
in the scale, complexity and heterogeneity of the underlying access and core networks. Furthermore,
based in part on the 2G to 3G experience, an explosive growth in the number of network elements
(NEs) to be managed is predicted. Each additional NE, type of NE, and inter-working function between
192
different access network technologies, adds to the volume of management data that must be collected,
queried, sorted, stored and manipulated by OAM systems. Moreover, as a result of this “always online”
lifestyle and the increased size and complexity of networks, there will be an increase in management and
service related data by several orders of magnitude.
As exemplified by the OSI reference model, the Simple Network Management Protocol (SNMP)
management framework, and the Telecommunications Management Network (TMN) management framework, network management (NM) has thrived on either centralised or weakly distributed agent-manager
solutions since the early 1990s [Martin-Flatin et al., 1999]. However, the increase in size, management
complexity, and service requirements of future networks will present challenging non-functional requirements that must be addressed in order to deliver scalable OAM data management sub-systems. More distributed architectures for next generation OSS platforms are one approach to providing scalable, flexible
and robust solutions to the demands presented by future networks [Burgess and Canright, 2003].
2
Distributed Data Layer
As an enabling technology for these distributed NM systems, a distributed data layer to manage replication and data access has been developed [Doherty and Hurley, 2006, Doherty and Hurley, 2007]. As
many of the challenges posed by future networks are data management challenges, an element of distributed control and autonomy is added to manage the replication life-cycle of data items.
The degree to which the advantages of replication are experienced is dependent upon access patterns,
traffic mix, the current state of the network and the applied replication schemes. Previous work has indicated that replication schemes impact significantly on performance of distributed systems in terms of
both throughput and response times [Hurley et al., 2005]. Indeed, a bad replication scheme can negatively impact performance and as such, may be worse than no replication at all.
A fundamental observation motivating this work is the fact that the access pattern perceived by a data
item is the product of an entire population of clients. This observation is not exploited in most replication
systems. That is, a system applying replication treats the arrival stream to a data item as though it were
generated by a single client. The system then attempts to generate a “one size fits all” replication scheme
to suit this client. As such, the range of consistency and quality of service requirements of all clients
contributing to an arrival stream is not taken into account when developing replication schemes.
In order to account for and exploit the various classes of client that contribute to the arrival stream
experienced by a data item, multiple distinct replication schemes are simultaneously applied to a single
data item so as to best satisfy the requirements of all classes of client. To provide this additional feature
of dynamic replication, policies are introduced to the system that must be enforced by all nodes.
2.1
Policy Based Replication
In order to account for node heterogeneity and control resources available to the distributed data layer, the
role a particular node plays in the network is controlled using a policy. Node policies are defined by an
administrator and specify how a particular node can be used in terms of network, storage and processing
resources. Node policies are used in determining which data items can be replicated on a specific node.
Two data centric policies are used to control replication. A data item policy specifies upper and
lower bounds on consistency related parameters and performance metrics that must be maintained by
any replica of the data item to which the policy refers. Associated with an instance of a logical data item
is a replica policy indirectly describing the level of consistency maintained by that replica and request
related performance metrics its host is prepared to maintain; replica policies are bound by data item
policies. A replica policy defines a particular point in the parameter space defined by a data item policy.
2.2
Distributed Control
Replication affords the possibility of increased ‘performance’ and robustness of client applications as
well as a degree of failure transparency. The appropriate measure of ‘performance’ is subjective with
193
respect to the system; it may relate to system–wide characteristics such as response time, throughput or
utilisation of nodes in a distributed database management scenario; or in a grid environment, replication
may be motivated by the high likelihood of node failure and quantified by availability. Our work specialises to the first scenario. We are interested in maintaining a desired level of performance, measured in
terms of throughput or response time, as well as data consistency, under changing workload conditions.
Traditional, centralised, approaches attempt to optimise some system wide measure of performance
such as throughput or response time using a centralised controller with complete knowledge of system
demands and resources. Such centralised control is impractical; firstly, due to lack of flexibility and
issues pertaining to failure transparency, reliability and availability. Secondly, due to the difficulty in
deciding upon a set of performance metrics to be optimised that will satisfy the QoS requirements of the
various classes of user in an inherently heterogenous environment. Finally, the immense computational
costs involved in provisioning centralised control represents an inescapable performance bottleneck.
Distribution or decentralisation of control and responsibility allows for independent and autonomous
components and yields partial solutions to the inadequacies of centralised control. In a decentralised
system, resources can not only be located where they will be most effectively utilised, but can also be
relocated, added and upgraded independently and incrementally in order to accommodate increasing
demands, growth or changes in system infrastructure. This flexibility also facilitates a more scalable
system. Furthermore, the relative independence and autonomy of components also affords a degree of
fault tolerance. Whereas component failure in a centralised system can result in total system outage, a
similar failure in a decentralised distributed system is typically limited to that component and results in
limited service degradation for a limited group of users.
Though based on several simplifying assumptions, a preliminary investigation using a discrete event
simulator has demonstrated the potential applicability of feedback control to replica management. In
response to a changing workload, nodes reconfigure replication schemes using feedback control so as
to maintain a particular response time. The current focus of research centres on maintaining the performance of the simulated controller whilst removing all simplifying assumptions. The approach being
taken is to use a set of algorithms to control different aspects of replication and feedback control to
manage the frequency with which the algorithms are run and setting of algorithm parameters.
3
Related Work
This work is primarily concerned with replication in a large scale, distributed, dynamic environment.
Existing work may be categorised according to features of this environment.
As a peer–to–peer system grows, so too do its resources, including bandwidth, storage space, and
compute power. When combined with replication, this scalability and the inherently distributed environment yields a degree of failure transparency. Furthermore, when structured, a peer–to–peer network or
Distributed Hash Table (DHT) not only offers a guarantee of an efficient route to every data item, but
continues to do so in the face of changes in network topology. Though seemingly well suited to the milieu, systems built on top of DHTs (PAST [Druschel and Rowstron, 2001], CFS [Dabek et al., 2001]) are
typically constrained by the decentralisation integral to peer–to–peer systems and do not maintain consistency. That is, data is read–only and is essentially cached as a means improving data availability and
fault tolerance. As such, many of the more difficult issues relating to replication are ignored. Systems
such as Ivy [Muthitacharoen et al., 2002] accept updates but offer only relaxed consistency guarantees.
In direct contrast to the scalability and restrictive consistency guarantees of peer–to–peer systems,
there exists a range of more centralised alternatives providing a wider range of consistency guarantees
(Bayou [Demers et al., 1994], fluid replication [Noble et al., 1999], TACT [Yu, 2000]). The global information necessary for these systems, restricts scalability and applicability to a dynamic environment due
to the possibility of frequent changes. Though these systems offer a multitude of consistency semantics across date items, none offer a range of consistency guarantees for a single logical data item (see
Section 2.1).
194
4
Conclusion
Modelling work [Hurley et al., 2005] validated against performance metrics taken from live networks
and test sites has fed into the design, development, and implementation of a flexible framework for replication. Within this framework a system supporting multiple distinct replication schemes for a single data
item has been developed. This system allows the exploitation of the range of consistency and quality
of service requirements of clients in a distributed environment and demonstrably improves upon performance when compared to metrics taken from live networks and test sites [Doherty and Hurley, 2006].
Further development and refinement of autonomic control mechanisms and integration into the system
will facilitate validation of hierarchical policy–based autonomic replication.
References
[Burgess and Canright, 2003] Burgess, M. and Canright, G. (2003). Scalability of Peer Configuration
Management in Partially Reliable and Ad Hoc Networks. In Proceedings of the 8th IFIP/IEEE International Symposium on Integrated Network Management, pages 293–305. Kluwer.
[Dabek et al., 2001] Dabek, F., Kaashoek, F., Karger, D., Morris, R., and Stoica, I. (2001). Wide-area
cooperative storage with CFS. In SOSP ’01’: Proceedings of the 18th ACM Symposium on Operating
System Principles, pages 202–215, New York, NY, USA. ACM Press.
[Demers et al., 1994] Demers, A., Petersen, K., Spreitzer, M., Terry, D., Theimer, M., and Welch, B.
(1994). The Bayou Architecture: Support for Data Sharing among Mobile Users. In Proceedings of
the IEEE Workshop on Mobile Computing Systems & Applications, pages 2–7, Santa Cruz, CA, USA.
[Doherty and Hurley, 2006] Doherty, C. and Hurley, N. (2006). Policy–Based Autonomic Replication
for Next Generation Network Management Systems. In Proceedings 1st Annual Workshop on Distributed Autonomous Network Management Systems, Dublin, Ireland.
[Doherty and Hurley, 2007] Doherty, C. and Hurley, N. (2007). Hierarchical Policy–Based Replication.
In Proceedings of the 26th IEEE International Performance, Computing and Communication Systems,
pages 254–263, New Orleans, LA, USA. IEEE Computer Society.
[Druschel and Rowstron, 2001] Druschel, P. and Rowstron, A. (2001). PAST: A large-scale, persistent
peer-to-peer storage utility. In HOTOS ’01: Proceedings of the 8th Workshop on Hot Topics in Operating Systems, pages 75–80, Washington, DC, USA. IEEE Computer Society.
[Hurley et al., 2005] Hurley, N., Doherty, C., and Brennan, R. (2005). Modelling Distributed Data Access for a Grid-Based Network Management System. In Proceedings of the 13th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages
315–318, Washington, DC, USA. IEEE Computer Society.
[Martin-Flatin et al., 1999] Martin-Flatin, J.-P., Znaty, S., and Hubaux, J.-P. (1999). A Survey of Distributed Enterprise Network and Systems Management Paradigms. Journal of Network and Systems
Management, 7(1):9–26.
[Muthitacharoen et al., 2002] Muthitacharoen, A., Morris, R., Gil, T., and Chen, B. (2002). Ivy: A
Read/Write Peer–to–Peer File System. volume 36, pages 31–44, New York, NY, USA. ACM Press.
[Noble et al., 1999] Noble, B., Fleis, B., and Kim, M. (1999). A Case for Fluid Replication. In Netstore
’99: Network Storage Symposium, Internet2.
[Yu, 2000] Yu, H. (2000). TACT: Tunable Availability and Consistency Tradeoffs for Replicated Internet Services. ACM SIGOPS Operating Systems Review, 34(2):40.
195
Sensemaking for Topic Comprehension
Brendan Ryder,
Dept. of Computing and Mathematics,
Dundalk Institute of Technology,
Dublin Road, Dundalk, Co. Louth, Ireland
[email protected]
Terry Anderson
School of Computing and Mathematics,
University of Ulster,
Newtownabbey,
Co. Antrim,
BT37 0QB, Northern Ireland.
[email protected]
Abstract
Users spend a considerable amount of time engaged in sensemaking, the process of searching for
structure in an unstructured situation and then populating that structure with information resources
relevant to a task. In other words, searching for representations and encoding information in those
representations. It is the gradual evolution of enquiry through our repeated interaction with
information. This work is examining how representation construction can be supported in the
sensemaking process, and specifically, how structural elements in resources can be exploited and
used to tag information resources at a fine-level of granularity. This paper surveys the literature
related to sensemaking, outlines the requirements and enabling technology selection for a
prototype sensemaking tool called coalesce and discusses proposed evaluation strategies.
Keywords: Sensemaking, Personal Information Management, Tagging.
1
Introduction
Finding relevant and up to date information is an essential task performed by all users in many
problem domains on a daily basis. Users spend considerable amounts of time and effort manually
identifying, evaluating, organising, producing and sharing digital resources. This can be referred to as
“information triage” [39], the process of sorting through relevant materials and organising them to
meet the needs of the task at hand, normally time-constrained requiring quick assessment based on
insufficient knowledge. The task of finding and organizing has been compounded because there is too
much digital information available on the Internet. Society is suffering from “information overload”,
or “data smog” [18], a concept originally explored by Bush in his seminal paper “As We May Think”
[10]. To compound the problem we also have to contend with information fragmentation [33], where
information that is required to complete a particular task is “fragmented” by physical location and
device. More people than ever before face the problems of identifying relevant and high quality
information that meet their information needs. Once the relevant information is found they need better
ways to organize and manage this information for their own use and for sharing with others in a
collaborative context.
2
Related Work
The following section surveys related work in sensemaking, information organization and
categorization.
196
2.1
Sensemaking
The tight integration between the tasks of finding and organising information has lead to the
establishment of a research area called sensemaking. Sensemaking is the cycle of pursuing,
discovering, and assimilating information during which we change our conceptualization of a problem
and our search strategies [51]. It is the gradual evolution of an inquiry through our repeated interaction
with information. This interaction can serve as an organizing structure for personally meaningful
information geographies [2]. Arriving at the output is ill-defined, iterative and complex. Information
retrieval, organisation and task-definition all interact in subtle ways [49]. This multifaceted activity is
also referred to as exploratory search [59]. The synergy and tight coupling of these behaviours result
in the creation of sense, that is, the process of sensemaking.
Russell [51] pioneered the work on the concept of sensemaking and developed a user model of
sensemaking activity, derived from observations of how a group of Xerox employees made sense of
laser printers. Russell found that people make sense of information about a specific topic using a
pattern he referred to as the “learning loop complex”. The learning loop has four main processes:
search for representations, instantiate representations, shift representations and consume encodons.
Representations are essentially a collection of concepts related to the task at hand (i.e. the
organisational structure of the information). Instantiating representations involve populating the
created structure with relevant information. The shift representation refers to the iterative amendments
that are made to the original structure during the ongoing search process. The original structure can be
merged, split and new categories added. The final structure is a schema that provides goal-directed
guidance and determines what to look for in the data, what questions to ask, and how the answers can
be organised.
Additional work carried out on sensemaking presented at CHI2005 suggests important revisions to
Russell’s model and theory [49]. Russell’s model separates the encoding activity, populating the
structure, from the representation search activity, finding a structure to aid sensemaking. Qu [49]
suggests a change to the relationship between these two activities. Representational development is
much more tightly integrated with the encoding process. Information suggesting representational
structure comes from some of the same sources as the information content. Sensemakers are not just
getting “bags of facts”, but organised ideas. These socially constructed knowledge resources can be
exploited in the representational search activity of sensemaking. Existing work examining
sensemaking include the Universal Labeler [29],[30], the Scholarly Ontologies Project (with
ClaiMapper, ClaiMaker and ClaimFinder) [55], Compendium [3], NoteCards [28], gBIS [15], ART
[43] and Sensemaker[4]. Complimentary work on the conceptual model of sensemaking, called the
Data/Frame theory, has also been discussed by Klein, et al. [60].
2.3
Information Organisation
A central activity in information management is the grouping of related items, and as a result is at the
heart of the sensemaking process. This can be examined from two perspectives: interface and
interaction design and associative technologies, that is, underlying data models for association.
Interface and interaction design is an important consideration in any application. Many studies have
been conducted to examine how we interact with information resources and information management
applications have provided various approaches to organising concepts and their associated content.
Concepts can be managed in text form [52], using a hierarchy [29] or graphically [50], using concept
maps [12], or mind-maps [43] representations. Content is associated in containers in research tools
such as Nigara [57]. Commercial tools such as Google Notebook [25], Clipmarks [14] and Net
Snippets [44] provide similar functionality. There is also a consensus in the literature that search and
organisation need to be combined in a unified interface [45] and [16]. Users find information in the
same way as our ancestors found prey, by “foraging” it, navigating from page to page along hyperlinks
[47]. ScentTrails [45] is a novel approach that applies this theory. This concurs with a study conducted
by Teevan [54] that found users prefer to find information by “orienteering”. They often begin with a
197
known object and then take repeated navigation steps to related information about that object,
eventually arriving at the information that is required.
In terms of aggregating heterogeneous information resources the associative technology can be used as
an information or semantic layer, a form of metadata that resides logically on top of existing resources.
This metadata layer can be created manually [31] and [32], or automatically [27] and [11]. This
improves both the management and discovery of the information that is stored. It can also be used to
automatically extract or recommend related resources subsequent to the initial sensemaking process
[38], a process called topic tracking [22]. For our work we are only concerned with the manual
creation of the metadata layer. The two most significant technologies that have been used in this
regard are WC3’s RDF (Resource Description Framework) and ISO’s Topic Map standards. RDF is a
fundamental technology at the heart of what is called the Semantic Web [6]. Metadata is encoded in
RDF and this representation is then machine-processible. With RDF you can say anything about
anything. Haystack [34] and E-person [1] both use RDF for creating associations at different levels.
Annotea [32] utilizes RDF for the creation and management of annotations. A topic map [46] can
represent information using topics (representing any concept, from people, individual files, events, and
information resources), associations (which represent the relationships between them), and
occurrences (which represent relationships between topics and information resources relevant to
them). DeepaMehta [50] employs topic maps as its underlying associative data model
2.4
Categorization
During this sensemaking process we label and categorise information resources. The labels that are
employed can be viewed as metadata [24]. The associative data model can be viewed as a metadata
layer and the labels form an integral part of that layer, a form a metadata within the metadata layer.
There are a number of mechanisms or approaches that can be used to categorise content and they can
be viewed along a continuum from formal subject-based classification (controlled vocabularies,
taxonomies, thesauri, faceted classification and ontologies) to informal folksonomies or tagging
systems to hybrid classification. A folksonomy is an Internet-based information retrieval methodology
consisting of collaboratively generated, open-ended labels that categorise web resources. The labels
are commonly known as tags and the labelling process is called tagging or social bookmarking.
Tagging systems have proved to be very effective in a collaborative context and it dramatically lowers
the content categorization costs because there is no complicated, hierarchically organised
nomenclature to learn. Systems like Dogear [41] have found that social bookmarking can also be
useful in the enterprise. Tagging has also been applied in a general tagging role in Piggybank [27],
[16] and [9].
3
Contribution
The broad aims of this research project are as follows:
•
•
To achieve a better theoretical understanding of sensemaking resulting in the establishment of
a model of how information seeking and sensemaking representation construction interact.
This involves examining the synergies between searching and organisation cognitive models.
To prototype novel designs to support the user engaged in the sensemaking activity and
evaluate them in the context of real-world workspaces.
More specifically, this work builds on existing sensemaking work by [51], [49] and [29], information
gathering work by Hunter-Gatherer [52] and DeepaMehta [50] and tagging concepts applied in
systems like Dogear [41], Phlat [16] and Piggybank [27]. This work will examine how representation
construction can be supported and will study how structural elements in resources can be exploited
and used to tag information resources at a fine-level of granularity, thus assisting the user with the
sensemaking process.
198
4
Proof-of-Concept Prototype
Table 1 provides an overview of the requirements and associated technologies that are being used to
create the coalesce (from word meaning associate, combine, consolidate) proof-of-concept prototype
to assist with sensemaking. Iteration #1 of prototype is currently being designed and developed with
the effort focusing on interface and interaction design and associative technology implementation.
Figure 1 provides and overview of the current interface, with each of the major elements numbered as
appropriate. Area 1 shows suggested topics that are extracted from the web document that is currently
being browsed. This is where the structural elements, organised bags of facts, are presented. Area 2
shows the consolidated sensemaking concepts and their relationship to one another. This will be
determined as the user interacts with resources and iteratively gains a better understanding of the
selected subject area. Area 3 is the information resource that is currently being viewed. The user has
the option of selecting and tagging page elements at various levels of granularity. Finally Area 4
illustrates a collection of snippets from the various resources that are browsed. The combination of
structuring and tagging will allow the user to create a “sensemap”, their cognitive understanding of the
association between the concepts as they search.
Requirement
Sensemaking Interface
Search component
Proposed Solution
Rich Internet application using AJAX (Google Web Toolkit).
Search API’s – Google (web carnivore [35]).
Associative component
Topic map – facilitates sharing and reuse; Tagging
Production component
XML, XML Schema, XSL(T).
Table 1. Prototype Technologies
5
Evaluation
Kelly [36], Zelkowitz [56] and Marvin [40] provide extensive discussion on evaluation techniques for
validating technology. Kelly, in relation to PIM (personal information management) applications,
maintains that using one-size-fits-all evaluation methods and tools is likely to be a less than ideal
strategy for studying something as seemingly idiosyncratic as PIM. According to Kelly, people should
be observed in their natural environments, at home or at work as they engage in PIM behavior in realtime, recording both the process and the consequences of the behaviour. Laboratory studies of
behaviours and tools should be leveraged to understand more about general PIM behavior. So, an
iterative combination of observations of users in their natural environments and also in controlled
laboratory sessions will enable the understanding of PIM behavior. These observations can then be
used to inform the design and development of prototypes and tools to support the user engaged in
PIM. She identifies the need for the development of new evaluation methods that will “produce valid,
generalisable, sharing knowledge about how users go about PIM activities” [36].
To this end, Elsweiler [61] has developed a task-based evaluation methodology that can be used for
PIM evaluations and it is this evaluation methodology that we propose using to evaluate the coalesce
prototype. We propose conducting iterative evaluations with the prototype between September 2007
and April 2008 with a representative group of users. It is intended to use Google Notebook as a
benchmark and perform a comparative analysis with it and the coalesce prototype, thus highlighting
the benefits that prototype brings to the sensemaking process. The aim will be to determine if
structural elements within resources can be used to aid the sensemaking process.
199
1
3
4
2
Figure 1. Iteration #1 Sensemaking Interface and Interaction
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
The ePerson Snippet Manager:a Semantic Web Application.
<http://www.hpl.hp.com/techreports/2002/HPL-2002-328.pdf >.
Bauer, D. (2002). Personal Information Geographies. Proc. of CHI2002, ACM Press, 538-539.
Compendium Institute. <http://www.compendiuminstitute.org/> (July 2007).
Baldonado, M.Q.W. and Winograd, T. (1997). SenseMaker: an information-exploration
interface supporting the contextual evolution of a user's interests. Proc of SIGCHI ’97, ACM
Press, 11-18.
Building an Integrated Ontology within the SEWASIE Project.
<http://www.dbgroup.unimo.it/prototipo/paper/demo-iscw2003.pdf> (July 2007).
Berners-Lee, T., Hendler, J. and Lassila, O. (2001) The Semantic Web, Scientific America, 284,
5, 34-43.
Describing and retrieving photos using RDF and HTTP. <http://www.w3.org/TR/photo-rdf/>
(July 2007).
Brause, R.W. and Ueberall, M. (2003). Internet-Based Intelligent Information Processing
Systems (Adaptive Content Mapping for Internet Navigation), World Scientific, Singapore.
Buffa, M. and Gandon, F. (2006). SweetWiki: Semantic Web Enabled Technoloies in Wiki.
Proc. of WikiSym ’06, ACM Press (2006), 69-78.
As We May Think, Atlantic Monthly.
<http://www.idemployee.id.tue.nl/g.w.m.rauterberg/lecturenotes/bush-1945.pdf> (July 2007).
Cai, Y., Dong, X.L., Halevy, A., Liu, J.M.and Madhavan, J. (2005). Personal information
management with SEMEX. Proc. of SIGMOD 2005, ACM Press (2005), 921-923.
Mining the Web to Suggest Concepts during Concept Map Construction.
<http://cmc.ihmc.us/papers/cmc2004-284.pdf> (July 2007).
Chirita, P.A., Gavriloaie, R., Ghita, S., Nejdl, W. and Paiu, R. (2005). Activity Based Metadata
for Semantic Desktop Search. Proc of ESWC05, Springer Lecture Notes in Computer Science,
439-454.
Clipmarks: Bite-size highlights of the web <http://www.clipmarks.com> (July 2007).
200
[15] Conklin, J., Selvin, A., Buckingham Shum, S. and Sierhuis, M. (2001)Facilitated Hypertext for
Collective Sensemaking: 15 Years on from gIBIS. Proc. of Hypertext ’01, ACM Press , 123124.
[16] Cutrell, E., Robbins, D., Dumais, S. and Sarin, R. (2006). Fast, Flexible Filtering with Phlat –
Personal Search and Organisation Made Easy. Proc of CHI 2006, ACM Press, 261-270.
[17] Davies, J. and Weeks, R. (2004). QuizRDF: Search Technology for the Semantic Web. Proc of
HICSS ‘04, IEEE.
[18] Denning, P.J. (2006). Infoglut. Communications of the ACM, 49, 7, 15-19.
[19] Ding, L., Finin, T, Joshi, A., Pan, R., Scott Cost, R., Peng, Y, Reddivari, P.,Doshi, V. and Sachs,
J. (2004). Swoogle: a search and metadata engine for the semantic web. Proc. of CIKM ‘04,
ACM Press, 652-659.
[20] Domingue, J. and Dzbor, M. (2004). Magpie: Supporting Browsing and Navigation on the
Semantic Web, Proc of IUI ‘04, ACM Press (2004), 191-197.
[21] Evans, M.P., Newman, R., Putnam, T. and Griffiths, D.J.M. (2005). Search Adaptations and the
Challenges of the Web, IEEE Internet Computing, 9, 3, 19-26.
[22] Fan, W., Wallace, L., Rich, S. and Zhang, Z. (2006) Tapping the Power of Text Mining.
Communications of the ACM, 49, 9 (2006), 77-82.
[23] Ferragina, P. and Gulli, A. (2005). A Personalized Search Engine Based on Web-Snippet
Hierarchical Clustering. Proc. of WWW2005, ACM Press, 801-810.
[24] “Metadata? Thesauri? Taxonomies? Topic Maps!”.
<http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html> (July 2007)
[25] Google Notebook. <http://www.google.com/notebook> (July 2007).
[26] Grigoris, A and Van Harmelen, F. (2004). A Semantic Web Primer, MIT Press, Cambridge,
USA.
[27] Huynh, D., Mazzocchi, S. and Karger, D. (2005). Piggy Bank: Experience The Semantic Web
Within Your Web Browser, Proc. of ISWC 2005, Springer Lecture Notes in Computer Science,
413-430.
[28] Halasz, F.G., Moran, T.P. and Trigg, R.H. (1986). Notecards in a nutshell. Proc. of SIGCHI/GI
1986, ACM Press, 45-52.
[29] Jones, W., Munat, C., and Bruce, H. (2005). The Universal Labeler: Plan the Project and Let
Your Information Follow, Proc. of ASIST 2005, 42.
[30] Jones, W., Phuwanartnurak, A.J., Gill, R. and Bruce, H. (2005). Don’t Take My Folders Away!
Organising Personal Information to Get Things Done. Proc. of CHI2005, ACM Press, 15051508.
[31] SMORE – Semantic Markup, Ontology, and RDF Editor.
<http://www.mindswap.org/papers/SMORE.pdf> (July 2007).
[32] Kahan, J and Koivunen, M. (2001). Annotea: An Open RDF Infrastructure for Shared Web
Annotations. Proc. of WWW10, ACM Press (2001), 623-632.
[33] Karger, D.R. and Jones, W. Data Unification in Personal Information Management,
Communications of the ACM, 49, 1 (2006), 77-82.
[34] Karger, D.R. and and Quan, D. (2004). Haystack: A User Interface for Creating, Browsing, and
Organizing Arbitrary Semistructured Information, Proc. of CHI2004, ACM Press, 777-778.
[35] Kraft, R. and Stata, R. (2003). Finding Buying Guides with a Web Carnivore, Proc. of the First
Conference on Latin American Web Congress, IEEE Computer Society, 84-92.
[36] Kelly, D. (2006). Evaluating Personal Information Management Behaviors and Tools,
Communications of the ACM, 49, 1, 84-86.
[37] Googling from a Concept Map: Towards Automatic Concept-Map-Based Query Formulation.
<http://cmc.ihmc.us/papers/cmc2004-225.pdf> (July 2007).
[38] Martin, I and Jose, J.M. (2003). A Personalised Information Retrieval Tool. Proc. of SIGIR’03,
ACM Press, 423-424.
[39] Marshall, C. and Shipman, F. (1997). Spatial hypertext and the practice of information triage.
Proc. of Hypertext ’97, ACM Press, 124-133.
[40] Zelkowitz, M.V., Wallace, D.R. and Binkley, D.W. (2003) Experimental validation of new
software technology, Software Engineering and Knowledge Engineering (Lecture notes on
empirical software engineering), World Scientific Publishing Co, 229-263.
201
[41] Millen, D.R., Feinberg, J. and Kerr, B. (2006). Dogear: Social Bookmarking in the Enterprise.
Proc. of CHI2006, ACM Press, 111-120.
[42] Mind Maps. <http://en.wikipedia.org/wiki/Mind_map> (July 2007).
[43] Nakakoji, K., Yamamoto, Y., Takada, S. and Reeves, B.N. (2000). Two-dimensional spatial
positioning as a means for reflection in design. Proc. of Conference on Designing interactive
systems: processes, practices, methods, and techniques (DIS ’00), ACM Press, 145-154.
[44] Net Snippets. <http://www.netsnippets.com> (July 2007).
[45] Nakakoji, K., Yamamoto, Y., Takada, S. and Reeves, B.N. (2003). ScentTrails: Integrating
Browsing and Searching on the Web, ACM Transactions on Computer-Human Interaction
(TOCHI), 10, 3, 177-197.
[46] The TAO of Topic Maps; finding the way in the age of Infoglut.
<http://www.idealliance.org/papers/dx_xmle03/papers/02-00-04/02-00-04.pdf> (July 2007).
[47] Information Foraging. <http://www2.parc.com/istl/groups/uir/publications/items/UIR-1999-05Pirolli-Report-InfoForaging.pdf> (July 2007).
[48] Preece, J.,Rogers, Y., and Sharp, H. (2007) Interaction Design: Beyond Human-Computer
Interaction (2nd Edition), Wiley Publishing, USA.
[49] Qu, Y. and Furnas, W. (2005). Sources of Structure in Sensemaking. Proc. of CHI2005, ACM
Press, 1989-1992.
[50] DeepaMehta-A Semantic Desktop. <http://www.deepamehta.de/ISWC-2005/deepamehta-paperiswc2005.pdf> (July 2007).
[51] Russell, D, Stefik, M., Pirolli, P., and Card, S. (1993). The Cost Structure of Sensemaking. Proc
of InterCHI ‘93, ACM Press, 269-276.
[52] Schraefel, M.C., Zhu, Y., Modjeska, D., Wigdor, D. and Zhao, S. (2002). Hunter Gatherer:
Interaction Support for the Creation and Management of Within-Web-Page Collections. Proc. of
WWW2002, ACM Press, 172-181.
[63] Selvin, A.M. and Buckingham Shum, S.J. (2005). Hypermedia as a Productivity Tool for
Doctoral Research. New Review of Hypermedia and Multimedia, 11, 1, 91-101.
[54] Teevan, J., Alvarado, C., Ackerman, M.S. and Karger, D.R. (2004) The perfect search engine is
not enough: A study of orienteering behaviour in directed search, Proc. of CHI2004, ACM
Press, 415-422.
[55] Uren, V., Buckingham-Shum, S., Bachler, M. and Li, G. (2006). Sensemaking Tools for
Understanding Research Literatures: Design, Implementation and User Evaluation. Int. Journal
Human Computer Studies, 64, 5, 420-445.
[56] Zelkowitz, M.V. and Wallace, D.R. (1998). Experimental Models for Validating Technology,
IEEE Computer, 31, 5, 23-31.
[57] Zellweger, P.T., Mackinlay, J.D., Good, L., Stefik, M. and Baudisch, P. (2003). City Lights:
Contextual Views in Minimal Space. Proc. of CHI2003, ACM Press, 838-839.
[58] OntoSearch: An Ontology Search Engine. <http://www.csd.abdn.ac.uk/~yzhang/AI-2004.pdf>
(July 2007).
[59] Klein, G. et al. (2006). Making Sense of Sensemaking 2: A Macrocognitive Model, IEEE
Intelligent Systems, 21, 5, 1541-1672.
[60] Marchionini, G. (2006). Exploratory search: from finding to understanding, Communications of
the ACM, 49, 4, 41-46.
[61] Elsweiler, D. and Ruthven, I. (2007). Towards task-based personal information management
evaluations. Proc. of SIGIR ‘07, ACM Press, 23-30.
202
A Pedagogical-based Framework for the Delivery of Educational Material to
Ubiquitous Devices
Authors: C O. Nualláin*, Dr S Redfern
'HSDUWPHQWRI,QIRUPDWLRQ7HFKQRORJ\1DWLRQDO8QLYHUVLW\RI,UHODQG*DOZD\
,UHODQG
&DRLPKLQRQXDOODLQ#QXLJDOZD\LH
6DP5HGIHUQ#1XLJDOZD\LH
What we are addressing in this paper is the historical failure to deliver good flexible
e-learning based on pedagogically sound, intuitive profile based learning systems with
assessment and reporting tools and which, based on the profile, adapt to the context and
style of the user. The main issues here are the ability of users to get what they asked for
in terms of a learning environment and with supporting research and data to illustrate that
the system works and would be of value, if deployed, to augment lectures and tutorials.
Currently from data collected in questionnaires we can say that how students are learning
is changing, they need to be challenged and not base their learning on memorising. The
country has a need to deliver home grown skilled graduates in the areas of Engineering
and Information Technology and we should not cut corners in getting there. With that in
mind it has been acknowledged by government bodies like Forfas that we must get the
students involved in technology and get them experimenting with material at an earlier
age in the area of electronics, and computer programming. Ultimately we need to build
up students’ curiosity into uses of technology and fortify that with problem solving skills.
Additional to this there are new advances in technology and software, like that of Web
2.0, which are making access to technology easier. The additional use of modalities like
web cams, podcasts, live video, chat, audio and SMS messaging is making it easier for
students to collaborate and share thoughts and material. This process is helping students
who are using the environment perform better. The provision of gaming environments
like LEGO, ROBOT WARS, ROBOCODE are additional games which have been very
successful in capturing student attention and providing an environment which allows the
users to learn several skills while having fun in a team based event.
Many have not taken up the challenges and opportunities which the new technologies
offer in terms of educational potential and return. This has been a problem throughout the
history of e-learning and application of learning technology where technology was
misused and misunderstood in its potential to deliver good quality educational material
which can be made available to a wide audience easily and preferably seamlessly.
By the careful analysis of the application of pedagogy strategies and learning styles we
feel part of the current problem and problems of the past 20 years may be overcome and
we feel we have turned part of the corner on that problem with our system, which takes
on board tried and tested pedagogical practices established by Socrates, Pask, Bandura,
Peppard, and Piage to name but a few. These are the forefathers of constructivism and
social constructivism. We also have established that it is not just the taking on board of
these theories but how they are applied. Most of our strategy is based around
203
collaboration, problem solving, testing, continuous challenge, assessment, feedback, fun,
and team building.
Problem Description
The goal of the research can be outlined or described by the following paragraph
Modern life is conducted at an increasingly fast pace. The effective use of time
is therefore a must. Through the use of wireless mobile devices, time spent in
transit (e.g. in a car, train, plane or bus) need not be wasted. Conversely, it
provides a valuable opportunity to study interactive educational material. Such
material must be pedagogically sound and results of trials carried out as part of
the research here have indicated that it is an effective learning tool. We aim to
make it effective in terms of High-Order Learning, Critical Thinking, Problem
Solving and through a medium with which problem solving skills may be
instilled.
The framework takes great care in its research of what effective learning and
teaching is and how it can possibly be best achieved with the result being a
number of frameworks for optimising same. Much time was also spent
researching how to affectively assess technology for its effectiveness and
opportunity to be involved in new delivery of curriculum.
The aim with this body of research is to strive for active engagement through
using well constructed material with good instructional design and the use of
new devices and file formats which are coming of age which show promise in
delivering a new type of media on smaller devices to a wider audience i.e. life
long learners. This takes advantage of the student profile on which it is based to
deliver the most appropriate material to the user at that time on what ever
device is being used at that time. In itself will allow students to join in dialogue
with class members and hence allow students the opportunity to join in
discussion which in turn allows them to feel part of the group and very much
less isolated – isolation which has in the past resulted in students dropping out
as in the case of Open University. This framework allows the student, whatever
their learning style, to become immersed in the material and also, through the
various support aspects, discuss and ultimately learn new skills based on the
material.
Currently our schools and colleges are not able to keep up with the changing needs of the
audience and the specific learning needs of the audience. By this we mean different
learning styles and modes of delivery which suit the users’ context and ultimately
learning requirement which we consider to be aided by more direct contact with the
curriculum through a micro world like environment. Here users can discuss material with
fellow students and tutors while solving problems in an active way which are driven by a
moderator. The moderator assesses all levels of student engagement so as to be able to
identify whether the student is following or not and in either case what they can do to
204
enhance their learning potential. It also provides us with a way to learn more about the
users and log them in a profile. This profile will greatly help identify failings in the
material and delivery mechanisms or even assessment methods. To that end we employ
several non traditional assessment methods which we use to augment the assessment
process like body language, mannerisms, sentence openers and active verbs. We have
been able to prove that these indirect assessment methods can be used very effectively, if
captured, as indicators to not only if the user is on board but if they are active learners
and even high order learners.
The proving of this has led to a large number of questionnaires and the capturing of all
audio and video, feedback, interview data, punctuality records and attendance records.
This took place in several programming competitions and labs over a two and a half year
process. The initial competition tried to use Blackboard and then a purpose built
e-learning platform developed in-house which did not take off for several reasons,
following which we progressed, having learned from the previous exercises, into a more
active class room environment which provided the template on which we wanted to build
an online environment. The classroom environment we organised, after much adapting
and changing of team sizes and interaction rules, worked well and it is this we wanted to
use as our template for an online version which makes up the software abstract that was
developed as part of the research and has been used to test and prove the suitability of
technology in education and the level of engagement and active learning possible with
the use of such media and modalities.
Through the evaluation of the data we have been able to prove to our satisfaction that
collaboration and more importantly collaboration with active mentoring is far more
effective than co-operation. We are also able to prove that good feedback is essential and
the timeliness of such feedback critical. It is important to be able to learn as much about
the users as possible which includes back ground information; ultimately anything that is
or can impact on their learning so we can put steps in place to lessen the impact and help
the students overcome the problem.
A vast amount of data was collected and analysed using several methods, e.g. Chi
squared, T Tests, Graphing hard data in pi charts and line charts. The analysis methods
were chosen on the basis of the type of data we had as several other methods were
examined to analyse our data but were deemed unsuitable. That said the methods selected
are probably the most standard methods used in all statistical analysis of this kind.
Through the data collected we have made findings which contribute to proving that we
have achieved our goals in the areas of assessment, collaboration, teamwork and team
sizes, e-moderator, moderators, modalities, motivation and engagement, effective
learning, pedagogy, use of technology, gaming as a metaphor for learning , profiling,
personalisation and feedback mechanisms.
205