Mobile und verteilte Systeme - Ubicomp - Teil VII (WS1011)

Transcription

Karlsruher Institute of Technology に KIT
Fakultät für Informatik
TecO/ PCS
Mobile und Verteilte Systeme
Ubiquitious Computing
Teil VII
Seminar WiSe 2010/11
Herausgeber:
Behnam Banitalebi
Predrag Jakimovski
Takashi Miyaki
Hedda R. Schmidtke
Markus Scholz
Karlsruher Institute of Technology
TecO/ PCS
Interner Bericht 06/2011
ISSN 1432-7864
Inhaltsverzeichnis
Vorwort………………………………………………………………………………………..iii
Anton Truong
Organic User Interfaces on Flexible Surfaces………………………………………………….1
Andreas Gutmann
XCS: Ein autonomes maschinelles Lernverfahren geeignet
für den Einsatz im Ubiquitous Computing…………………………………………………...17
Florian Becker
Object Recognition in Surface Computing…………………………………………………...33
Kevin Härtel
Usability und Evaluation von Pervasive Games……………………………………………...51
Michael Quednau
Überblick über die in der Industrie eingesetzten Verfahren
zur Positionsbestimmung auf Basis von 802.15.4 und 802.15.4A…………………………...67
Rayan Merched El Masri
A Survey on Brain-Computer-Interface………………………………………………………85
i
Vorwort
Die Seminarreihe Ubiquitäre Informationssysteme hat eine lange Tradition in
der Forschungsgruppe TecO. Ziel der Seminarreihe ist die Aufarbeitung und
Diskussion aktueller Forschungsfragen im Bereich Ubiquitous Computing. Seit
dem Wintersemester 2003/2004 werden die Seminararbeiten als KIT-Berichte
veröffentlicht. Seit das TecO im Wintersemester 2010/2011 Teil der Forschungsgruppe Pervasive Computing Systems wurde, findet das Seminar nun in jedem
Semester statt.
Dieser Seminarband fasst die Ergebnisse der Arbeiten der Wintersemester
2010/2011 und 2009/2010 zusammen. Die Themenvielfalt der hier zusammengetragenen Aufsätze reicht von den Möglichkeiten neuartiger User Interfaces unter
Verwendung flexibler organischer Displays, interaktiver Oberflächen und HirnComputer-Schnittstellen, über Grundlagen, wie die Verwendbarkeit autonomer
Lernverfahren für die Kontexterkennung und Verfahren der Positionsbestimmung, bis zu Bewertungsmethoden für Anwendungen im Bereich Pervasive Gaming.
Wir danken den Studierenden für ihren besonderen Einsatz sowohl während
des Seminars als auch bei der Fertigstellung dieses Bandes. Besonderer Dank geht
an Verena Noack für die technische Unterstützung bei der Zusammenstellung
dieses Bandes.
Mai, 2011
Karlsruhe
Behnam Banitalebi
Predrag Jakimovski
Takashi Miyaki
Hedda R. Schmidtke
Markus Scholz
Mobile und Verteilte Systeme - Ubiquitous Computing - Teil VII
Organic User Interfaces on Flexible Surfaces
Anton Truong
Karlsruhe Institute of Technology
Institute of Telematics - Pervasive Computing Systems
Advisor: Dr. Takashi Miyaki
[email protected]
Abstract. Due to the development of organic thin-film circuit substrates, displays become flexible and more and more thinner, so thin that
these resemble paper. Daily objects can be wrapped with these displays.
The number of computer systems in the surrounding increases. Therefore
computer systems which the user can interact with are ubiquitous. At
this time there are different definitions of Organic User Interfaces (OUI).
The author introduces one definition of OUI. This definition consists of
three ideas. 1. Input equals output: Where the display is the input device.
Deformation of the display can also be a way of interaction. 2. Function
equals form: Where the display can take on any shape. With the flexibility of the displays, daily objects can be wrapped by these. 3. Form
follows flow: Where the display can change their shape. Giving the displayed information a physical form. At the end some interaction methods
in a traditional Graphical-User-Interfaces will be listed. The author will
present for each interaction method several possible OUI-technologies.
1
Introduction
Currently we are seeing a change in Human-Computer-Interaction (HCI). First
we have the command line. Entering some commands to get the information
we are searching for. This method is tedious for most normal users who do
not get in contact with the computer very often. To overcome this obstacle the
Graphical-User-Interface (GUI) was born and for more intuitivity the mouse was
developed which allows the user to pinpoint objects on the screen. In order to
interact with the computer or manipulating objects more directly, multi-touch
screens are used.
The goal of ‘Organic User Interfaces (OUI) on Flexible Surfaces’ is not only
manipulating or interacting with the computer through planar touch screens,
but also using flexible displays. In addition the way of interaction is still performed by using flat displays for the output of information. But there are more
possibilities instead of using flat rigid displays for interaction. Developments in
display technologies like Organic-Light-Emitting-Diodes (OLED) displays, which
are already acquirable, offer new possibilities of presenting information.
With flexible displays daily objects can be enhanced, so that the object itself can display more information than real objects with printed information.
1
2
Anton Truong
Interaction with it is also possible, for example browsing. All daily objects can
become more informative and interactive. By forming the display to a corresponding form of an object, so that it is undistinguishable from the real object.
It is one goal of OUI, because the daily object should keep its original purpose
like a can still remains as a can where you can drink something out of it.
Only using the characteristics of flexible displays to adapt to the form of daily
objects, the deformation can also be a new method of interaction. Like bending
the display to invoke a requested function like zooming. Till now external devices
were used, like keyboards, mice or touch displays. Every kind of information
which is digitally presented on these flexible displays still remains digital. So
seeing the information is all the user can do. Displays with touch sensor let
us also manipulate the information by touching it, but the information is still
presented on a flat displays so feeling the object is not possible. With new display
technology which can not only display the information but also the physical form,
the interaction with these kind of displays becomes a new dimension in HCI.
In this paper three fundamental ideas of ’OUI on Flexible Surfaces’ will be
explained and complemented with some examples. The first one will show the
new possibilities of interaction with flexible displays. Further, every object becomes interactive where the displays are modeled to the corresponding form of
the object. And the last is where displays allow the user to feel the information
by creating the physical forms[8]. At the end some interaction methods in a traditional GUI will be listed and for each method several possible OUI-technologies
are presented.
2
The Ideas of Organic User Interface
There is an approach for a definition[8] of OUI which consists of three ideas:
1. ‘Input Equals Output: Where the display is the input device’. In the context
of flexible surfaces one can take advantage of the flexibility of the display
to interact with the system. For example the deformation of the display can
also be a sort of interaction with the system instead of using external input
devices.
2. ‘Function Equals Form: Where the display can take on any shape’. The
concept behind this idea is that everyday objects are wrapped with ’high
resolution, full-color, interactive graphics’. From the smallest daily objects
like a credit card to a building, regardless of its size, complexity, dynamic of
flexibility, every daily objects can have the ability of displaying graphics or
information. After wrapping an object with the display, it stays in its form
and keeps its original function such as a can where the user still thinks of it
as a can. It is rigid. Changing the form of the display also means changing
the form of the object which is wrapped by the flexible display.
3. ‘Form Follows Flow: Where displays can change their shape’, goes in a more
dynamical direction. Instead of adapting the display to the form of an object,
the display itself can change its form. Either it can change its form according
2
3
to a given pattern or due to external influence like touching it or due to
environmental circumstances.
In this survey paper these three ideas will be discussed in general and complemented with some examples.
3
3.1
Input Equals Output: Where the display is the input
device
General
To illustrate the first idea, a project called ‘PaperWindows [3]’ is presented. The
main communication medium is using normal paper. On this paper information
like a window will be projected. The paper is becoming the display. Since normal paper is used, the flexibility of the paper can be used for an other way of
interaction.
3.2
PaperWindows
One approach in creating a digital paper is ‘PaperWindows[3]’. To this time
thin and flexible displays were still in development. So this project tried to
overcome this obstacle by using normal paper for simulating thin and flexible
displays. Normal paper does not have the ability to display digital information
itself. Therefore a special environment must be set up. Projector and cameras
were used for projecting information and recognizing gestures for interaction.
By using normal paper for displaying information and cameras for recognition
of the movements, it is also possible to use deformation of the paper as a sort of
interaction[3]. For example in a business meeting, to every participant a prepared
paper will be given. So if the meeting begins the information can be displayed
directly on the paper and with its flexibility of normal paper each participant
can interact with the projected document. Also the data’s security is guaranteed,
because the information is only projected and stored in the computer system.
3.2.1 Environmental Setup For displaying and recognition a camera and
an overhead projector are required. For the recognition of the position of the
normal paper, the paper is modified with Infra Red(IR) reflective makers(see
Fig. 1). You can see this project as a ‘Marker-based System’. With this kind of
set up, the camera can detect the position and the deformation of the paper.
The projector can now use this information and project the information on the
corresponding paper with the consideration of the deformation of it.
3.2.2 What is now possible The information is already projected. Users can
now interact with information without any external input devices like a mouse
or a keyboard. For example the user can fold the paper, rub on it or flip it to
invoke several functions. These interactions methods are not only limited in the
3
4
Anton Truong
Fig. 1. PaperWindows: The markers are used for orientation to display the information on the paper[3].
deformation of the paper. Another interesting interaction possibility is moving
the information from a paper to another paper. The function copying can be
achieved by placing the paper which holds the information on top of a blank
paper and additionally rubbing on it. The content will be then copied onto the
blank paper. Annotation on the projected information is also not a problem.
Before you can use a pen to annotate information the pen itself must also be
modified with an IR reflective marker.
3.2.3 Only in a fixed environment For the realization of such kind of a
system as mentioned above, a camera and projector and also modified papers
with markers are necessary. In this project twelve cameras were used so that
they surround the workspace. The project uses only one projector which is also
put directly over the workspace. Because of this reason the range is limited to
the size of the workspace. This kind of setup consists of many cameras and a
projector, so that using this system anywhere else is not possible.
Table 1. Advantages and disadvantages of PaperWindows
Advantages
– One paper needed for
displaying several information
– More interactive than on a
printed paper
– Information security:
Information stored digitally
Disadvantages
– Several cameras are required
– Paper and writing tools
must be modified
– Fixed working environment
through the position of the
projector
4
4
5
Function Equals Form: Where the display can take on
any shape
When we are looking at the displays of desktop PCs or notebooks. They all
have several common characteristics which are for example the rectangular form
and the rigidity which is the result of the rigid background backlight. The only
things we can do with these displays are looking at it, moving it or touching it.
But folding it, is maybe not a good idea. So these are the limitations of rigid
displays. So the real world objects are more flexible e.g. normal paper. While
we can fold it to a form of a bird, with normal rigid displays we are reaching its
limits. Organic User Interfaces on Flexible Surfaces also aims for displays which
can take on any shape. Today there are already display technologies which have
the ability in taking on any shape. With the development of OLED displays, a
normal can can be wrapped with this OLED display so that the can can still
remain its meaning as a can, but has the possibility of displaying a video on it or
if it has a browser integrated, you can look at a webpage. In the end you can still
drink something out of it and handle it as a normal can. Also quite interesting is
that display technologies like E-Ink, magazines can be equipped with it instead
of normal paper. The resulting products are more interactive, like displaying a
video or just read an article like you would do with a normal magazine. In this
chapter one example will be presented[2].
4.1
Sphere
Information is displayed in its corresponding form which can be in some situations more intuitive. An opposite example is presenting a can as a ball so it
might lose its meaning as a can. But a display formed like a ball can also be
used to display information which is perfectly suitable for it. Think of a map
of the earth, it can be more intuitive if the map is displayed on such a display,
because we all know that the earth has the form of a ball. Its unique display
form has one remarkable characteristic which is the non-existent edges of todays
known displays. It can be seen as an endless display. In this chapter a technology,
Sphere, will be presented which already realized this idea of such a display. The
developer also integrated hardware which offers the possibility to interact with
the display[1].
4.1.1 New User Experience The non-visible hemisphere allows two users
to look at the display at the same time without knowing what the other user is
looking at or doing now. And also with the form of the display a user can not
look around the display. With a normal flat display this kind of simultaneous
usage is not possible without showing the things one user is doing in the moment.
Typical flat displays limits you in how to use it or how to look at it. To see the
displayed information of the display the user has to stand in front of the display.
For a spherical design like ’Sphere’ there are no limitations from where the user
can look at the display to see the information. From every angle the user can
5
6
Anton Truong
see the display. In the end with this kind of display the handling will be very
different in comparison with typical flat displays.
4.1.2 Setup The system can be divided into two parts. First is the projection
part which consists of a diffuse ball, in this project two diffuse balls with different
sizes and a high resolution DLP projector were used to build two version of
this system. The second part is the interaction part which consists of an InfraRed(IR) sensitive camera, an IR-pass filter for the camera, an IR-cut filter for
the projector, an IR-illumination ring and a cold mirror. A wide-angle lens was
also used to spread the image or information which comes from the projector.
The construction of this system is that on the top is the diffuse ball, at the
base of the ball is the illumination ring where the wide-angle lens is place in the
middle of the ring . Under that construct is the cold mirror which is used by
the DLP projector and the IR-camera with the IR pass filter in front of it (see
Fig.2).
Fig. 2. Construction of the System. Figure from Hrvoje Benko et. al [1]
4.1.3 Functionality For displaying images from the projector on the surface
of the ball, the projector sends the images from the side to the cold mirror (see
Fig.2) which is put under the base in a certain angle. The mirror reflects the
images directly into the wide-angle lens which spreads the images on the whole
inner surface of the diffuse ball. For the interaction part they used IR-elements,
because these are not visible and therefore it does not interfere with the visible
light which is send by the projector. And vice versa, to not interfere with the
IR-light which can also be sent by the projector, an IR-cut filter is put before
the projector. So with this setup no IR-light can be emitted by the projector.
IR-lights were used for sensing which are emitted by the 72 wide-angle, IR-light
emitting diodes (Illumination ring in Fig.2). The IR-pass filter is matched to the
6
7
wavelength of the IR-light emitted by the ring. So the important part of this
construction is the cold mirror which reflects light, but let IR-light pass through
it[1] (see Fig.3).
Fig. 3. Left: Using visible to display. Right: IR-light is used to sense fingers on
the display. Figure from Hrvoje Benko et. al [1]
4.1.4 Interaction In this system several applications were integrated, for
example a photo and video browsing application. This application enables the
user to drag the photos around the ’Sphere’, rotate and scale them. All the images
can be moved independently, also by more than one user at the same time. For
dragging the images around the ’Sphere’ it is realized by pointing with one finger
on the image and just drag it around(see Fig.4). Also the function auto-rotation
is integrated for dragging images around so that images get to horizontal position
like shown in the below part of the Fig.4. Like on a smartphone, scaling pictures
is also accomplished by putting two fingers on the picture and move the fingers
closer or further afar. As mentioned before, multiple users can interact with the
’Sphere’ at the same time. In the case of two people, both of them face each other
so that only each of them can see their half of the ’Sphere’, but not able seeing
what the other user is doing in the moment. It is not possible with a normal
flat display. Information exchange in this case can be performed by dragging
one information from one side to the other side but it may be laborious after
some time. Especially if the size of the diffuse ball is by several factor larger. To
ease the exchange of information the developers integrated a pseudo networking
without any network devices. For exchanging information e.g. without walking
around the ’Sphere’, the picture can be sent to the other side just by touching
it with the whole hand like shown in the Fig.5.
7
8
Anton Truong
Fig. 4. Dragging pictures with auto-rotation. Figure from Hrvoje Benko et. al
[1]
Fig. 5. Network: Sending a picture to the other side by laying the whole hand
on it. Figure from Hrvoje Benko et. al [1]
Table 2. Comparison of Sphere and a flat display
Sphere
– Endless display surface
– Two users can have their
own privacy
– Displaying information in a
curved way
5
Flat display
– Four edges
– User can see what the other
user is doing
– Information is displayed on a
flat surface
Form Follows Flow: Where displays can change their
shape
At first there was the television with black and white images displayed on the
screen. Later on the television were able to display videos in colors. This way
the displayed images are more natural. For more realism the resolution began to
increase. The direction of such displays aims continuously towards more realism.
As seen in the development of blue ray discs which can contain a very large
size of information such as high resolution images and videos. Therefore displays
8
9
with higher resolution are required to show this information on the display in its
full resolution. Comparing the old displays which are not full HD with the other
one and looking on these two displays with the same images, the full HD display
gives the user a more realistic view of the images or videos which are displayed
on the screen.
The next step towards more realism was the development of 3D displays
where images and videos can be viewed in the third dimension. Users can see
the pictures but also the depth. Though the user can not touch the objects which
are coming out of the 3D display, with this step of technology the perception
of the shown information increases in its realism. With 3D display technology
the digital information ’tries’ to break through the digital wall and get into the
reality. But with all the effort, digital information remains digital information.
So the last idea of the OUI - Form Follows Flow: Where displays can change
their shape is giving the displayed information a physical shape. Kinetic design
aims to give digital information a physical form and motion. In the context of
Organic User Interfaces designing displays which can give the digital information
a physical shape is not the only goal, but designing kinetic interaction with the
physical objects as well. For presenting the physical shape of digital information
some actuation must be performed.
For example, the display can only present one pixel at a time and to give this
pixel a physical shape, in this case a height, some kind of actuations like pushing
up the pixel with a motor which lies behind the pixel have to be done. So in
the case of a normal flat display if it shows only the form, color and movement
of the image the user can only see it, but with kinetic interfaces one can not
only visually perceive the information, but also feel and hear how the shapes are
created which are actuated by some mechanics like a motor. Designing kinetic
interfaces need not only to consider the quality of the displayed information but
also the speed, direction and range of the motion of interface elements[4]. One
example will be presented in this chapter, the display Lumen.
5.1
SmartSkin
SmartSkin [7] is a sensing architecture based on capacity sensing and also used
in the display Lumen [5] which will be presented in the following chapter. It
means no pressure on the display is required to detect the position of the object
pointed on the display like a finger or a whole hand. In this project Rekimoto
used for the experiment a mesh-shaped design. But the design of the SmartSkin
is not bounded to this kind of form, so any layout is imaginable. The SmartSkin consists of transmitter and receiver electrodes which are made by copper
wires and are integrated into the surface. Because the electrodes are only copper
wires, interactive paper can be created which can be large, thin or even flexible.
So this technology is quite a benefit for the interaction with flexible displays. The
architecture of the sensor is a grid-shaped surface which consists of a mesh of
transmitter and receiver made of flexible copper wires. Horizontal wires are the
receiver electrodes and the vertical wires are the transmitter electrodes(compare
9
10
Anton Truong
Fig.6). And the crossing points present the (very weak) capacitor. When a conductive and grounded object like a finger approaches the capacitor, it drains the
wave signal by measuring that the position of the finger can be detected. SmartSkin is quite suitable for flexible displays because of its thinness, scalability and
flexibility[7].
Fig. 6. Architecture of SmartSkin [7]
5.2
Shape Memory Alloy
The Shape Memory Alloy (SMA) [9] are used as actuactors in the display Lumen
which is presented after this paragraph. With SMA materials can have a memory.
A memory of an original form which the material takes on when electricity goes
through and heats it up. Like when its original form is a curved staff, but if it is
cooled down the material has the shape of a straight staff. Heating it up would
result that the staff goes to its original curved form. This characteristic can be
used in several areas of technologies, but especially for shape displays where it
can change its own shape. So this technology can be used as actuators for shape
displays like in the display Lumen1 . This effect is called shape memory effect. It
can be divided into two parts:
– One - way shape memory effect – A material which is modeled differently
from its original form by external forces but switches back to its original form
when it is heated up through electricity. After heating up the material it stays
in its original form[9].
– Two - way shape memory effect – This kind of effect is the same as
the ’One-way shape memory effect’ except the part after heating up the
material. Heating up would cause a deformation to its original form, but
1
http://ivanpoupyrev.com/projects/lumen.php
10
11
cooling down the material will cause a deformation of the material to its
form before heating it up[9].
5.3
Lumen
The display Lumen [5] (see Fig.7) belongs to the recently upcoming interactive
and actuated displays called shape displays[6]. The display itself has a resolution
Fig. 7. Display Lumen. Touching the physical shape
of 13x13 pixels. So it is a low-resolution display therefore it is not suitable for
displaying high quality images. One can think that this display with its limitation
in the low-resolution may affect the realism of the displayed pictures, but it
overcomes this obstacle by adding a new dimension to increase the realism in a
new way. The display itself can display normal images but also the physical form
by modifying the height of each pixel. The pixel in this display are called light
guides. Each light guide can independently display different colors1 . With adding
the possibility to change the height of each light guide this can be suggested as
an extension of the RGB concept to RGB-Height [6] (RGBH, see Fig.8).
5.3.1 Displaying and Interaction With this kind of technology, displaying
images or motions can be separated into two parts. The first one is for example
the asynchronous display of two images which means that one part is a traditional
image without a modification of height and the other one can be physically
moving images which are perceived by manipulating the height of each pixel.
For example water flowing from a tap where the pixels are manipulated in their
height so seeing and feeling the water is possible meanwhile the tap is just an
image. The other part is where an image gets a completely physical shape.
11
12
Anton Truong
Fig. 8. RGBH: To the additive colors (red, green, blue) the element height is
added.
One group of people can benefit from such display technology. For people who
are visually handicapped, it is easier to touch things or in this case the images,
instead of explaining to them what is now displayed on the screen. Integrating
brail could be a greater enrichment for these group of people[6].
On the Lumen surface is a custom two-dimensional capacitive sensor (SmartSkin) integrated so that an interaction with the system is also possible. Therefore
control elements, like buttons for play back, can be created by the display. Implementing Lumen everywhere in a house can make the interaction with the system
more comfortable. Everywhere at any time buttons or control elements can be
created so that the user can interact with the system without going directly to
the control station. Especially handicapped people can interact everywhere. Alternatively instead of a shape display a normal flat display like OLED displays
can be used, but visually handicapped people cannot benefit from this approach,
because the property to see the control elements is not given. Additionally the
display Lumen can also replace several control devices which consist only of buttons(see left Fig.9). Instead of switching between the devices the user can just
use the display Lumen as an All-in-one remote control to communicate with the
devices where before each of the devices were controlled by one control system.
Mobile phones, chatting, VoIP and video chat are all means of communication
over a short or long distance. In one case the communication can be performed
aurally in the other case images of the participant of the current communication
session are send over the connection so seeing each other is possible. Adding
to the display Lumen a network interface and having a display Lumen on each
end of the connection a remote communication is also imaginable. For example if
there are two devices of this display which are connected over a network interface,
also physical shapes can be sent. Imagine two people are communicating now
and one person lays his or her hand (see Fig.9 right) on the display. The outline
of the hand and the color can be displayed on the other side but also through the
actuation a physical shape of the hand can be created. So the other participant
of the communication is not just seeing the outline, color, but also the physical
shape and he or she is able to touch the physical shape. This set a new part in
12
13
HCI. It is not only seeing and hearing but also feeling the information can be in
the future common mean of communication1 .
Fig. 9. Remote Communication(left: Buttons are created, right: On one side the
button can be pushed and on the other side one can see and feel the pushed
button[6]
5.3.2 Technical Realization The display Lumen is constructed as a twodimensional array like a normal display but with a lower resolution (13x13 pixels), so the basics for displaying images are grounded. Each of the array consists
of several light guides (see right Fig.10). On the bottom of each light guide are
LEDs mounted and for an even illumination the surface of each light guide is
covered with matte cups. Manipulation of each light guide is performed by the
Shape-Memory-Alloy (SMA)-strings which are attached to each of them (see
left Fig.10). If electricity is going through one of these strings, they contract and
push the guides up. After cooling down which does not take too long, the light
guides go back to their initial height1 .
Fig. 10. Left: The SMA attached to each light guide, Right: Array of light guides
[6]
13
14
Anton Truong
Table 3. Comparison: Traditional GUI and OUI
Interaction in a GUI
Possible OUI-Technologies
– Pointing on the display
– Possible technology: Sphere
– Flexible displays with
SmartSkin integrated
– Bending the display
– The flexibility of
paper used in
PaperWindows such
deformation is possible
– Also flexible displays like
OLED-displays
– Creating shapes which
represents the alert
– Display Lumen can shape
an exclamation mark
– PaperWindow: Picking up a
paper displays the window on it
14
6
15
Discussion
The development of flexible displays like OLED displays can be used to present
old information in a new way. Every curved object can be wrapped with this
kind of technology to make it more informative than printed information and
interactive like showing a commercial video on a can. PaperWindows is limited
to its working environment, because of the technical setup it can not be used
everywhere. Further technologies like E-Ink are already on the market, also color
E-Ink will be presented in the next time, but because of its slow update rate and
processor, it is difficult to show videos on it. OLED is an option but with its
energy consumption in comparison with E-Ink it may not suit as a newspaper
or magazine.
Giving the information a certain shape so that feeling the information is
possible like the display Lumen does, it can also be a new of interaction and
communication. But with this state of the art only less information can be displayed because of its low resolution. For more realistic presentation of information such as high resolution pictures the size of the light guides must decrease
and the resolution increase. The problem are the actuators which are mechanical
elements which have their limits in how small the actuators can get. New kind
of actuators must be developed to overcome this obstacle, so that the display
Lumen can show more realistic images.
From the same developer who already developed a new kind of shape display
called ’TeslaTouch2 ’ which comes without any mechanical actuators. Instead it
uses the electrovibration principle. It can for example simulate the consistency of
sand. The user can touch the display and he can feel the texture of the sand. But
seeing and feeling the physical shape is not possible with this display, because
this display does not really create physical shapes.
7
Conclusion
The three ideas of Organic User Interface on Flexible Surfaces were presented.
By using the flexibility of the display they can be deformed and at the same the
deformation can be used as an input element to interact with the computer system. Like bending down- and upwards the display can invoke the zoom function.
No need of external devices or displayed buttons on the display any more. It can
also be an intuitive way of interaction. Therefore the display itself is the input
device. The second presented idea is that the display can take on any shape.
Merging real objects with the display, so that it can not be distinguish if it is
just an normal object or an enhanced one with the possibility of interaction like
video browsing. Like the Sphere example, a normal diffuse ball was used to display the map of the earth. It brings the display to its corresponding form. With
sensors built in, interaction can also be performed. With last the idea, where
the display can change their form, the display Lumen is one example which can
create physical forms but also movements. So feeling becomes a new part of HCI.
2
http://teslatouch.com
15
16
Anton Truong
At last for each interaction method in a traditional GUI different possible OUItechnologies were presented. Like the zooming function which can be perceived
by bending the display.
References
1. Hrvoje Benko, Andrew D. Wilson, and Ravin Balakrishnan. Sphere: multi-touch interactions on a spherical display. In Proceedings of the 21st annual ACM symposium
on User interface software and technology, UIST ’08, pages 77–86. ACM, 2008.
2. David Holman and Roel Vertegaal. Organic user interfaces: designing computers in
any way, shape, or form. Communications of the ACM, 51:48–55, June 2008.
3. David Holman, Roel Vertegaal, Mark Altosaar, Nikolaus Troje, and Derek Johns.
Paper windows: interaction techniques for digital paper. In Proceedings of the
SIGCHI conference on Human factors in computing systems, CHI ’05, pages 591–
599. ACM, 2005.
4. Amanda Parkes, Ivan Poupyrev, and Hiroshi Ishii. Designing kinetic interactions
for organic user interfaces. Communications of the ACM, 51:58–65, June 2008.
5. Ivan Poupyrev, Tatsushi Nashida, Shigeaki Maruyama, Jun Rekimoto, and Yasufumi Yamaji. Lumen: interactive visual and shape display for calm computing. In
ACM SIGGRAPH 2004 Emerging technologies, SIGGRAPH ’04, pages 17–. ACM,
2004.
6. Ivan Poupyrev, Tatsushi Nashida, and Makoto Okabe. Actuation and tangible user
interfaces: the vaucanson duck, robots, and shape displays. In Proceedings of the
1st international conference on Tangible and embedded interaction, TEI ’07, pages
205–212. ACM, 2007.
7. Jun Rekimoto. Smartskin: an infrastructure for freehand manipulation on interactive surfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, CHI ’02, pages 113–120.
ACM, 2002.
8. Ivan Poupyrev Roel Vertegaal. Organic user interfaces. Communications of the
ACM, 51(6):26–30, 2008.
9. Yee Harn Teh and Featherstone Roy. An architecture for fast and accurate control
of shape memory alloy actuators. Int. J. Rob. Res., 27:595–611, May 2008.
16
XCS: Ein autonomes maschinelles Lernverfahren
geeignet für den Einsatz im Ubiquitous
Computing
Andreas Gutmann
Karlsruher Institut für Technologie, 76131 Karlsruhe, Deutschland
Abstract. Diese Seminararbeit widmet sich dem autonomen maschinellen
Lernverfahren XCS mit dem Einsatzgebiet ubiquitous Computing. Im
ersten Teil werden zwei Verfahren mit dem Ziel der Klassifizierung, namentlich Principal Component Analysis und Kohonennetze, vorgestellt.
Diese Verfahren eignen sich aufgrund ihres niedrigen Ressourcenverbrauchs
hervorragend für den Einsatz im ubiquitous Computing. Kohonennetze
sind gute Klassifizierer und die Principal Component Analysis dient der
Vorverarbeitung der Messwerte durch Entrauschung sowie der Erkennung der Anzahl der Klassen. Somit können entrauschte und klassifizierte Daten als Grundlage für die weitere Verarbeitung genutzt werden.
Der zweite Teil behandelt das Learning Classifier System sowie deren
XCS-Erweiterung. Diese sind Reinforcement-Lerner mit einer genetischen Komponente. Sie eignen sich zum Lernen durch Interaktion mit der
Umwelt und lernen aus dem jeweiligen Feedback. Aus der Komposition
beider Teile lässt sich ein autonom lernendes System in eine ubiquitären
Umgebung einbetten.
1
Einleitung
Das Ubiquitous Computing übernimmt seit geraumer Zeit Verfahren und Methoden aus der Künstlichen Intelligenz (KI). Ohne diese Verfahren und Methoden würde ein ausgeliefertes System immer auf den gleichen Stand bleiben.
Veränderungen im Verhalten müssten von Hand eingestellt werden, was nur
einem Experten möglich ist, oder in regelmäßigen Updates aufgespielt werden.
Aber auch dann sind diese Updates nur ein paar nette und sehr allgemeine
gehaltene Features.
Im Ubiquitous Computing ist viel Wissen aus diversen Bereichen über die
Kontexterkennung gebündelt. Eine Vielzahl unterschiedlicher Sensoren liefert
unverarbeitete Daten in begrenzten Mengen. In den meisten Systemen sind
allerdings aus kausalen Gründen die Sensorwerte ungenau und fehleranfällig,
zugunsten der Usability. Diese Daten sind somit für sich genommen für nichts
zu gebrauchen. Sie können zwar nach bekannten Mustern durchsucht werden,
aber diese Muster sind in einer individuellen Umwelt zwecklos.
Durch die maschinellen Lernverfahren kann auf den Daten nach unbekannten Mustern und Regeln gesucht werden. Unbekannt weil sich Mensch und
17
2
Umgebung bei jedem Gerät unterscheiden. Durch diese einzigartigen Erkenntnissen kann das System verstärkt an die Bedürfnisse des einzelnen Benutzers
und seiner Umgebung angepasst werden. Somit können Aussagen über die Vergangenheit, Gegenwart und Zukunft getroffen werden. Beispielsweise kann ein
’Smart Device’ seinen Benutzer nicht nur an nicht abgeschlossene Tätigkeiten
erinnern, sonder es kann ihm auch empfehlen was als nächstes zu tun sei und
ggf. Vorkehrungen dafür treffen. Im besten Fall kann es diese Tätigkeit sogar
anstelle seines Benutzer ausführen. Das bedeutet ein solches System kann nicht
nur die Umwelt erkennen, sondern auch interpretieren.
Bei vielen Verfahren ist eine zeitintensive Trainingsphase notwendig. Zusätzlich
zum Zeitaufwand ist auch noch die Überwachung durch einen Experten notwendig,
welcher Trainigsbeispiele auswählt und die Lernparameter festlegt. Dieser Aufwand
lässt sich mit den Vorstellungen eines benutzerfreundlichen Systems im Ubiquitous Computing nicht in Einklang bringen. Um die Usability zu gewährleisten
muss also ein aufwändiger Trainingsprozess vermieden werden. Deshalb sollen
die ubiquitären Systeme nicht nur in der Lage sein sich an ihren einzigartigen
Einsatz anzupassen, also zu lernen, sie sollen es auch noch autonom tun.
All diesen Ansprüchen soll die hier vorgeschlagene Konkatenation von Algorithmen gerecht werden.
2
Anforderungen an das System
Nach Jakob Nielsen von der ’Technical University of Denmark’ in Kopenhagen
definiert sich Usability durch folgende fünf Punkte[1] :
– Learnability
Ein System sollte
– Efficiency
Ein System sollte
– Memorability
Ein System sollte
– Errors
Ein System sollte
– Satisfaction
Ein System sollte
einfach zu erlernen sein
effizient in der Nutzung sein
einfach zu merken sein
eine niedrige Error-Rate haben
den Nutzer zufriedenstellen (nicht frustrieren)
Im Bezug auf autonome maschinelle Lernverfahren verstehen wir hierunter:
– Learnability
Das System benötigt minimale menschliche Interaktion, denn mehr Interaktion bedeutet mehr Expertenwissen
– Efficiency
Das System soll verwertbare Ergebnisse erzeugen
– Memorability
Die notwendige menschliche Interaktion soll minimales bis garkein Expertenwissen vorraussetzen
18
3
– Errors
Das System soll erfolgreich, also mit minimalem absolutem Fehler, lernen
– Satisfaction
Das System soll für den Menschen eine Arbeitserleichterung sein, kein Mehraufwand
Das hier in Teilen theoretisch konstruierte ubiquitäre System soll diesen
Ansprüchen bestmöglich gerecht werden.
2.1
Anforderungen an die Sensorik
Um etwas zu lernen benötigen wir Daten aus denen wir lernen können. Dazu
bieten sich eine Vielzahl unterschiedlicher Sensoren wie z.B. RFID-Chips, Bewegungssensoren, Temperatursensoren, Kamerabilder, ... an, welche entsprechend
dem Einsatzbereich ausgewählt werden. Aufgrund der Art des Sensors lassen sich
die Daten bereits nach der Art ihrer Information wie z.B. Position, Gemütszustand,
Aktivität, Uhrzeit, Bedürfnisse, ... unterscheiden.
Zur Sensorik existieren im Ubiquitous Computing reichhaltige Erfahrungswerte.
Deshalb kann hier ein angepasstes und funktionierendes Netz aller notwendigen
Sensoren als existent angenommen werden.
2.2
Anforderungen an die Methodik
Ubiquitäre Systeme unterliegen zumeist einigen Einschränkungen welche sich
auf die verwendbaren Verfahren zum autonomen maschinellen Lernen auswirken.
Die meisten Einschränkungen sind in der physikalischen Größe des Systems zu
finden. So stehen nur kleine Mengen Arbeitsspeicher und Cache zur Verfügung.
Größere Datenaufkommen müssen auf externe Systemen ausgelagert werden.
Zusätzlich wird die algorithmische Leistungsfähigkeit durch die verwendeten
Prozessoren, welche die Taktung zugunsten der Größe und dem Energieverbrauch
opfern, beschränkt.
2.3
Zielsetzung für die Methodik
Ein autonomes, maschinell lernendes System soll anhand einer begrenzten Zahl
an Beobachtungen oder einer Aufgabenbeschreibung und dem zugehörigen Ziel in
der Lage sein selbstständig zu klassifizieren, zu interpretieren und ein Ziel durch
eine Reihe von Aktionen zu erreichen. Zusätzlich soll das System aufgrund der
geforderten Usability jedes Berechnungsergebnis in nahezu Echtzeit erreichen.
3
Klassifizierung
Gefordert ist als erster Schritt also ein System, das Eingaben aus einer unbekannten Umgebung intern geschickt repräsentieren kann. Eine solche Transformation der unverarbeiteten ”rohen” Daten aus der Umwelt bietet Vorteile für die
19
4
weitere Verarbeitung. Somit kann eine im Ubiquitous Computing wichtige Zeitund Ressourcenersparnis erzielt werden, ab jetzt als Effizienz bezeichnet.
In den meisten Fällen wird versucht Regularitäten aus den Eingaben zu extrahieren und in eine geeignete interne Repräsentation zu transformieren, so dass
ohne detailiertes Vorwissen über später zu lösende Aufgaben ein Effizienzgewinn
zu erwarten ist. Es wird also versucht die von Sensoren gemessenen Signale zu
kategorisieren um die folgende Verarbeitung auf den Daten zu erleichtern.
Eine weitere Effizienzsteigerung kann erzielt werden wenn ”verrauschte” Daten
”entrauscht” werden, bzw. Faktoren, welche irrelevant (da z.B. konstant) sind,
eliminieret werden. Durch eine solche Reduktion der Dimension des Eingaberaums kann anschließend mit den Daten schneller gearbeitet werden.
Um diese Ziele zu erreichen wird hier die Konkatenation zweier Algorithmen
vorgeschlagen:
Die Principal Component Analysis soll die Sensorwerte ”entrauschen” und die
Klassifizierung vorbereiten. Ein Kohonennetz soll anschließend die Klassifizierung
durchführen.
3.1
Principal Component Analysis (PCA)
Die Principal Component Analysis, zu deutsch Hauptkomponentenanalyse, wird
zur Dimensionsreduktion, mit der Zielsetzung unwichtige Faktoren aus den Sensorwerten auszusortieren, verwendet. Die unwichtigen Faktoren äußern sich dadurch
dass sie sich kaum verändern, bzw. nur in einem kleinen Bereich schwanken.
Solche Faktoren können zum Beispiel ein Hintergrundrauschen bei einer Tonaufnahme, minimale Temperaturschwankungen durch Körperwärme oder einfache
Messfehler sein.
Obwohl es eine große Zahl an Verfahren gibt welche diese Dimensionsreduktion erreichen können empfehle ich die Verwendung der PCA aus folgendem
Grund:
Nach der Dimensionsreduktion lässt sich aus einer mittels PCA neu gewonnen
Basis ohne großen weiteren Aufwand die Anzahl der Klassen im Eingaberaum
bestimmen. Die Anzahl der Klassen wird zur korrekten Klassifikation mittels Kohonennetzen - wie weiter unten gezeigt - benötigt. Auch ein alternativer Klassifikator ändert diese Empfehlung nicht, denn alle mir bekannten autonomen
Klassifikatoren, welche eine zu Kohonennetzen vergleichbare Effizienz haben,
benötigen als Vorwissen die Anzahl der Klassen.
Es wird bei PCA also implizit von der Annahme ausgegangen dass im Eingaberaum die Richtung mit der größten Streuung (Varianz) auch die meisten Informationen enhält und im Umkehrschluss die Richtung mit der kleinsten Streuung
die wenigsten Informationen. Wenn die Streuung unter einem kritischen, zuvor
definierten Wert liegt können die darin enthaltenen Informationen als entbehrlich
betrachtet und weggelassen werden. Dies kann (vereinfacht) folgendermaßen erreicht werden:
1. Berechne die Kovarianzmatrix
20
5
2. Berechne die Eigenvektoren und -werte der Kovarianzmatrix
3. Sortiere die Eigenvektoren absteigend nach Größe
4. Entferne die Eigenvektoren welche unter dem zuvor definierten kritischen
Wert liegen (die übrigen bilden eine neue Basis)
Fig. 1. Die größte Varianz in der Punktwolke enthält die meisten Informationen
Zur Bestimmung der Anzahl der Klassen im Eingaberaum betrachten wir nun
die neue Basis aus orthogonalen Vektoren. Die neue Basis bewirkt eine Rotation
derart, dass die Richtung der größten Varianz mit der ersten Koordinatenachse
übereinstimmt, die Richtung der zweitgrößten Varianz mit der zweiten Koordinatenachse und so weiter.
Als nächstes werden die einzelnen Punkte auf die neuen Koordinatenachsen
projeziert. Das Ergebnis dieser Projektion lässt sich mit einer Funktion fn (x) =
y für jede der n Koordinatenachsen darstellen. Der Wert y gibt dabei die Anzahl
der Punkte an, welche auf x projeziert werden. Wie leicht erkannt werden kann ist
fn (x) nicht zwingend stetig (siehe Fig.2). Allerdings kann ohne großen Aufwand
′
eine stetige Annäherung fn (x) an fn (x) erstellt werden.
Aus den lokalen Maxima dieser Funktionen lässt sich nun die Anzahl der
Klassen in der Punktwolke errechnen.
Fig. 2. Aus der Projektion der einzelnen Punkte entsteht eine unstetige Funktion
21
6
3.2
Kohonennetze
Kohonennetze sind ein Spezialfall künstlicher neuronaler Netze. Bevor Kohonennetze behandelt werden können müssen deshalb künstliche neuronale Netze
behandelt werden:
Künstliche neuronale Netze sind an das menschliche Gehirn angelehnt.
Computer können durch passende Algorithmen die meisten Aufgaben schneller
lösen als ein Mensch. Anders ist es allerdings wenn kein passender Algorithmus
bekannt ist. Das menschliche Gehirn kann beispielsweise Gesichter, selbst wenn
diese sich verändert haben, deutlich schneller erkennen. Es erreicht auch eine
weit bessere Erkennungsrate als jeder bekannte Algorithmus. Ebenso verhält
es sich bei einem schwer leserlichen Text, bei welchem zum Beispiel die Buchstaben verwischt oder vertauscht sind. Die Idee ist daher die Arbeitsweise des
menschlichen Gehirns auf den Computer zu übertragen.
Ein neuronales Netz ist ein Paar (N,V) mit einer Menge N von Neuronen
und einer Menge V von Verbindungen. Es besitzt den Aufbau eines gerichteten
Graphen. Die Knoten des Graphen heißen Neuronen und die Kanten des Graphen
heißen Verbindungen. Jedes Neuron kann über eine beliebige Menge von Verbindungen seine Eingaben empfangen und über eine beliebige Menge von Verbindungen
eine Ausgabe versenden. Das neuronale Netz erhält aus Verbindungen aus der
”Außenwelt” seine Eingaben (Eingabeschicht) und versendet über Verbindungen
zur ”Außenwelt” seine Ausgaben (Ausgabeschicht).
Kohonennetze eignen sich besonders gut für die Klassifikation von Sensorwerten ubiquitärer Systeme. Sie können nicht nur autonom, also ohne menschliche
Überwachung, lernen. Sie stechen ebenso durch ihre – im Vergleich mit anderen
Klassifikatoren – einfach Implementierung und hohe Effizienz hervor. Nach einer
Einlernphase klassifiziert das Kohonennetz zuverlässig und schnell, da nur der
euklidische Abstand zwischen Sensorwerten und allen Neuronen im Kohonennetz berechnet und verglichen werden muss. Das entspricht bei n Neuronen und
einer vollständigen Suche also einem Aufwand von O(n) und lässt sich unter
zusätzlichen Annahmen sogar noch weiter beschränken.
Die Neuronen in einem Kohonennetz sind als eindimensionale Kette (siehe
Fig.4) oder als zweidimensionales Netz (siehe Fig.3) angeordnet, können sich
aber in einem mehrdimensionalen Eingaberaum anordnen (siehe Fig.5). Im Gegensatz zum normalen künstlichen neuronalen Netz sind Kohonennetze allerdings
ungerichtet. Die Gewichte der Neuronen dienen der Lokalisierung im Eingaberaum. Jedes Neuron empfängt über eine Verbindung zur ”Außenwelt” die Eingabe.
Wird ein Eingabewert verzeichnet, so reagiert das Neuron, dessen euklidischer
Abstand zum Eingabewert am geringsten ist. Somit kann der Eingabewert klassifiziert werden. Das Ergebnis dieser Klassifizierung dient dem anschließend behandelten Learning Classifier System als Eingabe.
In der Einlernphase werden die Neuronen mit zufälligen Gewichtsvektoren m
initialisiert. Für den Abstand zwischen einem Eingabewert und einem Neuron
22
7
gilt der euklidische Abstand im Eingaberaum. Weiter sei der Abstand zwischen
zwei Neuronen definiert durch die Anzahl der auf dem kürzesten Pfad zwischen
beiden Neuronen liegenden anderen Neuronen (Nachbarschaftsfunktion h).
Bei Eingabe t mit einem Eingabewert x(t) für den Algorithmus wird zunächst
das Neuron c mit dem minimalen euklidischen Abstand zur Eingabe ermittelt.
∀i,
kx(t) − mc (t)k ≤ kx(t) − mi (t)k
(1)
Dieses Neuron wird nun dem Eingabewert angenähert. Andere Neuronen werden ebenfalls angenähert, aber umso weniger, je weiter sie vom erregten Neuron
entfernt sind (Nachbarschaftsbeziehung). Neuronen verändern ihr Gewicht jeweils mit einem Lernfaktor λ, welcher mit der Zeit abnimmt.
mi (t + 1) = mi (t) + hc(x),i λ(x(t) − mi (t))
(2)
Nach Abschluss der Lernphase entsprechen die Neuronen den Zentren der gelernten Klassen. Die Anzahl der verwendeten Neuronen entspricht dem Detailgrad der Klassifizierung. Durch die Nachbarschaftsbeziehung ist gewährleistet
dass die Neuronen ähnlicher Klassen auch mit einander verbunden sind.
Zur Veranschaulichung betrachte Fig.3 bis Fig.5.
Fig. 3. Lernschritte eines Kohonennetzes [2]
4
Interaktion
Die zwei bisher vorgestellten Algorithmen erlauben – unter der Annahme dass
das ubiquitäre System in der Lage ist die Umgebung zu erkennen – die Umwelt
zu klassifizieren. Das Ziel ist es nun dem ubiquitären System das autonome
Lösen von Aufgaben zu ermöglichen. Durch die zuvorige Klassifikation kann
23
8
Fig. 4. Lernschritte einer Kohonenkette [2]
Fig. 5. Repräsentation eines dreidimensionalen Eingaberaums [2]
24
9
die Umwelt ab jetzt in Schubladen betrachtet werden, d.h. komplexe Teile eines
Zustands können durch eine einfache aussagenlogische Formel dargestellt werden
(x oder x), bzw. (1 oder 0). Diese neue Kodierung der Umwelt verkleinert den
Eingaberaum jedes folgenden Algorithmus und ermöglicht somit den Einsatz von
Verfahren, welche zuvor, aufgrund der beschränkten Rechenleistung, dem großen
Eingaberaum und der, wie zuvor gefordert, möglichst kurzen Berechnungszeit,
in einem ubiquitären System undenkbar gewesen wären.
Eines dieser nun ermöglichten Verfahren ist das XCS Classifier System, ein
Reinforcement-Lerner.
Unter dem Begriff Reinforcement-Lernen versteht sich das Lernen durch Interaktion mit der Umwelt. Unter Interaktion versteht sich eigenständiges Handeln mit der Aufgabe ein bestimmtes Ziel zu erreichen. Algorithmen dieser Art
sind deshalb für ubiquitäre Systeme allgemein interessant. Der Lerneffekt tritt
auf wenn das System zeitverzögert eine Bewertung seiner Aktivitäten erhält
und auf Grundlage dieser Bewertung, im Maschinenlernen wird von Belohnung
gesprochen, sein zukünftiges Verhalten anpasst. Das Reinforcement-Lernen zeichnet sich außerdem durch eine weitere Besonderheit aus: Es lernt immer; es
existiert keine Lernphase welche irgendwann endet.
Der klassische Aufbau eines Reinforcement-Lerners besteht aus folgenden
Teilsystemen: Ein Agent (1) welcher durch einen Detektor (2) Informationen
von aus Umgebung (3) aufnimmt, mittels eines Effektors (4) eine Aktion in der
Umgebung vornimmt und dafür eine Belohnung (5) erhält. Anhand der Belohnung wird die getätige Aktion bewertet und das zukünftige Verhalten angepasst.
Example 1. Ein Mensch betritt einen Supermarkt um Tabak zu kaufen. Sein
Smartphone registriert dies anhand der GPS-Koordinaten und einer Stadtkarte.
Sofort schickt es eine Anfrage an den Kühlschrank zuhause. Der Kühlschrank
übermittelt dass heute Morgen die letzte Packung Milch entnommen wurde. Das
Smartphone erinnert nun seinen Besitzer daran Milch zu kaufen.
Sollte der Mensch nun Milch kaufen (und das Smartphone anhand eines beliebigen Systems an Sensoren dies erkennen), so wird das Smartphone seine Aktion
in Verbindung mit diesen Umständen positiv bewerten und das nächste mal identisch handeln. Sollte der Mensch keine Milch einkaufen, so wird das Smartphone
seine Aktion in Verbindung mit den Umständen als negativ bewerten und das
nächste mal, abhängig von der Summe der positiven und negativen Bewertungen, womöglich anders handeln.
Die Schwierigkeit hierbei ist vergleichbar mit der Aufgabe Fahrradfahren
zu lernen. Es ist zu erwarten dass das System zu Beginn eine ganze Reihe an
Fehlschlägen erfährt und nur langsam, durch induktives annähern, zu einer funktionierenden Lösung kommt. Deshalb kann es, abhängig vom Einsatz des Systems, sinnvoll sein eine andere Art Lernphase zu definieren: Einen Zeitraum in
dem das System nur beobachtet und aus den Beobachtungen Wissen gewinnt.
Im obigen Beispiel würde das Smartphone erst lernen nur Milch zu kaufen wenn
25
10
keine mehr im Kühlschrank ist, der Mensch in einem Supermarkt ist, sowie weitere Faktoren, z.B. nicht Morgens um 8:30 Uhr wenn der Mensch auf dem Weg
zur Arbeit ist. Diese Lernphase ist sinnvoll um die unter Punkte 2 definierte
Usability zu gewährleisten.
Fig. 6. Der Aufbau eines typischen Reinforcement-Problems [6]
4.1
Learning Classifier System (LCS)
Das XCS Classifier System ist eine Erweiterung des LCS. Beim Learning Classifier System handelt es sich um eine Kombination aus Reinforcementlernen und
Evolutionslernen. Die Reinforcementkomponente wählt die zu tätigende Aktion
aus und verarbeitet die erhaltene Belohnung. Die Evolutionskomponente generiert neue Verhaltensmöglichkeiten aus erfolgreichen Alten.
Das gelernte Wissen wird beim LCS in den Klassifizierern gespeichert. Ein
Klassifizierer besteht aus einer Menge von Regeln in einer Population. Eine
Regel besteht aus:
– einer Bedingung
– einer Aktion
– der erwarteten Belohnung
Eine Bedingung λ ist folgendermaßen binär Codiert, wobei λi die i-te Klasse
des Eingaberaums repräsentiert:


0, trifft nicht zu
λi = 1, trifft zu
(3)


#, irrelevant / dont care
Eine Aktion ist eine binäre Codierung, welche entweder auf eine Tabelle mit
allen möglichen Aktionen verweist, oder eine binäre Darstellung aller möglichen
Aktivitäten ist, welche jeweils disjunkte Teilmengen einer Aktion sind.
Regeln, deren Bedingungen sich überlappen, konkurrieren untereinander.
26
11
Die Reinforcementkomponente. Zu einem Problem werden jene Regeln
ausgewählt deren Bedingungen zu der Wahrnehmung von den Sensoren passen
(Situation). Hier möchte ich nochmal daran erinnern, dass die Sensorwerte zuvor
mittels PCA und Kohonennetzen klassifiziert wurden und eigentlich die Information, welche Klassen erkannt wurden und welche nicht, als Eingabe dient. Die
Menge der Regeln welche zur Situation passen wird als das Match-Set [M] bezeichnet. Die Menge der Aktion, welche durch die Regeln in [M] vorgeschlagen
werden, wird als das Aktion-Set [A] bezeichnet. Als nächstes wird eine der Aktion a ∈ [A] ausgewählt und durchgeführt. Die Auswahl wird in Abhängigkeit
einer Wahrscheinlichkeit p getroffen:
π=
(
arg maxa (Durchschnittliche Belohnung bei Aktion a),
random(a),
1−p
sonst
(4)
In Worten bedeutet das, dass mit einer Wahrscheinlichkeit p eine zufällige
mögliche Aktion ausgeführt wird und andernfalls die vermutlich beste Aktion.
Die vermutlich beste Aktion ist jene Aktion, welche die höchste Belohnung verspricht. Da es jedoch Wahrscheinlich ist dass mehrere Regeln die gleiche Aktion
vorschlagen (vlg. Fig.7), aber unterschiedliche Belohnungen versprechen, wird
der Mittelwert als Referenz genommen.
Aktion a wird nun vom System ausgeführt und eine Belohnung wird von
den Sensoren wahrgenommen. Anschließend wird bei allen Regel aus [A] die
erwartete Belohnung angepasst. Dabei orientieren sich die Richtung und das
Gewicht der Anpassung an der tatsächlichen Belohnung. Zumeist wird hier eine
Variation von Q-learning verwendet. Da aber auch andere Verfahren denkbar
sind werde ich Q-learning nicht weiter erklären.
Die Evolutionskomponente. In diesem Abschnitt wird der Frage nachgegangen wie das System neue Regeln, also neue Reaktionen auf bekannte Situationen, lernen kann. Dazu wird ein genetischer Algorithmus verwendet, welcher
die Regeln reproduziert, mutiert, rekombiniert und eine Auslese auf den Regeln
vornimmt.
Der genetische Algorithmus arbeitet auf den binär kodierten Regeln, wobei
es irrelevant ist ob diese dual- oder gray-kodiert sind. Im ersten Schritt werden zwei Regeln entsprechend ihrer erwarteten Belohnung aufgewählt. Dabei
bedeutet eine größere erwartete Belohnung auch eine höhere Wahrscheinlichkeit
ausgewählt zu werden. Diese Auswahl stellt sicher dass das System sich immer Mittelfristig verbessern wird, denn schlechte Regeln werden nur mit vernachlässigbar geringer Wahrscheinlichkeit ausgewählt. Die zwei ausgewählten
Regeln werden nun zuerst reproduziert und anschließend mutiert und rekombiniert.
Die Reproduktion ist notwendig, denn die zwei ausgewählten und offensichtlich erfolgreichen Regeln sollen beibehalten werden. Eine Mutation auf einer
Regel verändert ein Attribut (λi ; die Repräsentaton der i-ten Klasse) mit der
27
12
Fig. 7. Der Aufbau eines typischen Learning Classifier System [6]
Wahrscheinlichkeit µ zu einem anderen möglichen Wert (1,0,#). Eine Rekombination vertauscht das i-te Attribut beider Regeln mit der Wahrscheinlichkeit µ.
Dies wird sowohl auf der Bedingung als auch auf der Aktion beider Regeln angewandt. Es entstehen zwei neue Regeln (Nachkommen), welche zwei alte Regeln
verdrängen. Die zu verdrängenden Regeln werden mit einer Wahrscheinlichkeit
entsprechend dem Inversen ihrer erwarteten Belohnung ausgewählt und aus dem
System gelöscht.
Wenn zu einem beliebigen Zeitpunkt zur aktuellen Situation keine Regel
anwendbar ist, also das Match-Set [M] leer ist, generiert der Algorithmus eine
neue Regel folgendermaßen:
1.
2.
3.
4.
Die aktuelle Situation wird zur Bedingung der neuen Regel
∀ i: mit einer Wahrscheinlichkeit µ wird λi = # gesetzt
Eine zufällige Aktion wird zur Aktion dieser Regel
Nach Ausführung der Aktion wird die tatsächliche Belohnung zur erwarteten
Belohnung
Das kann in zwei Fällen passieren. Entweder die letzte in diese Situation
anwendbare Regel wurde kürzlich verdrängt, oder bereits zur Initialisierung des
Systems existierte keine solche Regel.
Detailiertere Angaben zum Learning Classifier System finden sich in [4] .
28
13
4.2
XCS Classifier System
Das XCS Classifier System stellt eine Weiterentwicklung des Learning Classifier
Systems dar. Es adressiert einige Probleme der Evolutions-Komponente beim
Learning Classifier System und behebt diese.
Die Hauptunterschiede sind folgende:
1. Es wird ein neuer Wert eingeführt: Der Vorhersagefehler τ . Er:
– gibt an wie exakt die Vorhersage der erwarteten Belohnung ist
– wird mit dem selben Verfahren aktualisiert/angepasst wie die erwartete
Belohnung (meistens Q-learning)
– dient als Faktor für die ebenfalls neu eingeführte Fitness ζ
– wird verwendet um jene Regeln im Match-Set [M] zu bevorzugen, welche
exaktere Vorhersagen tätigen
2. Die Fitness ζ gibt an wie exakt eine Regel im Verhältnis zu anderen, sie
überlappenden, Regeln ihre erwartete Belohnung einschätzen kann. Außerdem dient ζ als Kriterium bei der Auswahl der zu verdrängenden Regeln.
Damit können beim genetischen Algorithmus auch Regeln in der Population
bleiben welche im Absoluten wenig Belohnung bringen, dafür aber exakte
vorhersagen liefern.
3. Der genetische Algorithmus wählt zwei Regeln nur aus dem aktuellen AktionSet [A], verdrängt Regeln aber in der gesammten Population. Das bedeutet
dass der Algorithmus auch auf den Teilproblemen, welche bestenfalls nur
eine mittelmäßige Belohnung geben werden, lernen kann.
Fig. 8. Schematische Darstellung des XCS Classifier System [5]
Weiterführende Angaben zum XCS Classifier System finden sich in [5–7] .
29
14
5
Abschluss/Ausblick
5.1
Abschluss
Mit den vorgestellten Verfahren wurden nun alle notwendigen Werkzeuge vorgestellt,
welche benötigt werden um ein autonom lernendes ubiquitäres System zu entwerfen. Zusammengefasst sieht das System folgendermaßen aus:
Unter der Annahme es existiert ein ubiquitäres System mit allen notwendigen Sensoren (Detektoren) und Aktoren (Effektoren). Es wird eine Einlernphase
verwendet. In dieser Zeit kann das System nur beobachten. Die gewonnenen
Messwerte werden mit der Prinzipal Component Analysis entrauscht und dabei
wird gleichzeitig die Anzahl der Klassen bestimmt. Anschließend initialisiert das
System ein Kohonennetz mit der gleichen Anzahl an Neuronen wie zuvor bestimmten Klassen. Das Kohonennetz kann nun anhand der bisherigen Messwerte
trainieren um später dem XCS Classifier System generalisierte/klassifizierte Daten
als Eingabewerte weiterzuleiten.
Das XCS Classifier System initialisiert sich, wie auch das Kohonennetz,
selbstständig. Die Regeln werden entsprechend der evolutionären Komponente
passend zu den erkannten Klassen generiert. Nun kann das XCS Classifier System, wie oben dargestellt, arbeiten.
Optional können wir, abhängig vom Einsatzgebiet des Systems, weiterhin
alle Messwerte speichern und regelmäßig die Principal Component Analysis und
das Kohonennetz auf diesen Daten laufen lassen. Dadurch kann das System auf
eine sich ändernde Umgebung reagieren.
5.2
Ausblick
Auf dem Mobile World Congress 2011 kündigte sich die Verwendung von Near
Field Communication (NFC), einer kontaktlosen Funktechnik, in kommenden
Smartphones an.1 NFC bietet in einem bisher noch nicht dagewesenen Umfang
Möglichkeiten zur Vernetzung von Smartphones mit der Umwelt [8]. Insbesondere da Unternehmen wie Google, welches durch Google Add Sense viel Erfahrung mit personalisierten Angeboten vorweisen kann, auf NFC setzen1 kann
auch auf dem Smartphone-Markt mit einem zunehmend personalisierten Angebot gerechnet werden.
Ich gehe davon aus dass in naher Zukunft viele öffentliche, bzw. private aber
sich in der Öffentlichkeit befindliche, starre Objekte mit NFC ausgerüstet sein
werden. Private und in der Öffentlichkeit befindliche Objekte werden hier, so
erwarte ich es, Vorreiter sein. Die Motivation wird Werbung, bzw. Kundenservice
sein. Später denke ich werden auch öffentliche Objekte mit NFC ausgerüstet sein.
Hier wird die Motivation zu Beginn vermutlich im Tourismus/Marketing liegen.
1
Zweiter Anlauf für NFC-Funktechnik (Andrej Sokolow, dpa)
http://www.heise.de/newsticker/meldung/Zweiter-Anlauf-fuer-NFC-Funktechnik1193329.html
Abgerufen am 20.02.2011 um 19:20 Uhr
30
15
Sobald dieses breite NFC-Netz existiert werden auch die Möglichkeiten für
autonome maschinelle Lernverfahren in Smartphones nahezu unbegrenzt sein.
Denn das Smartphone wird nicht nur eigene Aktionen vornehmen können, es
wird auch andere Geräte dazu veranlassen Aktionen vorzunehmen.
Wie Erfahrungen im Internet, zum Beispiel bei sozialen Netzwerken, zeigen,
herrscht ein großes Bedürfnis und Interesse an personalisierten Anwendungen.
Folglich wird also in naher Zukunft ein breit angelegter Markt für ubiquitäre,
autonom lernende Systeme existieren.
Der hier vorgestellte Ansatz für ein solches System bedarf allerdings weiterer
Optimierungen. So fehlt beispielsweise eine Komponente welche die Belohnung
für das XCS System bestimmt. Bisher wurde angenommen dass eine Belohnung
für das XCS System ”einfach so” existiert. Tatsächlich muss die Belohnung für
das System allerdings vom System selbst anhand der Reaktionen der Umwelt
bestimmt werden.
Weiter besteht ein allgemeines Problem beim Klassifizieren: Wenn zwei Klassen
im Eingaberaum nahe bei einander liegen, allerdings ein Messfehler ähnlich einer
Gaußverteilung zu erwarten ist, so tritt folgendes Phänomen auf: Ein Messergebnis kann nicht mehr sicher einer Klasse zugeordnet werden. Das Problem ist
in Algorithmentechnik und Kryptographie bekannt als ’Learning With Errors’
und wird als NP-schwer angenommen. Im schlimmsten Fall könnte das System deshalb eine Aktion auslösen, welche das Gegenteil der gewünschten Aktion
darstellt. Ein möglicher Ansatz um diesem vorzubeugen wäre die Möglichkeit
solcher Vorfälle im Vorraus zu erkennen und den Eingaberaum entsprechend zu
formatieren, so dass betroffene Klassen nicht mehr nahe bei einander liegen.
Zuletzt sehe ich Verbesserungsbedarf bei der Eolutionskomponente des XCS
Systems. Nach dem bisherigen Ansatz werden zur Initialisierung des Systems
zufällige Aktionen verwendet. Sinnvoller wäre es allerdings bereits nach Initialisierung des Systems zu jeder Situation nur passende Aktionen auszuführen und
zu evaluieren.
References
1. Nielsen, J.: Usability Engineering. Academic Press, Boston, ISBN 0-12-518405-0
(hardcover), 0-12-518406-9 (softcover), (1994)
2. Kohonen, T.: The self-organizing map. Proceeding of the IEEE, Volume:78 Issue:9,
1464–1480, (Sep 1990)
3. Frieden, B. Roy: Probabiliy, Statistical Optics, and Data Testing: A Problem Solving
Approach. Springer-Verlag, ISBN 0-387-53310-9, (1991)
4. Richter, U. M. R.: Controlled Self-Organisation Using Learning Classifier Systems.
KIT Scientific Publishing, (2009)
5. Butz, M. V.: Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design. Springer-Verlag, ISBN 3-540-25379-3, (2006)
6. Bernauer, A.: Das Learning Classifier System XCS. (2007)
7. Wilson, S. W.: State of the XCS Classifier System Research. LCS’99, LNAI 1813,
63–81, Springer-Verlag (2000)
31
16
8. G. Madlmayr et al.: Eine mobile Service-Architektur fur ein sicheres NFCÖkosystem. Elektrotechnik und Informationstechnik, Volume:127 Issue:5, 127–134,
(Mai 2010)
32
Object Recognition in Surface Computing
Florian Becker
Institute of Telematics
Chair for Pervasive Computing Systems
[email protected]
Abstract. This survey concentrates on different methods and technologies used to recognize and track objects on interactive surfaces. As well
as surface computing in general, interactive surfaces are an important
branch (respectively subbranch) of ubiquitous computing. The aim is
the integration of the digital world into daily life in the form of larger,
responsive displays embedded into our surroundings. First, more general information will be provided to ensure a certain level of background
knowledge for the reader, followed by the technologies themselves, which
are electromagnetic approaches in general, RFID in detail and visual
approaches working with image processing, either on a marker based or
marker-less basis. Adequate criteria were found to make comparison between methods possible and a corresponding rating of the technologies is
presented, as are the advantages and disadvantages and possible future
developments. Also a few example projects are shown, implementing one
or several of the technologies presented.
1
Introduction
Today, digital information is ubiquitous, and development of interaction with
appropriate devices has started decades ago and is unlikely to slow down or even
stop. Still, the most common way of interaction with the digital world is the
combination of mouse and keyboard. Though for many people handling these
devices is perfectly natural, it didn’t come naturally and is not intuitive from
the beginning. As a logical consequence, researchers all over the world started
trying to find sensible alternatives and/or enhancements to this standard form
of input. So far, techniques which reached the home users are graphic tablets,
trackballs, styluses and digitalized pens.
Also very promising results have been accomplished with touch-sensitive surfaces (already common among mobile devices) and interactive surfaces in general.
With this technology we are going into the direction of table-size displays that
are united with the term surface computing. As a basic part of surface computing, object recognition is very important, or going to be. When one has the
opportunity to interact with intelligent devices on a natural level, with touch and
gestures, it is only obvious to try and integrate more ways of interaction, e.g.
by the use of daily objects. This can be achieved in various ways using different
technologies.
33
2
F. Becker
There are, and have been, promising works by different researchers. The so
called ‘Bricks’ [FIB95] combined ideas of future multi-touch with tangible devices. ‘Illuminating Clay’ and ‘SandScape’ [IRP+ 04], where scientists use ‘tangible’ clay and sand for the user to work with, overlaying them with images,
using cameras and projectors to create an interactive environment. Also to be
mentioned is the ‘Lumino’ [BBR10] project which also uses an optic based object recognition method and ‘Madgets’ [WSJB10], which tries the same with
magnetic forces but rather focuses on the possibility to give the user sensory
feedback.
2
Surface Computing
The goal of ubiquitous computing research is the augmentation of users’ access
to information during daily life, giving new possibilities and function, making
technology everywhere available and at all time. This means we have to search
for new ways of providing information and interaction with the digital world
wherever we dare to look. When we look down at our desks and tables where we
spend so much time sitting in front of, working with things put on top, working
with many kinds of objects, with computers, phones, documents and much more
diverse things, why not think further and embed new technology right here.
The definition of surface computing, taken from the definition of the Microsoft Surface 1 , describes an interactive surface as a device where the user interacts with the surface directly, without the use of a mouse and/or a keyboard.
The users are presented with a specialized graphic user interface where it is possible to replace traditional elements by intuitive, everyday objects. This means
handling information in a new scale. We already know some of the possibilities
multi-touch gives us. For some time now, people use pens or similar objects to
directly work with the preferred computing system (i.e. PDAs, tablets, smartphones, etc.). There even is the possibility to use more complex hand-gestures
like pinching or dragging for example. Surface computing enables us to use these
more intuitive ways of interaction on a display the size of a table. Complete
desks or tables can be replaced by interactive surfaces, augmenting plain furniture with new functions which the user could use, but doesn’t have to. This even
means, no additional space needs to be occupied.
A few examples: An ordinary coffee-table could be replaced by an interactive
surface. The family could sit around this table using a projected board of a
tabletop game and some real figures to spend the evening, having the possibility
to use random generated scenarios. Another likely utilization could be a timefiller for commercial breaks. While watching a movie on the television the time
of advertisements could be spent more worthy going through some photos or
reading articles whereas people possibly wouldn’t even switch channels on the
tv and the advertisement might still deliver interesting information. In office
or workspace environments interactive surfaces could replace desktop PCs in
1
Programmerworld.net - What is Microsoft Surface?
34
3
certain fields like some aspects of graphical design or upgrade these systems to
enable the user arranging and handling many digital objects on the surface of his
desk and where applicable, giving these objects physical form by representing
them by real, tangible ones, using the desktop system for detailed editing or
processing.
Setups of surfaces Depending on usage site, user-group, used technology and
other specifications, the way an interactive surface can be set up can be one
of many. There is a huge amount of variations which can all serve as basic
environment for the final system. Independent of the specific technology, there
can be one single system, where each component is embedded within the outer
shell (e.g. a table) or multiple external components which have to be installed
near the surface. Main reason for these differentiations are the ways the digital
information is displayed. An external projector could put the image on top of
the surface from above or it could be internal (rear-)projection or an attached
display. For surfaces working with cameras the same choices have to be made,
mounting cameras overhead or below the surface area. In wide spaces it is also
rather possible to install huge tables or blocks whereas in more private rooms
smaller versions or even wall-mounted version would be preferred, less limiting
the room by hardware resources.
3
The sections above give a first idea of how important the handling of real objects
on interactive surfaces can be, which leads to the importance of recognizing these
objects on top of them. In a real world scenario, where it would be preferable to
use any object available, the hindrances that have to be overcome to enable this
kind of functionality are vast. For once we have very different, often changing,
environmental settings and conditions. Every time we want to recognize an object
it can be affected by changes of lighting, viewing angles, distance, position,
alignment and orientation on the surface. An object used when the lights in the
room are on looks different than when the room is darkened, but it still feels the
same and has the same electromagnetic qualities. Also shadows have the same
effect, whereas the position of hands or other objects around the one that is to
be recognized can have more significant impact to the latter object qualities,
as much as the electronic devices close to the surface. Further are there many
objects only slightly different in color, size, form or texture. For example
– a dark blue and another black toy car of the same type
– dices with different styles of printed numbers
or one with rounded edges, one without
– vegetables of different sizes
– books with same ISBN but different layouts
– books with different ISBN but looking nearly the same
– diverse documents (means similar paper, variable content)
– full glass, empty glass
35
4
F. Becker
–
–
–
–
–
the pawns of a board game
equal products, different brands
new versus old (e.g. with a leather-ball)
equal objects owned by different users
...
Either credit is given to all of these objects and variations or developers concentrate on a specific kind and amount of different objects that could be used
for input. The first case makes development and implementation of recognition
techniques for the majority of possible scenarios necessary whereas in the second
case it could be better to create objects especially for the purpose of the system.
4
Methods for Object Recognition in Surface Computing
The two major categories for object recognition methods in surface computing
are visual and electromagnetic. Each approaches the solution with completely
different methods where they can also be divided further into subcategories.
Electromagnetic solutions for identifying objects in general use the same technique which is best described by exemplary projects. Nevertheless there is one
technology pointing out because of the commonness and growing popularity of
it, namely the Radio Frequency Identification, short RFID, though it has limits in its level of detail. The visual approaches will be divided later into marker
based and marker-less versions as this is the most definite cut.
4.1
Radio Frequency Identification
Neither the RFID technology, nor the idea behind it are absolutely new in terms
of computer science. But it is only now emerging, very fast. It is even said to
be capable of replacing the well-known and wide-spread barcode system. But
how is this possible? And how does it work? The answers to both of these
questions are quite simple. The technology itself is not too complex, neither in
its implementation nor in computational means. Basically what we have is an
antenna embedded, or attached to the product or object, combined with a circuit
that gives the possibility of automatic, fail-safe identification in close range. This
means we get the same effect as with the barcodes, but without the need of a
line of sight and special handling in any way. Also these RFID antennas, let’s
call them RFID tags, could be embedded into the product rather than attaching
them from outside, which is another advantage over the barcode system. Each
object could be individually identified using relatively simple technology without
having to alter their appearance. Of course this means, these tags would have to
be included during production, which should not be a true hindrance for today’s
industries.
More detailed, any RFID tag works in a similar way. There is the antenna,
as mentioned, which receives a signal from the device (the reader) that asks for
its identification and sends back a modulated signal which is then interpreted by
36
5
the reader as an unique identity. There can be active, as well as passive RFID
tags, where the difference is, that the passive tag uses the energy of the original
reader signal to compute its response, instead of having its own power source.
In this survey the focus lies on passive tags because for surface computing it is
much more applicable. As there can be some larger, electronic objects or devices
where active tags might be preferred, this option is not given with the majority of
objects used with an interactive surface and the maintenance which would come
along with active tags is unwanted as well. Also if RFID really will replace the
barcode system, it will very likely be replaced by passive tags. Usually the readers
of such tags have enough energy resources to cope with this “power draining”
or at least this problem is less worrisome than with the desired objects.
The following diagram (Fig. 1) of such a passive RFID tag shows the basics
of how this technology works. Like mentioned above the antenna receives the
Fig. 1: Diagram of a passive RFID Tag sketching the general functionality. Incoming signal is received by the Antenna and the energy this
signal delivers powers the internal processing and the sending of the
resulting message. [YRTT08]
signal which also delivers the energy to power the integrated circuit. This circuit
contains a passive power converter and a logic control unit which calculates
the bit stream from the AC voltage received by the antenna. The contained
instructions are interpreted and a corresponding action is executed, sending its
own ID back to the antenna where it will be sent to the reader. The upcoming
possibility of printing RFID tags (e.g. on stainless steel foil (by Inkjet)) will
enable smaller tags, perfectly thin, that are nearly invisible if added later to any
object.
With RFID, identification of prepared objects is certain, but the big drawback in this technology is the fact that not much detail is given about the position, and especially the orientation, of the identified object. It has to be close
by but for specific interaction on a surface this is not precise enough.
37
6
F. Becker
4.2
Other electromagnetic approaches
As a fact, one could use RFID for identification in general and additional methods to extract position and orientation information from the objects. Basically
this is how electromagnetic interaction with surfaces is realized. The technology is well known from graphic tablets of many brands using passive or active
answering to reader-sent signals, including an unique identification number. Position and orientation of objects are known by the use of a varying number of
coils inside the preferred object or attached to its exterior. Digitalized pens and
styluses work this way, having at least one coil inside which receives signals from
the wires in the surface (i.e. graphic tablet) and answering back. Here these can
also be active or passive objects, whereas passive objects would be preferred too,
because of the simpler handling of such objects and less maintenance and preparations. The more coils inside an object, the higher the detail of information we
get (like orientation). At least up to a certain grade.
Example: ToolStone For example the ToolStone [RS00] project where the
users are limited to a few specially prepared objects, but with these, responsiveness and handling feel good. In future work they plan to use small attachable
tags instead of built-in coils and circuits to circumvent these limits, as well as
they want to increase the amount of concurrently usable objects. So far they
are using Wacom graphic tablets to create their surface, limiting the amount
of objects used at the same time on a certain area (occupied by one tablet)
to two ToolStones. There are then three kinds of ToolStones, or at least three
ways implementation could look like. Basically a ToolStone is set up as a small
cuboid roughly the size of a cigarette pack (variable). The motivation is the idea
of giving the user the possibility to put such a ToolStone onto any of its six
sides, rotating it in several directions, having different functions all the time.
Icons resembling these functions are labeled onto the sides of the ToolStone. To
realize this, an object has either three or four coils inside, each attached to a
different side or corner. In Fig. 2 and 3 a ToolStone’s inner layout is depicted.
Fig. 2: ToolStone: Possible implementation with three coils inside the
object. The coil closest to the surface
and the corresponding angle define
the orientation of the object. [RS00]
Fig. 3: ToolStone: Four coil design.
Two coils are always closest to the
surface defining the objects orientation and contacting surface. [RS00]
38
7
In the three coil design, each coil is shared by two of the objects’ sides. The coil
which is closest to the surface identifies the side on which the object is lying or
standing, depending on the angles between the surface and the coil. In the four
coil design, for each of the six possible arrangements of the cuboid, two coils are
closest to the surface, defining the objects position and orientation.
Example: Sensetable Another project to mention, relating to electromagnetic
object recognition on an interactive surface is the Sensetable [PIHP01]. The
basics are the same as with the ToolStones, using Wacom graphic tablets as
surfaces and objects prepared with coils from their styluses and pens. Here they
use the more common title for such objects, “pucks”. Such a puck, as shown in
Fig. 4 is about the same size as an ordinary mouse and can have dials attached to
it. They can be put only on one side but turned 360 degrees and modified by the
Fig. 4: Sensetable pucks using the Wacom technology for interaction
with the surface, enhanced with the possibility to attach dials or modifiers for additional functions. [PIHP01]
additional, optional gear. Also with this implementation they tried to overcome
the limitation of having only two objects at the same time on one Wacom tablet
by using algorithms and circuits to switch coils on and off depending on their
usage, still guaranteeing a certain level of responsiveness.
4.3
Visual recognition
The more common way of object recognition works with something much more
imaginable and comprehensible. Not only very common in Surface Computing
but in many fields, objects, or certain details of an image are tried to be recognized by methods of image processing. Usually this process requires a visual
image of a certain quality and resolution depending on the desired level of detail
and precision. After obtaining the image(s) or video feed, the follow-up consists
of two phases. First the feature recognition phase where as much relevant information as possible is extracted from the source, dividing the provided data into
39
8
F. Becker
information about the object(s) and the irrelevant information of background
and such, e.g. hands. The second phase consists of object determination. The
features collected during the feature recognition phase are compared to a set of
data, resulting in a number of probabilities. The object matching the highest
possibility score should also match the object on the surface. The necessary information has to be provided (in most cases) beforehand, for example in form
of a database.
It is possible to recognize single objects, as well as groups of objects, depending on methods. Due to a lack of precision, visual approaches are not fool-proof
and vulnerable to many variables and conditions, much more in fact, than the
electromagnetic versions. Especially the limited handling is significant because
objects can easily be occluded, either by the user or by other objects. Depending on the setup of the surfaces technological environment, the user’s actions
are further limited. With visual object recognition, cameras are essential. They
can be either mounted above the surface, looking down, or inside the surface,
looking up. This means taking care in not hiding the objects from the camera
by either standing in the way, or putting something between object and camera
like a tablecloth or other objects is important. To circumvent these limitations
partly, multiple cameras are possible, but in general these are still vulnerable in
the same way.
Visual recognition - Marker based A relatively simple way to improve
success of visual based object recognition is the utilization of visible tags. These
are special forms and graphics attached to the objects to be recognized by the
surface in form of a small sticker or marker. The visible markings on these
tags are the only relevant information for the camera and recognition system.
The main features like size and basic form (rectangular, round,. . . ) are known in
advance by the system and it can specifically look for occurrence of these features
in the images. Contrary to the tags used in RFID or other electromagnetic
methods these tags have to be visible most of the time and can not be embedded
or built-in. This is of course a big disadvantage as it limits the user further
in handling these kind of objects. A further limitation lies in the size of the
tags. They need to be big enough to contain enough information which can be
safely recognized with the provided resources, like image quality, resolution or
computing power, but also small enough to fit to the smaller objects in the
scenario.
Nevertheless there are several advantages of this technology. First, with using
special tags the amount of features that have to be considered is reduced to a
required minimum and these features are only varying in certain details. For most
scenarios, tags in black and white and certain forms and symbols are sufficient,
enabling the use of simpler, means less expensive, hardware, lower response times
and higher success rates. Even more information can be provided by carefully
choosing the right kind of tag design. A special form of tag or special marker
on the tag can be used as an orientation point for extracting directions of the
objects, where of course it becomes necessary to always apply the tags pointing
40
9
Fig. 5: Exemplary visual tag taken from the iCon project. Easily applicable with a two color design and strong features to clearly identify the
object. [CLC+ 10]
the correct way. When tags of a specified size are used, the distance of objects
from the surface, or rather from the camera, could be obtained too.
An example tag, taken from the iCon project [CLC+ 10], is shown in Fig. 5.
They use a two-color design with unique drawings and always the same size
where orientation and distance can be extracted easily. The probably most important advantage compared to a marker-less recognition variant is the significant amount of time less necessary to identify and follow objects when spending
the same amount of effort or even less.
Visual recognition - Marker-less Still, the preferred kind of objects to use
with an interactive surface is the unaltered type, without tags or necessary preparations or limitations in any way which can only be realized by using marker-less,
image based recognition methods. Objects can just be used “as they are”, and
in certain cases even without having to train the system beforehand. Which is
especially useful in cases where tags are difficult to apply or the effort is too
high. What is given is just the input feed of the camera(s), either images or in
form of a video stream. From these images the features describing the interactive objects on the surface are going to recognized and (hopefully) successfully
interpreted. This means, without the use of additional technology like RFID,
the system needs a much higher amount of computing power than for example
the marker based approach and also a higher image quality and resolution could
be of much help. Also the algorithms and technologies are much more complex
here than they are with the other methods.
Like mentioned above, objects can vary in many different ways, ever so
slightly, and without altering the objects and applying a point of focus and
detail, the system and the developers have to be very imaginative and intelligent. More than with the other methods a controlled environment delivers best
results and fail-rates in common scenarios are much higher. Without specific
points of interest on an object the system has to take care of every possible factor. Distance of objects, resulting in different sizes of these objects in the images,
for example is a problem if it is not known how big the object usually is or if
it is touching the surface or not. This could be prevented by using additional
41
10
F. Becker
infra-red sensors and data, but in general, to enhance a system, there are always
numbers of possible upgrades available.
Example of a marker-less approach An example project representing marker-less object recognition in surface computing originates from NTT Comware
Corporation [OAN+ 10], published end of July 2010. Starting with the vision
of an interactive surface with a built-in camera they point out the most significant problem with marker-less object recognition given these surroundings. Most
common surface used these days is a kind of frosted glass which reduces image
quality and detail that can be obtained from below. This makes it difficult extracting features and determining objects in these scenarios, even more difficult
is following these objects.
They present their idea of a marker-less system using “multi channel silhouettes and quantized polar coordinates”. Basically they try to avoid the usual
drawbacks in marker-less implementations by concentrating on their idea of relevant information and using special algorithms and technologies for this. It seems,
what they need is the typical amount of previously known feature information
and less computation power than other implementations due to their “not so
high computational complexity” of their determination phase. From their results
of a limited survey of a few flat-bottom objects the success rate of 95% in average
looks pretty promising (98% average in the case of a sliced blue agate) combined
with the average processing time (response time) of 15.4 ms it is well imaginable
for use in surface computing. So far there have not been any further publications by these researches on this project and because of the limited information
(worst-case results and results of regular objects that are not equipped with a
flat side are not known) a real evaluation is difficult to make.
5
Method comparison
As enough information is provided, a comparison can be made including the
different technologies and approaches. First, some adequate criteria have to be
found. Thinking of the main aspects of ubiquitous computing, surface computing,
and interfaces in general, the following possible characteristics come to mind:
5.1
Criteria
– Performance - It could also be called responsiveness. Here the time necessary for the system to react to input is taken into consideration. To be really
comparable a typical amount of effort and computation power is taken as a
basis. Exemplary, a marker-less system usually needs much more powerful
hardware to achieve the same level of responsiveness as an equal marker
based system. Obviously this criteria is of great importance as users do not
want to wait long to get a result of an action taken. Also important here is
the resulting success rate with the same effort, because high responsiveness
is worthless if the outcome is not what was expected.
42
11
– Error rate - Deducted from the above thoughts this criteria measures the
level of correctness of the output the contemplated system delivers. Again
the question is how precise the systems are, assuming a comparable amount
of effort spent. Like mentioned this influences the performance score already
but it should be considered separately too.
– Complexity - Coming from comparable levels of performance and error-,
respectively success rates, the next step is to muse about the issue of how
complex and difficult it is to implement such a system. Maybe the technology
and many details are already well known and realization is quite simple.
Algorithms and software already exist and are easy to tune and apply. Or
everything is new and/or requires a lot of work for setting up a working
system, achieving still the same error rates and performance. Or theory and
realization are difficult to overlook and maintain, so that small changes and
adjustments, resulting in small differences in handling and function, require
huge amounts of work on the side of the researchers (or even users). As well,
another part of complexity comes from the amount of effort in general that
has to be spent in preparation of the system. For example the development
of certain, special objects.
– Object limitation - The next criteria refers to the objects handled in combination with the interactive surface only. Can only a small number of objects
be used? Up to what detail can they be different? Could any object be used,
either preparing them or even without preparations?
– Intuitivity/Handling - Here is meant the way the interaction with the
surface, using the tangible objects, feels for the user. The interaction possibilities and ways functions are invoked are not considered as they vary for
every project, relatively independent of used object recognition method. The
important part is the general handling of objects related to each method.
Does the user have to take special care, watch his behaviour, is not allowed
to step into or occupy certain spots or positions or is limited in any other
way in handling the objects, interacting with the surface.
– Position (determination) - This point rates the ability of the approach
to determine the position of a certain object. The difficulty here is again the
comparability. With most approaches it is manageable to produce very exact
position information by using appropriate computation power and hardware.
But this is not always suitable for the current scenario.
– Orientation (determination) - The definition of this criteria consists of
nearly the same specifications as the previous point. How well is it possible
to determine the orientation of a certain object. Also considering the amount
of effort necessary for a minimum level of detail.
– Energy consume - At this point, most approaches vary much or information offers not enough detail to give a precise opinion. The amount of
energy necessary for different projects varies largely. Equally varying is the
importance of energy consume in different implementations. In general the
devices, where it is most important to take care of this, are mobile devices.
And interactive surfaces are rarely mobile.
43
12
F. Becker
– Cost - As difficult and controversial as energy use is the cost of a system.
Here just as well there is not much comparable information, neither does
this seem very significant in current development. But also here, there is one
thing that will be pointed out in the following table and explanations.
5.2
Comparison
Table 1 presents the results of the applied criteria. The ratings and corresponding
conclusions are based on information collected from several sources and papers,
mainly for each method mentioned in the corresponding footnotes and in general
in the following papers, offering opinions about some or all of the methods based
on current or past research.
– Generic framework for transforming everyday objects
into interactive surfaces [MAKP+ 09]
– Emerging frameworks for tangible user interfaces [UI00]
– Augmented surfaces - a spatially continuous work space
for hybrid computing environments [RS99]
First the focus of this section lies on each method separately, explaining the
rating for the different criteria, followed by a comparison among the different
technologies.
Table 1: Comparison of the different methods providing a simple rating
to show advantages, disadvantages and important characteristics
Visual
RFID1 Electromagnetic2 Marker based3 Marker-less4
Performance
++
+
-
--
Error rate
++
+
-
--
Complexity
++
-
-
--
Obj. limitation
-
--
-
++
Handling
+
+
-
+
Position
--
+
++
+
Orientation
--
-
++
+
Energy consume
++
+
+
-
Cost
++
+/-
+/-
+/-
++ best, + better, - worse, - - worst, +/- nondescript
44
13
Per method
Radio Frequency Identification As described earlier, RFID technology is well
known and wide spread. This leads to easy understanding and a lower complexity.
It can be quite cheap, an RFID tag costing less than a few cents and though
it can be used for object identification only, it is reliable and fast in this. This
gives the RFID technology a high rating in performance, complexity, cost and
error rate. Also the passive tags and low energy reader technology counts for
best values in energy consume. Always in mind that RFID has the drawbacks
of having trouble with orientation and position determination of objects. This
seems to be possible within the same specifications but with greater expense.
Too great for surface computing and the small, relatively simple objects used
for interaction. The disadvantage left is the necessity to prepare the objects
used. Due to the possibility to use passive, built-in tags that are not visible or
disturbing in any way it is bearable and handling these objects is not limited
in any way. And maybe the future will bring us products with embedded RFID
tags from the time of manufacturing.
Other electromagnetic approaches Projects like Sensetable [PIHP01] or ToolStone [RS00] show that real-time interaction is manageable with not too much
effort. This is also obvious looking at the origin for most of such projects, the
graphic tablet segment where users all over the world use such devices for realtime interaction with real objects like pens, styluses or pucks for ages. Of course
this technology is more complex as we can manage to extract position and orientation information out of objects, though these are limited to very few, very
special, very well prepared objects so far. It is planned for the future to produce
and use simple tags attached to any object which would increase handling quality. Depending on current realizations handling in fact is limited slightly due to
being able to place objects only in specific ways or on specific sides to be usable.
Continuing with energy use, it is moderate for the complexity of the system. It
is known that graphic tablets can work for hours with included battery packs so
this still can be counted as a plus, as objects can also be used passively. With
these objects and the technology, position determination is precise and relatively
simple, again known from graphic tablets but orientation information is more
difficult due to more complex hardware and algorithms necessary, where the system can get vulnerable in its success rate. The cost is not evaluable due to great
variety in projects and implementations but we can say the basic hardware is
not yet really wide spread and objects have to be prepared in the most expensive
way of all technologies mentioned here.
Visual - Marker based The visual approaches in general need a lot more computing/processing due to the necessary image processing qualities required. This
1
2
3
4
[YRTT08]
[PIHP01,RS00]
[CLC+ 10,BBR10]
[OW08,OAN+ 10]
45
14
F. Becker
means reaching a comparable performance level is more difficult and combined
with more effort. The marker based approaches are the less extreme version here,
limiting the necessary hardware resources to a minimum by concentrating on tag
based features. Complexity can be called high but not the highest, meaning the
image processing technology is developed for many years and by constrictions,
also complexity is brought to lower levels. Object limitation is moderate as it
is sufficient to apply external tags to the objects but to be able to do so, these
objects have to fit certain specifications like size or texture (surface finish). Handling the objects in a marker based scenario is quite difficult and limiting as
users have to take care not to hide the tags from the camera(s) which produces
a negative feeling of constant vigilance. With marker based systems, position
and orientation extraction is rather simple. When the approach is smart enough
to identify the focused object by the features of the tag, position information can
be concluded from this. Orientation too, just requiring specific enough features
and information known in advance about the tags. This is further explained
in 4.3. With this technology, evaluation of energy consume and cost is also difficult, as this varies between projects. What can be said is the same as with
electromagnetics, that the complexity can be managed, and processing time is
not the highest. Also preparation of objects is relatively simple and of low cost,
but hardware requirements are higher.
Visual - Marker-less The marker-less visual systems have, due to more difficult
image processing algorithms and high hardware requirements a very high complexity and with comparable effort spent, a low performance. Probably this also
leads to high cost and energy consume, especially the great needs for processing time, but these two criteria are hard to evaluate. Energy consume can be a
problem depending on system and hardware, no matter that objects are always
passive. Which in turn would lower the costs as objects do not need preparation
but features and their information have to be collected which also is effort that
has to be spent and payed for. Still, in general cost and energy consume are
not the main aspects with marker-less approaches as the main point and big
advantage of such systems lies within their possibility to use objects as they are.
This means a no limit policy on objects usable and less problems with handling,
because users only have to keep in mind not to occlude the cameras instead of
having to take care of always pointing the markers the right way. As with the
marker based methods, position and orientation of objects can be easily obtained
when being able to identify them. Whereas position detail is a matter of matching the right object with the right position and not losing this match during
movement, orientation relies more on specific features of the objects themselves
having a distinguishable point of orientation.
Comprehensive Looking at the combined results, RFID looks attractive if high
performance, low error rate, energy consume and cost are important. Keeping
in mind that RFID can be used for identification only, it should probably be the
number one choice for projects where it is suitable and possible to isolate identification from position and orientation determination and tracking. For some
46
15
criteria it is plainly the most advanced technology but it has, as mentioned, its
drawback.
Focusing more on position and orientation detection the visual methods look
more fitting than the electromagnetic ones. This is due to the difficult trade-off
with electromagnetic approaches between level of detail in resulting information
and grade of complexity. It seems more suitable for specialised projects where the
advantages are more distinct and the use of individual objects is not wanted or
its absence can be coped with. The visual methods are difficult to implement but
the possible gain is highest. Here everything can be managed and the limitations
can be reduced to the desired amount.
6
Discussion
For the future many more projects are to be expected, especially more specialised
ones that are ready for use in real work situations. Surfaces designed for one
purpose, optimized and really practicable. Also next to the already established
platforms like Microsoft Surface and Surface 2, there will be at least some more
commercial products, available to the common user at a fair price. Products that
will be able to be used in many situations, which can adapt many functions,
are intuitive and easy to handle. The visual approaches seem currently more
adequate for this scenario as the use of individual objects is only manageable
there yet. Tag based versions are likely to grow in numbers and also the count
of marker-less versions is expected to increase as the work with objects, and
the handling of objects in their original, unaltered state, are preferred without
a doubt. Most promising therefore seems the combination of methods, using
advantages of one method, eliminating disadvantages of others. The example in
the next section will show such an implementation using more than one exclusive
technology.
What could also become real in final products is the ability to enable and
disable sensors as they are needed. In different scenarios and functions of the
surface a variety of sensors could be switched off to save energy and/or improve
performance. It is imaginable as well, to have upgradable devices with the possibility to add new sensors as they reach the market or present functions, which
the user then desires. Essential for all of this to be widely successful would also
be an intuitive and simple way of adding new objects, without having to go
through too many steps of installation. Another possibility is the introduction
of applications (‘Apps’ ) like it’s now such a growing market among smartphones,
tablet PCs and even notebooks/desktop systems (e.g. the Mac App Store).
Exemplary Project Very promising research and implementation was published
in 2008 by Alex Olwal and Andrew D. Wilson under the name of Surface Fusion
[OW08]. They prevented the necessity of visual tags and the need for previously
known information and huge databases. RFID is used for the identification part,
trusting this technology’s high performance and reliability. Additionally, image
processing is used for detecting movement exclusively.
47
16
F. Becker
Fig. 6: Sample camera output of the SurfaceFusion implementation.
Background and noise are removed, the image binarized and the objects
overlaid with the corresponding RFID Information. [OW08]
Fig. 7: SurfaceFusion. Example with tangible objects like ashtray, ship
or photo augmented with digital information displayed on the surface.
[OW08]
Basically the course of action is the following. An object is recognized by
the RFID reader as soon as it is close enough to the surface. The camera below
the surface expects then a new object (i.e. new patterns) on top of the display
and the computing unit matches RFID information with the recognized forms.
Independent of the kind of object and its dimensions, it is recognized and can be
tracked by the combination of identification and recognition of changes happening on the surface of the device. This way there is no limitation of objects usable
and except for having RFID tags, no other preparation is necessary. Put on the
surface, these objects can be attached with any kind of digital information or
function by the user, simply by the use of drag and drop.
Fig. 6 shows a sample pattern a camera produces as input. In detail the
process works with a binarization method for processing the image and detecting
features of objects put on the table. First (from left to right) the background is
removed, then the data is binarized, reduced of noise and finally combined with
an overlay of the RFID information. As the system needs the trigger of putting
an object on the table it is not possible to put two objects on top of the table at
precisely the same time. But a survey showed, that this scenario is unlikely to
happen frequently. Fig. 7 presents a possible implementation of such a surface.
Real objects, like the ship, the ash-tray or the id-card are placed on the surface
48
17
which reacts by displaying additional information, e.g. tagging the ash-tray with
“Trash” or showing pictures previously attached to the ship and some further
details, also about the person the id-card belongs to.
7
Conclusions
This survey concentrates on object recognition in surface computing. On this
topic different methods are compared and evaluated, resulting in different criteria and corresponding ratings. The focus lies on visual and electromagnetic
approaches which are further parted into RFID and other electromagnetic methods as well as into marker-less and marker based visual methods. The author
surveyed many papers and chose nine possible categories applicable to visual
and electromagnetic approaches and after ample considerations he set up a system of ratings to represent the characteristics of each method and to show their
advantages and disadvantages as well. For the reader to better understand these
ratings, they are shortly described, as well as the functionality of the methods,
the importance and meaning of each criteria and the reason for the different
ratings in each category.
The results conclude that visual approaches look most promising if a system
with an exclusive technology is preferred, but in general, combinations of methods look best. And to provide some outlook into a possible future the discussion
shows possible developments inspired by past or current projects giving the most
promising outlook. SurfaceFusion [OW08] is presented as such a project showing
great potential.
References
BBR10.
Patrick Baudisch, Torsten Becker, and Frederik Rudeck. Lumino: Tangible
blocks for tabletop computers based on glass fiber bundles. In Proceedings of the 28th international conference on Human factors in computing
systems, CHI ’10, pages 1165–1174, April 2010.
CLC+ 10.
Kai-Yin Cheng, Rong-Hao Liang, Bing-Yu Chen, Rung-Huei Laing, and
Sy-Yen Kuo. icon: utilizing everyday objects as additional, auxiliary and
instant tabletop controllers. In Proceedings of the 28th international conference on Human factors in computing systems, CHI ’10, pages 1155–
1164. ACM, 2010.
FIB95.
George W. Fitzmaurice, Hiroshi Ishii, and William Buxton. Bricks: Laying
the foundations for graspable user interfaces. In Proceedings of the SIGCHI
conference on Human factors in computing systems, CHI ’95, pages 442–
449, 1995.
IRP+ 04.
H. Ishii, C. Ratti, B. Piper, Y. Wang, A. Biderman, and E. Ben-Joseph.
Bringing clay and sand into digital design - continuous tangible user interfaces. BT Technology Journal, 22:287–299, October 2004.
MAKP+ 09. Elena Mugellini, Omar Abou Khaled, Stéphane Pierroz, Stefano Carrino,
and Houda Chabbi Drissi. Generic framework for transforming everyday
objects into interactive surfaces. In Proceedings of the 13th International
49
18
F. Becker
OAN+ 10.
OW08.
PIHP01.
RS99.
RS00.
UI00.
WSJB10.
YRTT08.
Conference on Human-Computer Interaction. Part III: Ubiquitous and
Intelligent Interaction, pages 473–482. Springer-Verlag, 2009.
Shiro Ozawa, Takao Abe, Noriyuki Naruto, Toshihiro Nakae, Makoto
Nakamura, Naoya Miyashita, Mitsunori Hirano, and Kazuhiko Tanaka.
Marker-less object recognition for surface computing. In ACM SIGGRAPH 2010 Posters, SIGGRAPH ’10, pages 71:1–71:1. ACM, 2010.
Alex Olwal and Andrew D. Wilson. Surfacefusion: Unobtrusive tracking
of everyday objects in tangible user interfaces. In Proceedings of graphics
interface 2008, GI ’08, pages 235–242, May 2008.
James Patten, Hiroshi Ishii, Jim Hines, and Gian Pangaro. Sensetable: a
wireless object tracking platform for tangible user interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems,
CHI ’01, pages 253–260. ACM, 2001.
Jun Rekimoto and Masanori Saitoh. Augmented surfaces: a spatially continuous work space for hybrid computing environments. In Proceedings of
the SIGCHI conference on Human factors in computing systems: the CHI
is the limit, CHI ’99, pages 378–385. ACM, 1999.
Jun Rekimoto and Eduardo Sciammarella. Toolstone: effective use of the
physical manipulation vocabularies of input devices. In Proceedings of the
13th annual ACM symposium on User interface software and technology,
UIST ’00, pages 109–117. ACM, 2000.
Brygg Ullmer and Hiroshi Ishii. Emerging frameworks for tangible user
interfaces. IBM Systems Journal, 39:915–931, July 2000.
Malte Weiss, Florian Schwarz, Simon Jakubowski, and Jan Borchers. Madgets: Actuating widgets on interactive tabletops. In Proceedings of the 23rd
annual ACM symposium on User interface software and technology, UIST
’10, pages 293–302, October 2010.
Li Yang, Amin Rida, Anya Traille, and Manos M. Tentzeris. Rfid. In Time
Domain Methods in Electrodynamics, pages 283–301. Springer-Verlag,
2008.
50
Usability und Evaluation von Pervasive Games
Kevin Härtel
Betreuer: Markus Scholz
Zusammenfassung
Pervasive Games sind digitale Spiele, die versuchen die Wirklichkeit, zum Beispiel in Form von Orten oder Spielerbewegungen, in das Spielerlebnis einzubinden. Diese Pervasive Games nehmen einen immer gröÿeren Stellenwert in der Videospielbranche ein 1 , da
diese die Wirklichkeit durchdringenden Spiele viele Vorteile aus klassischen Spielen in sich vereinen. Beispiele sind Brettspiele mit digitalem
Spielbrett, welches auf einen Tisch projeziert wird oder Spiele, die auf
Smartphones laufen und jeden Passanten miteinbeziehen. Ob ein solches
Spiel gelungen ist entscheidet letztendlich jeder Spieler für sich selbst,
daher sollte ein Spiel für eine möglichst breite Masse an Spielern sowohl
verfügbar, als auch ansprechend sein. Damit ein Spiel auch von den potenziellen Spielern angenommen wird muss ein höher Grad an intuitiver
Benutzbarkeit (sog. Usability) gegeben sein. Daher benötigen Entwickler schon in frühen Entwicklungsphasen des Projekts Usability-Tests, die
über die Usability des Spiels Aufschluss geben sollen. Diese UsabilityTests und andere Evaluationsarten, wie das Pervasive GameFlow -Modell
sind Methoden, die für ein User-centered development eingesetzt werden,
eine erfolgreiche Strategie, um schon in frühen Entwicklungsstadien, in
Form von Testspielen des unfertigen Spiels, auf eine positive Spielerfahrung von Benutzern Einuss nehmen zu können und damit den Erfolg des
Projekts zu sichern. Zur Evaluation von Pervasive Games werden heuristische Methoden gewählt, um einerseits Benutzeraussagen verwerten zu
können und andererseits das eigene Produkt mit Konkurrenzprodukten
vernünftig vergleichen zu können. In der folgenden Arbeit geht es darum
einige der Usability-Tests und Evaluationsmöglichkeiten von Pervasive
Games vorzustellen, deren Methodik zu erklären und Einsatzmöglichkeiten zu nennen. Auÿerdem wird eine Evaluation der Evaluationsmethoden
durchgeführt und ein Vergleich angestellt, der die Vorteile und Nachteile
der Einzelverfahren gegeneinander darstellt.
1
Einleitung
Diese Seminararbeit befasst sich mit der Evaluation von
Pervasive Games
. Ins-
besondere geht es darum, verschiedene Evaluationsmethoden kennenzulernen,
1
Microsoft kommt mit der Bewegungssteuerung Kinect für die Xbox 360 ins GuinnessBuch der Rekorde. Insgesamt 10 Millionen Kinect-Geräte wurden nach Angaben des Unternehmens innerhalb von vier Monaten nach Markteinführung verkauft. Quelle: http://www.heise.de/newsticker/meldung/Microsoft-stellt-mit-KinectGuinness-Rekord-auf-1205370.html, Stand: 10.03.2011, 'Microsoft stellt mit Kinect
'Guinness'-Rekord auf '
51
2
Kevin Härtel
Anwendungsmöglichkeiten zu prüfen und die Methoden miteinander zu vergleichen. Vorweg wird das Umfeld geklärt und einige Spieltypen vorgestellt, um
eine bessere Diskussionsbasis für die Anwendungen, Vorteile und Nachteile und
Gegenüberstellungen der Evaluationsmethoden zu schaen.
1.1 Denition Pervasive Games
Der Begri Pervasive Gaming (engl. pervasive: durchdringend) bzw. Ubiquitious
Gaming (engl. ubiquity: Allgegenwärtigkeit) bezeichnet digitale Spiele, bei denen
die reale Umgebung ein Teil des Spiels wird.
1.2 Motivation
Pervasive Games umfassen mittlerweile eine Vielzahl von Genres, sie haben verschiedene Ausprägungen oder Steuerungsmöglichkeiten. Eine Einteilung dieser
Spiele wurde von C. Magerkurth [MCMN05] vorgenommen, die hier kurz vorgestellt und auf die weiterhin Bezug genommen wird. In dem zitierten Paper
werden Pervasive Games in fünf Subgenres unterteilt:
Smart Toys
Aective Games
Augmented Tabletop Games
Augmented Reality Games und
Location-Aware Games
Smart Toys umfassen alle Spielzeuge oder Gegenstände, die entweder selbst klei-
ne Rechner enthalten oder mit Computern verbunden sind und somit das Spielgeschehen beeinussen oder komplett vorgeben. Ein solches Smart Toy ist der
Leuchtturm im Lighthouse -Game, das im Kapitel
Outdoor Play Observation Scheme genauer erläutert wird. Bekanntere Beispiele hierfür sind Nintendo-Wii oder Microsoft-Kinect.
Aective Games sind Spiele die durch Gefühlen gesteuert werden. Die gängige
Umsetzung erfolgt mittels EEG. Weiterführende Literatur zu diesem Thema
kann unter dem Schlüsselwort Brain-Computer-Interfaces gefunden werden.
Augmented Tabletop Games sind klassischen Brettspielen sehr ähnlich. Die
digitale Komponente hierbei ist das Spielbrett. Bei Rollenspielen ermöglicht so
ein digitales Spielbrett zum Beispiel randomisierte Spielszenarien und sorgt somit
für ein abwechslungsreiches Spielerlebnis.
Augmented Reality Games sind deutlich komplexer als die Augmented Tabletop Games. Sie gaukeln den Spielern/Innen nicht nur ein Spielbrett vor, sondern stellen virtuelle dreidimensionale Hindernisse, Spielobjekte oder Räume
dar. Häug werden hierfür visuelle Tags genutzt, die an reale Objekte angebracht
werden sowie mittels Kamera vom Computer erfasst und in die Spielumgebung
eingefügt werden.
Location-Aware Games arbeiten mit Standorten oder Abständen der Spieler/Innen untereinander oder gegenüber digitaler Komponenten. Bekannte Vertreter dieser Spiele sind auf Geo-Koordinaten basierende Schatzsuchen.
52
2
3
Verfahren
Im folgenden Kapitel werden mehrere Evaluationsmethoden von Pervasive Games vorgestellt, die sich besonders für dieses schon sehr weite Spektrum an
Pervasive Games eignen. Zu beachten ist, dass hier nur Vertreter der bestehenenden Methoden vorgestellt werden. Es wird dabei nicht der Anspruch erhoben,
alle derzeit existierenden Methoden zu nennen. Besonderer Wert wurde darauf gelegt, dass sich die Methoden in ihrer Herangehensweise unterscheiden und
zumindest ein Verfahren für jedes der vorhergenannten Subgenres angewendet
werden kann.
2.1
Pervasive GameFlow -Modell
Das meistzitierte Evaluationsmodell ist das von K. Jegers [Jege07] vorgestellte
Pervasive GameFlow -Modell. Diese Methode stellt ein heuristisches Verfahren
dar, das aus dem GameFlow -Modell von von P. Sweetser [SwWy05] entwickelt
wurde. Für eine Evaluation nach dem GameFlow -Modell sind Testspieler/Innen
nötig und eine testbare Version des Spiels. Im Folgenden wird auf die Grundstruktur des GameFlow -Modells eingegangen.
Die Testspieler/Innen bewerten das Spiel nach acht Elementen (Concentration, Challenge, Player skills, Control, Clear goals, Feedback, Immersion, Social Interaction), die jeweils spezischere Unterkriterien enthalten. Concentration
steht für die Konzentration der Spieler/Innen. Unter dieses Element fallen alle
Fragen, die dokumentieren, ob die Spieler/Innen ausreichend gefordert werden
und ob ihre Konzentration überfordert wird, beispielsweise durch zu viele optische Reize. Der zweite Punkt in der Tabelle, Challenge, dient zum Ermitteln, wie
gut die Herausforderung auf die Spieler/Innen angepasst sind. Die Spiele sollten
die Spieler/Innen herausfordern, ohne sie zu überfordern. In Spielerkreisen sind
die Spiele am anziehendsten, die einfach zu spielen, aber schwierig zu meistern
sind. Das Element Player skills steht für die Fertigkeiten der Spieler/Innen. Der
Anspruch an dieses Element ist, die Fertigkeiten der Spieler/Innen zu entwickeln
und sie dabei zu unterstützen das Spiel zu meistern. Control bedeutet, dass die
Spieler/Innen das Gefühl vermittelt bekommen, ihre Aktionen im Spiel selbst
kontrollieren zu können. Ob das zu evaluierdende Spiel die Ziele klar darstellt,
klären die Fragen des Elements Clear goals. Beim Feedback Element wird erfasst, ob das Spiel ausreichend Rückmeldung über die Spielerleistung gibt. Dies
Erfolgt in der Regel über Rankings, Leveling-Systeme oder Scores am Ende eines
bestandenen Szenarios. Das Element Immersion soll klären, wie sehr Spieler/Innen von dem Spiel involviert werden, aber auch geistig abgelenkt werden. Ein
gutes Spiel schat es, dass Spieler/Innen in das Spiel eintauchen, wie in ein gutes
Buch. Das letzte Element Social Interaction gibt Aufschluss darüber, wie hoch
der Grad des sozialen Austauschs der Spieler/Innen im Spiel ist.
Das Pervasive GameFlow -Modell enthält die selben acht Elemente, jedoch
mit für Pervasive Games erweiterten Unterkriterien. In Tabelle 1 auf Seite 4
sind diese hinzugefügten Kriterien kursiv dargestellt. Das Bewertungsschema
von Sweetser sieht eine Punktevergabe von 0-5 (0 - nicht verfügbar oder nicht
53
4
Kevin Härtel
bewertbar, 1 - nicht im Geringsten, 2 - unterdurchschnittlich, 3 - durchschnittlich, 4 - überdurchschnittlich gut, 5 - sehr gut!) auf die Unterkriterien vor. Aus
den Punkten (1-5) der Unterkriterien wird der Durchschnittswert für jedes der
acht Elemente ermittelt. Anschlieÿend wird der Durchschnitt aus den Durchschnittsbewertungen der acht Elemente berechnet. Ein Kriterium, das mit 0
bewertet wurde, wirkt sich nicht auf die Durchschnittberechnung aus. K. Jegers
weicht im
Pervasive GameFlow -Modell
von dem 5-Punktesystem ab und zieht
zur Bewertung eine Punktevergabe von 0-100(%) heran. Diese Punktzahl gibt
letztendlich Aufschluss darüber, wie gut das Spiel bei dem bewertenden Testspieler angekommen ist. Der Vorteil am Vorgehen von K. Jegers ist ein breiteres
Spektrum an Ergebnissen, woraus sich leichter Trends ablesen lassen, als bei
einem schmalen Spektrum, wie es P. Sweetser gewählt hat. So sind beispielsweise Aussagen möglich wie: Der Geschmack von
Spieler/In A
wurde zu 67%
getroen.
Tabelle 1: Tabelle des
Element
Concentration
Games should require concentration
and the player should be able to concentrate on the game
Challenge
Games should be suciently challenging and match the player's skill level.
Player skills
Games must support player skill development and mastery.
Control
Players should feel a sense of control
over their actions in the game.
Pervasive GameFlow -Modells
[Jege07]
Criteria
games should provide a lot of stimuli from dierent sources.
games must provide stimuli that are worth attending to.
players shouldn't be burdened with tasks that don't feel important.
games should have a high workload while still being appropriate for
the players' perceptual, cognitive and memory limits
* Pervasive games should support the player in the process of switching
concentration between in-game tasks and surrounding factors of importance.
challenges in games must match the players' skill levels.
games should provide dierent levels of challenge for dierent players.
* Pervasive games should stimulate and support the players in their own
creation of game scenarios and pacing.
* Pervasive games should help the players in keeping a balance in the
creation of paths and developments in the game world, but not put too
much control or constraints on the pacing and challenge evolving.
players should be able to start playing the game without reading the
manual.
learning the game should not be boring, but be part of the fun.
games should include online help so players don't need to exit the game.
players should be taught to play the game through tutorials or initial
levels that feel like playing the game.
players should be rewarded appropriately for their eort and skill development.
game interfaces and mechanics should be easy to learn and use.
* Pervasive games should be very exible and enable the players' skills
to be developed in a pace set by the players.
players should feel a sense of control over their characters or units and
their movements and interactions in the game world.
players should feel a sense of control over the game interface and input
devices.
players should not be able to make errors that are detrimental to the
game and should be supported in recovering from errors.
players should feel a sense of control and impact onto the game world
(like their actions matter and they are shaping the game world).
players should feel a sense of control over the actions that they take
and the strategies that they use and that they are free to play the game
the way that they want (not simply discovering actions and strategies
planned by the game developers).
* Pervasive games should enable the players to easily pick up game play
in a constantly ongoing game and quickly get a picture of the current
status in the game world (in order to assess how the state of the game
has evolved since the player last visited the game world).
54
Tabelle 1:
Clear goals
(Fortsetzung: Tabelle des Pervasive
Games should provide the player with
clear goals at appropriate times.
Feedback
Players must receive appropriate feedback at appropriate times.
Immersion
Players should experience deep but effortless involvement in the game.
Social Interaction
Games should support and create opportunities for social interaction.
2.2
GameFlow
5
-Modells)
overriding goals should be clear and presented early.
* Pervasive games should support the players in forming and communicating their own intermediate goals.
players should receive feedback on progress toward their goals.
players should receive immediate feedback on their actions.
players should always know their status or score.
players should become less self-aware and less worried about everyday
life or self.
players should experience an altered sense of time.
players should feel emotionally involved in the game.
players should feel viscerally involved in the game.
* Pervasive games should support a seamless transition between dierent
everyday contexts, and not imply or require player actions that might
result in a violation of social norms in everyday contexts.
* Pervasive games should enable the player to shift focus between the
virtual and physical parts of the game world without losing too much
of the feeling of immersion.
*
games should support competition and cooperation between players.
games should support social interaction between players (chat, etc.).
games should support social communities inside and outside the game.
Pervasive games should support and enable possibilities for game oriented, meaningful and purposeful social interaction within the gaming
system.
* Pervasive games should incorporate triggers and structures (e.g. quests
and events, factions, guilds or gangs) that motivate the players to communicate and interact socially.
Outdoor Play Observation Scheme
Das Outdoor Play Observation Scheme, vorgestellt von S. Bakker [BaMK08], ist
ein weiteres heuristisches Verfahren zur Evaluation von Pervasive Games. Dieses
Verfahren ist speziell zur Evaluation von Head-Up Games geeignet. Bei diesen
Head-Up Games handelt es sich um spezielle Pervasive Games für Kinder, die
vorwiegend für drauÿen konzipiert sind. Diese Spiele sind meistens Smart Toys
oder Location Aware Games. Anhand der Zielgruppe für diese Spiele lässt sich
erahnen, dass Fragebögen zum jeweiligen Spiel gerinfügig brauchbare Ergebnisse
einbrächten. Daher werden zur Durchführung dieser Methode Beobachter/Innen
eingesetzt, die das aufgezeichnete Spiel oder das Spiel im laufenden Betrieb begutachten und bewerten. Die Beobachter/Innen bekommen Fragebögen, welche
die Kriterien(Physical Activity, Focus, Social Interaction, General) aus Tabelle 2 auf Seite 6 umfassen. Aus den Fragekatalogen werden Statistiken erstellt,
die relative Werte enthalten und über den Verlauf des Spiels Aufschluss geben.
S. Bakker stellt das Modell am Beispiel des vorher erwähnten Lighthouse Game vor. Ziel dieses Spiels ist es, Schätze von Schatzinseln zu bergen und
auf das eigene Piratenschi zu bringen. Das Spiel wird beeinusst durch einen
Leuchtturm, der hier die digitale Komponente darstellt. Dieser Leuchtturm bewacht die Schatzinseln, in dem er immer eines der Piratenschie beleuchtet. Dadurch wird den zugehörigen Piraten verboten, Schätze zu sammeln. Des weiteren
kann der Leuchtturm nach dem Zufallsprinzip auch ein Seeungeheuer auftauchen
lassen. Dies wird durch Audiosignale realisiert, was wiederum besondere Handlungen im Spiel erlaubt. Die Inseln und Schie sind einfache Kreise, die mit
55
6
Kevin Härtel
Kreide auf den Boden gemalt werden. Den Blickwinkel der Beobachter/Innen
auf das Lighthouse -Game gibt Abbildung 1 wieder.
Abbildung 1.
Beobachterblick auf das laufende
Lighthouse -Game.
Beobachter/Innen bewerten bei diesem Spiel unter Physical Activity alle Bewegungen, die getätigt werden in ihrer Intensivität. Rennen, gehen, springen,
winken, usw. der Spieler/Innen wird hier vermerkt. Bei der Evaluationsauswertung zeigt dieser Punkt auf, wie anstrengend oder körperlich herausfordernd das
Spiel ist. Ein wenig schwieriger ist der Punkt Focus zu bewerten, da als Zuschauer/In immer nur erahnt werden kann, was die Spieler/Innen gerade ansehen. Die
Beobachter/Innen tragen unter diesem Kriterium ein, ob die Spieler/Innen Objekte fokusieren, die zum Spiel gehören und wenn ja, welche. Durch diesen Punkt
werden die Reize aufgezeigt, die auf die Spieler/Innen wirken und wie fordernd
die durch das Spiel erzeugten Reize sind. Hier kann auch mögliche Langeweile
feststellt werden. Social Interaction ist ein Kriterium, das sich besser beobachten lässt. Teammitgliedern auf die Schultern klopfen, High-ves, jemandem ein
Bein stellen, gegen Spielobjekte (Beispiel: Den Leuchtturm) treten fallen werden
damit dokumentiert. Unter General wird nur vermerkt, ob sich Spieler/Innen im
Sichtfeld benden oder auÿerhalb. Wie die Statistiken bewertet werden, ist von
den Entwickler/Innen abhängig, d.h. aus den erhobenen Daten muss individuell für den Einzelfall ein Fazit gezogen werden. Das Outdoor Play Observation
Scheme schreibt diesbezüglich keine Handhabung vor.
Tabelle des Outdoor Play Observation
Class
Behaviour
Explanation
Physical Activity
Tabelle 2:
Schemes [BaMK08]
Intensive physical activity
Exhausting physical activity that one can not keep doing for a long
period of time. For example: running, jumping or skipping.
Non-intensive physical acti- Physical activity that one can keep doing for a longer period of tivity
me. For example: walking, moving arms or legs, bending and standing up, crawling, moving while staying on the same location, etc.
No physical activity
Standing, laying or sitting still. Very small movements such as
coughing yawning, putting your hands in your pocket, looking at
your watch, etc. while being still should also fall in this category.
Focus
Looking at other players
The player is looking at one or more other players. This does not
only include looking at the face, but also looking at other parts of
the body.
56
Abbildung 2. Paper Prototyping
7
für eine Anwendung für Mobiltelefone.
Tabelle 2: (Fortsetzung: Tabelle des Outdoor Play Observation Schemes )
Looking at game objects
Looking at something out of
sight, possibly part of the
game
Looking at something else
Social Interaction
Functional, with another
player
Non-functional positive/neutral, with another player
Non-functional
negative,
with another player
With a non-player
Unintended physical contact
General
2.3
In sight
Out of sight
The player is looking at one or more game objects. All things that
are part of the game besides players and surroundings are game
objects. For example a ball, a goal, a chalked circle on the ground,
a hand held object, a token, etc.
Looking at objects, people or surroundings that are not part of the
game.
When game objects or players are out of sight of the camera and the
observed player is looking in the direction of which these player(s)
or object(s) likely are.
All interactions (verbal en nonverbal) that are functional for playing the game and directed to one or more other players or to noone. For example instructions such as 'give me the ball!', 'get the
monster-coin!' and 'tag him!', or expressions like 'John is it!', 'tag!'
or counting points aloud, or physical contact such as tagging, holding hand, etc that are needed to play the game.
All interactions (verbal and nonverbal) that are not functional for
playing the game, that are positive or neutral and directed to one
or more other players or to no-one. For example communication
about subjects that are not related to the game, showing results to
other players, cheering, screaming, expressions of enjoyment and
physical contact not required for playing the game such as holding
hands, high ve, etc.
All interactions (verbal and nonverbal) that are not functional for
playing the game, that are negative and directed to one or more
other players or to no-one. For example negative communication
such as swearing and bullying, expressions of pain or negative physical contact such as kicking or hitting.
All interactions (verbal en nonverbal) that are directed to someone
who is not a player in the game. This can be a researcher, a teacher,
a parent, a peer who is watching the game, etc.
Physical contact that is not intended, such as accidentally bumping
into another child.
In sight of the camera.
Out of sight of the camera.
Paper Prototyping
Paper Prototyping ist eine Verfahren, das allgemein bei der nutzerorientierten
Gestaltung von Software eingesetzt wird. Sobald eine Idee zur Gestalt des Layouts eines Projekts existiert, kann auch Paper Prototyping benutzt werden. Bei
diesem Testverfahren werden Vordrucke des tatsächlichen Layouts oder Skizzen
davon den Testpersonen vorgelegt. Abbildung 2 zeigt ein Paper Prototyping für
Handys.
Die Personen beschreiben dann, was sie sehen, welche Handlungen sie ausführen möchten und ihre Erwartung, was passieren soll. Ausgewertet werden
diese Art Tests per Interview mit den Testpersonen. Diese Methode wird auch
im Bereich Pervasive Games eingesetzt. Hier ist diese Evaluationsmethode ein
57
8
Kevin Härtel
einfaches Verfahren, um die Spielidee bzw. die Umsetzung von Spielen zu testen.
Dabei wird in Spezialfällen für eine ausgewählte Gruppe von Testspielern ein
Spielszenario oder ein Kapitel des Spiels realisiert. Dies geschieht mittels Administratoren, die die Aufgaben des Systems übernehmen (Spiel-Geschichte erzählen, Regeladministration, Kontrolle von Spielerinteraktion, usw...) sowie Stiften
und Zetteln, auf denen gegebenenfalls schon Vordrucke von Graken oder einfache Regeln bzw. Anweisungen zu nden sind.
Beim ausgeführten Paper Prototyping von E. Kovisto [KoEl06], wurde ein
MMORPG (Massive Multiplayer Online Roleplayinggame) mit Pervasive-Komponente,
als Location-Aware Game, in Form eines Pen&Paper-Rollenspiels (Rollenspiele,
die von einem Erzähler geleitet werden und mit Papier und Stiften auskommen) getestet. Anschlieÿend hatten die Spieler die Möglichkeit, direkte Fragen
zur Umsetzung zu stellen und Diskussionen zu starten, über ihre Bedenken und
Eindrücke. Diese Methode ist unter anderem ein Usability-Test, weshalb sie sich
besonders gut eignet, die Benutzbarkeit des Spiels oder Programms an Benutzern
zu testen.
2.4
Individuelle Testläufe
Die Pervasive-Komponente bringt auch nicht selten besondere Spielideen oder
Spielausprägungen hervor, die von standardisierten Evaluationsverfahren nicht
berücksichtigt werden. Daher wäre es denkbar, dass bei einer Evaluation per Fragebogen ein weniger guter Punktestand erreicht wird, obwohl die Spieler/Innen
das Spiel mit dem speziellen Feature oder der speziellen Idee unter Umständen präferieren würden. Aus diesem Grund werden von Entwicklern oft auch
individuelle Testverfahren/-szenarien auf ihre Entwicklungen und Gesetze zugeschnitten. Einfacher lässt sich das in einem Beispiel erläutern. Die von A. Gustafsson [GuBå08] durchgeführte Evaluation des Spiels Power Agent, entspricht
dem gerade genannten Gedanken.
Bei Power Agent handelt es sich im ein Pervasive Game, das zum Energiesparen anregen soll. Das Spiel ist für Java-fähige Mobiltelefone gemacht. Zusätzlich
werden Daten über Server verarbeitet. Auf der einen Seite gibt es Server, um
in dem jeweiligen Haushalt die Stromzähler auszulesen, auf der anderen Seite
einen Game-Server zur Verwaltung dieser Daten. Dieser Server stellt Gesamtübersichten der relativen Energieverbrauchswerte aus den ausgelesenen Daten
zusammen. Daraus wiederum ergeben sich die Rankings der Spielteilnehmer. Der
Game-Server bendet sich üblicherweise nicht bei den Spielern im Haushalt, sodass direkte Manipulationsversuche am Game-Server ausgeschlossen sind. Neue
Missionen zum Energiesparen kommen bei den Spieler/Innen als Nachricht auf
das Handy, versandt vom Game-Server. Das linke Bild der Abbildung 3 illustriert
die Architektur des Spiels, das rechte Bild zeigt das Interface auf dem Handy.
Die Software auf dem Handy bietet den Spieler/Innen die Möglichkeit, mehr
über die Mission in Erfahrung zu bringen oder Testlevel in Jump-and-Run-Form
zu spielen und dort spezielle Tipps zur Mission zu sammeln. Die Missionen variieren von 'Energiesparen durch Beleuchtung' über 'Energiesparen im Badezimmer'
bis zum 'Gesamtenergieverbrauch des Haushalts reduzieren'.
58
Abbildung 3.
9
Eindrücke vom Power Agent -Spiel.
Das Testszenario muss also speziell darauf ausgerichtet sein, dass Power
Agent zum Energiesparen in der Realität anregen und erziehen soll. Das Spiel
wurde von zwei Teams aus unterschiedlichen Städten in Schweden getestet. Jedes dieser Teams bestand aus drei bis vier Spieler/Innen, die in verschiedenen
Haushalten lebten. Bei jedem Spieler und bei jeder Spielerin wurde festgestellt,
wie viele andere Personen im Haus wohnten und wie viele dieser Personen an
diesen Missionen unterstützend mitzuwirken bereit waren. Gespielt wurde das
Testszenario über mehrere Tage. Es gab sechs Missionen, von denen fünf Tagesmissionen und eine abschlieÿende Wochenendmission waren. Die Strom- und
Gaszähler wurden vor, während und nach den Missionen ausgelesen und an den
Game-Server übertragen, um zuverlässige Daten für die Rankings zu sammeln.
Der Schwierigkeitsgrad der Missionen steigerte sich mit jeder fortschreitenden
Mission.
Für die Spieler/Innen gab es die Möglichkeit per Instant-Messenger (Bsp.:
Skype) mit den Beobachter/Innen in Kontakt zu treten. Worüber auf Grund der
groÿen räumlichen Distanz, direkt evaluiert oder interviewed wurde. Besonderes
Augenmerk wurde bei der Evaluation auf folgende Punkte gelegt:
Was wurde zum Energiesparen getan?
Motivation
Was wurde gelernt?
Spielermeinung.
Der Punkt Was wurde zum Energiesparen getan? ist, da er für die Evaluation der besonderen Spielausprägung wichtig war, sogar teilweise mit fotographischem Material belegt worden. Die Motivation umfasste die Motivation, die von
den Rankings ausging, bei der Mission besser abzuschneiden als das gegnerische
Team, andererseits, was für die Beobachter auch für Überraschung gesorgt hat
und die Nachfragen aus dem direkten Umfeld der Spieler/Innen. Dabei handelt es
sich vor allem um gewecktes Fremdinteresse in Sportvereinen oder Schulklassen,
in denen die Spieler/Innen von dem Testspiel erzählten und dadurch weiterhin
nach ihrer Platzierung und Maÿnahmen gefragt wurden.
Was wurde gelernt? war vor allem interessant, weil dieser Punkt durchaus
Langzeitauswirkungen haben könnte und dadurch Power Agent sein Ziel erreicht
59
10
Kevin Härtel
hat und sich in seinem Wert deutlich von anderen Spielen abheben könnte. Bei
den Spielermeinungen ging es den Beobachtern um konstruktive Kritik oder
Ideen, die die Spieler/Innen aus dem laufenden Spiel heraus aufgeworfen haben.
Bei diesem Testszenario wurden keine Scores geschaen durch Fragebögen. Alle
Fragen betrafen hauptsächlich das Spiel und das Spielumfeld.
3
Evaluation
Nach der Vorstellung dieser Evaluationsmethoden werden diese im Folgenden
verglichen.
Dazu stellt sich zunächst die Frage, was so unterschiedliche Methoden gemeinsam haben könnten und auf welche Kriterien man bei der Evaluation wert
legt. Ein wichtiger Aspekt für die Bewertung von Pervasive Games stellt Vergleichbarkeit dar. Vergleichbarkeit insofern, dass man direkt sehen kann anhand
von vergebenen Punkten oder Scores, wo man im direkten Vergleich mit anderen Spieleentwicklungen steht. Unter Umständen ist eine Spieleproduktion auch
stark vom Budget abhängig, dann wäre es während der Entwicklungsphase sinnvoll zu wissen, ob das bisher programmierte Spiel in der unfertigen Version überhaupt Erfolg haben kann. Auÿerdem wäre es hier möglich noch am unfertigen
Projekt eine andere Richtung einzuschlagen, um so eher den Geschmack der
Spieler/Innen zu treen. Was direkt zum nächsten Punkt führt: Spielermeinung.
Die Spieler/Innen entscheiden letztendlich ob sich das Spiel gut verkauft und sich
der Entwicklungsaufwand gelohnt hat oder nicht, daher ist es von hoher Wichtigkeit, die Meinung der Zielgruppe des Spiels einzuholen. Die Spielermeinung
hängt auÿerdem von der Usability ab, denn Spieler, die mit der Handhabung
oder Bedienung des Spiels Probleme haben, werden keinen groÿen Spaÿ an einem Spiel entwickeln können. Schlieÿlich stellt sich noch die Frage nach dem
Aufwand, den die Evaluation mit sich bringt. Die Bewertung der einzelnen Bewertungsansätze sind in der Tabelle 3 auf Seite 12 zusammengeführt.
3.1
Pervasive GameFlow
Das Pervasive GameFlow -Modell ist, ein heuristisches Evaluationsmodell, d.h.
es ist davon auszugehen, dass jede Evaluation mit diesem Modell sehr ähnlich
verlaufen wird. Durch die Punktevergabe mit einer allgemeinen Skala (z.B. 0100) lieÿen sich einfach Rankings erstellen, womit eine Vergleichbarkeit eindeutig
gegeben wäre. Das Spiel im unfertigen Zustand, beispielsweise in der Alpha- oder
Beta-Version des Spiels testspielen und dann die Fragebögen von den Testspieler/Innen ausfüllen zu lassen, ist durchaus mit diesem Modell machbar. Somit
kann hier schon in der Entwicklungsphase evaluiert werden und auf die Wünsche
der Spieler/Innen eingegangen werden. Wie schon erwähnt, wird dadurch, dass
die Testspieler/Innen ihre Punkte frei nach ihren Eindrücken verteilen können,
auch die Spielermeinung direkt festgehalten. Der einzige Kritikpunkt, der in
dieser eingeschränkten Bewertung auällig ist, besteht darin, dass die Usability
60
11
mit dieser Methode nicht direkt getestet wird. Auch der Aufwand beim Pervahält sich in Grenzen: Es muss eine spielbare Version des
Projekts vorhanden sein, Testspieler/Innen und Fragebögen, die sich auf Grund
ihrer Struktur aber einfach auswerten lassen.
sive GameFlow -Modell
3.2
Auch das Outdoor Play Observation Scheme wurde oben als heuristisches Modell bezeichnet. Das bedeutet, dass zu erwarten ist, dass je zwei Evaluationen
mit dieser Methode sehr ähnlich verlaufen. Der groÿe Unterschied bei dieser
Methode ist, dass relative Statistiken erzeugt werden, aber keine eindeutigen
Scores. Da die Beobachter/Innen bei diesem Verfahren auch klaren Richtlinien folgen, gehen wir auch hier von einem vergleichbaren Evaluationsverfahren
aus. Auf Grund dessen, dass dieses Verfahren häuger bei Spielen für Kinder
angewandt wird und Kinder keine unfertigen Spiele spielen können, ist davon
auszugehen, dass dieses Modell nicht ohne gröÿere Aufwände auf Spiele in der
Entwicklungsphase angewandt werden kann. Auch eine direkte Spielermeinung
lässt sich nur erschlieÿen, soweit diese direkt aus Mimik und Gestik der Testspieler/Innen erkennbar ist. Was beispielsweise bei einen Gähnen, schon Probleme
aufwirft, da ein Gähnen nicht zwangsweise auf Müdigkeit oder Langeweile eindeutig zuordenbar ist. Die Spielermeinung einzuordnen liegt also im Ermessen
der Beobachter/Innen. Dafür ist hier die Usability fast direkt getestet, da von
auÿen klar beobachtet werden kann, ob die Spieler/Innen mit dem Spiel zurechtkommen oder nicht. Die Aufwände zur Durchführung variieren sehr stark: Bei
einem Testlauf von einer Stunde, muss die Stunde auch beobachtet werden, bei
zwei Stunden sind es schon 2 Stunden die beobachtet werden müssen, etc. der
Aufwand wächst, wenn man von einem Beobachtungsdurchgang ausgeht also
linear mit der Zeit der Testläufe. Bei diesem Verfahren ist es denkbar, dass mehrere Perspektiven per Videocamera erfasst wurden, die mehr Auswertungszeit
beanspruchen. Das stellt einen deutlichen Nachteil dieses Verfahrens dar.
3.3
Paper Prototyping
Im Gegensatz zu den vorangegangenen Verfahren werden beim Paper Prototyping keine Scores aus vergebenen Punkten erzeugt, was eine vernünftige Vergleichbarkeit leider nicht ermöglicht. Da aber die Testspieler interviewed werden,
kann man mit dieser Methode die direkte Spielermeinung erfahren. Ein weiterer Vorteil am Paper Prototyping ist, dass die Usability bereits während der
Evaluation mit getestet wird, indem der Spieler mit Beschreibungen seinerseits
und dem Papierlayout interagieren muss. Ferner ist denkbar, eine Idee für ein
Projekt, bei dem das grobe Layout feststeht, zu testen, ohne den geringsten
Programmieraufwand, d.h. dass man nicht nur während der Entwicklungsphase,
sondern schon nach den ersten Entwicklungsschritten diese Methode anwenden
kann, um zu testen, wie gut das Spiel oder die Spielidee angenommen werden
61
12
Kevin Härtel
würde. Mittels dieses Tests zu Beginn einer Entwicklungsphase, kann man auÿerdem die Gedanken und Anregungen der Testpersonen mit in die Entwicklung
einbeziehen.
3.4
Individuelle Testverfahren
Zu individuellen Testverfahren variieren die Vorteile und Nachteile, die zuvor
genannt wurden, sehr stark. Eines steht jedoch mit Sicherheit fest: Für jedes
individuell auf das Spiel zugeschnittene Testverfahren muss zumindest einiges
an geistigem Aufwand betrieben werden. Daher wird in dieser Seminararbeit ein
solches individuell erstelltes Testszenario vom Aufwand negativ bewertet. Um im
folgenden ein solches Testverfahren trotzdem genauer evaluiert zu haben, wird in
diesem Kapitel das Evaluationsverfahren des vorher vorgestellten Spiels Power
Agent betrachtet.
Bei diesem Testlauf wurde die direkte Spielermeinung wieder per Interview
erfragt. Trotz gezielter Fragen wurden aber keine Scores oder eindeutigen Punktestände erstellt, womit eine Vergleichbarkeit nach Punkten hier nicht ermöglicht wurde. Über die Durchführbarkeit während der Entwicklungsphase wird
nicht gesprochen, es ist zumindest nicht während früher Entwicklungsphasen
möglich mit diesem Verfahren zu testen, weil schon sehr viel an technischer Infrastruktur geschaen werden muss und die Handy-App schon programmiert
sein muss, um die jeweiligen Missionen zu testen. Da in dem Test hauptsächlich
jüngere (Jugendliche) Spieler/Innen gewählt wurden, die, wovon man heutzutage ausgehen kann, mit dem Handy umzugehen wissen, wurde auch hier die
Usability nicht explizit getestet. Abgesehen von dem hohen Aufwand zur Erstellung dieses Testszenarios, waren auch noch hohe Evaluationsaufwände nötig.
Allein die Betreuung mittels Instant-Messenger bedeutet hier schon einen hohen
Aufwand, zusätzlich mussten spezielle Interviews erarbeitet werden, um die besondere Spielidee, hier das Energiesparen und das Erziehen zum Energiesparen,
zu evaluieren.
Die folgende Tabelle 3 auf Seite 12 fasst das eben diskutierte nochmal kurz
zusammen.
Tabelle 3: Bewertung in tabellarischer Form.
Vergleichbarkeit
In Entwicklungsphase möglich
Direkte Spielermeinung
Aufwand
Usability wird getestet
PP
OPOS PGF
ITS
+
+
+
+
+
?
?
?
?
+
+
+
+
-
+ : Vorteil; - :Nachteil; ? :Variiert
Paper Prototyping - PP; Outdoor Play Observation Scheme - OPOS; Pervasive GameFlow - PGF; Individuelle Testszenarien - ITS;
62
13
Diese Tabelle repräsentiert einen beschränkten Teil der Aspekte der Einzelmethoden. Daher wird in den folgenden Abschnitten weiter auf die Verfahren
direkt eingegangen.
4
4.1
Diskussion
Pervasive GameFlow
K. Jegers, der sich ausgiebig mit diesem Modell auÿeinander gesetzt hat, fand
in der Evaluation dieser Evaluationsmethode etwas Interessantes heraus: Eines
der acht Elemente war den Spieler/Innen gar nicht wichtig, wohingegen drei
der acht Elemente ganz besonders wichtig waren. In seiner Veröentlichung Elaborating eight elements of fun [Jege09] beschreibt Jegers die Anwendung des
Pervasive GameFlow -Modells anhand eines Spiels und evaluierte dieses Modell
gleich mit. Die Testspieler/Innen sollten das getestete Spiel bewerten und in
einer zusätzlichen Spalte des Fragebogens bewerten, wie wichtig es ihnen war,
dieses Kriterium zu bewerten. Das erfolgte wieder mittels 0-100% Scores und
der Durchschnittsbildung über die Elemente und den Gesamtfragebogen. Dabei
stellte sich heraus, dass den Spielern bei der Bewertung das Element Social Interaction beim Spielen nicht sehr wichtig war. Auÿerdem bewerteten die Spieler
die Elemente Concentration, Challenge und Immersion sehr hoch. Das bedeutet,
dass eine reine Durchschnittsbildung ohne besondere Gewichtung der einzelnen
Elemente nicht zu einem idealen Ergebnis führt. Es ist also davon auszugehen,
dass ein guter Score nicht bedingungslos auch bedeuten muss, das Spiel wird von
den Spieler/Innen auch zu den erreichten X% tatsächlich gut gefunden. Dies zeigt
das Potenzial für weitere Verbesserungen dre Methodik auf.
4.2
Das Outdoor Play Observation Scheme ist, obwohl es in der Tabelle nicht besonders positiv dargestellt wird, ein wichtiges Evaluationsinstrument. Wie schon
in der Beschreibung erwähnt, ist es besonders geeignet für Head-Up -Games,
Kinderspiele. Aber auch Evaluationen von Spielen, bei denen Spieler/Innen mit
Sprachunterschieden, die mit Fragebögen nicht zurechtkämen, könnten bei dieser
Methode in Evaluationen mit einbezogen werden. Mit dieser Methode wäre es
sogar möglich Spiele für körperlich- oder geistig beeinträchtige Spieler/Innen zu
testen, was höchstens mit einem individuell gestalteten Testszenario ermöglicht
wäre.
Für Zielgruppen, bei denen Erwachsene oder Jugendliche zur Evaluation herangezogen werden, könnte man zusätzlich zur Beobachtung Interviews oder Fragebögen in die Auswertung einieÿen lassen und könnte so die direkte Spielermeinung erfassen.
4.3
Paper Prototyping
Beim Paper Prototyping stöÿt man beim Lesen der Tabelle gleich auf den Nachteil der geringen/nicht möglichen Vergleichbarkeit. Dieser Punkt muss aber nicht
63
14
Kevin Härtel
zwangsweise ein gewichtiger Nachteil sein. Da dieses Verfahren schon in sehr frühen Entwicklungsstadien möglich ist, muss das Projekt auch noch nicht verglichen werden. So früh sind meistens die Fragen nach der Usability und der Spielermeinung wesentlich wichtiger, die durch das Paper Prototyping sehr gut geklärt
werden. Auÿerdem hat diese Methode einen enormen Vorteil gegenüber der anderen Methoden: Sie ist auch ohne Programmieraufwände durchführbar. Wie in der
Beschreibung erwähnt, kommt diese Idee mit Vordrucken oder Bleistift-Skizzen
des möglichen Layouts des Interfaces aus. Auf diese Weise ist es durchaus möglich, eine Spielidee schon vor gröÿeren Programmieraufwänden oder geistigen
Anstrengungen und hohen Entwicklungskosten zu testen. Das bedeutet unter
Anderem einen wirtschaftlichen Vorteil.
4.4
Individuelle Testszenarien
Bei individuellen Testszenarien bedeutet der Entwurf eines solchen Tests immer
sehr viel Aufwand. Bei Interviews müssen die richtigen Fragen gestellt , die Testumgebung muss geschaen , die technische Infrastruktur muss gegebenenfalls
komplett eingerichtet werden und die Szenarien müssen exibel genug sein um
bei unvorhergesehenen Ereignissen noch funktionsfähig zu sein. Auÿerdem ist der
Aspekt der Wirtschaftlichkeit hier wohl auch signikant. Das Testverfahren muss
die richtigen Ergebnisse möglichst kostenschonend erbringen. In der Evaluation
von Power Agent ist das gut gelungen, was vor allem durch die sehr groÿe Motivation der Testspieler/Innen getragen wurde. Es ist allerdings auch denkbar,
dass ein gewähltes Testszenario am Ende einer langen Entwicklungsphase nicht
das gewünschte Ergebnis hervorbringt. In einem solchen Fall würde der Entwicklungsaufwand für das Testszenario nur zusätzliche Kosten und Mühen bedeuten.
Der enorme Vorteil liegt jedoch auch klar auf der Hand. Man kann für durchaus
besondere Features in Spielen testen, wie es beim Power Agent gemacht wurde.
Es ist nicht selten, dass Spiele auch für Kinder und Jugendliche erziehende Elemente haben, über bestimmte Themen aufklären und somit auch das Leben der
Testspieler/Innen beeinussen sollen. Kein anderes Evaluationsverfahren erfasst
solche Spielelemente.
5
Zusammenfassung
Letztendlich gilt es zu sagen, dass alle vorgestellten Evaluationsverfahren wichtig
sind. Selbst wenn sich nicht jedes Verfahren für jede der gegebenen Sparten von
Pervasive Games eignet. Wo beispielsweise das Pervasive GameFlow -Modell für
jedes Smart Toy, Aective Game, Augmented Tabletop Game, Augmented Reality Game und Location Aware Game, ohne spezielle Features denkbar einfach
anzuwenden ist, sieht es mit dem Paper Prototyping bei Smart Toys oder Location Aware Games schon etwas schwieriger aus.
Es ist anhand der groÿen und immer weiter wachsenden Anzahl an Pervasive
Games durchaus möglich, dass eine Kombination aus den verschiedenen Evaluationsverfahren ein besseres Ergebnis bringt als die Einzelbenutzung. Bei gröÿeren
64
15
Projekten, die mit hohen Kosten- und Arbeitsaufwänden verbunden sind, bietet
es sich immer an, aktuelle Stände oder sogar die zu Grunde liegende Idee zu
testen. Hierfür würde sich ein Paper Prototyping anbieten. Auch denkbar wäre
es, einfache Umfragen zu starten, um die Spielidee zu evaluieren und Kommentare und Ideen zu sammeln, die Spielidee zu erweitern oder zu verbessern. Hierfür würden sich Internet-Polls, Fragebögen an Messeständen, Kurzinterviews auf
der Straÿe, Brieragebögen, Telefonumfragen und dergleichen auch anbieten. Bei
weiter fortgeschrittenen Projekten bietet sich dann ein Alpha- oder Beta-Test
an, den man mittels Pervasive GameFlow realisieren könnte. Falls dieses Verfahren jedoch nicht ausreicht das spezielle Feature zu testen würde sich eventuell
ein individuell gestalteter Test in Form eines Interviews anbieten. Am Ende der
Entwicklungsphase sind dann gröÿere Evaluationsläufe denkbar, die dann mittels Outdoor Play Observation Scheme, Pervasive GameFlow oder individuellem
Testszenario verwirklicht werden könnten. Man beachte, dass die eben genannten Möglichkeiten nur einen Teil der denkbaren Kombinationen darstellen. Das
Pervasive GameFlow -Modell schneidet in den hier geführten Vergleichen sehr
gut ab, bis auf wenige Verbesserungsmöglichkeiten und dass es sich nicht für
spezielle Features bei Spielen eignet. Es ist das am meisten zitierte und am
häugsten erwähnte, wohl auch am häugsten gebrauchte Evaluationsverfahren
unter den hier aufgezählten.
Dennoch ist es wichtig, auch andere Evaluationsverfahren kennenzulernen
und seinen Blick für das Mögliche zu weiten. Da sich die Spiele und Spieltypen
immer weiterentwickeln ist es nötig auch die Evaluationsmethodiken weiterzuentwickeln. Viele Pervasive Games lieÿen sich nicht mit älteren Verfahren evaluieren. Oder die Evaluation würde der Pervasive-Komponente der Spiele nicht gerecht werden. Ein gutes Beispiel hierfür ist das ursprüngliche GameFlow -Modell.
Eine zukunftsträchtige Evaluationsmöglichkeit 2 wird jüngst von Valve Software, den Entwicklern vieler bekannter Spieltitel, erforscht: Biosignale (in diesem
Fall: Den Puls, den elektrischen Hautwiderstand, die Mimik, Augenbewegungen
und Gehirnströme.) der Spieler beim Spielen des zu testenden Spiels messen
und auswerten. Mittels Aufzeichnung der Biosignale und Synchronisation, der
aus den Messwerten entstandenen Kurven, wird aktuell versucht ein Muster zu
erkennen, aus dem sich der Spielspaÿ ablesen lässt. Im Moment ist es schon
Möglich einen deutlichen Unterschied zwischen Erregungsphasen und Entspannungsphasen während des Spielens auszumachen. Die Vision, die Valve Software
verfolgt würde bedeuten, mittels einfacher Sensoren laufende Spiele beim Spielen
zu beeinussen. Für die Evaluation von Spielen bedeutet das eventuell, präzise
Denitionen von Spielspaÿ treen zu können und diesen messbar zu machen.
5.1
Schlusswort
In dieser Seminararbeit wurden einige Evaluationsmethoden vorgestellt, die sich
besonders eignen Pervasive Games zu bewerten. Anschlieÿend wurde versucht
2
Quelle: http://www.heise.de/newsticker/meldung/GDC-Valve-nutzt-Biosignale-zurSpielverbesserung-1201960.html, Stand: 10.03.2011, 'GDC: Valve nutzt Biosignale
zur Spielverbesserung'
65
16
Kevin Härtel
Vor- und Nachteile der Methoden auf einen gemeinsamen Nenner zu bringen
und die Methoden gegenüber zu stellen. Alle nicht direkt vergleichbaren Aspekte wurden danach erläutert und eventuelle Missverständnisse, die in der Gegenüberstellung aufgetreten sein könnten beseitigt. Zum Schluss wurden Ideen bereitgestellt, diese Methoden zu kombinieren. Diese Seminararbeit ist weitgehend
ein zusammengefasstes Studium mehrere Veröentlichungen aus dem Bereich
Pervasive Games, nach Lektüre dieser Arbeit ist ein Einzelstudium nur noch
nötig um tiefere Erkenntnisse zu erlangen. Dennoch gilt es zu sagen, dass sich
das Einzelstudium der Veröentlichungen sehr lohnt, auch wenn einige Paper
und Studien schon mehr als fünf Jahre alt sind.
Literatur
BaMK08. Saskia Bakker, Panos Markopoulos und Yvonne de Kort. OPOS: an observation scheme for evaluating head-up play. In Proceedings of the 5th
Nordic conference on Human-computer interaction: building bridges, NordiCHI '08, New York, NY, USA, 2008. ACM, S. 3342.
GuBå08. Anton Gustafsson und Magnus Bång. Evaluation of a pervasive game for
domestic energy engagement among teenagers. In Proceedings of the 2008
International Conference on Advances in Computer Entertainment Tech-
ACE '08, New York, NY, USA, 2008. ACM, S. 232239.
Kalle Jegers. Pervasive game ow: understanding player enjoyment in pervasive gaming. Comput. Entertain., Band 5, January 2007.
Jege09.
Kalle Jegers. Elaborating eight elements of fun: Supporting design of pervasive player enjoyment. Comput. Entertain., Band 7, June 2009, S. 25:1
25:22.
KoEl06.
Elina M. I. Koivisto und Mirjam Eladhari. Paper prototyping a pervasive
game. In Proceedings of the 2006 ACM SIGCHI international conference
on Advances in computer entertainment technology, ACE '06, New York,
NY, USA, 2006. ACM.
MCMN05. Carsten Magerkurth, Adrian David Cheok, Regan L. Mandryk und Trond
Nilsen. Pervasive games: bringing computer entertainment back to the real
world. Comput. Entertain., Band 3, July 2005, S. 44.
SwWy05. Penelope Sweetser und Peta Wyeth. GameFlow: a model for evaluating
player enjoyment in games. Comput. Entertain., Band 3, July 2005, S. 3
3.
Jege07.
nology,
66
Telecooperation Office (TecO)
Universität Karlsruhe (TH)
Seminar Ubiquitous Systems“
”
Wintersemester 09 / 10
Überblick über die in der Industrie eingesetzten
Verfahren zur Positionsbestimmung auf Basis von
802.15.4 und 802.15.4A
Autor:
Michael Quednau
Betreuer: Markus Scholz
Datum:
19. Februar 2010
67
2
1
Ubiquitous Systems
Einleitung
Ende der 1990er Jahre wurde der Bedarf für drahtlose Datenübertragung sowie
die Anforderungen an diese immer größer. [varb] Mehr und mehr Endgeräte waren miteinander vernetzbar und sollten miteinander kommunizieren können. Ein
neuer Standard, der permanent sende- und empfangsbereit sein kann und dabei
einen möglichst geringen Energiebedarf hat wurde benötigt.
Die damals verfügbaren Standards Bluetooth und 802.11 (WLAN) entsprachen
diesen Anforderungen nicht. [varb]
Somit formierte sich aus der IEEE Arbeitsgruppe 802.15, die sich mit drahtlosen Personal Area Networks (PANs) beschäftigte, eine neue Unterarbeitsgruppe
802.15.4, die sich fortan mit der Entwicklung eines neuen Standards beschäftigte,
der unkomplex, günstig in der Herstellung, robust und energieeffizient sein sollte.
[vara]
Doch was genau unterscheidet diese Standards von anderen und wie sind sie
aufgebaut?
Neben der Datenübertragung werden die Standards zudem auch noch in der Positionsbestimmung eingesetzt. Firmen wie Ubisense, Nanotron, Awarepoint und
Freescale haben, basierend auf diesen Standards, Ortungssyteme auf den Markt
gebracht.
Diese Arbeit soll nun einen Einblick in die beiden Standards 802.15.4 und 802.15.4a
geben und die System vorstellen und miteinander vergleichen.
In Kapitel 2 geht es zunächst um die Standards selbst, bevor die Systeme in Kapitel 3 und 4 näher vorgestellt und in Kapitel 5 miteinander verglichen werden.
In Kapitel 6 werde ich dann abschließend einen Überblick über den aktuellen
Stand und die Zukunftsperspektiven geben.
2
Standards
Im Jahr 2003 wurde der Standard 802.15.4-2003 veröffentlicht. Er gewährleistet
eine schnelle, energieeffiziente, drahtlose Datenübertragung. [IEE06]
Die Energiersparsamkeit wurde dadurch erreicht, dass die einzelnen Knoten im
Netz lange Ruhephasen haben. Immer wenn sie nicht benötigt werden, gehen
sie in einen Ruhezustand und sobald sie benötigt werden, aktivieren sie sich innerhalb weniger Millisekunden. Dadurch können batteriebetriebene Knoten eine
Laufzeit zwischen 6 Monaten und 2 Jahren erreichen.[varb]
Als sich 2004 die Arbeitsgruppe 802.15.4b formierte, um den Standard 802.15.42003 zu überarbeiten, stellte diese Gruppe ihre Arbeit ein.
Im Mai 2005 gründete sich die IEEE Arbeitsgruppe 802.15.4a, um eine noch
präzisere Ortung (unter einem Meter) zu erarbeiten.
Noch bevor es zu einer Veröffentlichung kam, wurde im Jahr 2006 der 802.15.42006, die überarbeitet Version des 802.15.4, veröffentlicht.
2007 wurde dann der Standard 802.15.4a-2007 veröffentlicht und als Weiterentwicklung des 802.15.4-2006 anerkannt. [vara]
(Im Folgenden steht 802.15.4 stellvertretend für 802.15.4-2006 und 802.15.4a
stellvertretend für 802.15.4a-2007)
68
Überblick über die Verfahren zur Positionsbestimmung
2.1
3
802.15.4
Aufbau
Der Standard definiert zwei unterschiedliche Knotenarten
– Reduced Function Device (RFD)
RFDs haben nur einen Teil der Funktionen und können nicht untereinander
kommunizieren.
Sie kommunizieren ausschließlich mit FFDs
Sie sind dafür sehr kostengünstig
Sie werden insbesondere da eingesetzt, wo selten gesendet und empfangen
wird.
– Full Function Device (FFD)
FFDs haben den vollen Funktionsumfang
Pro Netz wird ein FFD als Personal Area Network (PAN) Koordinator eingesetzt
Der PAN Koordinator grenzt das Netz gegen andere Netze in Funkreichweite
mittels des von ihm festgesetzten PAN Identifieres ab
Im Slotted Mode, auf den weiter unten eingegangen wird, übernimmt er
zudem die Synchronisierung der Knoten. [varb]
Basierend auf dem OSI Modell, umfasst der Standard die unteren beiden
Schichten, die Physikalische Schicht und die Sicherungsschicht
Physikalische Schicht
802.15.4 umfasst vier Physikalische Schichten. Drei basierend auf dem Frequenzspreizverfahren Direct Sequence Spread Spectrum (DSSS) und eine basierend auf dem Frequenzspreizverfahren Parallel Sequence Spread Spectrum
(PSSS)
Funkübertragung wird in der Regel in einem kleinen (schmalen) Bereich
durchgeführt. Das Problem, das sich dabei stellt, ist dass die Übertragung
dadurch sehr störanfällig ist. Ein Störsignal kann bei Schmalbandkommunikation eine erhebliche Menge der übertragenden Daten untauglich machen.
Somit wurden einige Spreizverfahren entwickelt, die dafür sorgen, dass die
Kommunikation auf einem breiteren Spektrum durchgeführt wird und Störsignale
keinen eklatanten Schaden anrichten können. [varc]
– Direct Sequence Spread Spectrum
Bei diesem Verfahren wird die Spreizung dadurch erreicht, das jedes Bit
mit einem 11 Bit langen Code Exklusiv-Oder (XOR) verknüpft wird
(siehe Abbildung 1).
Die zu übertragenden Daten werden somit verelffacht.
Mit dem selben Code kann beim Empfänger der ursprüngliche Code
wiederhergestellt werden. [Man02]
– Parallel Sequence Spread Spectrum
Dieses Verfahren ist ähnlich dem DSSS. Auch hier werden Daten codiert
und dann übertragen.
69
4
Ubiquitous Systems
Abbildung 1. Prinzip des Direct Sequence Spread Spectrum [Man02]
Während jedoch beim DSSS eine Spreizsequenz nach der anderen übertragen
wird, werden hier mehrere (zyklisch gegeneinander verschobene) Sequenzen parallel übertragen. [ITW]
Sicherungsschicht
Standardisierte Schnittstellen
Jede Schicht nutzt die Funktionen, die ihr von der unter ihr liegenden
Schicht zur Verfügung gestellt werden und stellt ihre Funktionen wiederum der über ihr liegenden Schicht zur Verfügung.
Da der Standard nur die unteren beiden Schichten definiert, müssen
Funktionen der höheren Schichten von anderen Protokollen übernommen
werden, die auf den 802.15.4 Standard aufbauen.
Hier ist zum Beispiel der ZigBee Standard zu nennen. [Hes05]
Die genaue Implementierung von Funktionen kann herstellerbedingt unterschiedlich sein. Damit aber Geräte unterschiedlicher Hersteller miteinander kombiniert werden können, legt der 802.15.4 Standard klar den
Funktionsumfang fest. [varb]
Übertragungsverfahren
Wie oben bereits erwähnt gibt es unterschiedliche Übertragungsverfahren.
802.15.4 umfasst den Unsloted Mode sowie den Slotted Mode [varb]
– Unsloted Mode
Hier erfolgt die Übertragung asynchron. Jeder Knoten prüft mittels
Carrier Sense Multiple Access/Collision Avoidance (CSMA/CA) zur
Kollisionsvermeidung, ob der Kanal frei ist. Ist er frei, so sendet er,
ist er belegt, so wartet er eine zufällig ausgewählte Zeit und versucht
es erneut. Ist er nach einigen Versuchen nicht erfolgreich, bricht er
ab.
Im gesendeten Paket kann zudem vermerkt sein, dass der Empfänger
eine Empfangsbestätigung (ACK) sendet.
Im
1.
2.
3.
Unslotted Mode gibt es drei Kommunikationsszenarien
Teilnehmer an PAN Koordinator
Teilnehmer an Teilnehmer
PAN Koordinator an Teilnehmer
70
5
Der PAN Koordinator muss somit immer aktiv sein, hat dafür aber
keinen Verwaltungsaufwand, da die Knoten dies selbst übernehmen.
[varb]
– Slotted Mode
Hier synchronisiert der PAN Koordinator das Senden der Teilnehmer. Er hat also einen Verwaltungsaufwand und wie im Unslotted
Mode muss er ebenfalls immer aktiv sein. [varb]
Topologie
Die Knoten können in drei unterschiedlichen Netztopologien angeordnet werden.
– Stern
Alle Knoten (sowohl RFDs als auch FFDs) kommunizieren ausschließlich mit
dem PAN Koordinator
– Peer to Peer
Alle Knoten können, entsprechend ihres Funktionsumfangs, neben dem PAN
Koordinator auch mit den anderen Knoten kommunizieren
– Baum
Diese Topologie ist eine Mischung aus den beiden Ersten.
Die Blätter des Baums sind die RFDs. Sie kommunizieren mit einem FFD,
der für einen Bereich des Netzes die Koordination übernimmt.
Diese FFDs wiederum kommunizieren mit dem PAN Koordinator. [varb]
2.2
802.15.4a
Der Standard 802.15.4a ist eine Erweiterung des Standards 802.15.4.
Er wurde 2007 veröffentlicht und hat zwei weitere physikalische Schichten, die
zu den vorher definierten Schichten (drei DSSS und eine PSSS) hinzukommen.
Eine basierend auf Ultra Wideband (UWB) und eine basierend auf Chirp Spread
Spectrum (CSS). [vard]
– Ulta Wideband
Ähnlich wie bei den Spreizverfahren werden bei der Ultra Wideband Technologie die Daten auf einem sehr breitem Spektrum übertragen. Im Unterschied
zu DSSS und PSSS werden die Daten allerdings nicht künstlich verbreitert,
sondern sind von vornherein breit angelegt.
UWB basiert auf Impulsverfahren. Jeder Impuls wird als ein Bit dargestellt.
Da die Impulse möglichst kurz sein sollen wird ein breites Frequenzspektrum genutzt. Zudem ist eine maximale Leistungsdichte von 70nW/MHz
vorgeschrieben.
Dadurch ist allerdings auch der Energiebedarf sehr gering, was eine der wesentlichen Vorteile dieser Technologie darstellt. [Wen06]
UWB arbeitet im 3,1 bis 10,6 GHz Frequenzbereich [vare]. Die kurzen Impulse erzeugen ein Rauschen, dass nur schwer abzuhören ist, da es für Außenstehende nicht zu unterscheiden ist, ob es sich um ein Rauschen oder eine
Datenübertragung handelt. [MW05]
71
6
Ubiquitous Systems
– Chirp Spread Spectrum
Dieses Verfahren arbeitet im 2.45 GHz ISM Band und wurde von Nanotron
entwickelt und nutzt die zur Verfügung stehende Bandbreite genau wie UWB
voll aus.
Dies begründet sich durch das Verfahren selbst: Es werden sinusförmige Signale erzeugt, die das gesamte Band mit sich ändernder Frequenz durchlaufen. Die Amplitude bleibt dabei stets gleich. (siehe Abbildung 2) [Nana]
Die Daten werden durch eine Aneinanderreihung der Chirp-Pulse übertragen.
Abbildung 2. Chirpimpulse [Nana]
Der Vorteil vom CSS Verfahren ist die Störunempfindlichkeit. Es ist sowohl
gegen Breitband-, Schmalband- sowie gegen Multipath-Störungen unempfindlich.
Und insbesondere auch gegen den Doppler-Effekt. Dies liegt daran, dass für
die Übertragung nur die Frequenzänderung pro Zeit wichtig ist und nicht
die exakte Frequenz selbst. [varf]
Dieses Verfahren hat sein Vorbild in der Natur. Fledermäuse und Delfine
nutzen ebenfalls Frequenzänderungen zur Kommunikation. ([Nana]) Interessant ist, dass dieses Verfahren derzeit ausschließlich von Nanotron selbst
verwendet wird, auf Bemühungen derer dieses Verfahren in den Standard
802.15.4a überhaupt erst aufgenommen wurde. ([varg])
72
3
3.1
7
Systeme zur Positionsbestimmung auf Basis von
802.15.4
Freescale
Im Jahre 2004 gliederte Motorola seinen Halbleiterbereich aus und gründetet die
Tochterfirma Freescale. Seit 1953 fertigt Motorola Halbleiter, so dass Freescale
von Anfang an über eine große Wissensbasis auf diesem Gebiet verfügte.
Der Hauptsitz der Firma ist in Texas, aber es gibt weltweit mehr als 20 Zweigstellen. [varh] [HJCB09]
Freescale nutzt für ihre Position Location Monitoring System 802.15.4 und ZigBee Technologie.
ZigBee ist ein Standard, der die höher liegenden Schichten des OSI Modells zur
Verfügung stellt und auf den 802.15.4 aufbaut.
Mit dieser Kombination erreicht Freescale Reichweiten bis zu 300m. In Gebäuden
reduziert sich die Reichweite auf 25-75m. Die genaue Reichweite ist von der Konstruktion der Gebäude abhängig. [HJCB09]
Im Netzwerk befinden sich drei Arten von Knoten: (siehe Abbildung 3)
– Mobile Knoten
– Statische Knoten
– Zugangs Knoten
Abbildung 3. ZigBee Netzwerk [HJCB09]
Mit einer Genauigkeit von 3m können die mobilen Knoten des Netzwerks
bestimmt werden.
Dazu wird die Signalstärke als Indikator verwendet (Received Signal Strength
Indication (RSSI)).
Hierbei wird die Signalstärke gemessen und daraus errechnet, wie weit der Sender entfernt ist. [HJCB09]
Die Signalstärke ist allerdings keine deterministische Größe und somit kein verlässlicher
Faktor. [Red06]
73
8
Ubiquitous Systems
Mindestens drei statische Knoten müssen die Signalstärke messen können, um
einen mobilen Knoten halbwegs genau zu bestimmen.
Dabei verhält es sich so, dass zudem Daten von bekannten Orten innerhalb des
entsprechenden Gebiets gespeichert sind und mit den gemessenen Werten verglichen werden um den Standort des mobilen Knotens zu verifizieren.[HJCB09]
Angewendet wird das System zur Lokalisierung von Personen.
Zum Beispiel in einer Schule (siehe Abbildung 4)
Die statischen Knoten sind über das gesamte zu überwachende Gebiet verteilt
und zwar so, dass jeder mögliche mobile Knoten jederzeit in Reichweite von drei
statischen Knoten ist. Die mobilen Knoten sind in Karten eingebettet, die jeder
Lehrer und jeder Schüler jederzeit bei sich führt.
Abbildung 4. Freescale Einsatzbeispiel: Schule [HJCB09]
74
3.2
9
Awarepoint
Die Firma wurde 2002 in San Diego gegründet. Die Gründer vertraten die Auffassung Erfolg benötigt Fokussierung“ und so konzentrierte sich Awarepoint
”
auf die Entwicklung eines Echtzeit Positionierungssystems, das im Gesundheitswesen, vor allem in Krankenhäusern, eingesetzt werden kann. [Awa]
Das entwickelte System Active RFID zeichnet sich durch seine Einfachheit aus.
Es beinhaltet Sensoren, Tags, Brücken und eine Software Plattform (siehe Abbildung 5). Die einfachste Anwendung (Lokalisierung von Equipment) lässt sich
hier sogar mit einem Browser realisieren. [Red06]
Abbildung 5. Awarepoint Real Time Awareness Solution [Awa]
Da sich Awarepoint auf den Gesundheitssektor spezialisiert, sind insbesondere die Tags für den Klinikbereich konzipiert.
Neben den normalen Tags, die an Objekten befestigt oder von Menschen getragen werden, gibt es auch einen Tag, der hitzebeständig, wasserdicht und steril
ist. Er eignet sich somit sogar für die Nutzung in Operationssälen.
Weiterhin gibt es einen Temperatur überwachenden Tag, der mit einer Genauigkeit von 1◦ C im Bereich -28◦ C - +90◦ C arbeitet. (siehe Abbildung 6)
Wie auch Freescale nutzt Awarepoint ZigBee Technologie zur Unterstützung des
802.15.4.
Und auch hier wird die Signalstärke gemessen um die Entfernung zu bestimmen
(RSSI) und das Awarepoint System schafft es so auf eine Genauigkeit von ca. 5m.
Das ist zwar nicht sehr genau aber es ist der derzeit einfachste und kostengünstigste
Ansatz.
Die Alternative wäre das Time of Flight Verfahren, das sehr präzise Uhren
benötigt. Auf dieses Verfahren wird weiter unten noch eingegangen. [Red06]
75
10
Ubiquitous Systems
Abbildung 6. Awarepoint Tags v.l.: Standard, Wearable, Temperature Monitoring [Awa]
4
4.1
Systeme zur Positionsbestimmung auf Basis von
802.15.4A
Ubisense
Über Ubisense
Ubisense ist ein Unternehmen das Ortungssysteme entwickelt. Es wurde 2002 in
Cambridge gegründet.
Das Ortungssystem von Ubisense nutzt UWB Technologie und kann Gegenstände
wie Personen mit einer Genauigkeit von 15cm orten und in 3D darstellen.
Es lassen sich mit einem System mehr als 1000 Objekte überwachen.
Mit bereits zwei Sensoren kann ein Tag genau bestimmt werden.
Eingesetzt wird das Ubisensesystem in den Bereichen Logistik, Gesundheitswesen, Fertigung, Büro, Gefahrengüter und Militär. [Ubi]
Das Ubisense Ortungssystem
Das System besteht aus drei Komponenten
– Sensoren
Die Sensoren empfangen die kurzen UWB Signale der Tags.
Sie bestimmen den Einfallswinkel des Signals (Angle of Arrival - AoA).
Ebenso messen sie die Zeit, die zwischen senden und empfangen vergangen
ist (Time of Flight - ToF). Mittels dieser beiden Verfahren kann bereits ein
einziger Sensor hinreichend genau die Position eines Tags bestimmen, wenn
man die Höhe (Z-Koordinate) annimmt.
Nimmt man einen weiteren Sensor dazu, so ist bereits eine 3D Lokalisierung möglich. Die 3D Lokalisierung lässt sich auf eine Genauigkeit von 15cm
steigern, wenn die Sensoren mit einem sogenannten Timingkabel miteinander verbunden werden und somit die Differenz der Signaleingänge bestimmt
wird (Time Difference of Arrival - TDoA) [War07]
76
11
Abbildung 7. Ubisense Sensor [Ubi]
Messverfahren Sensoren Darstellung
AoA
AoA
1
3D bei angenommener Z-Koordinate
2 oder mehr 3D
AoA und TDoA 2 oder mehr 3D mit 15cm Genauigkeit
Tabelle 1. Übersicht der Ubisense Messverfahren [War07]
– Tags
Sie senden die Signale. Sie bieten eine Aktualisierungsrate von bis zu 20 Mal
pro Sekunde.
Es gibt die Tags in zweifacher Ausführung. Einmal als Slim Tag und einmal
als Compact Tag (siehe Abbildung 8).
Der Slim Tag ist zum Tragen am Körper gedacht. Er hat zwei programmierbare Drucktasten, zwei LEDs und einen Summer. So kann der Tag zum
Beispiel beim Betreten eines Gefahrenbereichs einen Summton erzeugen.
Der Compact Tag hingegen ist kleiner und zur horizontalen Anbringung an
Objekten gedacht. Beide Tags haben robuste Gehäuse. [Ubi]
Abbildung 8. Ubisense Tags v.l.: Slim, Compact [Ubi]
77
12
Ubiquitous Systems
– Software
Die Software visualisiert alle Daten in einer 3D Umgebung (siehe Abbildung
9).
Zudem können Daten gespeichert und ausgewertet werden. [Ubi]
Abbildung 9. Visualisierung der Daten durch die Ubisense Software [Ubi]
Ortungsverfahren
– Angle of Arrival (AOA)
Bei diesem Verfahren wird der Winkel gemessen, der sich zwischen zwei
Linien befindet: Die eine Linie verläuft vom Sender zum Empfänger und die
andere verläuft in eine definierte Richtung, zum Beispiel nach Norden (siehe
Abbildung 10).
Das Problem, das sich hierbei stellt, ist dass man zur Bestimmung der Winkel
einige Antennenarrays im Empfänger benötigt. Je mehr Antennenarrays zur
Verfügung stehen, desto genauer ist das Ergebnis. [Nan07]
– Time of Flight (ToF)
Die Sendezeit wird mit der Ankunftszeit verglichen. Unter Annahme einer Signalgeschwindigkeit kann damit die Distanz zwischen Sender und Empfänger
mit einer Genauigkeit von 1m bis 2m bestimmt werden.
Problem dabei ist, dass man sehr präzise Uhren braucht. Je präziser sie sind,
desto kostspieliger sind sie. Zudem kann das Signal gestört und dadurch das
Ergebnis verfälscht werden.
Trotz dieser Umstände ist das ToF Verfahren eines der genauesten und sichersten. [Nan07]
78
13
Abbildung 10. Das Angle of Arrival Prinzip [Nan07]
4.2
Nanotron
NanoTron, ein Unternehmen aus Berlin, entwickelt wie Ubisense Systeme zu Lokalisierung von Gegenständen, Tieren und Personen.
Das primäre Produkt ist das NanoLOC TRX System. Es basiert auf der CSS
Technologie und bietet eine Genauigkeit von 1m.
Im Gegensatz zu Ubisense liefert Nanotron kein fertiges Produkt, sondern lediglich die Technik.
Einsatzmöglichkeiten bieten sich hier ebenfalls in der Logistik, Gesundheitswesen, Sicherheitsbereich und im Gefahrguttransport. [Nanb]
Abbildung 11. nanoLOC Module [Nanb]
Das Nanotron Ortungssystem
Der nanoLOC TRX Transceiver ist das Herzstück des Systems. Es ist ein Chip,
der robuste drahtlose Kommunikation sowie Ortung anbietet.
Die Ortung läuft dabei neben der eigentlichen Kommunikation ab, so dass keine
weiteren Kanäle oder infrastukturelle Maßnahmen notwendig sind.
Möglich wird dies durch das eingesetzte Verfahren Symmetric Double Sided Two
Way Ranging (SDS-TWR) (siehe Abbldung 12)
Dieses Verfahren basiert auf dem ToF Verfahren. Um nicht auf präzise Uhren
angewiesen zu sein, wird hier die Zeit gemessen, die ein Signal benötigt um zum
Empfänger und zurück zu kommen (Two Way). Um die Genauigkeit noch zu
erhöhen wird die Messung zeitgleich (symmetric) zweifach durchgeführt. Einmal
vom Sender, einmal vom Empfänger (Double Sided).
79
14
Ubiquitous Systems
Die Gesamte Messung wird wie folgt durchgeführt:
Knoten 1 sendet ein Datenpaket an Knoten 2. Dieser stoppt die Zeit, die zwischen Empfangen und Senden bei ihm vergeht und schickt eine Bestätigung
(ACK) sowie ein Datenpaket zurück an Knoten 1.
Knoten 1 empfängt die Bestätigung von Knoten 2 und hat somit die benötigte
Zeit für die Hinrichtung, die Rückrichtung und die Zeit, die zwischen Empfang
und Senden bei Knoten 1 vergangen ist und kann somit die Flugzeit berechnen.
Danach empfängt er das Datenpaket von Knoten 2 und sendet eine Bestätigung
an Knoten 2, der daraufhin ebenfalls die Flugzeit berechnen kann. [Nan07]
[Sch07]
Abbildung 12. Das SDS-TWR Verfahren [Nan07]
Vorteile:
– keine präzisen Uhren notwendig
– Knoten müssen nicht synchronisiert werden
– Sender und Empfänger können getauscht werden, da das Verfahren symmetrisch ist
– es eignet sich sowohl für Datenübertragung als auch für Positionsbestimmung
– geringe infrastrukturelle Anforderungen
80
15
5
Vergleich der vorgestellten Systeme
System
Standard
Physikalische
Schicht
Frequenzband
Freescale
802.15.4
DSSS
2.4 GHz ISM
Awarepoint
802.15.4
DSSS
2.4 GHz ISM
Ubisense
802.15.4a
UWB
3,1 bis 10,6 GHz
802.15.4a
CSS
2.45 GHz ISM
Nanotron
Tabelle 2. Technische Zusammenfassung
System
Positionsbestimmung
durch
Reichweite
Genauigkeit
Einsatzbereich
Freescale
RSSI
75m
3m
Lokalisierung
von Personen
Awarepoint
RSSI
100m
5m
Gesundheitsbereich
Ubisense
AoA und ToF
160m
15cm
Lokalisierung
von
Personen,
Gefahrengüterüberwachung, Miltär
Nanotron
SDS-TWR
100m
1m
Lokalisierung
von
Personen,
Gefahrengüterüberwachung
Tabelle 3. Funktionelle Zusammenfassung
In dieser Arbeit wurden vier unterschiedliche Systeme betrachtet. Zwei arbeiten auf dem Standard 802.15.4 (Freescale und Awarepoint) und zwei auf dem
Standard 802.15.4a (Ubisense und Nanotron).
Die beiden 802.15.4 System nutzen beide das Direct Sequence Spread Spectrum
im 2.4 GHz ISM Band.
Bei den 802.15.4a Systemen gibt es Unterschiede in der genutzten physikalischen
81
16
Ubiquitous Systems
Schicht. Während Ubisense UWB nutzt, arbeitet Nanotron auf der von ihnen
maßgeblich mitentwickelten CSS Schicht.
Während die beiden 802.15.4 Systeme die Signalstärke nutzen um die Position
zu bestimmen, setzt Ubisense eine Kombination aus Einfallswinkel und Signalflugzeit ein. Nanotron umgeht das Problem der präzisen Uhren und nutzt das
SDS-TWR verfahren.
Eingesetzt werden alle System vorrangig zum Tracking von Personen. Awarepoint konzentriert sich dabei auf den Gesundheitssektor und Ubisense und Nanotron finden auch Verwendung in der Überwachung von Gefahrengütern.
Die Reichweite der System im Gebäude befindet sich im Bereich zwischen 75m
und 160m und die Genauigkeit zwischen 0,15m und 5m.
6
Schlussfolgerungen und Ausblicke
Betrachtet man die vier Systeme erkennt man, dass keines das eindeutig Beste
ist. Letzten Endes kommt es auch auf die Anwendung an. Ubisense bietet schon
jetzt ein funktionierendes, einfaches System und bietet zudem auch noch den
Service der Installation. Das hat aber auch durchaus seinen Preis. Ein Ubisense
System mit 4 Sensoren, 10 Tags und der Software kostet inklusive Installation,
Schulungen und Wartung ca. 20.000 Eur.
Die anderen Systeme sind, soweit man die Preise ermitteln kann, zwar günstiger
(ein Nanotron Modul kostet ca. 70 Eur, die Freescale Software ca. 400 Eur) aber
dort gibt es auch keinerlei Service. Ein Vergleich der Systeme aus finanzieller
Sicht ist daher sehr aufwendig und es wird an dieser Stelle davon abgesehen.
Nanotron liefert nur die Technik und ist nicht für den Endverbraucher gedacht.
Dafür liefern sie ein robustes System. Insbesondere durch Nutzung von CSS in
Verbindung mit dem SDS-TWR Verfahren.
Als Erstes auf dem Markt und somit der Konkurrenz weit voraus waren Freescale. Bereits im August 2004 bekamen sie die Freigabe der Federal Communications
Commission (FCC). Die Konkurrenz erst einige Monate später.
Und auch wenn UWB als die beste Technologie gefeiert wird, so gibt es durchaus
auch Kritikpunkte. Das Wichtigste ist, dass die Nutzung von UWB in Europa
noch nicht in allen Bereichen freigegeben ist. Daraus folgt unmittelbar, dass mit
UWB Technologie noch keine weitreichenden Erfahrungen gesammelt werden
konnten und die ganzen Vorteile eher theoretischer Natur sind. Doch auf der
anderen Seite ist die Genauigkeit, die Ubisense schon heute erreicht, ein Vorteil,
der nicht von der Hand zu weisen ist. Doch da diese Präzision nicht in allen
Bereichen erforderlich ist, insbesondere bei der Lokalisierung von Personen, ist
dies kein allzu bedeutender Nachteil für die anderen Anbieter.
Interessant wird es auf jeden Fall, sobald UWB durch die FCC vollends frei gegeben wird und es somit mehr zum Einsatz kommt. Dann wird sich zeigen, ob
die Technik das hält, was sie verspricht.
82
17
Literatur
Awa.
Awarepoint. Awarepoint real-time awareness solutions. http: // www.
awarepoint. com , (17.02.2010).
Hes05.
Andre Hesse. Ieee 802.15.4 und zigbee. http: // zack1. e-technik.
tu-ilmenau. de/ ~ webkn/ Arbeiten/ Hauptseminarreferat/ hs%
20zigbee-802. 15. 4% 20V2. pdf , pages 4–5, 13.07.2005.
HJCB09. Oziel Hernandez, Varun Jain, Suhas Chakravarty, and Pashant Bhargava.
Position location monitoring. Beyond bits, 4:67–73, 2009.
IEE06.
IEEE. Ieee 802.15.4, 05.09.2006.
ITW.
ITWissen.
It wissen - das große online-lexikon für infomrationstechnologie.
http: // www. itwissen. info/ definition/ lexikon/
parallel-sequence-spread-spectrum-PSSS. html , (16.02.2010).
Man02. Thilo Manske.
Wireless local area networks; kapitel 3.1: Spread
spectrum.
http: // einstein. informatik. uni-oldenburg. de/ lehre/
semester/ seminar/ 02ss/ wlan/ WLAN. html , 13.08.2002.
MW05. Rudolf Zetik Mike Wolf, Jürgen Sachs. Ultra-breitband: Highspeed für
funknetzwerke, teil i.
http: // www. tecchannel. de/ netzwerk/ wlan/
429761/ ultra_ breitband_ highspeed_ fuer_ funknetzwerke_ teil_ i/ ,
12.04.2005.
Nana.
Nanotron. Nanotron technology - chirp spread spectrum (css). http: //
www. nanotron. com/ EN/ CO_ techn-css. php , (16.02.2010).
Nanb.
Nanotron. Reliable real time location systems and location-aware wsn with
nanotron technologies gmbh. http: // www. nanotron. com , (17.02.2010).
Nan07. Nanotron. Real time location systems (rtls). http: // www. nanotron. com/
EN/ pdf/ WP_ RTLS. pdf , (16.02.2010), pages 2–11, 30.05.2007.
Red06.
Joseph Reddy.
Localization in wireless networks leveraging zigbee. http: // www. zigbee. org/ imwp/ idms/ popups/ pop_ download. asp?
contentID= 10042 , pages 2–12, 27.11.2006.
Sch07.
Dr. Frank Schlichting. Präzise abstandsbestimmung und lokalisierung mittels
laufzeitmessungen (rtof) durch einsatz der 2,4 ghz chirp spreiztechnologie
(css). Wireless Technologies Kongress, pages 17–26, 28.09.2007.
Ubi.
Ubisense. Ubisense germane site. http: // www. ubisense. de , (17.02.2010).
vara.
various.
802.15.
http: // en. wikipedia. org/ wiki/ IEEE_ 802. 15 ,
(16.02.2010).
varb.
various. 802.15.4. http: // de. wikipedia. org/ wiki/ IEEE_ 802. 15. 4 ,
(16.02.2010).
varc.
various. 802.15.4. http: // en. wikipedia. org/ wiki/ IEEE_ 802. 15. 4 ,
(16.02.2010).
vard.
various. 802.15.4a. http: // en. wikipedia. org/ wiki/ IEEE_ 802. 15. 4a ,
(16.02.2010).
vare.
various. 802.15.4a. http: // de. wikipedia. org/ wiki/ Ultrawideband ,
(17.02.2010).
varf.
various. 802.15.4a. http: // de. wikipedia. org/ wiki/ Chirp_ Spread_
Spectrum , (16.02.2010).
varg.
various. Chirp spread spectrum. http: // en. wikipedia. org/ wiki/
Chirp_ Spread_ Spectrum , (16.02.2010).
varh.
various. Freescale semiconductor. http: // de. wikipedia. org/ wiki/
Freescale_ Semiconductor , (17.02.2010).
83
18
War07.
Wen06.
Ubiquitous Systems
Andy Ward. In-building location systems. http: // ieeexplore. ieee. org/
iel5/ 4446146/ 4449096/ 04449097. pdf? arnumber= 4449097 , pages 3–10,
06.12.2007.
Marco Wenzel. Uwb - ultrawideband-kommunikation. http: // zack1.
e-technik. tu-ilmenau. de/ ~ webkn/ Arbeiten/ Hauptseminarreferat/
UWB/ index. html , 11.10.2006.
84
A Survey on Brain-Computer-Interface
Rayan Merched El Masri
Chair for Pervasive Computing Systems
Abstract. Brain-Computer Interaction is relatively a new field of research. The aim of this interaction is replacing the motor activities for
controlling robots or devices, such as computers or phones, by interpreting the brain activities. A brain-computer-interface (BCI ) translates
the neural signals of the brain (EEG-signals) to digital output, through
which the user can handle a device or a robot.
In chapter 1 we will introduce brain computer interfaces and the components from which these systems consist of. In chapter 2, we will introduce and classify the brain activities and their corresponding signals.
Furthermore, two ways for signal acquisition will be presented. The first
uses electrodes and the second uses fiber optic based sensors to measure
EEG signals. In chapter 3, we will introduce the independent component
analysis and the beamforming, which are two methods for the signal extraction from the brain signals. Moreover, some application classes and
corresponding examples will be presented in chapter 4. In chapter 5 the
state of art in BCI and the challenges facing it will be mentioned.
1
Brain Computer Interface
Brain computer interface (BCI ), also called brain machine interface, is a new
method for communicating with- and controlling machines and computers. BCI
has witnessed as great interest over the last twenty years [1]. It was originally
destined for people with disabilities, who lost the control over their bodies but
still own the ability of cognitive thinking, to control computers and machines.
This could increase their life quality and help them to participate better in the
social life. Although BCI is mainly thought for disabled people, able-bodied
users could use it as well.
The main aim of BCI is to build a new output path or communication channel
between the brain and the environment instead of the normal pathways, which
are done by the muscles and the peripheral nerves, in order to provide communication and environmental control without movement [2]. The communication
with computers and computer controlled devices is achieved through modulating brain activities and measuring specific features of them, in order to translate
them into device control signals.
BCI systems consist of a set of sensors and signal processing components that
analyze and acquire brain activities. A standard BCI system is composed of four
components [3] as shown in Fig. 1. The signal acquisition module records the
1
85
brain activities with the electrodes, and amplifies them to enhance the signalto-noise ratio (SNR) and remove the contained artifacts caused by muscular
or cardiac activities. The resulting signal will then be digitized and subjected
to filters that detect and extract significant features which encode the user’s
commands and motor intentions. A translation algorithm converts the extracted
features into device commands, which are then sent to the external device that
performs the desired action.
Fig. 1. Components of a standard BCI system [4]
2
Brain signals and signal extraction
Brain computer interaction is based on extracting and analyzing brain activities. In the following chapter we will introduce the brain signals on which most
of the BCI applications are based. Furthermore, two ways of detecting brain
activities will be introduced. The first method uses electrodes for detecting electroencephalogram signals. The second method is by using a new technology,
which is based on fiber optics and electrolyte substances.
2.1
Brain signals
Researches in the early twenties of the last century have showed the electric
activities of the brain, with a potential that varies with the time. These variations
are called the brain signals. Brain signals are weak signals that could be measured
on the scalp, with an amplitude of about 100 µV and a frequency ranging between
0.5 Hz and 100 Hz depending on the activity of the brain. These signals are
mostly irregular, whereas some of their components are regular and are classified
into alpha, beta, theta, delta, gamma, and mu waves. Next we will introduce the
alpha, beta, theta and delta waves that are interesting for BCI applications.
2
86
Alpha waves. Alpha waves are rhythmic waves having a frequency between
8 and 13 Hz with an amplitude laying between 20 and 200 µV. Alpha waves
are mostly detected on the occipital region of the scalp (see Fig. 2) when an
individual is awake and is in a rest and quiet state. These waves disappear
completely when the individual is asleep. Visual perception or concentration
leads to the disappearance of alpha waves, which are replaced by asynchronous
waves of higher frequency and lower voltage.
Fig. 2. Diagram of the brain [5]
Beta waves. Beta waves have a frequency higher than that of alpha waves which
ranges between 14 and 30 Hz in the rest state of an individual. Nevertheless, they
may have a frequency of more than 50 Hz during an intense mental activity. Beta
waves are mostly detected on the parietal and frontal region of the scalp. Beta
waves are divided into two types: Beta-I and Beta-II. The first type Beta-I waves
have a frequency that is twice higher than that of alpha waves. Similar to the
alpha waves, Beta-I waves disappear during a mental activity and are replaced
by asynchronous low-voltage waves. Contrary to Beta-I waves, Beta-II waves
appear during intensive brain activities and tension situations.
Theta waves. Theta waves have a frequency between 4 and 7 Hz. They are
often recorded by children in the parietal and temporal regions of the scalp. They
arise by adults during emotional stress such as disappointment and frustration.
Delta waves. Delta waves arise in deep sleep, infancy or brain diseases. They
have a frequency of less than 3.5 Hz.
3
87
2.2
Electroencephalogram
Electroencephalogram (EEG) is an entire record of brain signals. EEG-signals
last for a long duration (30-500 ms). They are caused by a long-lasted depolarization of cell membranes, or due to the summation of several short responses.
2.3
EEG electrodes
In most clinical and BCI applications, electrodes are used to detect brain
signals. The EEG electrodes are placed on the scalp according to the “International Federation 10-20 System” (Fig. 3). The placement of the electrodes is
standardized by using landmarks. Table 1 lists the different representations of
EEG channels, also called montage. By using differential recordings, far-field activity common to two electrodes will be canceled and responses will be sharper
localized.
Fig. 3. The 10-20 system [6]
Montage
Bipolar
Referential
Average
Laplacian
Measures the difference between one electrode and:
An adjacent electrode
A reference electrode
Average of all other electrodes
Weighted Average of surrounding electrodes
Table 1. The different types of montage
For a comfortable use of BCI in wearable and pervasive applications, the electrodes have to be small and easy to fix on the scalp without the help of expert
operators. Moreover, after installing them, they have to remain on their places
4
88
(as shown in Fig. 3). The current state of art fulfills unfortunately none of these
requirements. Fixing the electrodes on the scalp is time-consuming and often
not easy for untrained persons. The area of fixation should be first cleaned with
alcohol and after that, a conducting pasta should be applied on the scalp. The
electrodes have to be glued with special glue named collodion and hold on in
their place by using rubber straps or rubber caps containing all electrodes.
2.4
Wearable brain cap
The wearable brain cap described in [3] is an alternative solution for EEGelectrodes. It requires no electrical contact with the scalp, and thus neither electrolyte gel nor attachments or straps are needed. The cap consists of a polymeric
layer containing sensors, such as the cheap and easy to be modified electroactive
hydrogel, polyacrylamide hydrogel (PAAM ). The layer may contain actuators
and other elements as well, and is covered by synthetic layers (Fig. 4). As in
EEG, the brain activity is measured as standard potentials by placing a reference electrode on the earlobe or on the neck back, which are mostly not affected
by electrical activities. This method is called “one-reference-methodology” (Fig.
5). The sensors are fiber optic based with PAAM at their end. A light source generates optical signals which will be modulated by the PAAM. The modulated
signals will be then analyzed. When PAAM submitted to an electromagnetic
field due to brain signals, the mass and volume properties will change leading to
a change in the amount of light transmitted back to the photo detector allowing
the calculation of the signal’s potential (Fig. 6). This procedure shows similar
results to those using EEG-electrodes-based methods which make it a potential
solution towards an easy and flexible brain-signal extraction method for BCI
applications.
Fig. 4. 3-layer structure of the wearable brain cap [3]
5
89
Fig. 5. One-reference approach [3]
Fig. 6. PAAM fiber-based sensor [3]
3
Methods for signal extraction
As mentioned in section 2.1, brain signals have different types of waves which
arise at distinct regions of the brain. Performing different tasks result in different
EEG signals. Localization and detection of these signals is the primary steps
before information extraction and classification necessary for BCI applications.
EEG signals are the superposition of several signals. Beside the brain signals
(the useful signals), EEG signals contain noise signals. Possible noise sources are,
for instance, the electrical activities produced by skeletal muscles (Electromyography - EMG), potential of the retina (Electrooculography - EOG), the electrical
activities of the heart (Electrocardiography - ECG) and other environmental
noises [7]. Signals recorded by electrodes (see section 2.2) are the summation of
the independent useful and noisy signals at different locations of the scalp.
BCI applications use various methods for recovering biosignals. Due to their
frequent use, we will introduce the independent component analysis and the
Beamforming.
6
90
3.1
Independent Component Analysis
The first use of the independent component analysis (ICA) to separate a set
signals from a mixed signal (known as blind source separation BSS ) goes back
to the early 80’s of the last century [8]. Since then, a number of approaches were
developed to optimize the ICA.
Before describing ICA, at first we have to mention some essential terms. The
random variables y1 ,y2 ,. . . , ym are said to be statistically independent, when
f (y1 , y2 , . . . , ym ) = f1 (y1 ) · f2 (y2 ) . . . fm (ym )
(1)
where f (y1 , y2 , . . . , ym ) is the joint density and fi (yi ) is the marginal density of
yi [9].
In the following definitions we refer to x = (x1 , . . . , xm )T as a m-dimensional
random vector, A a m×n mixing matrix, and s = (s1 , . . . , sn )T the n-dimensional
source vector whose components si are independent.
The general definition of ICA is as follows:
ICA of the random vector x consists of finding a linear transformation s = W x so that the components si are as independent as possible,
in the sense of maximizing some function F (s1 , . . . , sm ) that measures
independence [9].
An estimation-theoretically oriented definition of ICA is
x=A·s+v
(2)
where v is a m-dimensional random noise matrix. This definition is not simple
due to including the noise vector v. Most researches use a noise-free definition
to simplify the calculation. The noise-free definition is
x=A·s
(3)
For computing the mixing matrix A and the source vector s form equation
(3) in ICA, three restrictions [9] should be fulfilled:
1. All independent components of s (s1 , . . . , sn ) must be non Gaussian.
2. m ≥ n, where m is the number of linear mixture and n is the number of
independent source components.
3. A must have full rank (i.e. completely linear independent).
In noisy ICA, the second restriction is not fulfilled and m < n due to the
assumption that the components of the noise vector v are additional independent
components.
7
91
Prewhitening-based ICA. ICA calculations for estimating the mixing matrix A and the source vector s are often solved by two-stage algorithm called
prewhitening-based ICA which is described in [10]. As the name indicates, the
first stage is the whitening stage and the second is the higher-order stage (HOS ).
First, given T samples {xt }1≤t≤T of x = A · s + v (see eq. (2)), and let Mx =
[x1 , x2 , . . . , xT ].
Prewhitening step : The aim of this step is to transform the vector x into a
vector z having unit covariance. This transformation requires the multiplication
of the vector x with the inverse of the square root of its covariance matrix Cx . As
mentioned previously, a noise free ICA is assumed to simplify the estimations.
For noise free ICA the covariance matrix of x
Cx = A · Cs · AT
(4)
with Cs covariance matrix of s.
By assuming that the source signals have unit variances
Cx = A · AT
(5)
Calculating the eigenvector decomposition (EDV ) of the covariance of the
singular value decomposition (SVD) of the mixing matrix A
A=U ·S·VT
(6)
allows the estimation of U and S by substituting eq. (6) in eq. (5). V stays
unknown. Cx can then be written as
Cx = U · S 2 · U T = (U S) · (U S)T
(7)
Cx = M̃x · M˜xT .
(8)
M̃x = U · S · V˜T
(9)
and is estimated as
√
The m×T -matrix M̃x consists of T realizations of x, divided by T − 1 after
subtracting the sample mean. For the further estimation of z, the SVD form of
M̃x
will be used for computing U and S. z can then be defined as
z = S −1 · U T · x
(10)
8
92
Higher-order stage ( HOS) So far we computed z, the estimation of the source
vector s. In the HOS we will compute the matrices V and A. The higher order
cumulants of z are given by
ζz(N ) = ζs(N ) ×1 V T ×2 V T · · · ×N V T
(11)
which is related to the N th-order output cumulant by multilinearity property
ζz(N ) = ζs(N ) ×1 (U S)−1 ×2 (U S)−1 · · · ×N (U S)−1
(12)
(N )
The annotation ×i denotes the i-linear mapping of ζs with V T . The vector
V can be estimated by applying algorithms such FastICA, INFOMAX, or other
implemented approaches. By substituting U , S, and V in
Ã = U · S · V˜T
(13)
we get an approximation for the mixing matrix A.
Many optimizations for ICA are implemented. In [8], several approaches are
tested and compared among each other concerning complexity and performance.
The authors concluded that SOBI, COM2, JADE, and ICAR provide less number of computations than FastICA and INFOMAX. Concerning performance,
COM2, JADE, and FastICA show the best results. Nevertheless, INFOMAX
and FastICA are the most used implementations for ICA.
ICA in BCI applications. Figure 7 represents the usage of ICA for separating EEG signals for clinical and BCI applications. The dipole sources in the
human brain generate brain waves. The EEG sensors placed on the scalp detect
the mixture of the brain waves and the noise sources (not shown in the figure).
The detected signals represent the vector x in eq. (3). ICA approximates the
mixing matrix A and delivers the vector s, that represents the individual independent dipole sources as an output. The computed source vector can be used
for classification and pattern recognition in BCI applications.
3.2
Beamforming
EEG signals are mostly noisy due to the environmental noises. These noises lead
to inadequate classifications with ICA. In the following section we will introduce
a method called “beamforming” that shows better classification results even on
noisy EEG signals. The beamforming method extracts signal sources from a particular region of the brain. This procedure is reasonable in BCI, because signals
and origins can be correlated to recognize the contexts. For the different patterns, different regions of interest (ROI ) could be chosen to construct filters that
weaken EEG signals providing useless information. But discarding EEG sources
outside the ROI (i.e. distorting components) is impossible due to the inverse
problem of EEG. The sources outside the ROI should be therefore weakened
9
93
Fig. 7. Using ICA for EEG signal separation [11]
as much as possible. For that reason, the ratio of the variance of EEG signals
arising inside and outside the ROI should be maximized. As next, we will show
the steps for computing a filter as described in [12].
The filtered EEG signal y(t) ∈ R is
y(t) = w∗T · x(t)
(14)
with x(t) ∈ RM a one data sample at time t recorded at the M electrodes placed
on the scalp. w ∈ RM is the spatial filter and w∗ is defined as follows:
w∗ = argmaxw∈RM {
wT · RROI · w
}
wT · ROU T · w
(15)
with RROI and ROU T the covariance matrices of the EEG components measured
at the electrodes inside and outside the ROI respectively. The solutions of the
eigenvector w∗ are given from the eigenvectors of the eigenvalue of
RROI · w = λ · ROU T · w
(16)
The beamformer could be constructed by combining the eigenvector w∗ with the
eigenvalue λ∗ with
w∗T · RROI · w∗
(17)
λ∗ = ∗T
w · ROU T · w∗
10
94
RROI and ROU T must be approximated. A computing from the measured data
is impossible. x(t), the recorded EEG signals at the electrodes is
Z
L(r, r′ ) · P (r′ , t)dV (r′ )
(18)
x(t) =
V
with
V the volume of the brain,
P : R3 × R 7→ R3 the source strength at position r′ ,
t is the time in x-,y-,z-direction,
r ∈ R3M is the vector describing the x-,y-,z- position of the M electrodes placed
on the scalp,
L : R3 × R3 7→ RM ×3 describes the projection strength of a source with a source
strength in x-,y-,z-direction at position r′ to the measured electric potential at
the electrodes locations r.
x(t) can be written as
x(t) = xROI (t) + xOU T (t)
x(t) =
Z
ROI
L(r, r′ ) · P (r′ , t)dV (r′ ) +
Z
OU T
L(r, r′ ) · P (r′ , t)dV (r′ )
(19)
(20)
By substituting the covariance matrix Rx of the EEG signals, which could be
estimated from the EEG recordings, with
Rx = RROI + ROU T
(21)
RROI · w = λ̃ · Rx · w
(22)
into eq. (16) we get
with λ̃ = λ/(1 + λ). RROI should be determined by approximating xROI (t)
as
xROI (t) = α
J
X
j=1
L(r, rj′ ) · P (rj′ , t)
(23)
with rj′ the location of an equally spaced grid with J points within the ROI
(j = 1, · · · , J) and a constant α.
A n approximation for xROI (t) is
xROI (t) = α · L · p(t)
(24)
with L ∈ RM ×3J the projection in the x-,y-,z-direction of the source at J grid
points to the M electrodes. p(t) ∈ R3J is the source strength of the J sources.
11
95
The covariance matrix of xROI is
RROI (t) = α2 · L · Rp · LT
(25)
with Rp the source covariance matrix of the source within the ROI.
The filter is the eigenvector with the largest eigenvalue of
L · Rp · LT · w = λ̂ · Rx · w
(26)
which is the result of the substitution of eq. (25) in eq. (22) with λ̂ = λ̃/α2 and
L is the projection of the sources within the ROI to the EEG electrodes.
4
Applications
In the following chapter we will introduce a number of applications designed by
using BCI as a human-machine-interface (HMI ). Depending on the target that
the system controls, we have classified the applications in five different classes:
Robots and wheelchairs, PCs and phones, home automation, authentication, and
entertainment.
But before illustrating the different classes of applications, it is reasonable to
introduce the P300 procedure for intention recognition. Most of the introduced
applications use this procedure due to its simplicity. Campbell defines P300 in
the sixth edition of the “Psychiatric Dictionary” [13] as:
A late-appearing component of the event-related potential. P300 stands
for a positive deflection in the event-related voltage potential at 300
millisecond poststimulus. Its amplitude increases with unpredictable, unlikely, or highly significant stimuli and thereby constitutes an index of
mental activity.
4.1
Robots and Wheelchairs
This class contains applications that control devices which can be moved from
one place to another by using BCI. Robots and wheelchairs are examples on such
devices. The user should be able to control the robot or his electrical wheelchair
precisely in order to avoid collisions with other objects in his environment. This
means that the input commands should be precise and the system should response quickly with a very low error rate.
Service robot system. Next, we are going to present a service robot system
based on BCI as described in [14]. This system enables the user to control a
service robot. The structure of the system is shown in Fig. 8. The electrodes
measure the EEG signals on the scalp. The processing module extracts the
features from these signals and transforms them into control signals. The control
interface translates in its turn the signals into device commands, and sends them
to the robot that then performs the desired action.
12
96
Fig. 8. Structure of the service robot system [14]
The robot can move in the four different directions and it can stop as well. The
control panel (state selector in Fig. 8) is shown in Fig. 9. The board contains five
LEDs. Four of them represent the directions and the middle one represents the
activation state (motion or stop). The four LEDs blink in sequence. When the
desired direction flashes, the user closes his/her eyes, which leads to an increase
of the amplitude of the α−waves. This increase is interpreted as a command
to start the movement. In the same way the action is stopped. The user can
generate a new command about every 4 seconds with an accuracy of execution
of about 91%.
Fig. 9. State selector [14]
Wheelchair. An application to control electrical wheelchair by using BCI is
shown in [15]. The user has the ability to move his wheelchair inside a building
from one room to another. A cap provided with electrodes measures the EEG
13
97
signals and sends them to a laptop. On the display of the laptop, the possible
actions that could be performed are flashed in sequence. When the user sees his
desired actions highlighted, the P300 peak arises. The system recognizes this
peak and performs the action displayed before 300 milliseconds.
Fig. 10 shows a simplified structure of the graphical user interface (GUI ) displayed on the monitor. When no motion is performed, the user has the choice
between several possible actions (left side of the figure). But when the wheelchair
is in motion, there is only one possibility which is “stop”. This enables a quick
response on a stop command which helps to avoid collision with other objects.
Fig. 10. GUI for controlling the wheelchair [15]
For selecting an action out of nine, it takes about 8 seconds with 10% error rate.
The error rate could be decreased to 2.5%, but the response time will increase
to 20 seconds. For selecting the stop action, it takes the user about 6 seconds in
65% of the trail but not more than 13 seconds.
Conclusion. We conclude from both applications that the error rate is pretty
high (about 10%) for such applications. This high error rate could be critical for
the safety of the user and device. In addition to this, the time needed to perform
an action increases with the increase of the number of possible actions from 4
seconds for 4 possible actions into 8 seconds for selecting an action out of 9. This
could be another factor that reduces the safety of the system.
4.2
PCs and Phones
BCI could be used as an alternative input device to a PC or a smart phone. In
this section, we will introduce applications that take advantage of BCI to move
a cursor on the screen or to choose a contact from the phone book and to call
him.
14
98
Input device for PCs. BCI devices could replace the conventional input
devices to a PC. [16] shows how a user could move a cursor (represented as a
small car in Fig. 11) on a screen. The cursor could be moved in four directions:
up, down, left and right.
Fig. 11. Moving a cursor on a screen [16]
The EEG signals are acquired by using electrodes and are applied to BSS
algorithm to extract the features from them. A fuzzy neural network classifier
extracts the user’s intentions by detecting steady-state visual evoked potentials
(SSVEP ) in the EEG signals.
The checkerboards surrounding the car (in Fig. 11) blink at different but fix
frequencies. In the low frequency (LF ) range the boards blink at 5, 6, 7 and 8 Hz
for up, left, down and right respectively. In the medium frequency (MF ) range
they blink at 12, 13.3, 15 and 17 Hz for up, left, down and right respectively.
When the user focuses his attention on one of these checkerboards, a SSVEP
could be observed in the EEG signals. The SSVEP differs depending on the frequencies at which the checkerboard blinks. This differentiation of the frequency
allows the detection of the desired direction, but the classifier should be trained
at first. The user has to perform a training session, in which he concentrates for
6 seconds on each of the checkerboards. Further 6 seconds are needed to measure
the none SSVEP response, which requires removing all checkerboards.
In the LF range a success rate up to 93% is achievable with an execution delay
of 3.7 ± 1 seconds and a bit rate of 26 bits/min. Better results could be achieved
in the MF range. The success rate is 96% with an execution delay of 3.5 ± 0.8
seconds and bit rate of 30 bits/min.
Applications for phones. BCI facilitates a hand-free and silent human-phone
interaction as described in [17]. The NeuroPhone consists of a wireless EEG
headset (costs 200 - 500$), an iPhone that flashes the contancts’ pictures on its
15
99
display, and a laptop on which the feature extraction and classifications occur
on it. The system works in two modes: the think mode and the wink mode. In
the think mode, the user concentrates on the picture of the desired contact when
it is shown on the display. This leads to a P300 peak in the EEG signal. The
NeuroPhone recognizes the peak and dials the contact. In the wink mode, the
user winks when he sees the picture of the contact he wants to call. The wink
causes an EMG signal (see chapter 3) which can be easily detected in the EEG
signal and the NeuroPhone calls the contact. The complexity is lower than that
in the think mode because only the two channels from the electrodes placed
above the eyes are required (see Fig. 3).
As mentioned before, the wink mode detects EMG signals caused by the eye;
nevertheless the EMG signal could be also caused by other muscular activities
that may lead to a misinterpretation of these signals.
The NeuroPhone was tested under two conditions: sitting and walking. In the
sitting modus it has showed an accuracy of about 95%. 99% of the winks were
classified as winks and 92% of the actions that were classified as winks were
really winks. In the walking modus the accuracy decreased to 92%, where 96%
of the winks were classified as winks and 86% of the actions that were classified
as winks were really winks. The noise caused by other EMG components is
noticeable, nevertheless the results are acceptable.
In comparison to the wink modus, the think modus is more sensible against
interference, for example in the sitting modus, the accuracy was about 78% if
the user was only concentrating on the application. This accuracy decreases up
to 44% if the user was hearing music. An accuracy of only 33% was achieved if
the user was walking because SNR decreases drastically.
Conclusion. BCIs are suitable as input devices for PCs, and they require
no hardware extensions or modifications. Nevertheless, an external processing
unit (PC or laptop) is required for applications running on phones due to the
complexity of the signal acquisition and classification algorithms. This ought
not be a problem in the future when smart phones are equipped with high
performance processors.
4.3
Home automation
Smart homes could be controlled by means of brain signals. [18] and [19] present
a BCI system designed for home automation reasons. The system acquires EEG
signals, and detects P300 peaks in order to recognize the user’s intentions. Several tasks could be controlled, where a control mask is designed for each of these
tasks that is similar to the music mask shown in Fig. 12. TV, temperature, light
and phone control are examples of these other tasks. The system enables the
navigation as well as the movement inside the house.
16
100
Fig. 12. Music control mask [19]
The user has to concentrate on the desired tasks represented as a symbol in the
control mask. These symbols will be highlighted in a random order with an equal
probability. When the expected symbol, for instance TV, is highlighted then the
brain raises a P300 peak and the system recognizes the user’s intention. The
classifier should be trained at first to detect these peaks.
The tests show that the users were able to select a symbol (out of 25) from the
light mask within 33 seconds with an accuracy of 100%. On the other hand, it
took those users about 67 seconds to select a symbol out of 50 from the music
mask with an accuracy of 89%. The results are shown in table 2.
Mask
Number of symbols Time per character (sec) Accuracy in %
Light
25
33.750
100
Phone
30
40.500
100
Temperature
38
51.300
100
TV
40
54.000
83.3
Music
50
67.500
89.6
Table 2. Experimental results for the different masks [18]
The time needed to perform a task increases linearly (1.200 seconds × number
of symbols) with the number of the possible tasks, which leads to a long delay
while performing actions and may frustrate the user. Highlighting the symbols
with different probabilities, in such a way that the more the user uses a symbol
the often it is highlighted, decreases the delay of performing specific tasks, but
causes a longer delay for rarely performed tasks.
17
101
4.4
Authentication
The fourth class of application we would like to introduce is the authentication
class. In this class, the security plays a big role. We will introduce two examples
that use BCI in order to log into a secured PC or account.
Gaze directed interaction. The gaze directed interaction is described in [20].
This system consists of a cap equipped with electrodes to record the EEG signals,
an EEG amplifier, a wearable computer and sensors for indoor localization. In
addition to this, SSVEP is used to classify the signals. Logging into a remote
computer is possible by analyzing the user’s brain response into a visual stimulus.
The user has to determine and identify an object at a particular location. The
different objects blink at different frequencies, which makes them identifiable in
the EEG signals. The responses to these stimuli are biometric and unique for
the individual. The connection between the wearable computer and the remote
computer is built by gazing on the screen of the desired computer, which displays
a unique screen saver. The localization sensors deliver the location of the user
in the room. A frequency-position combined code is transmitted to a server that
translates the code to a unique ID through which it enables the logging into the
computer.
The experimental tests show that the users were able to log into the computer
within a time interval ranging between 15 seconds and three minutes with a median of 27.5 seconds and a standard deviation of 51.2 seconds. The true positive
(TP ) logins were between 9 and 20 logins out of 20 possible TPs with an average
of about 81% successful logins. The users failed 8 times in average to login (false
negative) and they were able to log into the computer up to nine times with an
average of 4 times per user even though a login was not allowed (false positive).
Pass-thought. The Pass-thought, described in [21], is a P300 -based authentication mechanism. Similar to other P300 -based methods, it triggers the P300
peaks in the EEG signal in order to identify the characters of the password.
With the help of a screen with Latin letters and Arabic numbers (Fig. 13), the
user could choose the characters of his password in order. These characters are
highlighted at random with an equal probability. When the right character is
highlighted, a P300 peak arises. If the peaks match with the saved pass-thought,
the login could successfully occur. This method makes the authentication safer
against shoulder-surfing. BCI systems using P300 have an accuracy of 90% and
a transfer rate of about 5 characters per minute, which means that the login
with an 8-character password could last up to 100 seconds.
Conclusion. BCI could increase the security of logging into computers due
to its resistance to non allowed logins or shoulder-surfing; nevertheless some
methods do not ensure a full security and hence allowing false logins. This is
unacceptable in some area, not even in low probabilities.
18
102
Fig. 13. Screen with Latin letters and Arabic numbers [21]
4.5
Entertainment
The last class we would like to introduce is the entertainment and gaming class.
The RaviDuel [22] is an example of how BCI could be used as a game controller.
The RaviDuel is a simple multi-player game (Fig. 14). The players have to hit
the animated creatures with a hot ball, by which they could control its direction
and speed by creating EEG commands. The player loses whenever he has been
hit by at least one of the creatures moving towards him.
Fig. 14. RaviDuel [22]
19
103
As it has been mentioned before, EEG headsets cost between 200$ and 500$.
The games have to offer a minimal level of complexity, fun and interaction speed
in order to encourage the players to buy the headsets. Finally we would like
to say that gaming stays an area where BCI may flourish and reach too many
users.
5
Conclusion
BCI systems are not widely spread yet. Most designed systems and applications
are still for the research goals. Several challenges face BCI and hinder it from
achieving much of it goals [23].
Most applications require the extension of the existing systems. Beside caps
or headsets to acquire the EEG signals, most systems require processing units,
such as PC or laptops, in order to perform the complex feature extraction and
the classification needed. This leads to limitations of the users mobility and to
the reduction of its ubiquitous use.
Moreover, users do not have the complete control over the systems they are
using. In most cases a trained person has to fix the electrodes on the right
positions in order to enable correct signal acquisition and correct classification
of the intentions. Moreover the systems can not be turned on and off by the user,
thats why all brain activities are interpreted as input.
The complex situations that the user could face in the real life (outside the
labs), such as emotional stress, interaction with people, walking or hearing music
or other sounds in the environment, lead to high error rates (see section 4.2 NeuroPhone) which is not acceptable in cases where the safety is vulnerable
(sections 4.1 and 4.4). For that reason, methods for preventing and detecting
errors are considered important and urgent.
Another factor hindering a natural BCI interaction is the low information
transfer rate. Transfer rates of 5 to 10 characters per minute (depending on the
word’s length and accuracy) and long response times might cause the frustration
of the user, or could even be dangerous in cases where the task to be performed
is time sensitive such as stopping a robot or wheelchair.
Despite the challenges facing BCI, regarding technical restrictions of the methods and devices used in most applications, BCI has a big potential of success
in pervasive application and as an alternative HCI input device. Due to the
rapid development of processing units which allow high processing power even
in small controllers, performing calculations of the complex algorithms for signal
or feature extraction at high speed and accuracy will not be a hinder anymore.
And according to Moore’s law, the costs will fall continuously which make BCI devices affordable for consumers (users) and profitable for producers.
20
104
Beside the examples mentioned in chapter 4 and the many other similar applications implemented, BCI will find usage in other fields too, such as in pervasive
health care or as an alternative pathway to speak, for those who lost the ability
to do that.
References
1. Hamadicharef, B.; , “Brain-Computer Interface (BCI) literature - a bibliometric
study,” Information Sciences Signal Processing and their Applications (ISSPA),
2010 10th International Conference on , vol., no., pp.626-629, 10-13 May 2010
2. Leuthardt EC, Schalk G, Roland J, Rouse A, Moran DW.; “Evolution of braincomputer interfaces: going beyond classic motor physiology ” Neurosurg Focus.
July 2009
3. Fernandes, M.; Dias, N.S.; Nunes, J.S.; El Tahchi, M.; Lanceros-Mendez, S.; Correia, J.H.; Mendes, P.M.; , “Wearable brain cap with contactless electroencephalogram measurement for brain-computer interface applications,” Neural Engineering,
2009. NER ’09. 4th International IEEE/EMBS Conference on , vol., no., pp.387390, April 29 2009-May 2 2009
4. www.fidis.net
5. www.sciencecases.org
6. www.neurologie.onlinehome.de
7. Xiaopei Wu; Xiaojing Guo; , “Mental EEG analysis based on independent component analysis,” Image and Signal Processing and Analysis, 2003. ISPA 2003.
Proceedings of the 3rd International Symposium on , vol.1, no., pp. 327- 331 Vol.1,
18-20 Sept. 2003
8. Kachenoura, A.; Albera, L.; Senhadji, L.; Comon, P.; , “Ica: a potential tool for
bci systems,” Signal Processing Magazine, IEEE , vol.25, no.1, pp.57-68, 2008
9. Aapo Hyvrinen; “Survey on Independent Component Analysis”, Neural Computing
Surveys 2, 94-128, 1999
10. Lieven De Lathauwer, Bart De Moorm, Joos Vandewalle; “An introduction to independent component analysis”, Journal of Chemometrics 2000; vol. 14, pp. 123149
11. Fachgebiet Elektronik und medizinische Signalverarbeitung der TU Berlin
[www.emsp.tu-berlin.de]
12. Grosse-Wentrup, M.; Liefhold, C.; Gramann, K.; Buss, M.; , “Beamforming in Noninvasive BrainComputer Interfaces,” Biomedical Engineering, IEEE Transactions
on , vol.56, no.4, pp.1209-1219, April 2009
13. Leland Earl Hinsie, Robert Jean Campbell, “Psychiatric dictionary”, sixth edition,1989, Oxford University Press
14. Li Zhao; Chuo Li; Shigang Cui; , “Service Robot System Based on Brain-computer
Interface Technology,” Natural Computation, 2007. ICNC 2007. Third International Conference on , vol.2, no., pp.349-353, 24-27 Aug. 2007
15. Rebsamen, B.; Burdet, E.; Cuntai Guan; Chee Leong Teo; Qiang Zeng; Ang, M.;
Laugier, C.; , “Controlling a wheelchair using a BCI with low information transfer rate,” Rehabilitation Robotics, 2007. ICORR 2007. IEEE 10th International
Conference on , vol., no., pp.1003-1008, 13-15 June 2007
16. PabloMartinez, Hovagim Bakardjian, Andrzej Cichocki; “Fully Online Multicommand Brain-Computer Interface with Visual Neurofeedback Using SSVEP
Paradigm”, Computational Intelligence and Neuroscience Volume 2007
21
105
17. Andrew T. Campbell, Tanzeem Choudhury, Shaohan Hu, Hong Lu, Matthew K.
Mukerjee., Mashfiqui Rabbi, and Rajeev D. S. Raizada; “NeuroPhone: BrainMobile Phone Interface using a Wireless EEG Headset”, MobiHeld 2010, August
30, 2010, New Delhi, India
18. Edlinger, G.; Holzner, C.; Guger, C.; Groenegress, C.; Slater, M.; , “Brain-computer
interfaces for goal orientated control of a virtual smart home environment,” Neural
Engineering, 2009. NER ’09. 4th International IEEE/EMBS Conference on , vol.,
no., pp.463-465, April 29 2009-May 2 2009
19. Holzner, C.; Guger, C.; Edlinger, G.; Gronegress, C.; Slater, M.; , “Virtual Smart
Home Controlled by Thoughts,” Enabling Technologies: Infrastructures for Collaborative Enterprises, 2009. WETICE ’09. 18th IEEE International Workshops on ,
vol., no., pp.236-239, June 29 2009-July 1 2009
20. Dieter Schmalstieg, Alexander Bornik, Gernot Mueller-Putz, Gert Pfurtscheller,
“Gaze-Directed Ubiquitous Interaction Using a Brain-Computer Interface”, Augmented Human Conference, April 2010
21. Julie Thorpe, P.C. van Oorschot, Anil Somayaji, “Pass-thoughts: Authenticating
with Our Minds”, New Security Paradigms Workshop 2005, September 2005
22. Unjoo Lee; Seung Hoon Han; Han Sup Kim; Young Bum Kim; Hyun Gi Jung;
Hyun-joo Lee; Yiran Lang; Daehwan Kim; Meiying Jin; Jungwha Song; Sungho
Song; Chang Geun Song; Hyung-Cheul Shin; , “Development of a Neuron Based
Internet Game Driven by a Brain-Computer Interface System,” Hybrid Information
Technology, 2006. ICHIT ’06. International Conference on , vol.2, no., pp.600-604,
9-11 Nov. 2006
23. Moore, M.M.; , “Real-world applications for brain-computer interface technology,”
Neural Systems and Rehabilitation Engineering, IEEE Transactions on , vol.11,
no.2, pp.162-165, June 2003
22
106

Mobile und verteilte Systeme - Ubicomp - Teil VII (WS1011)

Transcription

Similar documents

8866 Ziegelbrücke • www.garten

nimms mit einem lachen - gallissas theaterverlag und mediaagentur

21-12-2012 1.78MB 2014-09

wie lernen Unternehmer - e-TC

DVDs und Filme auf HD

Lesen Sie mehr

Endbericht der Projektgruppe - Lehrstuhl 11 Algorithm Engineering

76437 Rastatt Schöne Reise 2015 - Beck

Sicherer Remote-Zugriff auf 3D-Daten ohne Performance

Computerspiele im moralischen Urteil ihrer Nutzer

Vorschau - Handballpost

Volltext - sic

Zaubern als therapeutisches Mittel

Universität für Bodenkultur Wien University of Natural Resources

Schlussbericht _2008 04 29_