Proceedings

Transcription

Proceedings
Proceedings
of
Pervasive 2004 Workshop on
Memory and Sharing of Experiences
April 20, 2004, Vienna, Austria
http://www.ii.ist.i.kyoto-u.ac.jp/~sumi/pervasive04/
Proceedings
of
Pervasive 2004 Workshop on
Memory and Sharing of Experiences
April 20th, 2004
Vienna, Austria
http://www.ii.ist.i.kyoto-u.ac.jp/~sumi/pervasive04/
Organizers
Kenji Mase (Nagoya University / ATR)
Yasuyuki Sumi (Kyoto University / ATR)
Sidney Fels (University of British Columbia)
Program Committee
Kiyoharu Aizawa (The University of Tokyo)
Jeremy Cooperstock (McGill University)
Richard DeVaul (Massachusetts Institute of Technology)
Jim Gemmell (Microsoft)
Yasuyuki Kono (Nara Institute of Science and Technology )
Bernt Schiele (ETH Zurich)
Thad Starner (Georgia Institute of Technology)
Terry Winograd (Stanford University)
Supported by
ATR Media Information Science Laboratories
2-2-2 Hikaridai, Keihanna Science City, Kyoto 619-0288 JAPAN
Phone: +81-774-95-1401
Fax : +81-774-95-1408
http://www.mis.atr.jp/
ISBN: 4-902401-01-0
Preface
Welcome to the Pervasive 2004 workshop on Memory and Sharing of Experience (MSE2004).
The purpose of the workshop is to provide an opportunity to exchange research results
and to foster ideas in the emerging field of ubiquitous experience recording technologies
with the goal of effective experience sharing.
Pervasive computing environments provide essential infrastructure to record
experiences of people working and playing in the real world. Underlying the infrastructure
for recording experiences into extensive logs are ubiquitous sensor networks and effective
tagging systems.
This recorded experience becomes a life-memory, and, as a
communication medium, is sharable with others to enhance our sense of community.
Memory and sharing of experience is emerging as an important application of the pervasive
computing era. Moreover, the research emphasis from Memory and Sharing of Experience
encompasses many exciting research areas such as multimedia memory aids, reference for
context recognition, life-pattern modeling, and storytelling of life both in science and
technology.
The workshop addresses the following topics: method and devices to capture the
experience; storage and database of experience for recollection; experience and interaction
corpora; experience log applications; privacy issues and other related areas.
The workshop consists of 17 interesting presentations selected for presentation by peer
review from submitted articles. The selection was done by the Program Committee to meet
the limited time and space allotted for the one-day workshop associated to the Pervasive
2004. Unfortunately, many interesting works in the unselected papers could not fit into the
program. Each submission was reviewed by two or more PC members. We very much
appreciate and thank all the participants who put their time and effort to submit papers and
the PC members for their reviews.
We would like to thank ATR Media Information Science Laboratories and NICT
(National Institute of Information and Communications Technology) for their support to
publish this workshop record in a printed form.
We look forward to the workshop providing a rich environment for academia and
industry to foster active collaboration in the development of ubiquitous media technologies
focused on Memory and Sharing of Experience.
MSE2004 Workshop Program Committee Co-chairs
Kenji Mase
Nagoya University / ATR Media Information Science Laboratories
Yasuyuki Sumi
Kyoto University / ATR Media Information Science Laboratories
Sidney Fels
University of British Columbia
iii
Pervasive 2004 Workshop on Memory and Sharing Experience
SCHEDULE
Tuesday, April 20, 2004
9:00-10:35
Session 1: Introduction & Capturing Experineces
Introduction to Memory and Sharing of Experiences
Kenji Mase, Yasuyuki Sumi, Sidney Fels
Collaborative Capturing and Interpretation of Interactions (L)
Yasuyuki Sumi, Sadanori Ito, Tetsuya Matsuguchi, Sidney Fels,
Kenji Mase
Context Annotation for a Live Life Recording (L)
Nicky Kern, Bernt Schiele, Holger Junker, Paul Lukowicz, Gerhard
Tröster, Albrecht Schmidt
Capture and Efficient Retrieval of Life Log (L)
Kiyoharu Aizawa, Tetsuro Hori, Shinya Kawasaki, Takayuki
Ishikawa
11:05-12:25
Session 2: Recollecting Personal Memory
Exploring Graspable Cues for Everyday Recollecting
Elise van den Hoven
Remembrance Home: Storage for Re-discovering One's Life (L)
Yasuyuki Kono, Kaoru Misaki
An Object-centric Storytelling Framework Using Ubiquitous Sensor
Technology (L)
Norman Lin, Kenji Mase, Yasuyuki Sumi
v
Storing and Replaying Experiences in Mixed Environments using
Hypermedia
Nuno Correia, Luis Alves, Jorge Santiago, Luis Romero
Storing, Indexing and Retrieving My Autobiography
Alberto Frigo
14:00-15:35
Session 3: Utilizing Experiences
Sharing Experience and Knowledge with Wearable Computers
Marcus Nilsson, Mikael Drugge, Peter Parnes (L)
Sharing Multimedia and Context Information between Mobile
Terminals
Jani Mäntyjärvi, Heikki Keränen, Tapani Rantakokko
Using an Extended Episodic Memory Within a Mobile Companion
(L)
Alexander Kröner, Stephan Baldes, Anthony Jameson, Mathias
Bauer
u-Photo: A Design and Implementation of a Snapshot Based Method
for Capturing Contextual Information (L)
Takeshi Iwamoto, Genta Suzuki, Shun Aoki, Naohiko Kohtake,
Kazunori Takashio, Hideyuki Tokuda
The Re: living Map - an Effective Experience with GPS Tracking and
Photographs
Yoshimasa Niwa, Takafumi Iwai, Yuichiro Haraguchi, Masa
Inakage
16:05-17:30
Session 4: Fundamental and Social Issues & Discussion
Relational Analysis among Experiences and Real World Objects in
the Ubiquitous Memories Environment
Tatsuyuki Kawamura, Takahiro Ueoka, Yasuyuki Kono, Masatsugu
vi
Kidode
A Framework for Personalizing Action History Viewer
Masaki Ito, Jin Nakazawa, Hideyuki Tokuda
Providing Privacy While Being Connected (L)
Natalia A. Romero, Panos Markopoulos
Capturing Conversational Participation in a Ubiquitous Sensor
Environment
Yasuhiro Katagiri, Mayumi Bono, Noriko Suzuki
* "L" denotes long presentation
vii
Pervasive 2004 Workshop on Memory and Sharing Experience
Table of Contents
Collaborative Capturing and Interpretation of Interactions ········································· 1
Yasuyuki Sumi, Sadanori Ito, Tetsuya Matsuguchi, Sidney Fels, Kenji Mase
Context Annotation for a Live Life Recording ···························································· 9
Nicky Kern, Bernt Schiele, Holger Junker, Paul Lukowicz, Gerhard Tröster,
Albrecht Schmidt
Capture and Efficient Retrieval of Life Log ····························································· 15
Kiyoharu Aizawa, Tetsuro Hori, Shinya Kawasaki, Takayuki Ishikawa
Exploring Graspable Cues for Everyday Recollecting ················································ 21
Elise van den Hoven
Remembrance Home: Storage for Re-discovering One's Life ······································· 25
Yasuyuki Kono, Kaoru Misaki
An Object-centric Storytelling Framework Using Ubiquitous Sensor Technology ·········· 31
Norman Lin, Kenji Mase, Yasuyuki Sumi
Storing and Replaying Experiences in Mixed Environments using Hypermedia ··········· 35
Nuno Correia, Luis Alves, Jorge Santiago, Luis Romero
Storing, Indexing and Retrieving My Autobiography ················································ 41
Alberto Frigo
Sharing Experience and Knowledge with Wearable Computers ·································· 47
Marcus Nilsson, Mikael Drugge, Peter Parnes
Sharing Multimedia and Context Information between Mobile Terminals ···················· 55
Jani Mäntyjärvi, Heikki Keränen, Tapani Rantakokko
ix
Using an Extended Episodic Memory Within a Mobile Companion ····························· 59
Alexander Kröner, Stephan Baldes, Anthony Jameson, Mathias Bauer
u-Photo: A Design and Implementation of a Snapshot Based Method for Capturing
Contextual Information ························································································ 67
Takeshi Iwamoto, Genta Suzuki, Shun Aoki, Naohiko Kohtake, Kazunori Takashio,
Hideyuki Tokuda
The Re: living Map - an Effective Experience with GPS Tracking and Photographs ······· 73
Yoshimasa Niwa, Takafumi Iwai, Yuichiro Haraguchi, Masa Inakage
Relational Analysis among Experiences and Real World Objects in the Ubiquitous Memories
Environment ······································································································ 79
Tatsuyuki Kawamura, Takahiro Ueoka, Yasuyuki Kono, Masatsugu Kidode
A Framework for Personalizing Action History Viewer ·············································· 87
Masaki Ito, Jin Nakazawa, Hideyuki Tokuda
Providing Privacy While Being Connected ······························································ 95
Natalia A. Romero, Panos Markopoulos
Capturing Conversational Participation in a Ubiquitous Sensor
Environment ···································································································· 101
Yasuhiro Katagiri, Mayumi Bono, Noriko Suzuki
x
Collaborative Capturing and Interpretation of Interactions
Yasuyuki Sumi†‡
Sadanori Ito‡
Tetsuya Matsuguchi‡ß
Sidney Fels¶
Kenji Mase§‡
†Graduate School of Infomatics, Kyoto University
‡ATR Media Information Science Laboratories
¶The University of British Columbia
§Information Technology Center, Nagoya University
ßPresently with University of California, San Francisco
[email protected], http://www.ii.ist.i.kyoto-u.ac.jp/˜sumi
ABSTRACT
This paper proposes a notion of interaction corpus, a
captured collection of human behaviors and interactions
among humans and artifacts. Digital multimedia and
ubiquitous sensor technologies create a venue to capture
and store interactions that are automatically annotated.
A very large-scale accumulated corpus provides an important infrastructure for a future digital society for
both humans and computers to understand verbal/nonverbal mechanisms of human interactions. The interaction corpus can also be used as a well-structured stored
experience, which is shared with other people for communication and creation of further experiences. Our
approach employs wearable and ubiquitous sensors, such
as video cameras, microphones, and tracking tags, to
capture all of the events from multiple viewpoints simultaneously. We demonstrate an application of generating
a video-based experience summary that is reconfigured
automatically from the interaction corpus.
KEYWORDS:
interaction corpus, experience capturing, ubiquitous sensors
INTRODUCTION
Weiser proposed a vision where computers pervade our
environment and hide themselves behind their tasks[1].
To achieve this vision, we need a new HCI (HumanComputer Interaction) paradigm based on embodied interactions beyond existing HCI frameworks based on
desktop metaphor and GUIs (Graphical User Interfaces).
A machine-readable dictionary of interaction protocols
among humans, artifacts, and environments is necessary
as an infrastructure for the new paradigm.
As a first step, this paper proposes to build an interaction corpus, a semi-structured set of a large amount of
interaction data collected by various sensors. We aim to
use this corpus as a medium to share past experiences
with others. Since the captured data is segmented into
primitive behaviors and annotated semantically, it is
easy to collect the action highlights, for example, to generate a reconstructed diary. The corpus can, of course,
also serve as an infrastructure for researchers to analyze
and model social protocols of human interactions.
Our approach for the interaction corpus is characterized by the integration of many sensors (video cameras
and microphones), ubiquitously set up around rooms
and outdoors, and wearable sensors (video camera, microphone, and physiological sensors) to monitor humans
as the subjects of interactions1 . More importantly, our
system incorporates ID tags with an infrared LED (LED
tags) and infrared signal tracking device (IR tracker)
in order to record positional context along with audio/video data. The IR tracker gives the position and
identity of any tag attached to an artifact or human in
its field of view. By wearing an IR tracker, a user’s
gaze can also be determined. This approach assumes
that gazing can be used as a good index for human
interactions[2]. We also employ autonomous physical
agents, like humanoid robots[3], as social actors to proactively collect human interaction patterns by intentionally approaching humans.
Use of the corpus allows us to relate the captured event
to interaction semantics among users by collaboratively
processing the data of users who jointly interact with
each other in a particular setting. This can be performed without time-consuming audio and image processing as long as the corpus is well prepared with finegrained annotations. Using the interpreted semantics,
we also provide an automated video summarization of
1 Throughout this paper, we use the term “ubiquitous” to describe sensors set up around the room and “wearable” to specify
sensors carried by the users.
individual users’ interactions to show the accessibility of
our interaction corpus. The resulting video summary itself is also an interaction medium for experience-sharing
communication.
CAPTURING INTERACTIONS BY MULTIPLE SENSORS
We developed a prototype a system for recording natural
interactions among multiple presenters and visitors in
an exhibition room. The prototype was installed and
tested in one of the exhibition rooms during our twoday research laboratories’ open house.
Wireless connection
Wearable sensors
Stationary sensors
IR tracker
Headset microphone
Portable
Capturing PC
m
Stationary
Capturing PC
n
Physiological sensors
:
:
:
:
Portable
Capturing PC
1
Stationary
Capturing PC
1
IR tracker
Stationary
camera
Stationary
microphone
IR tracker
Head-mounted
camera
Headset microphone
Physiological sensors
Raw AV
data
SQL
DB
Captured data
server
Tactile sensors
Stationary
camera
Stationary
microphone
Application
server
Omni-directional camera
Stereo cameras
There have been many works on smart environments for
supporting humans in a room by using video cameras set
around the room, e.g., the Smart rooms[4], Intelligent
room[5], AwareHome[6], Kidsroom[7], and EasyLiving[8].
The shared goal of these works was recognition of human
behavior using computer vision techniques and understanding of the human’s intention. On the other hand,
our interest is to capture not only an individual human’s
behavior but also interactions among multiple humans
(networking of their behaviors). We then focus on the
understanding and utilization of human interactions by
employing an infrared ID system to simply identify the
human’s existence.
Ethernet connection
IR tracker
Head-mounted
camera
RELATED WORKS
IR tracker
Humanoid
robot
Head-mounted
camera
Headset microphone
Ultrasonic sensors
Communication robot
Figure 1: Architecture of the system for capturing interactions.
Figure 1 illustrates the system architecture for collecting
interaction data. The system consists of sensor clients
ubiquitously set up around the room and wearable clients
to monitor humans as subjects of interactions. Each
client has a video camera, microphone, and IR tracker,
and sends the data to the central data server. Some
wearable clients have physiological sensors.
There also have been works on wearable systems for collecting personal daily activities by recording video data,
e.g., [9] and [10]. Their aim was to build an intelligent
recording system used by single users. We, however, aim
to build a system collaboratively used by multiple users
to capture their shared experiences and promote their
further creative collaborations. By using such a system,
our experiences can be recorded by multiple viewpoints
and individual viewpoints will become obvious.
This paper shows a system that automatically generates
video summaries for individual users as an application
of our interaction corpus. In relation to this system,
some systems to extract important scenes of a meeting
from its video data were proposed, e.g., [11]. These systems extract scenes according to changes in the physical
quantity of video data captured by fixed cameras. On
the other hand, our interest is not to detect the changes
of visual quantity but to segment human interactions
(perhaps derived by the humans’ intentions and interests), and then extract scene highlights from a meeting
naturally.
IMPLEMENTATION
Figure 2 is a snapshot of the exhibition room set up for
recording an interaction corpus. There were five booths
in the exhibition room. Each booth had two sets of
ubiquitous sensors that include video cameras with IR
trackers and microphones. LED tags were attached to
possible focal points for social interactions, such as on
posters and displays.
Principal data is video data sensed by camera and microphone. Along the video stream data, IDs of the LED
tag captured by the IR trackers and physiological data
are recorded in the database as indices of the video data.
Each presenter at their booth carried a set of wearable
sensors, including a video camera with an IR tracker,
a microphone, an LED tag, and physiological sensors
(heart rate, skin conductance, and temperature). A visitor could choose to carry the same wearable system as
the presenters, just an LED tag, or nothing at all.
The humanoid robots in the room record their own behavior logs and the reactions of the humans with whom
the robots interact.
One booth had a humanoid robot for its demonstration
that was also used as an actor to interact with visitors
and record interactions using the same wearable system
Ubiquitous sensors (video
camera, microphone, IR tracker)
LED tags attached to objects
Video camera,
IR tracker,
LED tag
Humanoid robot
Microphone
PC
Figure 2: Setup of the ubiquitous sensor room.
as the human presenters.
The clients for recording the sensed data were Windowsbased PCs. In order to incorporate data from multiple
sensor sets, time is an important index. We installed
NTP (Network Time Protocol) to all the client PCs to
synchronize their internal clocks within 10ms.
LED tag
Micro computer
Recorded video data were gathered to a UNIX file server
via samba server. Index data given to the video data
were stored in an SQL server (MySQL) running on another Linux machine. In addition, we had another Linuxbased server, called an application server, for generating
a video-based summary by using MJPEG Tools2 .
LED
CMOS camera for ID tracking
IR tracker
CCD camera for video recording
At each client PC, video data was encoded into MJPEG
(320 x 240 resolution, 15 frames per second) and audio
data was recorded in PCM 22 KHz 16 bit monaural.
Figure 3: IR tracker and LED tag.
Figure 3 shows the prototyped IR tracker and LED tag.
The IR tracker consists of a CMOS camera for detecting blinking signals of LED and a micro computer for
controlling the CMOS camera. The IR tracker was embedded in a small box with another CCD camera for
recording video contents.
ognize IDs of LED tags within their view in the range
of 2.5 meters, and send the detected IDs to the SQL
server. Each tracker data consists of spatial data, the
two-dimensional coordinate of the tag detected by the
IR tracker, and temporal data, the time of detection, in
addition to the ID of the detected tag (see Figure 4).
Each LED tag emits a 6-bit unique ID, allowing for 64
different IDs, by rapidly flashing. The IR trackers rec-
A few persons attached three types of physiological sensors – a pulse physiology sensor, skin conductance sensor, and temperature sensor – to their fingers3 These
2 A set of tools that can do cut-and-paste editing
and MPEG compression of audio and video under Linux.
http://mjpeg.sourceforge.net
3
We used Procomp+ as an AD converter for transmitting
X
Y
4
1036571603.137000
61
229
60
1036571603.448000
150
29
4
1036571603.878000
61
228
60
1036571604.319000
149
28
4
1036571604.659000
62
227
60
1036571605.440000
152
31
60
1036571605.791000
150
28
60
1036571606.131000
148
30
4
1036571606.472000
64
230
60
1036571607.163000
150
30
60
1036571608.074000
150
30
60
1036571608.385000
148
29
60
1036571608.725000
146
28
4
1036571609.066000
65
228
ID
60
4
TIME
Coexistence
Staying
Gazing at an object
Joint attention
Attention Focus: Socially important event
Conversation
IR tracker’s view
LED tag
Figure 4: Indexing by visual tags.
Figure 5: Interaction primitives.
data were also sent to the SQL server via the PC.
Eighty users participated during the two-day open house
providing ∼ 300 hours of video data, 380,000 tracker
data along with associated physiological data. The major advantage of the system is the relatively short time
required in analyzing tracker data compared to processing audio and images of all the video data.
the users jointly pay attention to the object. When
many users pay attention to the object, we infer that
the object plays a socially important role at that moment.
facing Two users’ IR trackers detect each others’ LED
tags: they are facing each other.
INTERPRETING INTERACTIONS
To illustrate how our interaction corpus may be used,
we constructed a system to provide users with a personal summary video at the end of their touring of an
exhibition room on the fly. We developed a method to
segment interaction scenes from the IR tracker data. We
defined interaction primitives, or “events”, as significant
intervals or moments of activities. For example, a video
clip that has a particular object (such as a poster, user,
etc.) in it constitutes an event. Since the location of
all objects is known from the IR tracker and LED tags,
it is easy to determine these events. We then interpret
the meaning of events by considering the combination
of objects appearing in the events.
Figure 5 illustrates basic events that we considered.
stay A fixed IR tracker at a booth captures an LED
tag attached to a user: the user stays at the booth.
coexist A single IR tracker captures LED tags attached
to different users at some moment: the users coexist in
the same area.
gaze An IR tracker worn by a user captures an LED
tag attached to someone/something: the user gazes at
someone/something.
attention An LED tag attached to an object is simultaneously captured by IR trackers worn by two users:
sensed signals to the carried PC.
Raw data from IR trackers are just a set of intermittently detected IDs of LED tags. Therefore, we first
group the discrete data into interval data implying that
a certain LED tag stays in view for a period of time.
Then, these interval data are interpreted as one of the
above events according to the combination of entities
attached by the IR tracker and LED tag.
In order to group the discrete data into interval data, we
assigned two parameters, minInterval and maxInterval.
A captured event is at least minInterval in length, and
times between tracker data that make up the event are
less than maxInterval. The minInterval allows elimination of events too short to be significant. The maxInterval value compensates for the low detection rate of the
tracker; however, if the maxInterval is too large, more
erroneous data will be utilized to make captured events.
The larger the minInterval and the smaller the maxInterval are, the fewer the significant events that will be
recognized.
For the first prototype, we set both the minInterval and
maxInterval at 5 sec. However, a 5 sec maxInterval was
too short to extract events having a meaningful length
of time. As a result of the video analyses, we found an
appropriate value of maxInterval: 10 sec for ubiquitous
sensors and 20 sec for wearable sensors. The difference
of maxInterval values is reasonable because ubiquitous
sensors are fixed and wearable sensors are moving.
TALKED WITH I talked with [someone].
VIDEO SUMMARY
We were able to extract appropriate “scenes” from the
viewpoints of individual users by clustering events having spatial and temporal relationships.
Time
Talk to A
Talk to B
Talk to C
Visit Z
Visit X
Look into W
Visit Y
Talk to A about Z
Talk to B & C about Y
Watch W at X
Figure 6: Interpreting events to scenes by grouping
spatio-temporal co-occurences.
A scene is made up of several basic interaction events
and is defined based on time. Because of the setup of
the exhibition room, in which five separate booths had
a high concentration of sensors, scenes were locationdependent to some extent as well. Precisely, all the
events that overlap at least minInterval / 2 were considered to be a part of the same scene (see Figure 6).
Scene videos were created in a linear time fashion using
only one source of video at a time. In order to decide
which video source to use to make up the scene video, we
established a priority list. In creating the priority list,
we made a few assumptions. One of these assumptions
was that the video source of a user associated with a captured event of UserA shows the close-up view of UserA.
Another assumption was that all the components of the
interactions occurring in BoothA are captured by the
ubiquitous cameras set up for BoothA.
The actual priority list used was based on the following
basic rules. When someone is speaking (the volume of
the audio is greater than 0.1 / 1.0), a video source that
shows the close-up view of the speaker is used. If no one
that is involved in the event is speaking, the ubiquitous
video camera source is used.
Figure 7 shows an example of video summarization for a
user. The summary page was created by chronologically
listing scene videos, which were automatically extracted
based on events (see above). We used thumbnails of
the scene videos and coordinated their shading based
on the videos’ duration for quick visual cues. The system provided each scene with annotations, i.e., time,
description, and duration. The descriptions were automatically determined according to the interpretation of
extracted interactions by using templates, as follows.
WAS WITH I was with [someone].
LOOKED AT I looked at [something].
In the time intervals where more than one interaction
event has occurred, the following priority was used: TALKED
WITH > WAS WITH > LOOKED AT.
We also provided a summary video for a quick overview
of the events the users experienced. To generate the
summary video, we used a simple format in which at
most 15 seconds of each relevant scene was put together
chronologically with fading effects between the scenes.
The event clips used to make up a scene were not restricted to those captured by a single resource (video
camera and microphone). For example, for a summary
of a conversation TALKED WITH scene, the video clips
used were recorded by the camera worn by the user
him/herself, the camera of the conversation partner, and
a fixed camera on the ceiling that captured both users.
Our system selects which video clips to use by consulting the volume levels of the users’ individual voices. The
worn LED tag is assumed to indicate that the user’s face
is in the video clip if the associated IR tracker detects it.
Thus, the interchanging integration of video and audio
from different worn sensors could generate a scene of a
speaking face by camera with a clearer voice by his/her
microphone.
CORPUS VIEWER: TOOL FOR ANALYZING INTERACTION
PATTERNS
The video summarizing system was intended to be used
as an end-user application. Our interaction corpus is
also valuable for researchers to analyze and model human social interactions. In such a context, we aim to
develop a system that researchers (HCI designers, social scientists, etc.) can query for specific interactions
quickly with simple commands that provides enough
flexibility to suit various needs. To this end, we prototyped a system called the Corpus Viewer, as shown in
Figure 8.
This system first visualizes all interactions collected from
the viewpoint of a certain user. The vertical axis is time.
Vertical bars correspond to IR trackers (red bars) that
capture the selected user’s LED tag and LED tags (blue
bars) that are captured by the user’s IR tracker. Many
horizontal lines on the bars imply IR tracker data.
By viewing this, we can easily grasp an overview of the
user’s interactions with other users and exhibits, such as
mutual gazing with other users and staying at a certain
booth. The viewer’s user can then select any part of the
bars to extract a video corresponding to the selected
time and viewpoint.
Summary video
of the user’s
entire visit
List of
highlighted
scenes during
the user’s visit
Annotations for
each scene:
time,
description,
duration
Video example of
conversation scene
Overhead camera
Partner’s camera
Self camera
Figure 7: Automated video summarization.
We have just started to work together with social scientists to identify patterns of social interactions in the
exhibition room using our interaction corpus augmented
by the Corpus Viewer. The social scientists actually
used our system to roughly estimate sufficient points
from a large amount of data by browsing clusters of IR
tracking data.
CONCLUSIONS
This paper proposed a method to build an interaction
corpus using multiple sensors either worn or placed ubiquitously in the environment. We built a method to segment and interpret interactions from huge collected data
in a bottom-up manner by using IR tracking data. At
the two-day demonstration of our system, we were able
to provide users with a video summary at the end of
their experience on the fly. We also developed a prototype system to help social scientists analyze our interaction corpus to learn social protocols from the interaction
patterns.
ACKNOWLEDGEMENTS
We thank our colleagues at ATR for their valuable discussion and help on the experiments described in this
paper. Valuable contributions to the systems described
in this paper were made by Tetsushi Yamamoto, Shoichiro
Iwasawa, and Atsushi Nakahara. We also would like to
thank Norihiro Hagita, Yasuyhiro Katagiri, and Kiyoshi
Kogure for their continuing support of our research.
This research was supported in part by the Telecommunications Advancement Organization of Japan.
REFERENCES
1. Mark Weiser. The computer for the 21st century. Scientific American, 265(30):94–104, 1991.
2. Rainer Stiefelhagen, Jie Yang, and Alex Waibel. Modeling focus of attention for meeting indexing. In ACM
Multimedia ’99, pages 3–10. ACM, 1999.
3. Takayuki Kanda, Hiroshi Ishiguro, Michita Imai, Tetsuo Ono, and Kenji Mase.
A constructive approach for developing interactive humanoid robots. In
2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2002), pages 1265–
1270, 2002.
4. Alex Pentland. Smart rooms.
274(4):68–76, 1996.
Scientific American,
5. Rodney A. Brooks, Michael Coen, Darren Dang,
Jeremy De Bonet, Josha Kramer, Tomás Lozano-Pérez,
John Mellor, Polly Pook, Chris Stauffer, Lynn Stein,
Mark Torrance, and Michael Wessler. The intelligent
room project. In Proceedings of the Second International Cognitive Technology Conference (CT’97), pages
271–278. IEEE, 1997.
6. Cory D. Kidd, Robert Orr, Gregory D. Abowd, Christopher G. Atkeson, Irfan A. Essa, Blair MacIntyre, Elizabeth Mynatt, Thad E. Startner, and Wendy Newstetter. The aware home: A living laboratory for ubiqui-
1. Selecting a
period of time
to extract video
3. Viewing
the extracted
video
2. Confirmation
and adjustment of
the selected target
Figure 8: Corpus viewer for facilitating an analysis of
interaction patterns.
tous computing research. In Proceedings of CoBuild’99
(Springer LNCS1670), pages 190–197, 1999.
7. Aaron F. Bobick, Stephen S. Intille, James W. Davis,
Freedom Baird, Claudio S. Pinhanez, Lee W. Campbell,
Yuri A. Ivanov, Arjan Schütte, and Andrew Wilson.
The KidsRoom: A perceptually-based interactive and
immersive story environment. Presence, 8(4):369–393,
1999.
8. Barry Brumitt, Brian Meyers, John Krumm, Amanda
Kern, and Steven Shafer. EasyLiving: Technologies for
intelligent environments. In Proceedings of HUC 2000
(Springer LNCS1927), pages 12–29, 2000.
9. Steve Mann. Humanistic intelligence: WearComp as a
new framework for intelligence signal processing. Proceedings of the IEEE, 86(11):2123–2125, 1998.
10. Tatsuyuki Kawamura, Yasuyuki Kono, and Masatsugu
Kidode. Wearable interfaces for a video diary: Towards
memory retrieval, exchange, and transportation. In The
6th International Symposium on Wearable Computers
(ISWC2002), pages 31–38. IEEE, 2002.
11. Patrick Chiu, Ashutosh Kapuskar, Sarah Reitmeier,
and Lynn Wilcox. Meeting capture in a media enriched conference room. In Proceedings of CoBuild’99
(Springer LNCS1670), pages 79–88, 1999.
Context Annotation for a Live Life Recording
Nicky Kern, Bernt Schiele
Perceptual Computing and
Computer Vision
ETH Zurich, Switzerland
{kern,schiele}@inf.ethz.ch
Holger Junker,
Paul Lukowicz,
Gerhard Tröster
Wearable Computing Lab
ETH Zurich, Switzerland
{junker,lukowicz,troester}
@ife.ee.ethz.ch
ABSTRACT
We propose to use wearable sensors and computer systems to generate personal contextual annotations in
audio-visual recordings of a person’s life. In this paper we argue that such annotations are essential and
effective to allow retrieval of relevant information from
large audio-visual databases. The paper summarizes
work on automatically annotating meeting recordings,
extracting context from body-worn acceleration sensors
alone, and combining context from three different sensors (acceleration, audio, location) for estimating the
interruptability of the user. These first experimental
results indicate, that it is possible to automatically find
useful annotations for a lifetime’s recording and discusses what can be reached with certain sensors and
sensor configurations.
INTRODUCTION
Interestingly, about 500 Tera Bytes of storage are sufficient to record all audio-visual information a person
perceives during an entire lifespan1 . This amount of
storage will be available even for an average person in
the not so distant future. A wearable recording and
computing device therefore might be used to ’remember’ any talk, any discussion, or any environment the
person saw.
For annotating an entire life-time it is important that
the recording device with the attached sensors can be
worn in any situation by the user. Although it is possible to augment certain environments, this will not
be sufficient. Furthermore wearable computers allow
a truly personal audio-visual record of the environment
of a person in any environment. Using a hat- or glassmounted camera and microphones attached to the chest
1
assuming a lifespan of 100 years, 24h recording per day,
and 10 MB per min recording results in approximately 500
TB
Albrecht Schmidt
Media Informatics Group
Universität München
[email protected]
or shoulders of the person enable a recording from a
first-person perspective.
Today however, the usefulness of such data is limited by
the lack of adequate methods for accessing and indexing large audio-visual databases. While humans tend
to remember events by associating them with personal
experience and contextual information, today’s archiving systems are based solely on date, time, location and
simple content classification. As a consequence even in
a recording of a simple event sequence such as a short
meeting, it is very difficult for the user to efficiently retrieve relevant events. Thus for example the user might
remember a particular part of the discussion as being a
heated exchange conducted during a short, unscheduled
coffee break. However he is unlikely to remember the
exact time of this discussion which is typically required
today to retrieve in audio-visual recordings.
In this paper we propose to use wearable sensors in
order to enhance the recorded data with contextual,
personal information to facilitate user friendly retrieval.
Sensors, such as accelerometers and biometric sensors,
can enhance the recording with information on the user’s
context, activity and physical state. That sensor information can be used to annotate and structure the data
stream for later associative access.
This paper summarizes three papers [1, 2, 3] in which
we have worked towards extracting such context and
specifically using it for retrieving information. The second section of this paper summarizes [1], in which context annotations from audio and acceleration are used
to annotate meeting recordings. The third section introduces [2], in which context from audio, acceleration
and location is used to mediate notifications to the user.
The fifth section examines in detail how much information can be extracted from acceleration sensors alone. A
discussion of these three in the context of life recording
concludes the paper.
RELATED WORK
Recently, the idea of recording an entire lifetime of information has received great attention. The UK Computing Research Committee formulated as part of the
Grand Challenges Initiative a number of issues arising
from recording a lifetime [4]. Microsoft’s MyLifeBits [5]
Discussion
- Sitting
- Two speakers
Presentation
- Standing
- One speaker
Figure 1: Retrieval Application for a Meeting, presentation and discussion parts highlighted, graphs
from top to bottom: Audio signal, speaker recognition first speaker, speaker recognition second
speaker, Me vs. the World , Activity Recognition
project tries to collect and store any digital information about a person, but leaves the annotation to the
user. Finally, DARPA’s LifeLog initiative [6] invites
researchers to investigate the issues of data collection,
automatic annotation, and retrieval.
The idea of computer-based support for human memory and retrieval is not new. Lamming and Flynn for
example point out the importance of context as a retrieval key [7] but only used cues like location, phone
calls, and interaction between different PDAs. The conference assistant [8] supports the organization of a conference visit, annotation of talks and discussions, and
retrieval of information after the visit. Again, the cooperation and communication between different wearables
and the environment is an essential part of the system.
Rhodes proposed the text-based remembrance agent [9]
to help people to retrieve notes they previously made
on their computer.
For speech recognition the automatic speech transcription of meetings is an extremely challenging task due
to overlapping and spontaneous speech, large vocabularies, and difficult background noise [10, 11]. Often,
multiple microphones are used such as close-talking, table microphones, and microphone arrays. The SpeechCorder project [12] for example aims to retrieve information from roughly transcribed speech recorded during a meeting. Summarization is another topic, which is
currently under investigation in speech recognition [13]
as well as video processing. We strongly believe, however, that summarization is not enough to allow effective and in particular associative access to the recorded
data.
Richter and Le [14] propose a device which will use
predefined commands to record conversations and take
low-resolution photos. At the university of Tokyo [15]
researchers investigate the possibilities to record subjective experience by recording audio, video, as well as
heartbeat or skin conductance so as to recall one’s experience from various aspects. StartleCam [16] is a wearable device which tries to mimic the wearer’s selective
memory. The WearCam idea of Mann [17] is also related to the idea of constantly recording one’s visual
environment.
WEARABLE SENSING TO ANNOTATE MEETING RECORDINGS
In order to give first experimental evidence that context
annotations are useful, we recorded meetings and annotated them using audio and acceleration sensors [1]. In
particular, we extracted information such as walking,
standing, and sitting from the acceleration sensors, and
speaker changes from the audio. Thus we facilitate the
associative retrieval of the information in the meetings.
Looking at the meeting scenario we have identified four
classes of relevant annotations. Those are different meeting phases, flow of discussion, user activity and reac-
To detect different speakers and find speaker changes,
we have implemented a HMM-based speaker segmentation algorithm, based on [18]. A model is trained for
every speaker using labelled training data, final segmentation is done by combining all models. First results
yielded some 85-95% recognition rate. We proposed a
scheme to facilitate retrieval using these segmentation
rates by trading error rate against time accuracy of the
segmentation.
interruption
no problem
Boring
Talk
Bar
Sitting
in a Tram
Skiing
Having
a coffee
Walking in
Restaurant the street
Riding
a bike
Lecture
Driving
a car
don't
disturb
Audio Context.
By clipping the recording microphone to the collar of
the user, we can tell the user from the rest of world by
thresholding the energy of the audio signal. This allows
us to further increase the recognition rate of the speaker
segmentation.
Waiting
Room
interruption
ok
Personal Interruptability
don't
disturb
tions, and interactions between the participants. The
meeting phase includes the time of presentations, breaks,
and when somebody is coming or leaving during the
meeting. The flow of discussion annotations attach
speaker identity and changes to the audio stream, and
indicate the level of intensity of discussion. It can also
help to differentiate single person presentations, interactive questions and answers, and heated debate. User
activity and reactions indicate user’s level of interest,
focus of attention, and agreement or disagreement with
particular issues and comments. By tracking the interaction of the user with other participants personal
discussions can be differentiated from general discussions.
interruption
ok
interruption
no problem
Social Interruptability
Figure 2: Personal and Social Interruptability of
the User
CONTEXT–AWARE NOTIFICATION FOR WEARABLE COMPUTING
For the automatic mediation of notifications, we have
investigated the inference of complex context information (namely the personal and social interruptability of
the user) from simpler contexts, such as user activity,
social situation, and location [2].
To show how to use our proposed annotations, we have
recorded a set of three short meetings (3-4 min) and
evaluated both the speaker identification and the activity recognition on them. The recognition results for the
respective sensors are similar to those obtained in our
previous experiments.
The cost of a notification mainly depends on the interruptability of the user. However, we have to distinguish
between the interruptability of the user and of his environment. We refer to the Personal Interruptability as
the interruptability of the user. With the term Social
Interruptability we indicate the interruptability of the
environment of the user. These two interruptabilities
are depicted in a two-dimensional space (see Figure 2).
Considering the ‘lecture’ situation in Figure 2, the user
is little interruptible, because he follows the lecture, and
his environment is equally little interruptible. However,
if the lecture was boring, the user would probably appreciate an interruption, while the environment should
still not be interrupted. As shown in Figure 3, this
space can also be used to select notification modalities,
by discretizing it and assigning a notification modality
to every bin.
Figure 1 shows a screen shot of our retrieval application. The audio stream is displayed on top, followed by
the results of the speaker identification algorithm (in
blue the ground truth). The “Me Vs. The World” row
shows the result of the energy thresholding algorithm.
Finally the bottom most block shows the activity of the
user. We can clearly tell the presentation phase of the
meeting from the discussion phase by looking at both
the number of speaker changes and the fact that the
presenter is standing during the presentation.
We use three different sensors, namely acceleration, audio, and location, from which we extract low-level context information. We use a single dual-axis accelerometer mounted above the user’s knee, and classify its data
into walking, sitting, standing, and stairs. The audio
data is classified into street, restaurant, conversation,
lecture, and other. Finally, we use the closest wireless
LAN access point as location information. We have
grouped the available access points into Office, Lab, Lecture Hall, Cafeteria, and Outside.
Acceleration Context.
We use two 3D-accelerometers to detect the user’s activity. The sensors are attached above the right knee
and on the right wrist of the user. We classify the user’s
activity in sitting, walking, standing, and shaking hands.
The first three tell a lot about the user’s activity, while
the last one allows to find interaction with others. First
experiments of our HMM-based classifier yielded some
88-98% recognition score.
grab entire
attention
make
aware
HMD
+ Vibration
Vibration
+ Watch
Beep
+ HMD
Beep
Speech
+ HMD
Ring
don't
notify
make
aware
grab entire
attention
Acc: Sitting
3
Acc: Standing
3
Acc: Walking
3
Acc: Stairs
2
2
2
1
1
1
0
0
1
2
3
Audio: Conversation
3
0
0
2
2
2
2
2
1
1
1
1
1
0
0
1
2
3
Location: Lecture Hall
3
0
0
1
2
3
Location: Cafeteria
3
0
0
3
don't
notify
Intensity for the User
3
1
2
3
Location: Office
3
0
0
3
1
2
3
Audio: Restaurant
1
2
Location: Lab
3
0
0
3
0
0
3
2
1
1
2
Audio: Street
3
0
0
3
1
2
3
Location: Outdoor
1
2
3
Audio: Lecture
3
2
2
2
2
2
1
1
1
1
1
0
0
0
0
0
0
0
0
1
2
3
1
2
3
1
2
3
1
2
3
0
0
Audio: Other
1
2
3
Figure 4: Tendencies for Combining Low-Level
Contexts into the Interruptability of the User
Intensity for the Environment
Figure 3: Selecting Notification Modalities using
the User’s Social and Personal Interruptability
We found that modelling situations such as the ones in
the Figure 2 is inappropriate to estimate the user’s interruptability from sensor data. Situations are too general and thus their corresponding interruptability cover
too large an area in the space. Increasing the level of detail of the situations would help, but make the number
of situations unmanageable.
Instead we infer the interruptability directly from lowlevel sensors. For each context, we define a tendency
where the interruptability is likely to be within the interruptability space. See Figure 4 for the tendencies
we used in our experiments. The final interruptability is then found by weighing the tendencies with the
respective sensor recognition score, summing all tendencies together and finding the maximum within the
interruptability space.
We have experimentally shown the feasibility of the approach on a 37min stretch of data for all three modalities. We used the tendencies depicted in Figure 4.
Since we wanted to use the interruptability for notification modality selection, we consider the error sufficiently small, if the interruptability was within the same
‘bin’ of the 3x3 grid. Using this error measure, we could
estimate the Personal Interruptability sufficiently well
in 96.3% of the time, and the Social Interruptability for
88.5% of the time. The Social Interruptability mainly
depends on the audio classification, which, in itself, had
a lower recognition score than the acceleration classification.
MULTI–SENSOR ACTIVITY CONTEXT DETECTION FOR
WEARABLE COMPUTING
We have started the investigation how much information can be extracted from acceleration sensors alone
[3]. In particular we investigated the number of sensors
required for detecting a certain context and their best
placement. To this end we have developed a hardware
platform, that allows to take acceleration readings from
12 positions on the user’s body.
We investigated both simple activities such as sitting,
walking, standing, or walking stairs up and down, and
more complex ones, such as shaking hands, typing on a
keyboard, and writing on a white board. The sensors
were attached to all major body joints, namely both
shoulders, elbows, wrists, both sides of the hip, both
knees and ankles. We recorded some 19 minutes of data
for the above activities. The data was classified using a
Naı̈ve Bayes’ classifier, using 5-fold cross validation.
Figure 5 shows some recognition results for different
sub-sets of sensors. The right-most sets of bars show
that the recognition rate decreases with the number of
sensors that are used. The recognition rates for the
‘leg-only’ activities are very similar for all sets of sensors, with the exception of the set of the upper body
sensors, which seems obvious. More detailed experiments [3] show that reducing the number of sensors
either works well for simpler contexts (such as walking, sitting, standing) or reduces the recognition score.
Depending on the complexity of the activity the drop
in recognition score can be slight, e.g. for activities
of medium complexity such as upstairs or downstairs,
or significant for complex or subtle activities, such as
shaking hands or typing on a keyboard.
DISCUSSION AND OUTLOOK
Recording part of or even an entire lifetime is becoming
feasible in the near future. Retrieval within and structuring of such large data collections is a critical challenge for this vision to come true. We propose to use
wearable sensor and computer systems to annotate the
recorded data automatically with personal information,
and allow for associative retrieval. We present three applications in this context. Firstly, a Meeting Recorder
that automatically annotates recordings using context
from body-worn acceleration and audio sensors. Sec-
Leg-Only Activities
100
90
80
70
60
50
40
30
20
10
0
All S ens ors
R ight
Left
Upper B ody
Lower B ody
S itting
S tanding
Walking
Ups tairs
Downs tairs
Average Leg-Only
Other Activities
100
90
80
70
60
50
40
30
20
10
0
All S ens ors
R ight
Left
Upper B ody
Lower B ody
S hake Hands
Write on B oard
K eyboard T yping
Average all Activities
Figure 5: Recognize User Activity from Body-Worn Acceleration Sensors. Recognition Rates for
Different Sub-Sets of Sensors
ondly, we have used low-level context information from
acceleration, audio, and location, to estimate the user’s
social and personal interruptability — a high-level context information that can both be used for retrieval
and to drive a context-aware application. Thirdly, we
have investigated how much information can be gathered from acceleration sensors alone, specifically how
many sensors are required and where they could be
placed for the recognition of a certain context.
With the technology presented, we can capture personal
information of the user. This personal information can
be extended in two ways, either using the user’s interactions with other users or using his digital footprint
in the environment. The user’s interaction with others could be detected using the physical presence of the
other user’s personal device, and could for example be
used to find a specific discussion with the other user.
The user’s digital footprint includes not only e-mail,
but also his interaction with other electronic devices
such as printers, beamers, etc.
Projects such as Microsoft MyLifeBits [5] currently concentrate on collecting all digitally accessible information
from the user’s environment such as telephone calls, letters, e-mails, etc., and making it accessible by explicit
user annotation. This is complemented by other initiatives, such as DARPA’s LifeLog, which rather focus on
sensory augmented wearable technologies, and tries, to
automatically find structure (events and episodes) in
the data to facilitate subsequent retrieval. While the
work presented fits very well in the latter direction, it
is but a first step towards recording a lifetime.
REFERENCES
1. N. Kern, B. Schiele, H. Junker, P. Lukowicz, and
G. Tröster. Wearable sensing to annotate meetings
recordings. In Proc. ISWC, pages 186–193, 2002.
2. N. Kern and B. Schiele. Context–aware notfication
for wearable computing. In Proc. ISWC, pages
223–230, White Plains, NY, USA, October 2003.
3. N. Kern, B. Schiele, and A. Schmidt. Multi–sensor
activity context detection for wearable computing.
In Proc. EUSAI, LNCS, volume 2875, pages
220–232, Eindhoven, The Netherlands, November
2003.
4. Memories for life, CRC Grand Challenges
Initiative. http://www.csd.abdn.ac.uk/ ereiter/memories.html.
5. Microsoft MyLifeBits project.
http://research.microsoft.com/barc/mediapresence/MyLifeBits.aspx.
6. DARPA LifeLog initiative.
http://www.darpa.mil/ipto/programs/lifelog/.
7. M. Lamming and M. Flynn. Forget-me-not:
intimate computing in support of human memory.
In FRIENDS21, pages 125–128, 1994.
8. A.K. Dey, D. Salber, G.D. Abowd, and
M. Futakawa. The conference assistant:
Combining context-awareness with wearable
computing. In ISWC, pages 21–28, 1999.
9. B. Rhodes. The wearable remembrance agent: A
system for augmented memory. In ISWC, pages
123–128, 1997.
10. ICSI Berkeley, The Meeting Recorder Project at
ICSI. http://www.icsi.berkeley.edu/Speech/mr/.
11. NIST Automatic Meeting Transcription Project.
http://www.itl.nist.gov/iad/894.01/.
12. A. Janin and N. Morgan. Speechcorder, the
portable meeting recorder. In Workshop on
Hands-Free Speech Communication, 2001.
13. A. Waibel, M. Bett, and M. Finke. Meeting
browser: Tracking and summarizing meetings. In
Proceedings of the DARPA Broadcast News
Workshop, 1998.
14. T. Kontzer. Recording your life.
http://www.informationweek.com, Dec, 18 2001.
15. R. Ueoka, M. Hirose, K. Hirota, A. Hiyama, and
A. Yamamura. Study of experience recording and
recalling for wearable computer. Correspondences
on Human Interface, 3(1):13–16, 2001.02.
16. J. Healey and R. Picard. Startlecam: A cybernetic
wearable camera. In ISWC, pages 42–49, 1998.
17. S. Mann. Smart clothing: The wearable computer
and wearcam. Personal Technologies, 1(1), 1997.
18. D. Kimber and L. Wilcox. Acoustic segmentation
for audio browsers. In Proc. Interface Conference,
1996.
Capture and Efficient Retrieval of Life Log
Kiyoharu Aizawa, Tetsuro Hori, Shinya Kawasaki, Takayuki Ishikawa
Department of Frontier Informatics,
The University of Tokyo
+81-3-5841-6651
{aizawa,t_hori, kawasaki,ishikawa}@hal.t.u-tokyo.ac.jp
ABSTRACT
In ``Wearable computing'' environments, digitization of
personal experiences will be made possible by continuous
recording using a wearable video camera. This could lead
to ``automatic life-log application''. It is evident that the
resulting amount of video content will be enormous.
Accordingly, to retrieve and browse desired scenes, a vast
quantity of video data must be organized using structural
information. In this paper, we are developing a ``contextbased video retrieval system for life-log applications''. This
system can capture not only video and audio but also
various sensor data and provides functions that make
efficient video browsing and retrieval possible by using
data from these sensors, some databases and various
document data.
Keywords
life log, retrieval, context, wearable
INTRODUCTION
The custom of writing a diary is common all over the
world. This fact shows that many people like to log their
everyday lives. However, to write a complete diary, a
person must recollect and note what was experienced
without missing anything. For an ordinary person, this is
impossible. It would be nice to have a secretary who
observed your everyday life and wrote your diary for you.
In the future, a wearable computer may become such a
secretary-agent. In this paper, we aim at the development
of a ``life-log agent'' (that operates on a wearable
computer). The life-log agent logs our everyday life on
storage devices instead of paper, using multimedia such as
a small camera instead of a pencil.
There have been works to log a person's life in the area of
mobile computing, wearable computing, video retrieval and
database [1,2,3,8,9,10,11]. A person's experiences or
activities have been captured from many different points of
view. In one of the earliest works [7], various personal
activities were recorded such as personal location and
encounters with others, file exchange, workstation
activities, etc. Diary recording using additional sensors
have been attempted in the wearable computing area. For
example, in [2], a person's skin conductivities were
captured for video retrieval keys. In [11], not only wearable
sensors, but also RFIDs for object identification were
utilized. Meetings were also recorded using sensors for
speaker identification [9]. In database area, Mylifebits
project attempts to exhaustively records a person's
activities such as document processing, web browsing etc.
We focus on continuous capturing our experiences by
wearable sensors including a camera. In our previous
works [4,5], we used a person's brain waves and motion to
retrieve videos. In this paper, we describe our latest work,
which is able to retrieve using more contexts.
PROBLEMS IN BROWSING LIFE-LOG VIDEO
A life-log video can be captured using a small wearable
camera with a field of view equivalent to the user's field of
view. Videos are the most important contents of life-log
data. By continuously capturing life-log videos, personal
experiences of everyday life can be recorded by video,
which is a most popular medium. Instead of writing a diary,
a person can simply order the life-log agent to start
capturing a life-log video at the beginning of every day.
For a conventional written diary, a person can look back on
a year at its end by reading the diary, and will soon finish
reading the diary and will easily review events in the year.
However, watching life-log videos is a critical problem. It
would take another year to watch the entire life-log video
for one year. Then, although it is surely necessary to digest
or edit life-log videos, editing takes even more time. It is
the most important to be able to process a vast quantity of
video data automatically.
Conventional Video Retrieval Systems
Recently, a variety of systems for video retrieval has been
existing. Conventional systems take content-based
approach. They digest or edit videos by processing the
various features grasped from image or audio signals. For
example, they may utilize color histograms extracted from
image signals. However, even if they utilize such
information, computers do not understand the contents of
the videos, and they can seldom help their users to easily
retrieve and browse the desired scenes in life-log videos. In
addition, such image signal processing requires very high
computational costs.
Our Proposed Solution to this Problem
Life-log videos are captured by a user. Therefore, as the
life-log video is captured, various data such as GPS,
motion, etc. other than video and audio can be
simultaneously recorded. By these information, computers
may be able to use contexts as well as contents, thus, our
approach is very different from conventional video
retrieval technologies.
CAPTURING SYSTEM
The life-log agent is a system that can capture data from a
wearable camera, a microphone and various sensors that
show contexts. The sensors we used are a brain-wave
analyzer, a GPS receiver, an acceleration sensor and a gyro
sensor. All these sensors are attached to the notebook PC
through, serial ports, USBs and PCMCIA slots. (figure 1
and figure 2)
Next, using a modem, the agent can connect into the
Internet almost anywhere via the PHS (Personal Handyphone System: Versatile cordless/mobile system developed
in Japan.) network of NTT-DoCoMo. By referring to data
on the Internet, the agent records ``the present weather in
the user's location'', ``various news on that day, which were
offered by some news sites or some email magazines'', ``all
web pages (*.html) that the user browses'' and ``all emails
that the user transmitted and received''.
Figure 2. Capturing System
At last, the agent monitors and controls the following
applications, ``Microsoft Word'', ``Microsoft Excel'',
``Microsoft PowerPoint'' and ``Adobe Acrobat''. In addition
to web browsing and transmission and reception of emails,
these applications are the main softwares used while people
are using computer. Because of monitoring and controlling
them, when the user opens document a file (*.doc; *.xls;
*.ppt; *.pdf) of such applications, the agent can order each
application to copy the file and save it as text data.
The user can use his cellular phone as a controller of
operations ``start/stop life-log''. The agent recognizes the
user's operations on his cellular phone via PHS.
RETRIEVAL OF LIFE-LOG VIDEO
We, human beings, save many experiences as a vast
quantity of memories over many years of life while
arranging and selecting them, and we can quickly retrieve
and utilize necessary information from our memory. Some
psychology researches say that we manage our memories
based on contexts at the time. When we want to remember
something, we can often use such contexts as keys, and
recall the memories by associating them with these keys.
Figure 1. Diagram of Capturing System
For example, to recollect the scenes of a conversation, the
typical keys used in the memory recollection process are
such context information as ``what, where, with whom,
when, how''
A user may put the following query (Query A). “On a
cloudy day in mid-May when the Lower House general
election was held, after making my presentation about lifelog, I was called to Shinjuku by the email from Kenji, and I
talked with him while walking at a department store in
Shinjuku. The conversation was very interesting! I want to
see the scene to remember the contents of the
conversation”. In conventional video retrieval the lowlevel features of image and audio signals of the videos are
used as keys for retrieval. Probably, they will not be
suitable for queries compatible with the way we query to
our memories as in Query A. However, data from the
brain-wave analyzer, the GPS receiver, the acceleration
sensor, and the gyro sensor correlate highly with the user's
contexts. The life-log agent estimates its user's contexts
from these sensor data and some database, and uses them
as keys for video retrieval. Thus, the agent retrieves life-log
videos by imitating the way a person recollects experiences
from his memories. It is conceivable that by using such
context information, the agent can produce more accurate
retrieval results than by using only audiovisual data.
Moreover, each input from these sensors is a onedimensional signal, and the computational cost for
processing them is low.
Keys Obtained from Motion Data
The life-log agent inputs the data of the acceleration sensor
and the gyro sensor to the K-Means method and HMM and
estimates the user's motion state. The details are in our
previous paper[5]. In Query A, the conversation was held
while the user was walking.
Keys Obtained from Face Detection
The life-log agent detects a person's face in life-log videos
by processing the color histogram of the video image. Our
method only uses very easy processing of the color
histogram. Accordingly, even if there is no person in the
image, when skin color is predominantly included, the
agent make a wrong detection. But, the agent shows its
user the frame images and the time of the scene in which
the face was detected. If it is a wrong detection, the user
can ignore it and can also delete it. If the image is detected
correctly, the user can look at it and judge who it is.
Therefore, identification of a face is unnecessary and
simple detection is enough here. In Query A, the
conversation was held when the user was with Kenji.
Keys Obtained from Brain-Wave Data
A sub-band [8-12 Hz] of brain waves is named α wave
and it clearly shows the person's arousal status. When α
wave is low [α-blocking], the person is in arousal, or in
other words, is interested in or pays attention to something.
We demonstrated that we can effectively retrieve a scene of
interest to him using a person's brain waves in [4]. In
Query A, the conversation was very interesting.
Figure 4. Interface for managing videos
Figure 5. A result of face detection
Keys Obtained from GPS Data
Figure 3. Interface for playing the video
From the GPS signal, the life-log agent acquires
information about the position of its user as longitude and
latitude when capturing a life-log video. The contents of
videos and the location information are automatically
associated. Longitude and latitude information are onedimensional numerical data that identify positions on the
Earth's surface relative to a datum position. Therefore, they
are not intuitively readable for users. However, the agent
can convert longitude and latitude into addresses with
hierarchical structure using a special database, for example,
``7-3-1, Hongo, Bunkyo-ku, Tokyo, Japan''. The results are
information familiar to us, and we use them as keys for
video retrieval.
Latitude and longitude information also become
information that we can intuitively understand by being
plotted on a map as the footprints of the user, and thus
become keys for video retrieval. ``What did I do when
capturing the life-log video?'' A user may be able to
recollect it by seeing his footprints. The agent draws the
user's footprint in the video under playback using a thick
light-blue line, and draws other footprints using thin blue
lines on the map. By simply ``dragging his mouse'' on the
map, the user can change the area displayed on the map.
The user can also order the map to display the other area by
clicking arbitrary addresses of all the places where
footprints were recorded. The user can watch the desired
scenes by choosing arbitrary points of footprints.
Figure 7. Retrieval using the town directory
Because the locations of all the supermarkets visited must
be indicated in the town directory database, the agent
accesses the town directory, and finds one or more
supermarkets near his footprints including Shop A. The
agent then shows the user the formal names of all the
supermarkets which he visited and the time of visits as
retrieval results. Probably he chooses Shop A from the
results. Finally, the agent knows the time of the visit to
Shop A, and displays the desired scene. In Query A, the
conversation was held at a shopping center in Shinjuku.
The agent may make mistakes, for example, to the query
shown above. Even if the user has not actually been into
Shop A but has passed in front of it, the agent will
enumerate that event as one of the retrieval results.
Figure 6. Interface for retrieval using a map
Moreover, the agent has a town directory database. The
database has a vast amount of information about one
million or more public institutions, stores, companies,
restaurants, and so on, in Japan. Except for individual
dwellings, the database covers almost all places in Japan
including small shops or small companies that individuals
manage. In the database, each site has information about its
name, its address, its telephone number, and its category
with layered structures.
Using this database, a user can retrieve his life-log videos
as follows. He can enter the name of a store or an
institution, or can input the category. He can also enter the
both. For example, we assume that the user wants to review
the scene in which he visited the supermarket called ``Shop
A'', and enters the category-keyword ``supermarket''. To
filter retrieval results, the user can also enter the rough
location of Shop A, for example, ``Shinjuku-ku, Tokyo''.
Figure 8 Retrieval experiments
To cope with this problem, the agent investigates whether
the GPS signal was received during the event. If the GPS
became unreceivable, it is likely that the user went into
Shop A. The agent investigates the length of the period
when the GPS was unreceivable, and equates that to the
time spent in Shop A. If the GPS did not become
unreceivable, the user most likely did not go into Shop A.
We examined the validity of this retrieval technique. First,
we went to Ueno Zoological Gardens, the supermarket
``Summit'', and the drug store ``Matsumoto-Kiyoshi''. We
found that this technique was very effective! For example,
when we referred to a name-keyword ``Summit'', we found
the scene that was captured when the user was just about to
enter ``Summit'' as the result. When we referred to the
category-keyword ``drug store'', we found the scene that
was captured when the user was just about to enter
``Matsumoto-Kiyoshi'', and similarly for Ueno Zoological
Gardens. These retrievals were quickly completed; retrieval
from videos for three-hours took less than one second.
Keys Obtained from Time Data
The agent records the time by asking the operating system
for the present time, and associates contents of life-log
videos with the time when they were captured. In Query A,
the conversation was held in mid-May.
Keys Obtained from the Internet
The life-log agent records the weather and news on that
day, web pages that the user browses and emails that the
user transmitted and received. These data are automatically
associated with time data. Afterwards, these data can be
used as keys for life-log videos retrieval. In Query A, the
conversation was held after the user received the email
from Kenji on a cloudy day when the Lower House general
election was held.
Figure10. A result of retrieval from PowerPoint-document
held after the user made presentation about life-log.
(assume that PowerPoint was used at his presentation.)
Reversely, the agent can also perform video-based retrieval
for such documents including web pages and emails.
Keys Added by the User
The user can order the life-log agent to add retrieval keys
(annotation) with an arbitrary name by simple operations
on his cellular phone while the agent is capturing a life-log
video. This enables the agent to identify a scene that the
user wants to remember throughout his life, and thus the
user can access easily to the videos that were captured
during precious experiences.
Retrieval with a Combination of Keys
Consider Query A again. The user may have met Kenji
many times during some period of time. The user may have
gone to a shopping center many times during the period.
The user may have made presentation about life-log many
times during the period...etc.
Accordingly, if a user uses only one kind of key among the
various kinds of keys when retrieving life-log videos, too
many results which he does not desire will appear. By
using as many different keys as possible, only the desired
result may be obtained, or at least most of the undesired
results can be eliminated.
Figure 9.A result of retrieval from Web-document
Keys Obtained from Various Applications
All the document files (*.doc; *.xls; *.ppt; *.pdf) that user
opens are copied and saved as text. These copied document
files and text data are automatically associated with time
data. Afterwards, these text data can be used as keys for
life-log videos retrieval. In Query A, the conversation was
CONCLUSION
By using the data acquired from various sources while
capturing videos and combining these data with data from
some databases, the agent can estimate its user's various
contexts with high accuracy and high speed that do not
seem achievable with conventional methods. These are the
reasons the agent can respond to video retrieval queries of
various forms correctly and flexibly.
1. S.Mann, `WearCam' (The Wearable Camera), In Proc.
of ISWC 1998, 124-131.
7. M.Lamming and M.Flynn, Forget-me-not: intimate
computing in human memory, In Proc. FRIEND21, Int.
Symp. Next Generation Human Interface, Feb.1994
2. J.Healey, R.W.Picard, A Cybernetic Wearable Camera,
In Proc. of ISWC 1998, 42-49.
8.
B.J.Rhodes, The wearable remembrance agent: a
system for augmented memory, In Proc. of ISWC 1997
3. J. Gemmell, G. Bell, R. Lueder, S. Drucker, C. Wong,
MyLifeBits: fulfilling the Memex vision, In Proc. of
ACM Multimedia 2002, 235-238.
9.
N.Kern et al., Wearable sensing to annotate meeting
recordings, In Proc. of ISWC 2002
REFERENCES
4. K.Aizawa,
K.Ishijima,
M.Shiina,
Summarizing
Wearable Video, In Proc. of IEEE ICIP 2001, 398-401.
10. A.Dey et al., The conference assistant : combining
context-awareness with wearable computing, In Proc.
of ISWC 1999
5. Y.Sawahata, K.Aizawa, Wearable Imaging System for
Summarizing Personal Experiences, In Proc. of IEEE
ICME 2003.
11. T.Kawamura, Y.Kono, M.Kidode, Wearable interface
for a video diary: towards memory retrieval, exchange
and transportation, In Proc. of ISWC 2002
6. T.Hori, K.Aizawa, Context-based Video Retrieval
System for the Life-log Applications, In Proc. of MIR
2003, ACM, 31-38.
Figure 11. Interface of the life-log agent for browsing and retrieving life-log videos
Exploring Graspable Cues for Everyday Recollecting
Elise van den Hoven
Industrial Design Department
Eindhoven University of Technology
P.O.Box 513, Den Dolech 2,
5600 MB Eindhoven, The Netherlands
+31 40 247 8360
[email protected]
ABSTRACT
This paper gives a short overview of a four-year PhDproject which concerned several aspects of a device which
helps people to recollect personal memories in the context
of the home. Several studies were done on related topics,
such as: autobiographical memory cuing, using souvenirs
in the home and developing the user-system interaction of a
portable digital photo browser.
Keywords
Everyday
Recollecting,
Ambient
Intelligence,
Recollection-Supporting Device, Digital Photo Browser,
Graspable User Interfaces, Tangible Souvenirs.
INTRODUCTION
Most people are actively dealing with their personal
memories. Take for example a woman who just returned
from a holiday. Probably this person talks about her
experiences with various people, which in fact is the
rehearsal and perhaps the fixation of her holiday memories.
When she refers to other holidays in the same conversation
she is trying to relate her new memories to other existing
memories, therefore she is working on her old memories at
the same time. And there is a fair chance that her listeners
are doing the same thing. Since most people reminisce
everyday and the results of this process shape their
personal histories and thus their identities this is an
important process, which often goes unnoticed.
Today, with the increasing digitalization of memory
carriers, such as digital photos, this remembering or
reminiscing can be aided in ways previously impossible. In
this paper the possibilities of supporting people in dealing
with their memories with increasing digital support are
investigated.
Context
The work described in this paper was done as a four-year
PhD-study [1] both at Philips Research Laboratories
Eindhoven and at the Eindhoven University of Technology.
Currently the author is continuing this work at the
LEAVE BLANK THE LAST 2.5 cm (1”) OF THE LEFT
COLUMN ON THE FIRST PAGE FOR THE
COPYRIGHT NOTICE.
Eindhoven University of Technology as an assistant
professor in the Industrial Design department.
The work was concerned with the topic of supporting inhome recollecting. The content of this work was influenced
by both the project context as well as the industrial context.
The project team decided together on the aim of the work,
which was to build a demonstrator of a “RecollectionSupporting Device”. The industrial context of this project
was that it was part of the Ambient Intelligence research
program at Philips Research.
Paper Outline
The following section of this paper gives an overview of
the abovementioned PhD-thesis, which is followed by
some sections on relevant topics worked out in more detail.
THESIS OVERVIEW
Several studies were performed in order to explore the
wide area of recollecting memories in the home context.
The first study tested with questionnaires how people use
souvenirs in the home. It confirmed that souvenirs can be
seen as external memory and that they are suitable
candidates to be used as tangibles in a graspable user
interface for the Recollection-Supporting Device. The
second study focused on the analysis, design,
implementation and evaluation of a user interface for
browsing and viewing digital photos on a touch screen
device. This user interface consisted of a graphical and a
graspable part, the latter using personal souvenirs as
tangible user interface controls. The research into the use
of tangibles led to an extension of the current Graspable
UI-categorization, which mentioned only so-called
“generic” objects. From the souvenirs it was learnt that
compared to generic objects personal objects have the
benefit that users already have a mental model and the
object is embedded in the user’s personal environment. The
Digital Photo Browser raised some issues on memory
cuing. Therefore, an experiment was conducted which
compared the effect of modality (odor, physical object,
photo, sound and video) on the number of memories people
had from a unique one-day event. During this event all
above-mentioned modalities were present and they were
later used to cue the participants. Against expectation, the
no-cue condition (in effect only a text cue) created on
average significantly more memories than any of the cued
conditions. The given explanation for this effect is that
“specific cues” can make people focus on the perceived
information, whereas text leaves space for reflection. In
view of the inherent qualities to souvenirs, representing a
memento for storing and stimulating memories, the
physical-object cue condition was expected to do better
than it did in practice. Before concluding that this
expectation was not confirmed it was tested whether the
participants in the cuing study indeed viewed their
personally handmade artefacts as souvenirs. It turned out
that most of them did and therefore it had to be concluded
that souvenirs cued fewer memory details than text-only
cues.
All the information from the above-mentioned studies
served as input for the last part of the thesis, which
summarizes guidelines for designers who want to realize a
future Recollection-Supporting Device. This part comprises
a literature overview, a lessons-learned section and some
future directions.
Although this thesis answers a lot of questions about
several aspects of a Recollection-Supporting Device, still a
lot of work has to be done in order to realize one, because
this multidisciplinary area appeared to be rather
unexplored.
AUTOBIOGRAPHICAL MEMORY THEORY
Recollecting
personal
experiences
concerns
Autobiographical Memory (AM), which is defined as
“memory for the events of one’s life” [2]. AM, which is a
part of Long-Term Memory, includes all the memories
people have that have something to do with themselves
including traumatic experiences.
between the cue and the to-be-remembered event. A
combination of cues increases the chance of retrieving a
memory, especially when a subject in a cued-recall
experiment had to perform activities, such as write with a
pen or close a door (e.g. [4]). One example of a memory
cue is a souvenir.
SOUVENIRS
The word souvenir originates from Middle French from
(se) souvenir (de) meaning “to remember”, which again
comes from the Latin word subvenire meaning “to come
up, come to mind”.
From a questionnaire study with 30 participants it was
concluded that many people appeared to have a collection
of souvenirs at home. This collection contained on average
over 50 souvenirs of the following three categories: holiday
souvenirs, heirlooms and gifts. All three categories made
the participants recollect memories when they looked at
their most valuable souvenirs, meaning they serve as
external memory for those people.
Three quarters of the participants brought souvenirs from
their holidays but most of them did not throw away any
during the last year. Eighty percent of the participants
thought self-made objects could be souvenirs. When
participants were asked to name their most valuable
souvenir, only half of these objects were from a holiday.
1.
The construction and maintenance of the self-concept
and self-history, which shapes personal identity;
2.
Regulating moods;
3.
Making friends and maintaining relationships by
sharing experiences;
Neisser (1982) describes a study on external memory aids
used by students. They were asked what aids they used to
remember future or past events and one of the results was
that students do not know which types of external memory
they use, unless they are explicitly mentioned, such as “do
you use diaries for remembering”. This result is consistent
with results found in the investigation presented in this
chapter, because the souvenir-questionnaire participants
did not mention remembering as a function of their
souvenirs. But apparently they did use their souvenirs as
external memory, because when they were asked what
happened when they looked at their most-cherished
souvenirs half of the participants mentioned that memories
popped up or were relived.
4.
Problem-solving based on previous experiences;
DIGITAL PHOTO BROWSER
5.
Shaping likes, dislikes, enthusiasms, beliefs and
prejudices, based on remembered experiences;
6.
Helping to predict the future based on the memories of
the past.
After learning that souvenirs can be used as external
memory to the souvenir owners, it was decided to build a
demonstrator with souvenirs. Together with the project
team it was decided to focus on digital photos and to
implement a Digital Photo Browser (see Figure 1, [5]).
This device and the user interface were designed and
implemented based on requirements which were derived
from a scenario and a focus group. Based on these
requirements a user-interface concept was designed, that
reminds people of their photos by continuously scrolling
them along the display.
According to Cohen [3] six functions of Autobiographical
Memory can be distinguished:
Personal memories are important to people, which can be
derived from the different range of functions, from solely
internal usage, to communication between people.
Cuing memories is one way of retrieving autobiographical
memories. A cue (or trigger) is a stimulus that can help
someone to retrieve information from Long-Term Memory,
but only if this cue is related to the to-be-retrieved memory.
The stimuli most often used in studies are photos, smells or
text labels. But anything could be a cue (a spoken word, a
color, an action or a person), as long as there is a link
The user interface of the Digital Photo Browser (see Figure
2) consists of three areas: 1 - an area on the left which
shows a moving photo roll, 2 - a central area which allows
enlarging individual photos, both in landscape and portrait
format, 3 - an area on the right where icons of the current
user (3a), of other display devices (3b) or of detected
graspable objects (3c) can be shown. The roll (1), which
shows on average eight thumbnails on-screen, consists of
two layers: the first layer shows an overview of all the
albums owned by the current user and the second layer
shows the contents of each album. This second layer is
accessible by clicking on an album-icon, one can return to
the first layer by clicking the “back”-button.
In short the Digital Photo Browser is a portable and
wireless touch-screen device. The user can interact via
touch (drag and drop) and the physical objects. These
objects are RFID-tagged and are recognized when placed
on a special table. Immediately the corresponding photos
are shown on the portable device. Via a simple drag and
drop the photo can be enlarged on this device or any other
screen which is available for viewing photos.
CUING AUTOBIOGRAPHICAL MEMORIES
From the AM-theory and from observing people using the
Digital Photo Browser it was learned that the ideal
“recollection-supporting” device cannot and should not
contain the memories, but the cues to the memories. That is
why the suitability of several types of recollection cues
including photos, audio, video, odor, and graspable objects
was investigated [7,8]. In order to test this, 70 participants
joined in a standardized real-life event and one month later
they were cued in a laboratory living room setting, when
filling out questionnaires, either without a cue or with a
photo, object, odor, audio or video cue. In addition, a
special method was developed in order to analyze the
number of recollection details in these written free recall
accounts.
Fig. 1. The Digital Photo Browser and some souvenirs in
an intelligent living room.
When brought into an intelligent room, the implemented
Digital Photo Browser is able to recognize the presence of
people, graspable objects, and available output devices.
Since souvenirs are suitable for use in a Graspable User
Interface and they have the ability to cue recollections,
souvenirs are used as shortcuts to sets of digital photos.
(This similar interaction was presented in scenarios of the
POEMs project, which stands for Physical Objects with
Embedded Memories [6].)
Although this study is presumably the first to investigate a
real-life event, which compares quantitatively recollections
across different media types, it is perhaps also the first to
find a negative effect of cues on the number of memories
produced compared to a no-cue situation. Because the main
result from this study shows that the no-cue condition for
the recall of a real-life event generated significantly more
memory details compared to any of the cue-conditions
(object, picture, odor, sound and video). This is against
expectation, since the encoding specificity principle, which
states that environmental cues that match information
encoded in a stored event or memory trace cue recollection
of the complete memory (see [9] for an overview on
context-dependent memory), and several other studies (see
[1] for an overview) do predict and show a positive cuing
effect on memory recall. In order to explain this result, it is
hypothesized that cues might have a filtering effect on the
internal memory search resulting in fewer memories
recalled with a cue compared to the no-cue condition.
LESSONS LEARNED
The recommendations of Stevens et al. [10] which were
derived from their study but some of them were
independently uncovered in the work presented in this
paper, will be mentioned here, since they are important for
the design of a recollection-supporting device:
Fig. 2. A sketch of the Digital Photo-Browser user interface
(for an explanation see text).
•
Develop the process of annotating or organizing
memories into an activity of personal expression.
•
Make the inclusion of practically any object possible.
•
Bring the interaction away from the PC.
•
Develop “natural” interactions (i.e. touch and voice).
•
Encourage storytelling at any point.
•
Assure the capability of multiple “voices”.
•
Create unique experiences, especially for creating and
viewing annotations.
The design recommendations given by Stevens et al. [10]
were the starting point for the lessons learned mentioned
here, which are based on all the chapters in the thesis [1]:
•
Include
Device.
•
Souvenirs should be used as tangibles in a Graspable
User Interface of a RSD.
•
Support the personal identity of the user and the
communication to other people.
•
More media types than just text should be used in the
RSD.
•
The RSD should not pretend to know the truth, since
this might interfere with the needs of the user.
•
Create a metadata system that can be changed easily
by the user.
souvenirs
in
a
Recollection-Supporting
ACKNOWLEDGMENTS
The author would like to thank her supervisors: prof.
Eggen, prof. Kohlrausch and prof. Rauterberg. And in
addition the other members of the project team that created
the Digital Photo Browser: E. Dijk, N. de Jong, E. van
Loenen, D. Tedd, D. Teixeira and Y. Qian.
REFERENCES
1. Hoven, E. van den (2004). Graspable Cues for
Everyday Recollecting, Ph.D. thesis, Eindhoven
University of Technology, The Netherlands, May 2004,
ISBN 90-386-1958-8.
2. Conway, M. A. and Pleydell-Pearce, C. W. (2000). The
Construction of Autobiographical Memories in the Self-
Memory System, Psychological Review, 107 (2), 261288.
3. Cohen, G. (1996). Memory in the real world, Hove,
UK: Psychology Press.
4. Engelkamp, J. (1998). Memory for actions, Hove, UK:
Psychology Press.
5. Hoven, E. van den and Eggen, B. (2003). Digital Photo
Browsing with Souvenirs, Proceedings of the
Interact2003 (videopaper), 1000-1004.
6. Ullmer, B. (1997). Models and Mechanisms for
Tangible User Interfaces, Masters thesis, MIT Media
Laboratory, Cambridge, USA.
7. Hoven, E. van den and Eggen, B. (2003). The Design of
a Recollection Supporting Device: A Study into
Triggering Personal Recollections, Proceedings of the
Human-Computer Interaction International (HCI-Int.
2003), part II, 1034-1038.
8. Hoven, E. van den, Eggen, B., and Wessel, I. (2003).
Context-dependency in the real world: How different
retrieval cues affect Event-Specific Knowledge in
recollections of a real-life event, 5th Biennial Meeting of
the Society for Applied Research in Memory and
Cognition (SARMAC V), Aberdeen, Scotland, July
2003.
9. Smith, S. M., and Vela, E. (2001). Environmental
context-dependent memory: A review and metaanalysis, Psychonomic Bulletin and Review, 8 (2), 203220.
10. Stevens, M. M., Abowd, G. D., Truong, K. N., and
Vollmer, F. (2003). Getting into the Living Memory
Box: Family Archives & Holistic Design, Personal and
Ubiquitous Computing, 7 (3-4), 210-216.
Remembrance Home: Storage for re-discovering one’s life
Yasuyuki KONO
Graduate School of Information Science, NAIST
Keihanna Science City, 630-0192, JAPAN
[email protected]
http://ai-www.aist-nara.ac.jp/~kono/
ABSTRACT
Remembrance Home is a project for supporting one's
remembrance throughout his/her life by employing his/her
house as storage media for memorizing, organizing and
remembering his/her everyday activity. The Remembrance
Home stores his/her everyday memories which consist of
digital data of both what he/she has ever seen and what
he/she has ever generated. He/she can augment his/her
memory by passively viewing slide-shown images played
in ubiquitously arranged displays in the house. The
experiments have shown that the prototype system that
contains over 570,000 images, 35,000 titles of hypertext
data, and 250,000 of hyperlinks among them, augments
his/her remembering activity.
Kaoru MISAKI
office ZeRO
2-25-27 Motoizumi, Komae 201-0013, JAPAN
[email protected]
http://homepage3.nifty.com/misaki_kaoru/
the house.
Author Keywords
The project started in the year 2000. We have employed
Kaoru Misaki’s house as the prototype. It is equipped with
several LCDs and some video projectors embedded into
walls and furniture. These display devices continuously and
automatically slide-show digitized and stored still images of
his life-slice, e.g., photos, books, notebooks, and letters.
The number of the images exceeds 570,000 and is
increasing by 20,000 per month in average. The hyperlink
structure that currently consists of the images and 35,000
titles of texts he has ever written is getting larger by his rediscovering activities of his past triggered by the slideshown images. We have empirically found that browsing
one’s past by passively viewing digitized images activates
his/her remembrance activity.
Augmented Memory, Remembrance Home, LifeLog,
Passive Browsing
THE REMEMBRANCE HOME PROJECT
Overview of Lifetime Memories
INTRODUCTION
The Remembrance Home stores one’s (and his/her
family’s) digitized lifetime memories. Information
technologies would provide virtually unlimited storage for
storing one’s life, i.e., both what he/she has ever
experienced/seen (documents, photos, movies, graphics,
books, notes, pictures, etc.) and generated (text articles,
drawings, etc.). The digital record must enrich his/her life
because he/she augments his/her memory by re-discovering
his/her past experience. A house can be media for
storing/recording its residents’ memories, e.g., portraits on
furniture, children’s doodles on a wall, in all ages. Viewing
a record in the real world triggers off one’s remembering of
the experiences associated with it. The Remembrance
Home is a prototype house of next-era for augmenting
human memory by naturally integrating digital devices into
Kaoru Misaki’s house 1 was rebuilt to set his lifetime
memory storage and to install a memory browsing and rediscovering environment. His lifetime memory consists of
everything that either he has ever seen or he has ever
generated, and that can be digitized. What he has ever seen
mainly consists of 1) photo images he has ever taken, and
2) paper materials, he had stored either in his house or his
parents’ house, such as books, magazines, leaflets,
textbooks, and letters. The paper materials were taken apart
into sheets and each page is digitized as a JPEG image file
by digital scanners. What he has ever generated mainly
consists of 1) digitally written documents such as articles,
diaries, and e-mails, and 2) paper materials he wrote/drew
such as diaries, letters, articles, and notebooks, that were
also digitized into JPEG images.
Digitizing of paper materials have been outsourced and is in
progress. The number of digitized images increases about
20,000 files a month. Because the pace of the increase is so
rapid and the digitizing has not been performed by himself,
it is impossible to make symbolic annotations to each image
synchronously with its digitizing process as is performed in
MyLifeBits project [1]. The images and texts are manually
1
He is a technical journalist who usually works in his
library to write articles.
linked with each in his daily activity. Whenever he is
inspired by viewing digitized images, he manually
establishes hyperlinks between the images and associable
texts. His lifetime memory consists of over 570,000 images,
35,000 titles of texts, and 250,000 of hyperlinks among
them. About 100,000 of the images are digital photos which
are newly taken and the rest are scanned materials that
belong to his past. The data structure is on the BTRONbased environment where the user can easily establish
hyperlinks among data on its GUI.
Figure 1. LCDs embedded into the library desk. The left
display slide-shows digitized images.
Figure 4. Bookshelf before the project.
Figure 2. LCDs settled in the dining-kitchen.
Figure 5. Current bookshelf. Papers have gone away.
Furnishings and Their Settings
Figure 3. PC Screen projected on the wall in the library.
In one’s living space, necessary but uncomfortable objects
for daily living such as paper files, computing devices,
cables, and audiovisual equipments,
should be
transparent/invisible to him/her. In the Remembrance Home,
computers, storages, audiovisual equipments, and most of
keyboards are set under the floor. Most cables are
embedded in walls and ceilings. Several LCDs and some
video projectors are ubiquitously settled in the house so as
to naturally merge into the environment (See Figure 1-3).
The amount of documents in bookshelves/cabinets has
extremely reduced, because they were discarded after the
digitizing (See Figure 4-5).
hyperlinked from existing text data with additional text
annotation. Sometimes inspired by the browsed image, he
created a new text file to write down the re-discovered
event by detecting the era, the year, the month, or the day
associated with it. We call such kind of hypertexts that
make mention of re-discovered his past experiences the
“past diary.” Figure 7 shows an example of a digitized
image.
Elementary School Times, 1978-1980, 1981-1983, 1984, 1993, 1985, 1986,
1987, 1988, 1989, 1992, 1994
Figure 8. List of past diary titles before Apr. 2002.
1965-1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977,
1977/06, 1977/07, 1977/08, 1977/09, 1977/10, 1978, 1978/04, 1978/08,
1979, 1979/08, 1979/12, 1980, 1980/02, 1980/04, 1980/05, 1980/06,
Figure 6. Embedded hyperlinks in a past diary text.
1980/07, 1980/08, 1980/09, 1980/10, 1980/11, 1980/12, 1981, 1981/02,
1981/03, 1981/04, 1981/06, 1981/07, 1981/08, 1982, 1982/04, 1982/10,
1983, 1984/06, 1984/11, 1984/12, 1988/05, 1988/06
Figure 9. List of past diary titles created after Apr. 2002.
Developing Memory Browsing Environment
Although symbolic annotation creation is crucial for active
browsing of non-text media [2], rapid increase of scanned
images prevented him from on-time annotation creation.
The time interval of most of the images, e.g., the day he got
the original material, the day the original material was
distributed, or the day of the event it reported, was
ambiguous/unknown. It takes around 36 days to merely
view all the 570,000 images for 2 seconds each, if he
spends 8 hours for the task per day.
Figure 7. Example of digitized image (a notebook page of a
class when he was a high school student).
WORKING/LIVING IN THE REMEMBRANCE HOME
The Remembrance Home Project started in the year 2000
by Kaoru Misaki. Paper materials have been continuously
digitized month by month and stored into a Windows-based
file system. At the beginning of the project, he had daily
diary text data started in June 1986 stored into the BTRONbased file system that is suitable for making annotations
and hyperlinks among data (see Figure 6). Each digitized
image (page) was originally annotated by the following two
features: 1) the day and the time it was scanned as the timestamp of the image file, and 2) the title of the set of pages
as the folder name such as the book title. Additionally, the
following digital data were kept in the storage in average:
(a) 20 e-mail texts a day, (b) 10 web pages a day, and (c)
100 digital photo images a day. In the early stage of the
project, each image was manually browsed by him and was
The passive browsing method where he views periodically
slide-shown images is employed, after active browsing
where he actively selected a folder and viewed thumbnails
in it was applied. Active browsing became harder as the
number of images increased. The difficulty prevented his
motivation for re-discovering in April 2002, when the
number exceeded 100,000. We have employed “JPEG
Saver,” a freeware screensaver, for randomly showing
images into ubiquitously settled screens in the house [3].
By switching the browsing style, his re-discovering activity
has extremely activated. Inspired by randomly and daily
shown images, he has re-discovered his past experiences
step by step. His past diary has become more detailed and
accurate. By remembering the detail of each his past
experience, he has discriminated his diary files into months
while his past diary was divided by years or school-times
before the passive method was employed (See Figure 8 and
9). Before April 2002, he had 129 diary texts among which
12 (9%) were past diaries. The total size of these diary texts
was approximately 230K bytes. After the switch, he has
created 68 diary files among which 48 (72%) are past
diaries. The total size of these diary texts becomes
diary texts. This indicates that passively viewing slideshown images contextually associated with past experiences
explosively activates his past-rediscovering activity, i.e.,
referring to past experiences and annotating the past diary.
approximately 855K bytes. Furthermore, 33 files (66%) of
the past diaries are divided by month, i.e., each title
contains not only the year but the month, as depicted in
Figure 9.
Most of digital contents in the storage is in either text or
JPEG image format. A photo image captures
surroundings with object(s) of the photo in
general. By repeatedly viewing an image
within certain period of time, viewer’s
intention moves into detail of surroundings.
Such transition must activate his further pastrediscovering activity.
1 6 0 00 0 0
G enerated past diaries (bytes) per m onth
1 4 0 00 0 0
To tal size (bytes) of past diary
1 2 0 00 0 0
1 0 0 00 0 0
8 0 00 0 0
CONCLUDING REMARKS
6 0 00 0 0
4 0 00 0 0
2 0 00 0 0
0
/1
7
4
/0
03
/0
03
20
20
0
1
03
/1
/0
03
20
20
4
7
/0
02
/0
02
02
20
20
0
1
/0
20
7
/1
/0
01
02
20
20
1
4
/0
01
20
0
/0
01
20
7
/1
01
/0
00
00
20
20
4
1
/0
/0
00
20
20
7
0
/1
00
99
19
20
1
4
/0
/0
99
19
0
/0
99
/1
98
99
19
19
7
4
/0
/0
98
19
19
0
1
98
/0
/1
98
19
97
19
19
7
/0
97
19
19
97
/0
4
0
Figure 10. Total size of past diary texts.
8000
7000
Total # of hyperlinks from diray texts
Total # of hyperlinks to past diary texts
6000
5000
4000
3000
This paper introduced the concept of the
Remembrance Home that supports one's
remembrance throughout his/her life by
employing his/her house as storage media for
memorizing, organizing and remembering
his/her everyday activity. This paper also
described
the
design
and
current
implementation of the Remembrance Home.
We have been digitizing Kaoru Misaki’s
lifetime memories and storing into the house.
The memories must be one of the biggest
personal and digital memory archive, although
large scale social and digital logging projects
are in progress [5, 6]. Passively viewing the
memories augments his memory and activates
his past-rediscovering activity. The digitizing
is still in progress over 100 times faster than
that of MyLifeBits [2]. The Remembrance
Home is going to store around 3 million
images in 10 years.
We are also planning to enhance triggers for
one’s remembrance activity from PC screens
to the real world, by providing means for
1000
hyperlinking among his external memory
elements and real world indexes. As depicted
0
in Figure 5, paper materials have gone away
by the project. It means that contexts have
been replaced into symbolic annotations and
that
only
indexical
objects
whose
Figure 11. Total numbers of hyperlinks from/to past diary texts.
shapes/existences have some meanings for him
are left. We should have means for annotating
Explosion of Past-Rediscovering Activity
one’s memory by beings in the real world. We have already
We have empirically found that browsing his past by
proposed the framework for memory albuming systems,
passively viewing digitized images in daily life extremely
named SARA, that employs real world objects as media for
activates his past-rediscovering activity. By switching the
augmenting human memory, by providing its users with
style of browsing digitized images to passive one, his
functions for memory retrieval, transportation, editing, and
description in the past diary has been more detailed as
exchange [4]. We believe that integrating the framework
mentioned above. Figure 10 shows the trend of total size of
into the Remembrance Home brings us a new vision for
past diary texts. This indicates that the amount of
both augmenting one’s memory. The Remembrance Home
description on his past experiences extremely increased
must also provide its family members with the means for
after Apr. 2002, i.e., the total size of past diary texts is
sharing digitally augmented memories.
approximately 4 times than that before the switch. Figure
11 shows the trend of numbers of hyperlinks from/to past
19
19
97
/0
4
97
/
19 07
97
/
19 10
98
/
19 01
98
/
19 04
98
/
19 07
98
/
19 10
99
/
19 01
99
/
19 04
99
/
19 07
99
/
20 10
00
/
20 01
00
/
20 04
00
/
20 07
00
/
20 10
01
/
20 01
01
/
20 04
01
/
20 07
01
/
20 10
02
/
20 01
02
/
20 04
02
/
20 07
02
/
20 10
03
/
20 01
03
/
20 04
03
/
20 07
03
/1
0
2000
REFERENCES
1. MyLifeBits Project.
http://research.microsoft.com/research/barc/MediaPrese
nce/MyLifeBits.aspx
2. Gemmell, J., Lueder, R., and Bell, G. Living with a
Lifetime Store. Proc. ATR Workshop on Ubiquitous
Experience Media, Sept. 2003.
http://www.mis.atr.jp/uem2003/WScontents/dr.gemmell
.html
3. JPEG Saver.
http://hp.vector.co.jp/authors/VA016442/delphi/jpegsav
erhp.html. (in Japanese)
4. Kono, Y., Kawamura, T., Ueoka, T., Murata, S. and
Kidode, M. Real World Objects as Media for
Augmenting Human Memory, Proc. Workshop on
Multi-User and Ubiquitous User Interfaces (MU3I
2004), 37-42, 2004. http://www.mu3i.org/
5. American Memory. http://memory.loc.gov/
6. Wikipedia. http://en.wikipedia.org/wiki/Main_Page
An Object›centric Storytelling Framework Using Ubiquitous
Sensor Technology
Norman Lin
ATR Media Information
Science Laboratories
Seika›cho, Soraku›gun,
Kyoto 619›02 JAPAN
[email protected]
Kenji Mase
Nagoya University
Furu›cho, Chigusa›ku,
Nagoya City 404›8603
JAPAN
[email protected]›u.ac.jp
Yasuyuki Sumi
Kyoto University
Yoshida›Honmachi,
Sakyo›ku, Kyoto 606›8501
JAPAN
[email protected]
ABSTRACT
Using ubiquitous and wearable sensors and cameras, it is possible to capture a large amount of video, audio, and interaction
data from multiple viewpoints over a period of time. This paper
proposes a structure for a storytelling system using such captured data, based on the object-centric idea of visualized object
histories. The rationale for using an object-centric approach is
discussed, and the possibility of developing an observational
algebra is suggested.
Author Keywords
ubiquitous sensors, storytelling, co-experience, experience
sharing
ACM Classification Keywords
H.5.1. Information Interfaces and Presentation (e.g., HCI): Multimedia Information Systems
INTRODUCTION
Previous work has developed an ubiquitous sensor room and
wearable computer technology capable of capturing audio, video,
and gazing information of individuals within the ubiquitous
sensor environment[4 6]. Ubiquitous machine-readable ID tags,
based on infrared light emitting diodes (IR LED’s), are mounted
throughout the environment on objects of interest, and a wearable headset captures both a first-person video stream as well
as continuous data on the ID tags currently in the field of view
(Figure 1). The data on the ID tags currently in the field of view
represents, at least at a coarse level, which objects the user was
gazing at during the course of an experience.
In this paper, we propose using an object-centric approach to
organizing and re-experiencing captured experience data. The
result should be a storytelling system structure based on visualized object histories. In the following sections we explore
what is meant by an object-centric organizational approach,
and present a structure for a storytelling system based on this
object-centric idea.
Figure 1: Head-mounted sensors for capturing video and
gazing information.
GOALS
The long-term goal of this research is to develop a paradigm
and the supporting technology for experience sharing based on
data captured by ubiquitous and personal sensors. In a broad
sense, the paradigm and technology should assist users in (a)
sharing, (b) contextualizing, and (c) re-contextualizing captured
experiences. Ubiquitous sensors should automatically capture
content and context of an experience. A storytelling system
should then allow users to extract and interact with video-clip
based representations of the objects or persons involved in the
original experience, in a virtual 3D stage space (Figure 2).
AN OBJECT-CENTRIC APPROACH
The central structuring idea is to focus on objects – physical
artifacts in the real world, tagged with IR tags and identifiable via gazing – as the main mechanism or agent of experience generation. Other projects using objects to collect histories include StoryMat[5] and Rosebud[2]; also, [7] discusses
the importance of using objects to share experience. By focusing on objects in this way, ubiquitous and wearable sensors and
cameras allow the capturing and playback of personalized object histories from different participants in the experience. An
object “accumulates” a history based on persons’ interactions
with it, and the ubiquitous sensor and capture system records
this history. A storytelling system should allow playback and
sharing of these personalized object histories. By communicat-
original context of certain objects with respect to a personalized
experience.
Re-contextualization, on the other hand, would involve using
object A in a new contextstorytelluppose that a second user
never saw object A and object B together, but instead saw object A and object C together. From this second user’s personal
perspective, A and C are related, but from the first person’s personal perspective, A and B are related. By allowing video clips
of A, B, and C to be freely combined in a storytelling environment, and by comparing the current context with pre-recorded
and differing personal contexts, we allow the storytellers and
audience to illustrate and discover new perspectives, or new
contexts, on objects of interest. New stories and new contexts
about the objects can be created by combining their captured
video histories in new ways.
Figure 2: Virtual 3D stage space for storytelling using visualized object histories.
ing personalized object histories to others, personal experience
can be shared.
Storytelling: Visualizing and Interacting with Object Histories
Having captured video of personalized object histories, we would
like to allow the interaction with those objects and their personalized histories in a multi-user environment to facilitate storytelling and experience sharing. Currently, a 3D stage space
based on 3D game engine technology is being implemented
(Figure 2). Within this 3D stage space, users can navigate
an avatar through a virtual “stage set” and interact with videobillboard representations of objects in the captured experiences.
Contextualization and Re-contextualization
Objects are typically not observed or interacted with in isolation; instead, objects are typically dealt with in groups. In terms
of the physical sensor technology, this means that when one
(tagged) object A of interest is being gazed at, another object
B which is also in the current field of view becomes associated
with object A. This is one form of context: object A was involved with object B during the course of the experience. The
current working thesis is that an object-centric storytelling system should remember the original context, but should also separate or loosen an object from its original context. The reasoning
is that by remembering but loosening the original context, we
can both remind the user of the original context (contextualization) as well as allowing the user to reuse the object in another
context (re-contextualization).
Concretely, for instance, consider the case where we have two
objects, object A and object B, which are recorded on video by
a personal head-mounted camera, and which are seen simultaneously in the field of view. Later, if the storyteller plays back
a video clip of object A in order to share his experience about
object A with someone else, the storyteller should be reminded
in some way that object B is also relevant, because it was also
observed or gazed at with object A. This is what is meant by
system support for contextualization of experience. Essentially,
the storytelling system serves as a memory aid to remember the
Towards an Algebra of Observations
The object-centric idea presented above is that an object accumulates history, and that this object history is an agent for
generating experience and an agent for transmitting experience
to others. Part of this idea is that not only do individual objects have experiential significance, but also groups of objects
carry some semantic meaning. A group of objects can be considered to be a “configuration” or a “situation” - in other words,
a higher-level semantic unit of the experience. An object’s
history should be associated with the situations in which that
object was seen. For example, the fact that objects A and B
are observed together by a user means that object A is related,
through situation AB, with object B. The situation AB is the
higher-level grouping mechanism which relates objects to one
another through their common observational history.
If we accept this, then, just as we can speak of the history of an
object, we can also speak of the history of a situation. Just as
we can relate objects with one another, we can relate situations
with one another, or objects with situations. These situations
can furthermore be grouped into even larger situations. This
leads to a sort of hierarchy or continuous incremental experience structure, and suggests the possibility of developing an
algebra for describing and reasoning about observations, situations, and higher-level groups. As an example of the kind of
questions which such an observational algebra might answer,
consider the case of three objects A, B, and C. User 1 observes
objects A and B together, forming situation AB. User 2 observes objects B and C together, forming situation BC. Then, in
the storytelling environment, user 1 and user 2 collaboratively
talk about objects A and C together, forming new situation AC.
What then is the relationship among situations A, AB, B, BC,
C, and AC? Future work will explore this idea further.
The Value of Context
By capturing the context of objects as observed, we provide for
the later possibility to understand the original context of objects
or object groups. We aim to answer questions of the following
forms: In what situations was a particular object involved? In
what situations was a particular group of objects involved? To
what degree are other situations related to the currently chosen
object or object group? By capturing context and defining a
comparison metric, these types of questions can be answered.
The value of answering these questions is that it provides an
intuitive, object-centric way of understanding, organizing, and
telling stories about experience. It also provides a method of
showing both strong and weak relations to other parts of the
experience. The MyLifeBits project [1] also emphasizes the
importance of “linking” to allow the user to understand context
of captured bits of experience.
As an example, one can imagine a person who works as a home
decorator, who uses a variety of furnishings meant to decorate
the interior of a home. The decorator has built up an experience of creating several different configurations of objects in
different situations. When trying to create a new decoration,
it can be useful to try to group objects together (e.g. potted
plant, shelf, and lamp), then see what previous situations, from
the personalized experience corpus, have used this object group
before. When illustrating to a client the decoration possibilities, the decorator, as a storyteller, could select candidate object
groups and tell a story (by playing back related, captured video
clips in a 3D stage space) about how those object groups have
been used in previous designs. This also points out the value
of using other persons’ experience corpora, as it can provide
new and different perspectives on how those objects might be
combined.
The preceding example raises a subtle point not yet addressed,
namely, that object types, and not just objects themselves, can
also be important in classifying experience. In the above example, the decorator may be less interested in the history of
one particular furnishing (e.g. one particular potted plant), but
rather may be more interested in past related experiences using
some types of furnishings (e.g. past decoration designs using
potted plants in general). On the other hand, there are also situations where we are indeed interested in the history of one
particular instance of an object. For instance, if a home decorator is involved with regularly re-decorating several homes,
then within one specific home it can be useful to understand
the specific object history of a specific furnishing.
A STORYTELLING SYSTEM
Based on the previous object-centric paradigm for experience,
this section presents a proposed structure for a storytelling system using visualized object histories. Core technology for this
system is currently being implemented. The structure consists
of five phases, each of which will be described separately. The
five phases are capture, segmentation, clustering, primitive extraction, and storytelling.
Experience
Experience
Experience
Experience
Experience Map
Story planning
Storytelling
Figure 3: Role of the experience map in the storytelling process. Objects are grouped into situations based on subjective observation. The experience map shows the situations
and allows planning and telling a story about the objects in
the situations.
captured video into equally-sized segments. The second approach is to define an observation vector for each instant1 of
the captured video, and to cause formation of a new video segment whenever the observation vector “significantly” changes.
The observation vector for an instant is the set of all objects
observed by the user during that instant. Therefore, this second approach reasons that whenever the set of observed objects
“significantly” changes, that a new situation has in some sense
occurred.
The reason that two approaches have been considered is that
the two approaches tend to yield units (video clips) of different lengths. With the first approach, all resulting video clips
are of equal, and short, length. With the second approach,
resulting video clips tend to be longer; a new clip starts only
when the situation changes. The first approach is more likely
to uncover "hidden" patterns in the data because it imposes little structure on the data; the second approach introduces some
sort of algorithmic bias, due to the more complicated decision
on when a segment ends, but the hope is that this will yield
longer, more semantically meaningful segments. The reason
that longer video segments may be desirable is that they may
serve as a better basis for extracting useful primitives which
can be used in storytelling.
Clustering
Capture
In the capture phase, a user captures an experience by wearing
a head-mounted camera which records a video stream as well
as continuous gazing data, in other words, which tagged objects
were seen at any particular point in time during the experience.
A tagged object can be either an inanimate artifact or another
person; the only importance is that the object has a tag on it so
that it can be recognized and recorded by the capture system.
Segmentation
The goal of the segmentation phase is to break up the captured
video data into chunks or segments which can then be compared (the comparison measure is discussed in the next section)
and clustered. Two main approaches have been developed for
the segmentation. The first approach is simply to divide the
Given the segments from the previous segmentation step, the
clustering phase compares the segments and clusters them together. The idea is that groups of similar segments form situations. Again, this is based on the object-centric organizing
principle discussed earlier. For each segment under consideration, we first generate the observation vector over the entire
time interval of that segment. An observation vector for a particular interval of time is a binary-valued vector representing,
for all objects under consideration, whether that object was
seen or not. Then, we use a clustering algorithm to compare
the similarity of segments by comparing their observation vectors. To compare similarity of observation vectors, we have
1
Technically, due to sampling issues, the observation vector cannot
be measured instantaneously, but is instead aggregated over a small
time-slice with an epsilon duration.
in the 3D stage space by generating an observation vector, just
as is done in physical space during experience capture; this virtual observation vector defines a current “storytelling situation”
which can be mapped onto the experience map to illustrate the
storyteller’s “location” in conceptual “story space.”
CONCLUSION
Given an ubiquitous sensor room capable of capturing video,
audio, and gazing data, this paper described the use of an objectcentric approach to organizing and communicating experience
by visualizing personalized object histories. A storytelling system structure based on this object-centric idea was proposed.
Core technology for this storytelling system is being developed,
and work continues on gaining insight into a reasoning framework or algebra for observations.
ACKNOWLEDGEMENTS
Figure 4: A sample experience map illustrating clusters
forming situations.
chosen to use the Tanimoto similarity measure[3, p. 16-17],
ST (a, b) = (a · b)/((a · a) + (b · b) − (a · b)), with the · operator
representing the inner dot product. This essentially is the ratio
of common elements to the number of different elements.
This research was supported by the Telecommunications Advancement Organization of Japan. Highly valuable contributions to this work were made by Sadanori Itoh, Atsushi Nakahara, and Masahi Takahashi.
REFERENCES
1. J. Gemmell, G. Bell, R. Lueder, S. Drucker, and C. Wong.
Mylifebits: Fulfilling the memex vision, 2002.
2. J. Glos and J. Cassell. Rosebud: Technological toys for
storytelling. In Proceedings of CHI 1997 Extended
Abstracts, pages 359–360, 1997.
Clusters represent situations in the original captured experience; they are groups of video segments, each involving a similar set of objects. By displaying the clusters on a 2D map, and
by mapping similar clusters close to each other, we can create a “map” of the experience which can serve as a structural
guide and memory aid during storytelling. Figure 3 shows the
conceptual role of the experience map, and Figure 4 shows a
sample interactive experience map created using magnetic attractive/repulsive forces.
3. Teuvo Kohonen. Self-Organizing Maps. Springer-Verlag
Berlin Heidelberg, 1995.
Primitive Extraction
6. Yasuyuki Sumi. Collaborative capturing of interactions by
wearable/ubiquitous sensors. The 2nd CREST Workshop
on Advanced Computing and Communicating Techniques
for Wearable Information Playing, Panel "Killer
Applications to Implement Wearable Information Playing
Stations Used in Daily Life", Nara, Japan, May 2003.
Given the clusters from the previous clustering step, the primitive extraction phase aims to extract reusable video primitives
from the situation clusters. By “reusable” we refer to the “loosening of context” discussed earlier. We aim to extract temporal
and spatial subsets of the video which can be used in a variety
of contexts to tell many stories relating to the original captured
experience. The output of this phase should be a pool of video
clips which represent object histories. This phase requires human intervention to decide which video clips from the situation
clusters are representative of the experience and which have
communicative value.
Storytelling
In this final phase, a storyteller uses the video primitives extracted from the previous phase to tell a story to others. Within
a virtual 3D stage space, video billboards representing the objects are placed in the environment and can be moved and activated by storytellers or participants in the space. Video billboards of objects can be activated in order to play back their
object histories which were extracted in the primitive extraction phase. The current object configuration can be measured
4. Tetsuya Matsuguchi, Yasuyuki Sumi, and Kenji Mase.
Deciphering interactions from spatio-temporal data. ISPJ
SIGNotes Human Interface, (102), 2002.
5. Kimiko Ryokai and Justine Cassell. Storymat: A play space
with narrative memories. In Intelligent User Interfaces,
page 201, 1999.
7. Steve Whittaker. Things to talk about when talking about
things. Human-Computer Interaction, 18:149–170, 2003.
Storing and Replaying Experiences in Mixed
Environments using Hypermedia
Nuno Correia1
Luis Alves1
Jorge Santiago1
Luis Romero1,2
[email protected]
[email protected]
[email protected]
[email protected]
1
Interactive Multimedia Group, DI and CITI
New University of Lisbon
Portugal
ABSTRACT
This paper describes a model and tools to store and replay
user experiences in mixed environments. The experience is
stored as a set of hypermedia nodes and links, with the
information that was displayed along with the video of the
real world that was navigated. It uses a generic hypermedia
model implemented as software components developed to
handle mixed reality environments. The mechanisms for
storing and replaying the experience are part of this model.
The paper presents the goals of the system, the underlying
hypermedia model, and the preliminary tools that we are
developing.
Keywords
Store/replay user experience, mixed reality, hypermedia,
video.
INTRODUCTION
Storing photos, videos, and objects that help to remember
past experiences is an activity that almost everyone has
done at some point in their lives. Sometimes these materials
are also augmented with annotations that help to remember
or add personal comments about the situation and events
that took place. The content that is stored is mostly used to
remember the events but also to compose them in new
ways and create new content. This activity is becoming
increasingly dependent of technological support and
multiple media can currently be used. In a mixed reality
environment, where users are involved in live activities, the
replay and arrangement of such experiences is definitely a
requirement. In mixed reality, people can participate in
gaming or exploration activities either alone or involving
other people and this is a perfect setting for generating
interesting activities that people want to remember at a later
time.
Previous work in this area, in mixed reality, includes [7].
Other related systems that help to store and retrieve
previous related work involving storing user annotations or
repurposing of captured materials include [2, 5, 6]. The
2
School of Technology and Management
Viana do Castelo Polytechnic Institute
Portugal
Ambient Wood project described in [7] introduces an
augmented physical space to enable learning experiences
by children that take readings of moisture and light levels.
The activities of the children that participate are recorded in
log files. These log files are later replayed to enable further
reflection on the experiences they had in the physical
augmented outdoor environment.
In [6] the authors present a system for capturing public
experiences and personalizing the results. The system
accepts the different streams that are generated, including
speaker slides and notes, student notes, visited Web pages,
and it stores all this information along with the timestamps
that enable synchronization. The playback interface has
features for rapid browsing enabling to locate a point of
interest in the streams that were captured.
VideoPaper [2] is a system for multimedia browsing,
analysis, and replay. It has been used for several
applications including meetings, news, oral stories and
personal recordings. The system captures audio and video
streams and key frames if slides are used. VideoPaper uses
this data to produce a paper document that includes
barcodes that can give access to digital information. This
information can be accessed in a PDA or in a PC connected
to a media server.
The SHAPE project [3] had the goal of designing novel
technologies for interpersonal communication in public
places, such as museums and galleries. The users can learn
about antique artifacts and their history. Although the main
focus of the system is not on replay of experiences some of
the features are related. The users can search for objects in
a physical setting and their positions are tracked. Later they
can continue their exploration and obtain more information
about the objects that they were searching in a mixed
reality setting, with projection screens.
This area of research, storing the memory of past
experiences, was also identified as one of the “Grand
Challenges for Computing Research” [1]. The workgroup
that produced the report identified several problems related
with data storage and analysis, interaction with sensors,
human computer interaction, that will shape future research
in this area and that are key questions for the work that we
describe here.
This paper presents an approach for storing the user
experience using hypermedia structures, much in the way
the history mechanism (that allows accessing previously
visited nodes) is used in Web browsers. In this case the
activities take place in the real world, may involve
accessing digital documents and entering and navigating in
virtual worlds. The paper presents the scenario of usage,
the underlying hypermedia model, the specific mechanism
for storing/replaying, and the tools that we are developing
to support these mechanisms.
Events
SCENARIO
The position of a user in the space can also define an
interest point. If the space has several subspaces (rooms,
floors) moving from one to another will generate an event.
Whenever an event is generated, new information is
displayed, and the interface changes. A location event does
not necessarily generate a change in the information and it
can also occur in virtual spaces. In the physical space,
where the user is, there are interest points that are detected
by the system. When one of these points is detected new
information is displayed in the mobile device of the user.
When this point of interest is no longer detected the
information ceases to be available unless this was a manual
choice from the user. An information block that is
displayed, as a result of an event, can be browsed by the
user, thus originating a change in the content. Each
navigation action made by the user creates a new event.
The scenario of usage is a physical space, e.g. a museum,
an art gallery, or even an outdoor space, where there are
several interest points, e.g. paintings, objects, detected by
the system. The user carries a portable wearable system that
is able to capture the video of the real world scene, detect
close objects or within the field of view, access a database
using a wireless network, and display additional
information over the real video.
If the option for storing the user experience is on, when the
user moves around the space the video is being captured
along with the information about interest points. The
information presented at each interest point and thus stored
for later replay can be video, audio, text, or images, or
virtual worlds. The user experience involves the
visualization of the physical space, the augmented
information, and the navigated virtual worlds. All this data
is stored as a hypermedia network using the mechanisms
described in the next sections
HYPERMEDIA MODEL
Storing and displaying information is supported by a
hypermedia model defined by a set of reusable components
for application programming. The model includes the
following types of components:
•
Atomic: It represents the basic data types, e.g.,
text and image.
•
Composite: It is a container for other components,
including Composites, and it is used to structure
an interface hierarchically.
•
Link: It establishes relations among components.
Every component includes a list of Anchors and a
Presentation Specification. Anchors allow to reference part
of a component and are used in specifiers, a triplet
consisting of anchor, component and direction, used in
Links to establish relations between the different
components of a hypermedia graph. The Presentation
Specification describes the way the data is presented in an
augmented interface. The interface structure is done with
Composite objects that establish a hierarchy of visual
blocks. Interfaces are presented (and removed) according to
the sequence of events, as described next.
Anything that happens and that it changes the information
that is presented is considered and event. There are three
main types of events as follows:
•
Location of user in a space.
•
Recognition of an interest point, identified by an
optical marker or a RFID tag.
•
User navigation or choice.
APPLICATIONS
We are developing several applications to test the
hypermedia model in context aware augmented
environments. The two applications where the development
is more advanced are a museum/gallery information
assistant and a game that takes place in the gallery
environment.
The gallery experimental space consists of room with
subdivisions to create a navigational need. The physical
entities consist of paintings, and each painting is positioned
in a different part of the room. Virtual 3D models related to
these paintings have been created and used to enrich the
information setting, augment the user interface and allow
navigation within those worlds in search of new
experiences and knowledge.
The user set up consists of a portable PC, with a wireless
LAN card, and a camera to capture the real world video.
There are two alternative user set-ups: in the first the
visualization and interaction is done directly on the PC; and
the other uses a Head Mounted Display, and a 2-3-button
device for interaction purposes. We are currently using the
Cy-Visor DH-4400VP video see through display. The main
recognition process is accomplished through the camera
device. There are markers associated with each painting
that are optically recognized through an augment reality
toolkit (ARToolkit) developed at the University of
Washington. The system uses this recognition process to
know the user position and orientation, although the
components of the hypermedia model can accept input
from different devices for location purposes. Once objects
are recognized, media data is added to the real world video
capture, by accessing the remote hypermedia graph. When
manipulating 3D data, such as the worlds that represent the
paintings, a 3D behavior toolkit is used to superimpose the
models over the real world video and navigate in them.
There is one ARToolkit marker located near each painting.
When this marker is recognized the system presents
information about the painting and an iconic simplified 3D
representation. If the user selects this model it will enter a
complex and detailed virtual world representing the
painting where navigation is possible and the game
described next takes place. Adding to the gallery
information setting, a mystery game was also developed.
The story consists in solving a robbery that took place in
the gallery. The user has to gather clues and interact with
virtual characters to find the stolen item. To do so, the
player has to move around the physical and virtual spaces.
The game features several objects to be accessed or
navigated during playtime, namely worlds, characters and
clues.
This method of specifying history through a sequence of
scenes yields to obvious possibilities of arranging it in a
different order or introducing new media elements. This is
a generic mechanism for repurposing these types of
materials and building new applications (e.g., storytelling).
Time
Story
Entity
Action
Duration
Story
Action
Entity
...
Duration
Scene
Start
Content
Video
Content
Figure 2: History Structure
Storage Requirements
It is necessary to store different sets of data in order to
represent and later replay the user experience:
•
Video of the real world scenes
•
Events
•
Information that is displayed
The main storage requirement is related with the video
capture. This has to be continuously stored when the events
are occurring. The only exception happens when the user
navigates in virtual worlds. The real world video is stored
in streams and it is referenced by Entity component,
through Anchors and Content components, for each event
that occurs. The streams are interrupted whenever the
interface is a virtual world where the user navigates.
Figure 1: Application Architecture
HISTORY STRUCTURE
In a context aware application, the experience can be
divided into several scenes, each of which triggered by a
particular action. Each of such scenes has two main
elements: the content of the interface, and the action that
triggered the interface. Associated with the action is also its
life period. Each scene is presented before another action is
processed that leads to another interface content. For
instance, in an augmented reality environment, the real
video is needed to replay the experience, as well as the
augmented information. The action is the command (or
event) that caused the augmented content to be displayed.
The experience history is build up of several scenes.
The hypermedia system models scenes with the Story and
Entity components. Each scene is a Story link that points to
an Entity component. The Story link contains the action
and duration attributes. The Entity component is associated
to a set of links that specify the data elements needed to
replay the scene. The result of navigating in the system, a
history instance, is a linked list of Story/Entity component
pairs. The Entity components are linked together by Story
components, forming a path of links and nodes. Figure 2
illustrates this structure.
The Story components in the scene list contain the events
and the associated information. For each event a new Story
component is created with the event parameters.
Simultaneously an Entity component is created with the
necessary Content connections to reproduce the state of the
interface after processing the event.
The state of the interface is a set of Content connections
with the corresponding dynamic behaviors, for each
information item displayed in the interface at a given
instant. The necessary data for later replay are copied from
the original components and referenced in Content links
through specifiers. The behavior of the interface is
reproduced with the Presentation Specification of the
Content links.
REPLAYING THE EXPERIENCE
In order to replay the experience we are developing a set of
applications. These applications assume that as a result of a
previous navigation session a hypermedia graph, as
described above, was produced and added to the main
graph, that contains the overall information (Figure 3). This
hypermedia graph describes the experience, including
video, events and user interfaces. It includes all the
structural and timing information needed to provide a view
of a past experience in an augmented environment.
In order to replay the experience we are considering
different levels and tools, described in the next subsections.
video, augmented information, and navigation in virtual
worlds into a coherent narrative structure.
PC Player
Authoring Environment
Player/editor of the experiences stored as a hypermedia
history, for later browsing. This is a tool that enables to
browse the stored materials at a later stage, in a setting that
can be different from the one where the experience took
place. The typical usage is on someone’s personal
computer. This is the first tool that we are implementing
and it is essentially a video player that displays the video
that was captured during the navigation in the real world.
Superimposed to this video, the augmented materials that
were presented in the original navigation are presented.
Besides the traditional video player control buttons (play,
stop, pause, resume) it includes buttons for Next/Previous,
which allows going back and forth in the history list. This
navigation mode is exactly the same as going to the
next/previous interest point that was defined in the physical
space. Additionally, the tool allows adding annotations to a
given point in the video stream.
Besides these applications we are building an authoring
environment for mixed reality applications that will also be
used to browse and edit hypermedia networks that resulted
from previous experiences. This allows to add additional
materials at a later stage, for example, to provide more
insights about a given interest point. This corresponds to
add more components to the hypermedia graph or remove
them. The authoring environment includes a graph browser
and space representations (2D and 3D). It allows attaching
Entity components to physical spaces and additional
content.
Current User History
Hypermedia Graph
User 1
Player
User 2
User n
Movie
Mixed Reality
Figure 3: Information storage and repurpose scheme
Mixed Reality Player
This player allows to access stored experiences when the
user is in the physical setting. When the user reaches an
interest point it can follow the links for further information
or it can access previous content that she or others have
browsed at that time. This mechanism can be used in
gaming settings or it can also be used to leave personal
information attached to physical spaces, as memory for
future visitors.
Movie
The navigation in the real world combined with access to
virtual worlds can be viewed as a movie. As such, we
intend to explore this option by defining a set of montage
and editing rules that can be applied to the overall
hypermedia network in order to generate a movie. This
movie will integrate the different elements: the original
CONCLUSIONS AND FUTURE WORK
This paper presents an approach for replaying experiences
in mixed reality using hypermedia mechanisms. The main
advantage of using hypermedia as support for this type of
activities comes from the fact that hypermedia mechanisms
provide a powerful and well-tested way to structure
information and provide support for navigation. When
extending the existing hypermedia mechanisms to the real
world many concepts can be used including navigation
aids, annotations or bookmarks, and path/history
mechanisms. The history mechanism, common in most
hypermedia systems, namely the Web browsers, is the main
concept that supports the work that we are doing. The
history, as list of visited nodes, provides a simple and
flexible mechanism for structuring information captured
from the real world along with virtual elements. The
applications that we are building explore different ways to
replay the events, ranging from a player to be used in a
normal PC to exploration in the place where past events
took place. Additionally, we are exploring the possibility of
generating a movie out of the raw materials that were
captured and stored.
The current status of the system includes an
implementation of the hypermedia model, testing
applications (the information and gaming environment
mentioned above), and preliminary tools for replay (the
player for later browsing, in a PC setting). Further work
includes the tools for replaying the experience where it
took place. Also, editing the result of a session is a way that
we want to explore in the context of the authoring
environment that we are building. Editing can be as simple
as adding or removing materials, but it can include
transforming and repurposing the materials using
storytelling and cinematographic techniques.
ACKNOWLEDGEMENTS
We thank the financial support given by “PRODEP III,
Medida 5, Acção 5.3”, a FSE education program.
REFERENCES
1. Fitzgibbon, A., Reiter, E. “Memories for Life” –
Managing Information Over a Human Lifetime.
Included in Grand Challenges for Computing Research,
Sponsored by the UK Computing Research Committee,
with
support
from
EPSRC
and
NeSC,
http://www.nesc.ac.uk/esi/events/Grand_Challenges/
4. Pea, R., Mills M., Rosen, J., Dauber, K., Effelsberg, W.,
Hoffert, E. The Diver Project: Interactive Digital Video
Repurposing. IEEE Multimedia Jan/Mar 2004
2. Graham, J., Erol B., Hull, J., and Lee, D. The Video
Paper Multimedia Playback System. ACM Multimedia
2003, Berkeley, USA, November 2003
5. Romero, L., and Correia, N. HyperReal: A Hypermedia
Moduel for Mixed Reality. ACM Hypertext’03,
Nottingham, UK, August 2003
3. Hall, T. et al. The Visitor as Virtual Archaeologist:
Explorations in Mixed Reality Technology to Enhance
Educational and Social Interaction in the Museum.
Proceedings of the 2001 conference on Virtual reality,
archeology, and cultural heritage, Glyfada, Greece,
2001
6. Truong, K., Abowd, G., and Brotherton, J.
Personalizing the Capture of Public Experiences.
UIST’99, Asheville, USA, November 1999
7. Weal, M., Michaelides, D., Thompson, M., and De
Roure, D. The Ambient Wood Journals – Replaying the
Experience. ACM Hypertext’03, Nottingham, UK,
August 2003
Storing, indexing and retrieving my autobiography
Alberto Frigo
Innovative Design, Chalmers University of Technology
412 96 Gothenburg, Sweden
[email protected]
ABSTRACT
This paper describes an ongoing experiment consisting of
photographing each time my right hand uses an object in
order to create my autobiography for self-reflection and
enforcing my identity. This experiment has now been
carried out for six months. The daily sequences of photos
are linked together on a portable computer based on the
typology of the object represented. With this structure I can
review the database and retrieve a record of my past
activities both chronologically and in an associative
manner. The portable database is also used to support
communication with persons in my proximity. Finally I
consider a scenario where several users carrying such
database could interact with one another.
Keywords
Autobiography, photography of object engagement, object
typologies, portable database.
INTRODUCTION
Since the 24th of September 2003 I have been
photographing and cataloguing each time my right hand has
used an object. The images are chronologically collected in
a portable database [1] I constantly carry with me. Here
they are linked to one another based on the object
represented.
The visualization of the database attempts to codify the
whole of my life patching together and associating every
single event. Each of my life-events finds a representative
symbol in the images of the objects used while
accomplishing it. These symbols are meant to be a direct
stimulus for me to actively remember my past.
As I am reviewing this paper (24th of March 2004), 9536
activities from 181 days have been indexed in 124
categories stored on my portable database. The experience
of writing this paper has been stored by photographing my
hand using my computer mouse (see the first image of the
third row in fig. 2). Later I will retrieve today’s images to
quickly index them with a binary code of eight digits
corresponding to a matrix of icons representing the objects
(fig.1).
57<8,2
0001 0010 0100
1001 1011 1101
)*+-/3g
0011 0110 1100
1000 0101 0000 1010
9%1
0111 1110 1111
fig. 1 By combining an icon from the matrix on the left with
an icon from the matrix on the right I can assign to every
typology of objects eight binary digits which are different
for each category and easy to remember.
On the subway on my way to work I am likely to extract my
database from my pocket and browse it, to recollect my
history and myself.
I might also retrieve it in front of a colleague of mine
asking me: “How are you?” and my answer: “Good!” would
be accompanied by a rapid slide-show of the recent
photographed objects I have engaged with. The photo of
my hand engaging with the database on the subway would
appear as well (see the first image of the first row in fig. 3).
BACKGROUND
Throughout history humans have been using images as
cognitive tools to assist memory. For this purpose images
have been both mentally internalized and externalized as for
instance in an alphabet (fig. 4).
fig. 2 The images are organized per typology of object starting left to right with the most recent photographed engagement.
fig. 3 The daily sequence of objects used on the 12th of February 2004, see http://www.id.gu.se/~alberto/12.02.04.html,
Besides, as life events within an inorganic context can
repeat themselves identically, the same repetition of a
typology of object can establish a link between situations
distant in time but where I had to follow an identical
procedure.
An example of this would be that of myself brushing my
teeth after every meal. Those identical situations might
frequently look redundant though the photographic
representation of this activity can reveal some unusual
situations. An example could be that of a strangely shaped
toothbrush that immediately reminds me that I used it at a
friend’s place when a snow storm prohibited me to drive
home so I was invited to stay over and I was given a brand
new toothbrush.
fig. 4 A 15th century example of an alphabet where the
representation of objects were used by a priest to remember
his sermons.
In both cases, whether the memory image is internal or
external, a sequential order to move from one image to
another is needed. In the practice of the Ars Memorativa
this order was given by mentally dispersing the images
within a familiar architecture, to then mentally move from
one room to another in a predetermined way [2].
MOTIVATIONS
My method of photographing each object I engage with was
created as a medium to recollect myself and my personal
history, which I see as very fragmented, interrupted by a
technology that allows different realities, different selves
that I am not able to express as a whole.
In today’s artificially mediated reality, a continuous
narration of my life is not possible, too many inorganic
interruptions have occurred. The inorganic interruptions can
be symbolized by the objects I voluntary or involuntary,
consciously or unconsciously engage with. The objects are
the artificial tools that allow me to access different contexts.
For example a magnetic card allows me to enter the gym
where I will exercise and become a macho, the mobile
phone allows me to contact my relatives in my native
country and suddenly re-become part of them, a pen allows
me to sign a contract that allows me to get an apartment in
an upper class neighborhood and so changing my social
status.
The objects I have been engaging with throughout my life
symbolize those drastic, therefore inorganic changes.
The objects can then be seen as joining together life events
that most likely have no organic connections between one
another. If my life and my effort of existing is worth to be
remembered and communicated, a visual inventory of those
objects is, in my opinion, its most accurate and immediate
representation.
The objects I photograph, while used, represent single
specific activities that from a more general perspective can
visualize how, throughout my life, my intentions, my
desires, my sorrows have mutated. The objects become my
emblems, the code through which the whole of me can be
reconstructed, interpreted.
POTENTIAL
The utility of this mnemonic mechanism can be found in the
way it provides a language of interpretation of a person’s
life. Through genetic code we are able to trace our organic
evolution yet within an inorganic context, genetic code is
not sufficient to trace what I would call our inorganic
evolution, our artificially mediated way of being.
On a macro level the sequences of objects a hypothetic
person might have used, could be examined based on their
frequency.
On a micro level each object represents an objective
landmark to the psychological, physical state of this person
around the moment in which the object was used.
THE DESIGN
For the physical design of the database the concept was to
make it self-contained, portable and self-sustainable. The
result is a mixture of few commercially available electronic
accessories that consist of:
•
A low cost digital camera, and its battery and
charger.
•
A handheld pc and its charger.
•
A memory card.
protective, practical to use and easy to wear both in the
labeling and showing mode.
fig. 5 The camera and the pocket PC.
The memory card allows me to transfer the images from the
camera to the handheld pc where an application developed
within the project will link them to both the day sequence
(fig. 3) and the object sequence (fig. 2) via the eight digits I
input (fig. 1).
fig. 7 The bag.
CONTEXTUALISATION
My method should be referred as a strictly visual attempt to
build a memory-aid. Other research has been carried out on
the same agenda, also exploring wearable and portable
technologies. However this research has primarily focused
either on text based approaches as in the case of Augmented
Reality where the user annotates conversations [3], or on a
continuous video and/or audio recording of reality as in the
case of wearable surveillance devices [4].
My approach differs from these by being selective in time.
This selection is neither completely subjective nor
completely objective. The usage of an object tells me
exactly that is time to capture yet it’ s my decision to use it.
fig. 6 The figure illustrates how the handheld PC buttons
are arranged both to catalogue and show images.
The central button of the handheld pc (fig. 6) allows to slide
show the images chronologically when going left or right
(respectively more remote and more recent). When showing
the image of an object, by pressing up or down I can slide
show the images of the same type of object respectively
captured after or before. This last method associates those
life-events symbolized by the same type of object
engagement.
The buttons 0 and 1 allows me to input the eight digits
correspondent to a category (fig. 1) and by pressing + in the
central button, I can retrieve the most recent image of this.
In the same way images are labeled.
By pressing the white button I switch to label mode. In this
mode the application picks the first image stored in the
memory card of the portable camera and shows it on screen.
The application, after inputting 0 and/or 1 digits and
pressing +, updates the database and proposes a new image
on screen until there are no more. With the black button I
can delete the eight digits label just inputted or in the show
mode exit the program.
The whole configuration is fitted in a pouch around my
waist (fig 7). The pouch has been adapted so to be
My approach is not a substitute for my memory but it
assists it by showing a sequential order of activities that
triggers an internal recollection of the situation around
them.
On the same level a related portable system was designed at
Xerox Parc [5] a decade ago. This system collected graphic
icons based on the user activities within an office space.
Although the activities that could be tracked were limited
and the graphic icons repetitive, this project was an
important predecessor of my work.
CONCEPT IMPLEMENTATION
In a scenario where a whole crowd of people have their own
portable database containing the objects representing their
activities, I can imagine persons encountering each other
and quickly getting an overview of their objects’ similarities
or differences and perhaps distributing themselves
according to those.
If I for instance dislike smokers, I would try to avoid
persons whose database contains many lighters and instead
approach those that have swim goggles (I like to swim).
Still I would be really curious about the persons around me,
diminishing the sense of alienation typical of metropolitans.
The sum of each of our autobiographical databases could
become the medium of a global history, an authentic record
where it would be possible to determine our present
consequences and perhaps even predict future ones. It is my
opinion that this distributed method, where each individual
in society would have to contribute to a global history, is
both a more concrete and less overwhelming possibility
than a surveillance system distributed in the environment.
CONCLUSION
To conclude I would like to stress my position that the
design of a memory-aid and sharing of experiences should
involve a high degree of participation from the user without
aiming to simulate his or her life.
On the contrary this design should be of existential and
intellectual stimulus. Existential stimulus in the sense that
the user should look for exciting experiences worth to be
remembered and narrated, intellectual stimulus in the sense
that he or she, while retrieving the recorded experiences,
should contribute with his or her own memory.
REFERENCES
1. To access my current
http://www.id.gu.se/~alberto/
project
please
visit:
2. Frances A. Yates, The art of memory, The University of
Chicago Press, Chicago 1966.
fig. 8 A visualization of possible interactions between
persons carrying the database and encountering.
3. Thad Starner, Steve Mann, Bradley Rhodes, Jeffrey
Levine, Jennifer Healey, Dana Kirsch, Rosalind W.
Picard, Alex Pentland, Augmented Reality Through
Wearable Computing, Presence, Vol. 6, No. 4, August
1997, pp. 386-398.
4. Steve Mann, Wearable Computing: A First Step Toward
Personal Imaging, Computer, February 1997, pp. 2531.
5. Mik Lamming, Mike Flynn, "Forget-me-not" Intimate
Computing in Support of Human Memory, FRIEND21
Symposium, Next Generation Human Interfaces, Tokyo
Japan, 1994.
Sharing Experience and Knowledge with Wearable
Computers
Marcus Nilsson, Mikael Drugge, Peter Parnes
Division of Media Technology
Department of Computer Science & Electrical Engineering
Luleå University of Technology
SE-971 87 Luleå, Sweden
{marcus.nilsson, mikael.drugge, peter.parnes}@ltu.se
ABSTRACT
Wearable computer have mostly been looked on when used
in isolation. But the wearable computer with Internet connection is a good tool for communication and for sharing
knowledge and experience with other people. The unobtrusiveness of this type of equipment makes it easy to communicate at most type of locations and contexts. The wearable computer makes it easy to be a mediator of other people
knowledge and becoming a knowledgeable user. This paper
describes the experience gained from testing the wearable
computer as a communication tool and being the knowledgeable user on different fairs.
Keywords
Group communication, wearable computer.
INTRODUCTION
Wearable computer can today be made by of the shelf equipment and are becoming more common used in some areas as
construction, health care etc. Researchers in the wearable
computer area believe that wearable computer will be equipment for everyone that aids the user all day. This aid is in
areas where computers are more suited then humans for example memory task. Wearable computer research has been
focusing on the usage of wearable computer in isolation [5].
It is believed in the Media Technology group at Luleå University of Technology that a big usage of the wearable computer will be the connection the wearable computer can make
possible, both with people and the surrounding environment.
Research on this is being conducted in what we call Borderland[12], which is about wearable computer and the tool for
it to communicate with people and technology. A wearable
computer with network connection can make it possible to
have a communication with people that are at distant locations independent of the users current location. This is of
course possible today with mobile phones etc, but a signifi-
cant difference with the wearable computer is the possibility
of a broader use of media and the unobtrusiveness of using a
wearable computer.
One of the goals for wearable computers is that the user
could operate it without diminishing his presence in the real
world [4]. This together with the wearable computer as a
tool for rich1 communication make it possible for new ways
of communication. A wearable computer user could become
a beacon of several people’s knowledge and experience, a
knowledgeable user. The wearable computer would not just
be a tool for receiving expert help [8] but a tool to give the
impression to other people that the user does have the knowledge in himself.
The research questions this brings forward include by what
means communication can take place, what type of media is
important for this type of communication?
There is also the question of how this way of communicating
will affect the participants involved, what advantages and
disadvantages there are with this form of communication.
In this paper we present experience that have been made on
using wearable computers as a tool to communicate knowledge and experience from both the user and other participants over the network or locally.
Environment for Testing
The usage of wearable computers for communication was
tested under different fairs that the Media Technology group
attended. The wearable computer was part of the exhibition
of the group and used to communicate with the immobile
part of the exhibition. Communication was also established
with remote persons from the group that was not attending
the fairs. Both the immobile and remote participants could
communicate with the wearable computer through video, audio and text.
The type of fairs ranged from small fairs locally to the university for attracting new students, to bigger fairs where research was presented for investors and other interested parties.
1
With rich we mean that several different media is used as audio,
video, text, etc
RELATED WORK
Collaborative work using wearable computers has been discussed in several publications [2, 3, 13]. The work has focused on how several wearable computers and/or computer
users can collaborate. Not much work has been done on how
the wearable computer user can be a mediator for knowledge
and experience of other people. Lyons and Starners work on
capture the experience of the wearable computer user [10]
is interesting and some of the work there can be used for
sharing knowledge and experience in real time. But it is also
important to consider the other way around where people are
sharing to the wearable computer user.
As pointed out in [5], wearable computers tend to be most
often used in isolation. We believe it is important to study
how communication with other people can be enabled and
enhanced by using this kind of platform.
THE MOBILE USER
We see the mobile user as one using a wearable computer
that is seamlessly connected to the Internet throughout the
day, regardless of where the user is currently situated. In
Borderland we currently have two different platforms which
both enable this; one is based on a laptop and the other is
based on a PDA. In this section we discuss our current hardware and software solution used for the laptop-based prototype. This prototype is also the one used throughout the
remainder of this paper, unless explicitly stated otherwise.
Hardware Equipment
The wearable computer prototype consists of a Dell Latitude C400 laptop with a Pentium III 1.2 GHz processor, 1
GB of main memory and built-in IEEE 802.11b. Connected
to the laptop is a semi-transparent head-mounted display by
TekGear called the M2 Personal Viewer, which provides the
user with a monocular full color view of the regular laptop
display in 800x600 resolution. Fit onto the head-mounted
display is a Nogatech NV3000N web camera that is used to
capture video of what the user is currently looking or aiming his head at. A small wired headset with an earplug and
microphone provides audio capabilities. User input is received through a PS/2-based Twiddler2 providing a mouse
and chording keyboard via a USB adapter. The laptop together with an USB-hub and a battery for the head-mounted
display are placed in a backpack for convenience of carrying
everything. A battery for the laptop lasts about 3 hours while
the head-mounted display can run for about 6 hours before
recharging is needed. What the equipment looks like when
being worn by a user is shown in figure 1.
Note that the hardware consists only of standard consumer
components. While it would be possible to make the wearable computer less physically obtrusive by using more specialized custom-made hardware, which is not a goal in itself
at this time. We do, however, try to reduce its size as new
consumer components become available.
There is work being done on a PDA based wearable that
can be seen in figure 2. The goal is that it will be much
more useful outside the Media Technology group at Luleå
Figure 1: The Borderland laptop-based wearable computer.
University of Technology and by that make it possible to do
some real life test on the knowledgeable user.
Software Solution
The commercial collaborative work application Marratech
Pro2 running under Windows XP provides the user with the
ability to send and receive video, audio and text to and from
other participants using either IP-multicast or unicast. In addition to this there is also a shared whiteboard and shared
web browser. An example of what the user may see in his
head-mounted display is shown in figure 3.
BEYOND COMMUNICATION
With a wearable computer, several novel uses emerge as a
side effect of the communication ability that the platform allows. In this section we will focus on how knowledge and
experiences can be conveyed between users and remote participants. Examples will be given on how this sharing of
information can be applied in real world scenarios.
Becoming a Knowledgeable User
One of the key findings at the different fairs was how easily a single person could represent the entire research group,
2
http://www.marratech.com
Figure 3. The collaborative work application Marratech Pro as seen in the head-mounted display.
provided he was mobile and could communicate with them.
When meeting someone, the wearable computer user could
ask questions and provide answers that may in fact have
originated from someone else at the division. As long as
the remote information, e.g. questions, answers, comments
and advices, was presented for our user in a non-intrusive
manner, it provided an excellent way to make the flow of
information as smooth as possible.
For example, if a person asked what a certain course or program was like at our university, the participants at the division would hear the question as it was asked and could
respond with what they knew. The wearable computer user
then just had to summarize those bits of information in order
to provide a very informative and professional answer.
This ability can be further extended and generalized as in
the following scenario. Imagine a person who is very charismatic, who is excellent at holding speeches and can present
information to an audience in a convincing manner. However, lacking technical knowledge, such a person would not
be very credible when it comes to explaining actual technical
details that may be brought up. If such a person is equipped
with a wearable computer, he will be able to receive information from an expert group of people and should thus be
able to answer any question. In effect, that person will now
know everything and be able to present it all in a credible
manner, hopefully for the benefit of all people involved.
Further studies are needed to find out whether and how this
scenario would work in real life — can for example an ex-
ternal person convey the entire knowledge of, for example
a research group, and can this be done without the opposite
party noticing it? From a technical standpoint this transmission of knowledge is possible to do with Borderland today,
but would an audience socially accept it or would they feel
they are being deceived?
Another, perhaps more important, use for this way of conveying knowledge is in health-care. In rural areas there may
be a long way from hospital to patients’ homes, and resources in terms of time and money may be too sparse to
let a medical doctor visit all the patients in person. However, a nurse who is attending a patient in his home can
use a wearable computer to keep in contact with the doctor who may be at a central location. The doctor can then
help make diagnoses and advise the nurse on what to do.
He can also ask questions and hear the patient answer in
his own words, thereby eliminating risks of misinterpretation and misunderstanding. This allows the doctor to virtually visit more patients than would have been possible using conventional means, it serves as an example on how the
knowledge of a single person can be distributed and shared
over a distance.
Involving External People in Meetings
When in an online meeting, it is sometimes desirable for an
ordinary user to be able to jump into the discussion and say
a few words. Maybe a friend of yours comes by your office while you are in a conversation with some other people,
and you invite him to participate for some reason, maybe he
the headset.
3
To alleviate this problem, we found it would likely be very
useful to have a small speaker as part of the wearable computer through which the persons you meet could hear the
participants. That way, the happenstance meeting can take
place immediately and the wearable computer user need not
even take part in any way, he just acts as a walking beacon
through which people can communicate. Of course, a side
effect of this novel way of communicating may well be that
the user gets to know the other person as well and thus, in
the end, builds a larger contact network of his own.
We believe that with a mobile participant, this kind of unplanned meetings will happen even more frequently. Imagine, for example, all the people you meet when walking
down a street or entering a local store. Being able to involve
such persons in a meeting the way it has been described here
may be very socially beneficial in the long run.
When Wearable Computer Users Meet
Besides being able to involve external persons as discussed
in the section before, there is also the special case of inviting
other wearable computer users to participate in a meeting.
This is something that can be done using the Session Initiation Protocol (SIP)[7].
Figure 2: The Borderland PDA-based wearable computer.
knows a few of them and just wants to have a quick chat.
While this is trivial to achieve when at a desktop — you just
turn over your camera and hand a microphone to your friend
— this is not so easily done with a wearable computer for
practical reasons.
Even though this situation may not be that common to deserve any real attention, we have noticed an interesting trait
of mobile users participating in this kind of meetings. The
more people you meet when you are mobile, the bigger chance
there is that some remote participant will know someone
among those people, and thus the desire for him to communicate with that person becomes more prevalent. For this
reason, it has suddenly become much more important to be
able to involve ordinary users — those you just meet happenstance — in the meeting without any time to prepare the
other person for it.
A common happening at the different fairs was that the wearable computer user met or saw a few persons who some participant turned out to know and wanted to speak with. Lacking any way besides using the headset to hear what the remote participants said, the only way to convey information
was for our user to act as a voice buffer, repeating the spoken
words in the headset to the other person. Obviously, it would
have been much easier to hand over the headset, but several
people seemed intimidated by it. They would all try on the
head-mounted display, but were very reluctant to speak in
A scenario that exemplifies when meetings between several
wearable computer users at different locations would be highly
useful is in the area of fire-fighting.4 When a fire breaks out,
the first team of firefighters arrives at the scene to assess the
nature of the fire and proceed with further actions. Often
a fire engineer with expertise knowledge arrives at the scene
some time after the initial team in order to assist them. Upon
arrival he is briefed of the situation and can then provide advice on how to best extinguish the fire. The briefing itself is
usually done in front of a shared whiteboard on the side of
one of the fire-fighting vehicles. Considering the amount of
time the fire engineer spends while being transported to the
scene, it would be highly beneficial if the briefing could start
immediately instead of waiting until he arrives.
By equipping the fire engineer and some of the firefighters
with wearable computers, they would be able to start communicate early on upon the first team’s arrival. Not only
does this allow the fire engineer to be briefed of the situation in advance, but he can also get a first person perspective
over the scene and assess the whole situation better. Just as
in kraut’s work [9] the fire engineer as an expert can assist
the less knowledgeable before reaching the destination. As
the briefing is usually done with help of a shared whiteboard
— which also exists in the collaborative work application
in Borderland — there would be no conceptual change to
their work procedures other than the change from a physical
3
Another exhibitor of a voice-based application mentioned they
had the same problem when requesting people to try it out; in general people seemed very uncomfortable speaking into unknown devices.
4
This scenario is based on discussions with a person involved in
fire fighting methods and procedures in Sweden.
whiteboard to an electronic one. This is important to stress
— the platform does not force people to change their existing
work behavior, but rather allows the same work procedures
to be applied in the virtual domain when that is beneficial.
In this case the benefit lies in briefing being done remotely,
thereby saving valuable time. It may even be so that the fire
engineer no longer needs to travel physically to the scene,
but can provide all guidance remotely and serve multiple
scenes at once. In a catastrophe scenario, this ability for a
single person to share his knowledge and convey it to people
at remote locations may well help in saving lives.
EVALUATION
The findings we have done are based on experiences from
the fairs and exhibitions we have attended so far, as well as
from pilot studies done in different situations at our university.
The communication that the platform enables allows for a
user to receive information from remote participants and convey this to local peers. As participants can get a highly realistic feeling of “being there” when experiencing the world
from the wearable computer user’s perspective, the distance
between those who possess knowledge and the user who
needs it appears to shrink. Thus, not only is the gap of physical distance bridged by the platform, but so is the gap of
context and situation.
While a similar feeling of presence might be achieved through
the use of an ordinary video camera that a person is carrying around together with a microphone, there are a number
of points that dramatically sets the wearable computer user
apart from such.
• The user will eventually become more and more used to
the wearable computer, thus making the task of capturing
information and conveying this to other participants more
of a subconscious task. This means that the user can still
be an active contributing participant, and not just someone
who goes around recording.
• As the head-mounted display aims in the same direction as
the user’s head, a more realistic feeling of presence is conveyed as subtle glances, deliberate stares, seeking looks
and other kinds of unconscious behavior is conveyed. The
camera movement and what is captured on video thus becomes more natural in this sense.
• The participants could interact with the user and tell him
to do something or go somewhere. While this is possible even without a wearable computer, this interaction in
combination with the feeling of presence that already existed gave a boost to it all. Not only did they experience
the world as seen through the user’s eyes, but they were
now able to remotely “control” that user.
The Importance of Text
Even though audio may be well suited for communicating
with people, there are occasions where textual chat is more
preferable. The main advantage of text as we see it is that
unlike audio, the processing of the information can be postponed for later. This has three consequences, all of which
are very beneficial for the user.
1. The user can choose when to process the information, unlike a voice that requires immediate attention. This also
means processing can be done in a more arbitrary, nonsequential, order compared to audio.
2. The user may be in a crowded place and/or talk to other
people while the information is received. In such environments, it may be easier to have the information presented
as text rather than in an audible form, as the former would
interfere less with the user’s normal task.
3. The text remains accessible for a longer period of time
meaning the user does not need to memorize the information in the pace it is given. For things such as URL:s,
telephone numbers, mathematical formulas and the like, a
textual representation is likely to be of more use than the
same spoken information.
While there was no problem in using voice when talking
with the other participants, on several occasions the need
to get information as text rather than voice became apparent. Most of the time, the reason was that while in a live
conversation with someone, the interruption and increased
cognitive workload placed upon the user became too difficult to deal with. In our case, the user often turned off the
audio while in a conversation so as not to be disturbed. The
downside of this was that the rest of the participants in the
meeting no longer had any way of interacting or providing
useful information during the conversation. 5
There may also be privacy concerns that apply; a user standing in a crowd or attending a formal meeting may need to
communicate in private with someone. In such situations,
sending textual messages may be the only choice. This means
that the user of a wearable computer need not only be able
to receive text, he must also be able to send it. We can even
imagine a meeting with only wearable computer participants
to make it clear that sending text will definitely remain an
important need.
Hand-held chord keyboards such as the Twiddler have showed
to give good result for typing [11]. But these types of devices still take time to learn and for those who seldom need
to use them the motivation to learn typing efficiently may
never come. Other alternatives that provide a regular keyboard setup, such as the Canesta KeyboardTM Perception
ChipsetTM that uses IR to track the user’s fingers on a projected keyboard, also exist and may well be a viable option to
use. Virtual keyboards shown on the display may be another
alternative and can be used with a touch-sensitive screen or
eye-tracking software in the case of a head-mounted display.
Voice recognition systems translating voice to text may be of
some use, although these will not work in situations where
5
This was our first public test of the platform in an uncontrolled
environment, so neither of the participants was sure of what was
the best thing to do in the hectic and more or less chaotic world
that emerged. Still, much was learnt thanks to exactly that.
privacy or quietness is of concern. It would, of course, also
be possible for the user to carry a regular keyboard with him,
but that can hardly be classified as convenient enough to be
truly wearable.
There is one final advantage of text compared to audio, and
that is the lower bandwidth requirements of the former compared to the latter. On some occasions there may simply not
be enough bandwidth, or the bandwidth may be too expensive, for communicating by other means than through text.
Camera and Video
Opinions about the placement of the camera on the user’s
body varied among the participants. Most of them liked having the camera always pointing in the same direction as the
user’s head, although there were reports of becoming disoriented when the user turned his head too frequently.
Some participants wanted the camera to be more body stabilized, e.g. mounted on the shoulder, in order to avoid this
kind of problem. While this placement would give a more
stable image it may reduce the feeling of presence as well
as obscure the hints of what catches the user’s attention. In
fact, some participants expressed a desire to be given an even
more detailed view of what the user was looking at by tracking his eye movements, as that is something which can not
be conveyed merely by having the camera mounted on the
user’s head. As Fussell points out [6] there are problems
that have to be identified with head-mounted cameras. Some
of these problems may be solved by changing the placement
on the body for the camera. However, further studies are
needed to draw any real conclusions of the effects of the different choices when used in this kind of situation.
Some participants reported a feeling of motion sickness with
a framerate (about 5 Hz), and for that reason preferred a
lower framerate (about 1 Hz) providing almost a slideshow
of still images. However, those who had no tendency for motion sickness preferred as high framerate as possible because
otherwise it became difficult to keep track of the direction
when the user moved or looked around suddenly.
In [1] it is stated that a high framerate (15 Hz) is desirable
in immersive environments to avoid motion sickness. This
suggests our notion of high framerate was still too low, and
by increasing it further it might have helped eliminate this
kind of problem.
Transmission of Knowledge
Conveying knowledge to a user at a remote location seems
in our experience to be highly useful. So far, text and audio
have most of the time been enough to provide a user with
the information needed, but we have also experienced a few
situations calling for visual aids such as images or video.
CONCLUSIONS
We have presented our prototype of a mobile platform in
form of a wearable computer that allows its user to communicate with other. We have discussed how remote participants can provide a single user with information in order to
represent a larger group, and also how a single expert user
can share the knowledge he possesses in order to assist multiple persons at a distance. The benefits of this sharing have
been exemplified with scenarios taken from health-care and
fire-fighting situations. The platform serves as a proof-ofconcept that this form of communication is possible today.
Based on experiences from fairs and exhibitions, we have
found and identified a number of areas that need further refinement in order to make this form of communication more
convenient for everyone involved. The importance of text
and the configuration and placement of video has been discussed.
The equipment used in these trials is not very specialized
and can be bought and built by anyone. The big challenges
in wearable computers today are the usage and in this paper
a usage of the wearable computer as a tool for sharing of
knowledge and experience was presented.
Future Work
We currently lack quantitative measures for our evaluation.
For this a wearable computer that ordinary people will accept to use in their everyday life is needed. It is believed that
the PDA based wearable that was mentioned earlier in this
paper is that kind of wearable computer and the plan is to do
user test for some of the scenarios that have been mentioned
in earlier in the paper.
There are also plans to improve the prototype with more
tools for improving sharing of experience and knowledge.
One thing that is being worked on now is to incorporate a
telepointer over the video so distant participants can share
with the wearable computer user what they are talking about
or what have their attention at the moment.
ACKNOWLEDGEMENTS
Microphone and Audio
Audio was deemed as very important. Through the headset
microphone the participants would hear much of the random
noise from the remote location as well as discussions with
persons the user met, thereby enhancing the feeling of “being there” tremendously
Of course, there are also situations in which participants are
only interested in hearing the user when he speaks, thereby
pointing out the need for good silence suppression to reduce
any background noise.
This work was sponsored by the Centre for Distance-spanning
Technology (CDT) and Mäkitalo Research Centre (MRC)
under the VINNOVA RadioSphere and VITAL project, and
by the Centre for Distance-spanning Health care (CDH).
REFERENCES
1. B IERBAUM , A., AND J UST, C. Software tools for
virtual reality application development, 1998. Applied
Virtual Reality, SIGGRAPH 98 Course Notes.
2. B ILLINGHURST, M., B OWSKILL , J., J ESSOP, M.,
AND M ORPHETT, J. A wearable spatial conferencing
space. In Proc. of the 2nd International Symposium on
Wearable Computers (1998), pp. 76–83.
3. B ILLINGHURST, M., W EGHORST, S., AND F URNESS ,
T. A. Wearable computers for three dimensional
CSCW. In Proc. of the International Symposium on
Wearable Computers (1997), pp. 39–46.
wearable computer system. ACM/Baltzer Journal on
Mobile Networks and Applications (MONET) 4, 1
(1999).
9. K RAUT, R. E., M ILLER , M. D., AND S IEGEL , J.
Collaboration in performance of physical tasks: Effects
on outcomes and communication. In Computer
Supported Cooperative Work (1996).
4. B REWSTER , S., L UMSDEN , J., B ELL , M., H ALL , M.,
AND TASKER , S. Multimodal ’eyes-free’ interaction
techniques for wearable devices. In Conference on
Human Factors in Computing Systems (2003),
pp. 473–480.
10. LYONS , K., AND S TARNER , T. Mobile capture for
wearable computer usability testing. In International
Symposium on Wearable Computers (ISWC 2001)
(October 2001), pp. 69–76.
5. F ICKAS , S., KORTUEM , G., S CHNEIDER , J.,
S EGALL , Z., AND S URUDA , J. When cyborgs meet:
Building communities of cooperating wearable agents.
In Proc. of the 3rd International Symposium on
Wearable Computers (October 1999), pp. 124–132.
11. LYONS , K., S TARNER , T., P LAISTED , D., F USIA , J.,
LYONS , A., D REW, A., AND L OONEY, E. Twiddler
typing: One-handed chording text entry for mobile
phones. Technical report, Georgia Institute of
Technology, 2003.
6. F USSELL , S. R., S ETLOCK , L. D., AND K RAUT,
R. E. Effects of head-mounted and scene-oriented
video system on remote collaboration on physical
tasks. In CHI2003 (Arpil 2003).
12. N ILSSON , M., D RUGGE , M., AND PARNES , P. In the
borderland between wearable computers and pervasive
computing. Research report, Luleå University of
Technology, 2003. ISSN 1402-1528.
7. H ANDLEY, M., S CHULZRINNE , H., S CHOOLER , E.,
AND ROSENBERG , J. SIP: session initiation protocol,
March 1999. IETF RFC2543.
13. S IEGEL , J., K RAUT, R. E., J OHN , B. E., AND
C ARLEY, K. M. An empirical study of collaborative
wearable computer systems. In Conference companion
on Human factors in computing systems (1995), ACM
Press, pp. 312–313.
8. KORTUEM , G., BAUER , M., H EIBER , T., AND
S EGALL , Z. Netman: The design of a collaborative
Sharing Multimedia and Context Information Between
Mobile Terminals
Jani Mäntyjärvi, Heikki Keränen, and Tapani Rantakokko
VTT Electronics, Technical Research Centre of Finland,
P.O. Box 1100, FIN-90571 Oulu, Finland
{Heikki.Keranen, Tapani.Rantakokko, Jani.Mantyjarvi}@vtt.fi
ABSTRACT
Mobile terminal users have needs for sharing experiences
and common interests in a context sensitive manner.
However, due to the current division of creation, delivery
and access functionality of multimedia to applications, much
user effort is needed to communicate efficiently. In this
paper an approach for a user interface for mobile terminals
to share multimedia and context information is presented
and discussed. A map-based interface and domain object
model -based user interface technique is utilized.
INTRODUCTION
Sharing of experiences using mobile technology is
becoming more common since current mobile terminals
enable capturing and delivery of multimedia content.
However, due to physical limitations of mobile terminals to
present and process multimedia they require particular user
interface (UI) solutions. Current user interfaces do not
provide means to share multimedia content effectively in
real time since creating, delivering, and managing
multimedia documents needs considerable effort.
Context awareness of mobile terminals enables novel
dimensions for mobile communication. Mobile terminals
can share and present contexts by showing contexts of their
members as symbols in a phonebook [12]. The sharing of
context information enables the extension of the basic
applications of mobile terminals with context features, for
example context based call operation [13] and messaging
[7]. Sharing of context information creates potential for
more efficient multimedia distribution, augmentation, and
content management.
In this paper we present and discuss an approach for a user
interface that supports the presentation and sharing of
multimedia and context information together on a context
aware map. Furthermore, we discuss technologies for
enabling user interface solution. A UI solution for an online
community is presented in more detail in [15].
TECHNIQUES
FOR
CREATING
MULTIMEDIA
Crossing application boundaries
AND
SHARING
Applications are an artificial concept of computer science
and for users there are often artificial boundaries between
applications. In our case distinct applications exist for map-
based positioning, taking photos or video shots, playing the
media, sharing the media files created and showing the
current context of each user.
A great amount of user effort is required in order for
crossing those boundaries as discussed in [10]. To deliver
information to a community about what is happening and at
which location a user needs to copy location information
from the positioning application and a media file from the
camera application to a message to be sent in an instant
messaging application. On the receiving side user effort is
needed to figure out the user's position relative to the
position of his friend, because the position of his friend is in
the received message in the messaging application and his
own position is in the positioning application. Further effort
is needed to figure out what his friend is doing right now by
looking at the context sensitive phonebook if the sender
didn't bother to write it directly in the message.
As a solution to the problems caused by applications Raskin
stated that there should not be any separate applications, but
objects and operations that can manipulate those objects
[10]. One of the intriguing technologies towards this
direction is the Naked Objects framework [8] that maintains
one to one correspondence of the domain or business object
model and the UI by enabling the generation of the UI
automatically from the domain object model.
Our object model for enabling multimedia communication
in an online community consists of people in the
community, the multimedia files they create and share, and a
shared map acting as a container object for people and
multimedia files.
Objects in the UI should look and function in a similar way
regardless of the context where they are used and the size
rendered [10]. By using icons it is possible to present
objects in a very small space [3], which is important to fit
more objects to a map being displayed on a small screen. By
using the same icon in larger representations of the object
the user easily associates the object with the one presented
by the small icon.
Maps
Geographical maps have unique advantages being direct
representations of the real world already familiar to users
and exploiting human spatial memory. Positioning
applications showing your place and route to a destination
have been popular especially in car and boat navigation. The
impression of connectivity to the real world can be
enhanced by using positioning techniques to provide a real
time up to date "you are here" position symbol to the map
and using an electronic compass to keep the map parallel to
the real world despite of device orientation.
The idea of capturing position and context during
multimedia creation and using that information for laying
multimedia objects to geographical maps has been used
successfully for multimedia retrieval [1]. It is easy to find
images and video clips about certain situation or place from
a map.
The distinction between real world and augmented reality
solutions is that maps help users to see farther than
physically possible and get an overview of the environment
faster than physically possible. This feature has been utilised
in many navigational purposes to find a route from place to
place.
Geographical maps are also useful for presenting and
finding electronic services having an unambiguous
geographical location [9] instead of some kind of context
aware menu which is constantly changing when you are
walking in the city.
Maps have the ability to visualise very heterogeneous
objects being either physical like people or immaterial like
video clips. The only requirement is that objects must have
location information. Putting heterogeneous objects from
different sources to a geographical map can help the user to
get a good overview of how things are related to each other,
which may help in decision making.
We utilise this feature in our user interface. In our case there
are terminals, which share their context information and
multimedia objects they have created on a map online.
Minimal user effort is needed to communicate their position
and the context and context of multimedia created. New
media objects can be represented by a blinking icon and
when the map becomes crowded the oldest media objects
can be removed from the map in a similar way that instant
messaging applications are removing the oldest messages.
Bluetooth. Here we discuss the presentation of context
information available in mobile terminals to support online
communities.
As discussed in the previous section the UI solution for
online communities should present many types of
information including various multimedia documents,
context information and group interests and preferences in
an online manner, and at the same time to keep the UI clear
and easy to use.
Context information represents the current state of the
object or its environment and can be presented as pictures.
The classification of UI pictures for small interfaces is
provided in [4]. Their explanation based on [2] indicates
that picture classes for small UIs are Iconic, Index, and
Symbolic pictures. Most UI pictures are Index pictures as
they are associated with a function.
In the work of Schmidt et al. [12] availability and location
information is presented as pictures in the phonebook. The
availability is presented as Symbolic color codes similar to
traffic lights while the location is presented as Index
pictures of a house indicating ‘at home’, a factory indicating
‘at work’ and a car indicating ‘on the way’.
In our UI context information describing a person's state is
coded into the Iconic picture of that person as presented in
Table 1. Animation can be used to reflect user activity like
walking, running etc. People can express themselves by
selecting the icon set representing them, which brings
challenges and possibilities for graphic artists.
Table 1. Context information with classes used in user
interface.
User activity
Standing
Walking
Running
Chatting
Environment
Silent
Loud
Dark
Bright
CONTEXT INFORMATION
A mobile terminal may be aware of the context of its user
[6,11]. Data provided by several onboard sources, e.g.
various types of sensors, and remote sources, e.g. location
services, can be processed to a context representation in
which context abstractions describe concepts from the real
world, for example loud, warm or at home. This facilitates
the utilization of context information e.g. in various
applications and in communicating context to other
terminals [5,6]. Describing the context information using
commonly agreed ontology is one way to achieve this. The
sharing of context information between several terminals
can be realized using the latest communication standard
protocols, e.g. GPRS, 3rd generation networks and
Cold
Warm
Hot
Device Activity
Call
Browse
Chat
Idle
Context information related to the environment and device
is more challenging, because these are not first class objects
having icons in UI. Therefore we present context
information related to environment and device only on
request as index and symbolic icons (Table 1) in the same
way as done in [12].
PROTOTYPE
We have created a context-aware map-based interface for
accessing situated services with mobile terminals [9]. The
current prototype, which is built on the Compaq iPAQ 3660
PDA, includes positioning via WLAN and context based
control via an external sensor box [14]. XML-based maps
are rotated with the aid of a compass sensor, and zooming
and scrolling can be performed by a user's gestures derived
from proximity and accelerometer sensors, respectively. An
ontology for describing sensor-based context information is
used in sharing context data [16].
We are exploring the Naked Objects framework [8], as a
user interface solution for interacting with objects and
extending the Naked Objects platform by implementing an
object viewing mechanism (OVM) for PocketPC style
devices, because the original framework contains an OVM
only for desktop PC.
Figure 2. A screenshot of online information sharing
(context information-window) Context data is shown with
clear visualisations.
With the map-based interface, the author and current
location are associated with the multimedia document
(Fig.1b), and the document is added on the map (presented
as an icon, Fig. 1a). Fig. 1a presents a map-based view seen
by User 1. His position is shown in the middle of the screen,
in the center of sight. Two other users are also in the visible
area, and part of their route is illustrated with broken lines.
(a)
(b)
The pull-down menu shows available communication
operations. The context view (Fig. 1c) shows the detailed
context. The context represented by several symbols
provides a partial description of the situation, yet the
interpretation and understanding of the overall situation is
the user’s task.
In Fig. 2 a screenshot of online information sharing is
presented using a context information –window. Context
data is shown with clear visualisations.
(c)
Figure 1. Screenshots of a UI. A user has created a video
clip and placed it on a map. Arrows between screenshots
describe navigation achieved by clicking at the starting
point of the arrow.
The representation and sharing of multimedia and context
information with this UI solution does not require any effort
for switching between various types of applications. To get
rid of the concept of applications that are creating artificial
boundaries for users, a lot of research and development is
needed to make current computing systems support the
division of software into objects and operations.
SUMMARY AND DISCUSSION
A UI solution for mobile terminals presenting and sharing
multimedia with context information is introduced. An
approach utilizes object oriented UI techniques and a shared
geographical map to present multimedia objects and
contexts of group members in the same view. The UI
solution satisfied needs:
•
Sharing interesting findings from the environment by
using multimedia and effortless communication of the
current group situation.
•
Multimedia documents are presented on the map as
icons to compress information representation and to
provide easy access to the full content of objects.
•
Online sharing of context information (activity, device
and environment) with simple but descriptive symbols.
One concern with this approach is that the map becomes
crowded due to active multimedia production and by
bringing other objects and services to the map. This can be
helped to some extend by map labeling algorithms, but some
kind of map filtering methods are needed in the long run.
Other issues requiring more concern (for technical
implementation) include:
•
Deciding how the messages and context information
are delivered and stored in the network,
•
Where the maps are loaded,
•
Who creates maps and in which format.
Conference on, Multimedia and Expo, Vol.1, pp. 749752, 2002.
5. Mäntyjärvi, J. et al. "Collaborative Context
Determination
to
Support
Mobile
Terminal
Applications", IEEE Wireless Communications, Vol
9(5), New York, pp. 39-45, 2002.
6. Mäntyjärvi, J., Seppänen T., “Adapting Applications
According to Fuzzy Context Information” Interacting
with Computers”, Vol.15(3), Elsevier, Amsterdam, To
Appear, 2003.
7. Nakanishi, Y., et at. Context-aware Messaging Service:
A Dynamical Messaging Delivery using Location
information and Schedule information. Journal of
Personal Technologies, Vol.4, Springer Press, pp.221224, 2000.
8. Pawson, R., and Matthews R., "Naked objects: a
technique for designing more expressive systems," ACM
SIGPLAN Notices, Vol. 36(12), New York, USA, pp.
61-67, 2001.
9. Rantakokko, T., and Plomp, J., "An Adaptive MapBased Interface for Situated Services," Submitted to
Smart Objects Conference, Grenoble, France, 2003.
Moreover, aspects that need further investigation comprise
of:
10. Raskin, J., The Humane Interface: new directions for
designing interactive systems, ACM Press, 2000.
•
How the access of users to shared information can be
limited.
•
How to handle terminals, which do not have a mapbased interface.
11. Schmidt, A., et al. Advanced Interaction in Context,
LNCS n:o 1927, 2nd Intl. Symposium on Hand Held and
Ubiquitous Computing, pp. 89-101, 1999.
•
How to provide support for representing more
multiform context information
In the future, we will continue the integration of the map
interface to the Naked Objects platform. Moreover, user
tests are required to obtain experiences in real usage
situations and understanding of symbols used.
REFERENCES
1. Hewagamage, K.P., Hirakawa, M., "Augmented Album:
situation-dependent system for a personal digital
video/image collection", IEEE Intl. Conference on
Multimedia and Expo, Vol.1, pp. 232-236, 2000.
2. Hietala, V., Kuvien todellisuus, Gummerus, Helsinki,
1993, (In finnish).
3. Horton, W., "Designing Icons and Visual Symbols",
Proc. of the CHI '96, ACM Press, New York, USA, pp.
371-372, 1996.
4. Makarainen, M., Isomursu, P., Exploiting multimedia
components in small user interfaces, IEEE Intl.
12. Schmidt, A., et al. H., Context-Phonebook - Extending
Mobile Phone Applications with Context, 3rd Intl.
Workshop on HCI with Mobile Devices, Lille, France,
2001.
13. Schmidt, A., et al. Context-Aware Telephony over WAP,
Personal Technologies, Vol. 4(4), pp. 225-229, 2000.
14. Tuulari, E., and Ylisaukko-oja, A. "SoapBox: A latform
for Ubiquitous Computing Research and Applications,"
Intl. Conference on Pervasive Computing, Zürich,
Switzerland, pp. 125-138, 2002.
15. Keränen, H., Rantakokko T., Mäntyjärvi, J, “Presenting
and sharing multimedia within online communities using
context aware mobile terminals. In IEEE International
Conference on Multimedia and Expo, Vol.2. pp.641-644
2003.
16. Korpipaa, P.; Mantyjarvi, J.; Kela, J.; Keranen, H.;
Malm, E.J., Managing context information in mobile
devices IEEE Pervasive Computing, Vol.2(3) pp.42-51
2003.
Using an Extended Episodic Memory Within a Mobile
Companion
Alexander Kröner, Stephan Baldes, Anthony Jameson, and Mathias Bauer
DFKI, German Research Center for Artificial Intelligence
Stuhlsatzenhausweg 3, 66123 Saarbrücken, Germany
<first name>.<last name>@dfki.de
ABSTRACT
We discuss and illustrate design principles that have
emerged in our ongoing work on a context-aware, useradaptive mobile personal assistant in which an extended
episodic memory—the personal journal—plays a central
role. The prototype system S PECTER keeps track of its
user’s actions and affective states, and it collaborates with
the user to create a personal journal and to learn a persistent user model. These sources of information in turn allow S PECTER to help the user with the planning and execution of actions, in particular in instrumented environments.
Three principles appear to offer useful guidance in the design of this and similar systems: 1. an emphasis on usercontrolled collaboration as opposed to autonomous system
initiatives; 2. provision of diverse, multiple benefits to the
user as a reward for the effort that the user must inevitably
invest in collaboration with the system; and 3. support for
diverse forms of collaboration that are well suited to different settings and situations. We illustrate the way in which
these principles are guiding the design of S PECTER by discussing two aspects of the system that are currently being implemented and tested: (a) The provision of multiple,
qualitatively different ways of interacting with the personal
journal allows the user to contribute to its construction in
various ways, depending on the user’s current situation—
and also to derive multiple benefits from the stored information. (b) S PECTER’s collaborative methods for learning
a user model give the user different ways in which to contribute essential knowledge to the learning process and to
control the content of the learned model.
INTRODUCTION
There is growing agreement, reflected in the very existence
of this workshop, that an extended episodic memory can
constitute a valuable component of systems that serve as
personal companions. But there remain numerous open
questions about how such a memory can be acquired and
exploited. How much work will the user have to do to
ensure that the extended memory is sufficiently complete
and accurate; what form should this work take; and how
can the user be motivated to do it? How can the system
analyze the contents of the episodic memory so as to learn
useful regularities that can in turn be exploited for the assistance of the user?
In this contribution, we discuss and illustrate three of the
principles that we have found useful in our ongoing work
on the relevant prototype system S PECTER. After sketching S PECTER’s functionality and comparing it with that
of some representative related systems, we formulate and
briefly justify three principles which appear to constitute a
useful approach to addressing these requirements. Then,
two aspects of S PECTER are discussed which illustrate
how these principles can serve as a guide to the many interrelated design decisions that need to be made with systems
that feature extended episodic memories.
BRIEF OVERVIEW OF SPECTER
Basic Functionality
S PECTER is a mobile personal assistant that is being developed and tested for three mutually complementary scenarios, involving shopping, company visits, and interaction at
trade fairs. S PECTER exhibits the following characteristic
set of interrelated functions: It extends its user’s perception
by acquiring information from objects in instrumented environments and by recording (to the extent that is feasible)
information about the user’s actions and affective states. It
builds up a personal journal that stores this information.
It uses the personal journal as a basis for the learning of
a user model, which represents more general assumptions
about the user (e.g., the user’s preferred ways of performing particular tasks). S PECTER refers to the information in
the personal journal and user model when helping the user
(a) to create plans for future actions and (b) to adapt and
execute these plans when the time comes.
Relationships to Previous Work
Provision of Multiple Functions
The idea of building up a personal journal figured prominently in the early system F ORGET-M E -N OT ([4]), though
at that time the technology for communicating with objects
in instrumented environments and for sensing the user’s
affective states was much less well developed than it is
now. The much more recent project M Y L IFE B ITS ([3])
has similarly explored the possibility of maintaining an extensive record of a user’s experience, but here the emphasis is more on managing recordings of various sorts (e.g.,
videos) than on storing more abstract representations. The
idea of having a personal assistant learn a persistent user
model can be found to some extent in many systems, such
as the early C ALENDAR A PPRENTICE ([6]); but these systems have not used multifaceted personal journals as a basis for the learning, and there has been little emphasis in
involving the user in the learning process. The idea of providing proactive, context-dependent assistance is reflected
in many context-aware systems, such as shopping assistants and tourist guides; but there is much less emphasis
on basing such assistance on a rich user model or on active collaboration by the user. The idea of collaboration
between system and user has been emphasized in several
projects; in particular, the C OLLAGEN framework (see,
e.g., [8], [9]) is being used explicitly within S PECTER.
The necessary collaboration effort by users implied by the
previous principle will not in general be invested by users
unless they see the effort as (indirectly) leading to benefits
that clearly justify the investment. Designers of a system
that requires such collaboration should therefore try to ensure that the system provides multiple benefits as a reward
for the user’s investment. Even if only one or two particular types of benefit constituted the original motivation
for the system’s design, it may be possible and worthwhile
to look for additional functions that take advantage of the
same user input.
DESIGN PRINCIPLES
In this section, we present and briefly justify three design principles which have proven useful in the design of
S PECTER and which should be applicable to some extent
to related systems.
User-System Collaboration as the Basic Interaction
Model
In the foreseeable future, no computing device will be able
to perform the functions listed above without receiving a
significant amount of help from the user at least some of
the time. For example, a system cannot in general record
all actions of the user that do not involve an electronic device; and the user’s affective reactions and evaluations are
even more likely to be unrecognizable. Therefore, the user
will have to provide some explicit input if a useful personal
journal is to be built up. More generally, it is realistic to
see each of the general functions served by the system as
involving collaboration between system and user, although
the exact division of labor can vary greatly from one case
to the next.
A different justification for an emphasis on collaboration is
the assumption that, even in cases where help by the user
is not required, users will often want to be involved in the
the system’s processing to some extent, so as to be able to
exert some control over it.
Flexible Scheduling and Realization of Collaboration
In addition to the user’s motivation, another obstacle to obtaining adequate collaboration from the user is created by
situational restrictions. For example, when the user is performing attention-demanding activities and/or interacting
with S PECTER via a limited-bandwidth device, she may
be able to provide little or no input.
A strategy for overcoming this problem is to look for ways
of shifting the work required by the user to a setting where
the user will have more attentional and computational resources available. For example, if the system can help the
user to plan a shopping trip in advance, she may be able
to use a high-bandwidth device and to supply information
that she would not have time to supply while actually doing
the shopping. Similarly, if S PECTER makes it worthwhile
for the user to look back reflectively at the shopping trip
after completing it, the user may be able to fill in some of
the gaps in the record that the system has built up about the
trip.
INTERACTION WITH THE PERSONAL JOURNAL
In accordance with the principle of multi-functionality, the
data collection represented by the personal journal should
be exploited in various ways. By providing methods for
information retrieval, the journal may serve as extension
of the user’s personal memory for individual events and
objects. A quite different type of application may use these
data to provide feedback on how the user is spending her
time, suggesting how she could adjust her time allocation
in order to achieve her goals more effectively. Yet another
way of exploiting the personal journal, discussed in the
final major section below, is for S PECTER to mine its data
in order to learn regularities that can serve as a basis for
assistive actions.
The basis of these high-level interactions are so-called
journal entries, which are created based upon signal input
retrieved from an instrumented environment, or by means
of abstraction. In the firmer case, fine-grained symbolic
data are taken as input from sensors, and are directly stored
in the journal. An exemplary setup of such an environment
Figure 1: An environment for testing S PECTER with RFID
input: an instrumented shelf with RFID-enriched products.
On the right-hand side, a laptop which performs shop communication, and provides S PECTER with input. End-user
display and interaction are performed via a PDA.
is shown in Figure 1, where S PECTER has been connected
with the RFID infrastructure created in the project REAL
([10]). The recorded signals serve as input for abstraction
methods, which may range from syntactical, hard-coded
translation to machine learning techniques.
Journal entries are usually created automatically, which
leads to several requirements to S PECTER’s user interface.
Firstly, the kind of the recorded data and potentially the
way how they have been retrieved should be transparent in
order to strengthen the user’s trust in the system. Hence
the user needs a facility for inspecting the journal. Furthermore, content incorporated through entries may contain errors: measurement errors and wrong abstractions
may occur, and the user herself might change with time her
opinion about the correctness of previously created entries.
Accordingly S PECTER requires a user interface, which enables the modification of journal content. Finally, with respect to the goal of a flexible scheduling of collaboration
these requirements are completed by the need for an interface, which is adaptable to varying application scenarios.
Figure 2: The S PECTER browser, with a viewer for listing
journal entries. In the upper area controls for navigation,
in the lower area the viewer display.
Interface Approach
That need for flexibility is taken into account by a journal browser, which enables accessing the personal journal
via so-called viewers. These realize varying data views of
the information stored in the journal, a popular approach
known from systems such as [3], [7], and [11]. An example of S PECTER’s journal browser and a viewer for displaying lists of journal entries is shown in Figure 2.
In this framework, the browser as well as the viewers may
be exchanged with respect to the given platform and interaction task. The browser is a central component that
serves content requests from viewers, provides a repository of resources shared by several viewers. The latter
ones include shared data such as display preferences, and
shared user interface elements, such as access to common
navigation facilities, and the viewer selection. That selection is in general performed automatically by the browser
with respect to the display request, but may also be performed manually by the user if the viewer has registered
itself within the browser’s user interface.
Due to their varying functions, viewers may differ in their
interaction not only with the user but also with the system itself. For instance, when displaying a list of journal
entries, a viewer may be updated automatically when new
entries (e.g., from sensors) arrive. That behavior might be
confusing if the user is just entering data using a form-like
viewer. Therefore the browser relies on a feature mechanism to configure itself with respect to the viewer’s preferences: a viewer’s configuration includes a list of feature
triggers, which may be applied by S PECTER components
such as the browser in order to adapt their behavior to the
given viewer. Following our previous example, this way
the form editor may indicate that display updates are not
granted while the viewer is active.
Navigation and Annotation
Navigation in the personal journal relies in the first place
on requests, which provide a similar functionality as hyperlinks. Instead of Web addresses, they make reference
to particular S PECTER components, optionally further described using form parameters. Additionally, they may
carry a complex value encoded in XML that is submitted
to the requested component. The user may apply these requests to browse the journal similarly as the Web, and may
organize frequently used requests in a list of bookmarks.
An alternative way of navigating the journal is provided by
the so-called reminder points. This specific kind of journal
entry is created by the user during interaction with the environment with only one click (see the “!” button in the upper right corner of Figure 2). The rationale of these points
is that the user might be too busy or distracted to provide
detailed feedback. Nevertheless she might notice the need
to adjust the system’s behavior, and this need can be expressed via a reminder point. Later on at a more appropriate time and location for introspection, she may inspect
the recorded reminder points and perform in collaboration
with S PECTER the required adjustments.
Another way of dealing with journal entries is annotation.
It provides a means of associating information with entries
quite similar to the approach applied in [3]. In S PECTER,
annotations serve in the first place as storage for information about how an entry performs with respect to selected
aspects of the user model. Accordingly annotations include free text, references to other journal entries or Web
pages, content categories, and ratings. Here content categories represent predefined content descriptions, provided
by S PECTER for quick (and less precise) description of the
kind of content. A rating expresses the performance of an
entry with respect to a rating dimension selected from a
predefined set (e.g., importance or evaluation).
Annotations are further described by a fixed set of meta
data. These capture information about the annotation such
as a privacy level, and the source of the annotation. The
latter one is of particular importance, since the user has to
stay informed about who has created an annotation - she
herself, or S PECTER. An editor for entry annotations is
shown in Figure 3. The form-like viewer provides feedback about the annotations associated with an entry, and
Figure 3: A form-like viewer that enables annotating a
journal entry: a field for entering a free text comment,
check boxes for content category selection, and select
boxes for performing ratings. The selected values are
marked with their sources (see “by”).
enables editing them in part.
COLLABORATIVE LEARNING OF THE USER MODEL
As was mentioned above, one of the benefits offered by the
personal journal is the ability of the system to learn a user
model that can in turn serve as a knowledge source for intelligent assistance. Just as the acquisition of data for the
personal journal is best viewed as a collaborative process,
the same is true of the process of learning the user model.
The system brings to this process (a) a large amount of
data from the personal journal, (b) a repertoire of learning
techniques, and (c) a large amount of computing capacity
(and unlimited “patience”) that can be applied to the learning task. But the user’s input will in general also be necessary: Her common-sense knowledge and introspective
abilities can help to filter out spurious additions to the user
model that would simply reflect chance regularities in her
behavior (cf. [1]). Moreover, when there are several possible learned models that are equally well supported by the
data, the user may reasonably prefer the model that makes
the most intuitive sense to her. In short, the basic conception of S PECTER gives rise to a novel interface design
challenge: How can a user who has no technical knowledge of machine learning or user modeling be allowed to
collaborate in the process of learning a user model from
data?
We will look at this problem in connection with one particular function of S PECTER’s user model: that of triggering
the offering of services to the user.
S PECTER offers several types of service to the user. Some
of these make use of external resources (e.g., technical devices such as printers), while others make use of internal
functions of the system (e.g., retrieval of facts from the
personal journal). If S PECTER simply waited for the user
to request each possible service explicitly, many opportunities would be lost, simply because the user is not in general aware of all currently available and relevant services.
Therefore, S PECTER tries to learn about regularities in the
user’s behavior that will allow it to offer services at appropriate times. For example, if the learned user model
indicates that the user is likely to want to perform a particular action within the next few minutes, S PECTER may
offer to activate a service that will facilitate that action.
While there exist a number of approaches for collaborative
learning that involve a human in the process of constructing a classification model (e.g., a decision tree), these approaches focus on supporting data analysts as opposed to
essentially naive users (see, e.g., [2]). We are currently
investigating the use of machine learning tools like TAR2
([5]) that apply heuristics to produce imperfect, but easily understandable—and thus, modifiable—classification
rules. Here we present an assistant component, the trigger editor, which gives the user intelligent suggestions for
creating and modifying trigger rules for services.
Example: The EC Card Purchase Service
Our discussion will refer to the following example. Suppose that the user sometimes pays in stores with an EC
card.1 At one occasion the cashier rejects her card telling
her that her bank account provides insufficient funds to pay
for her shopping. In order to prevent this embarrassing experience in the future, the user sets a reminder point, thus
marking the current situation—and the resulting entry in
the personal journal—to be dealt with later.
The rationale of this is that the user decided to create an
automated service that triggers a status check of her bank
account—a basic functionality provided by S PECTER—
whenever an EC card payment is likely to occur. In order
to do so she will create an abstract model of this particular type of situations using S PECTER’s machine-learning
capabilities. Whenever a new shopping situation occurs,
S PECTER will use this model to classify the situation and
trigger the bank account check in case this classification
indicates a high probability of the user using her EC card.
Identifying Training Examples
In a first step, S PECTER’s machine-learning component
needs a number of training examples—previous shopping
episodes stored in the personal journal that can be used to
distinguish EC payments from “non-EC payments”. The
system displays the entry marked with the reminder point
and asks the user to indicate what is special about it. The
user indicates the use of the EC card (“MeansOfPayment
= EC card” in the personal journal) whereupon S PECTER
looks for previous entries of the same category (shopping)
with identical and differing values for MeansOfPayment
and classifies these examples according to this value (“positive” for EC card, “negative” for all other values).
Learning a Decision Tree
Once the training data have been identified, S PECTER applies a machine-learning algorithm to create an appropriate
classifier. One useful learning technique in this context is
decision tree learning, which yields a relatively comprehensible type of model (see Figure 4). Even though users
would seldom be willing or able to define a reasonably accurate decision tree entirely by hand, critiquing a decision
tree proposed by the system may be a reasonably easy—
and perhaps even enlightening—activity, if the user interface is designed well.
Figure 4 shows two decision trees that the user might deal
with in connection with the EC Card Purchase service.
Each node of a tree is labeled with an attribute, and each
edge specifies a possible value (or range of values) for the
attribute. Each leaf of the tree is labeled as positive or negative, indicating the decision that results if a path is traversed through the tree that leads to this leaf. In the case
of service triggering, a positive result means that S PECTER
should establish the goal of invoking the service. (Whether
or not the service is actually invoked can depend on other
1
For non-European readers: An EC card is like a credit
card except that the funds are transferred to the recipient
directly from the purchaser’s bank account.
(b) Tree generated after the attribute "Store"
(a) Initially generated decision tree
has been specified by the user to be irrelevant
Figure 4: Two examples of decision trees that arise during the collaborative specification of a rule for triggering the service
EC card purchase.
factors, such as the existence of competing goals.) 2
When the system presents a learned tree such as the one in
Figure 4 (a), the user can critique it in any of several ways,
including: eliminating irrelevant attributes, selecting paths
from the tree, and modifying split decisions. The question
of what interface designs are best suited for this type of
critiquing requires further exploration and user testing; the
next subsection describes the critiquing interface currently
being tested in S PECTER.
Critiquing of Decision Trees
Figure 5 shows the two main dialog boxes of the current
decision tree editor, which is implemented as a viewer that
runs within the browser. The interface allows the user to
critique the current decision tree for a given type of decision step by step until she is satisfied with the result.
The standard interface hides the potential complexity of a
decision tree as depicted in Figure 4 by merely listing the
set of attributes used by the machine-learning component
(see Figure 5 (a)). Depending on regularities occurring
in the training data, some of these attributes—although
well-suited to discriminate positive and negative examples in the decision tree—might make little sense from
the user’s perspective. For example, if the user happened
to use her EC card only in the morning in all shopping
episodes recorded by S PECTER, then the attribute TimeOfDay will almost inevitably be used in the decision tree.
The user’s background knowledge, however, enables her
to easily identify such “meaningless” aspects and remove
this attribute altogether, thus preventing its use for classification purposes.
2
Attributes other than the ones shown in this example
may also be relevant—e.g., the total cost of the other items
that are still on the user’s shopping list and which therefore
remain to be purchased on the same day.
Another way of critiquing the decision tree is to replace
an attribute by another one which is semantically related.
To this end, whenever the user presses the “Add related
attribute” button, S PECTER will identify concepts in the
domain ontology that are in close proximity to the one represented by the attribute under consideration and generate
appropriate attributes to be used in the decision tree. This
way, the system’s capabilities to deal with regularities and
statistical relationships among the training data is complemented by the user’s ability to deal with semantic interrelations.
Advanced users have even more options to influence the
machine-learning component of S PECTER. She can directly inspect the classification model (either depicted as
a decision tree, a set of rules (see Figure 5 (b), or visualized in some other way yet to be investigated) and change
the split criteria, i.e. the attribute values tested in a rule or
tree (e.g. change Price from 117.50 to 100 as depicted in
Figure 5). Doing so will of course affect the classification
accuracy, i.e. the percentage of correct classifications of
episodes from the personal journal. The user is informed
about the current quality of the classification model and
can bias the system to produce “false positives” rather than
“false negatives” (which would mean that the bank account
is checked even in some situations when the user will not
use her EC card) or vice versa, depending on which error
is more serious for the user.
One of the many design issues that we are exploring in
connection with this interface is the question of whether
(a) to present to the user a graphical depiction of the decision tree (as shown in Figure 4), (b) to stick to dialog
boxes such as those in Figure 5, or (c) to offer a selection
of interfaces. The principle of flexible scheduling and realization of collaboration suggests providing several different (though fundamentally consistent) views of a decision
(a) Simple critiquing options
(b) Advanced critiquing options
Figure 5: The basic (left) and advanced (right) dialog boxes in the current version of the S PECTER decision tree editor.
tree that are appropriate for different usage situations (e.g.,
a quick check on a rather unimportant decision tree vs. indepth analysis and editing of a highly important one). The
two different dialog boxes shown in Figure 5 represent a
step in this direction.
CONCLUSION AND FUTURE WORK
In this contribution we described in part our ongoing work
on S PECTER, a system that aims at assisting users in instrumented environments by means of an episodic memory. First results include a set of design principles, which
have already proven their value as guides through a potentially immense design space. We believe they may be
found helpful by designers of other systems featuring advanced personal memories. These days, a commercial
product (Nokia LifeBlog3 ) was released that allows the
user to create a simple version of what we call a personal
journal. This provides a hint that this kind of functionality
may sooner or later enter our daily lives.
3 www.Nokia.com/lifeblog
We have illustrated how these design principles have been
applied during the development of one of S PECTER’s most
important components: the personal journal. In the sequel
we concentrated on the interface approach consisting of
a journal browser and varying viewers. Here the browser
allows navigating the journal contents using a hyperlinklike approach, and the viewers provide varying data views.
An application of this interface is the decision tree editor.
By means of our prototype implementation, we illustrated
how the end user may construct triggers for services provided by S PECTER. This construction is basically an iterative process, where S PECTER is creating candidate decision trees that might serve as triggers, and the user is
critiquing these trees. This process is supported by the
editor in various ways including attribute selection, biasing the learning component with the aim to minimize conseuquences resulting from classification errors, and manual modification of the generated models.
Our next steps will include the extension of the view-based
interface. For instance, we have to acquire information
about which kinds of viewers are actually required by the
end user, and we have to evaluate implemented viewers.
Additionally, the machine-learning component will need
an interface that makes its complicated inferences and the
resulting models accessible to even naive users.
ACKNOWLEDGMENTS
This research was supported by the German Ministry
of Education and Research (BMB+F) under grant 52440001-01 IW C03 (project S PECTER). Furthermore, we
like to thank the REAL team for the valuable advice and
support.
1. Gediminas Adomavicius and Alexander Tuzhilin.
User profiling in personalization applications through
rule discovery and validation. In Proceedings of
the 5th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD’99),
pages 377–381, San Diego, CA, 1999.
2. M. Ankerst, C. Elsen, M. Ester, and H.-P. Kriegel.
Visual classification: An interactive approach to decision tree construction. In Proceedings of the 5th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’99), pages
392–396, 1999.
3. J. Gemmell, G. Bell, R. Lueder, S. Drucker, and
C. Wong. MyLifeBits: Fulfilling the Memex vision.
ACM Multimedia, pages 235–238, 2002.
4. Mik Lamming and Mike Flynn. “Forget-me-not”: Intimate computing in support of human memory. In
Proceedings of FRIEND21, the 1994 International
Symposium on Next Generation Human Interface, Meguro Gajoen, Japan, Meguro Gajoen, Japan, 1994.
5. T. Menzies, E. Chiang, M. Feather, Y. Hu, and J. D.
Kiper. Condensing uncertainty via incremental treatment learning. Annals of Software Engineering, Special issue on Computational Intelligence, 2002.
6. Tom Mitchell, Rich Caruana, Dayne Freitag, John
McDermott, and David Zabowski. Experience with
a learning personal assistant. Communications of the
ACM, 37(7):81–91, 1994.
7. D. Quan, D. Huynh, and D. R. Karger. Haystack: A
platform for authoring end user semantic web applications. In Proceedings of the 2nd International Semantic Web Conference (ISWC2003), pages 738–753,
Sanibel Island, Florida, USA, 2003.
8. Charles Rich and Candace L. Sidner. COLLAGEN: A
collaboration manager for software interface agents.
User Modeling and User-Adapted Interaction, 8:315–
350, 1998.
9. Charles Rich, Candace L. Sidner, and Neal Lesh.
COLLAGEN: Applying collaborative discourse theory to human-computer interaction. AI Magazine,
22(4):15–25, 2001.
10. M. Schneider. Towards a transparent proactive user
interface for a shopping assistant. In A. Butz, C. Kray,
A. Krüger, and A. Schmidt, editors, Proceedings of
the Workshop on Multi-User and Ubiquitous User Interfaces (MU3I 2004) SFB 378, Memo Nr. 83, 2004.
11. C. Shen, B. Moghaddam, N. Lesh, and P. Beardsley.
Personal Digital Historian: User interface design. In
Extended Abstracts of the 2001 Conference on Human
Factors in Computing Systems, 2001.
u›Photo: A Design and Implementation of a Snapshot
Based Method for Capturing Contextual Information
Takeshi Iwamoto
Graduate School of Media and Governance
Keio University
5322 Endo Fujisawa Kanagawa
[email protected]
Shun Aoki
Graduate School of Media and Governance
Keio University
5322 Endo Fujisawa Kanagawa
[email protected]
Kazunori Takashio
Graduate School of Media and Governance
Keio University
5322 Endo Fujisawa Kanagawa
[email protected]
ABSTRACT
In this paper, we propose u-Photo, a method uses “action of
taking photograph” as a metaphor for capturing contextual
information. u-Photo is a digital photo image that can store
not only visible pictures but invisible information that can be
collected from embedded sensors and devices in ubiquitous
environment. Using u-Photo, a user can intuitively capture
and view contextual information. Moreover, we present several applications that become possible by u-Photo, remote
control of devices, remote monitoring of environment, suspend/resume of user’s task.
Keywords
Pervasive Computing Architecture, Contextual Information,
Sensors, Smart Appliances
INTRODUCTION
Today, ubiquitous computing is becoming a popular research
field and various issues are addressed. In ubiquitous computing environment, many sensors and devices are spread
around a user and can be embedded in environments such
as a living room, office and so on. These invisible sensors
and devices can obtain many information about users or environment and can provide it as contextual information to
applications or middleware. In this paper, we propose a suitable method for capturing the contextual information , named
Genta Suzuki
Graduate School of Media and Governance
Keio University
5322 Endo Fujisawa Kanagawa
[email protected]
Naohiko Kohtake
Graduate School of Media and Governance
Keio University
5322 Endo Fujisawa Kanagawa
[email protected]
Hideyuki Tokuda
Graduate School of Media and Governance
Keio University
5322 Endo Fujisawa Kanagawa
[email protected]
u-Photo.
u-Photo is a digital photo image which contains contextual
information of an environment. In other words, u-Photo can
store invisible information obtained from embedded devices
or sensors along with ordinary photo image. Furthermore,
objects in the picture are used as keys for controlling and obtaining surrounding environment information of the object.
When taking a u-Photo, the “viewing finder” and “releasing
shutter” actions are identical to the ordinary digital camera.
The action of “viewing finder” determines the target area for
capturing contextual information, and “releasing shutter” determines the timing. Through these actions, namely “taking
a photograph”, contextual information can be stored into a
digital photo image as u-Photo.
In order to provide intuitive method of viewing contextual information, we present “u-Photo Viewer”, which is a viewer
application for u-Photo. u-Photo Viewer provides easy access to stored contextual information in u-Photo. Users are
provided with GUI for viewing contextual information and
controlling devices taken in u-Photo. u-Photo Viewer places
the GUI for controlling devices over the objects in the picture.
In this paper, we present the design and implementation of
our system to realize u-Photo. The remainder of this paper
is structured as follows; Section 2 presents the scenario of
using u-Photo. Design issues of our research are described
in section3, and the design and implementation are describe
in section4 and 5. Related works are summarized in section
6.
SCENARIO
In order to clarify our research goal, now we present several
scenarios using u-Photo.
Scenario1: Controlling Remote Devices using u-Photo
Bob takes pictures of his room, which are stored as u-Photo
in his PDA. He goes out to work, forgetting to turn off the
room light. After finishing work, he realizes he might have
left the room light on. To check whether the light is on or not,
he uses the u-Photo Viewer in his PDA and taps the “light
icon” displayed on top of the light image on u-Photo (shown
in Figure1). His u-Photo Viewer responds and shows that
the room light’s status is on. He then taps the “OFF” button
which is displayed in the u-Photo to turn off the room light.
Figure 2: GUI and Environmental Information
Figure 1: (a) shows “taking a u-Photo” (b) shows GUI for
controling the light
Scenario2: Capturing/Showing Environmental Information
After turing off the light, Bob decides to go home. Wanting the room to be comfortable when he gets home, he views
the environment information of his room, such as temperature and brightness (shown in Figure2). Clicking the icon of
each appliance on the u-Photo Viewer displays the working
conditions of each appliance. He controls the air conditioner
as he did the room light to make the room temperature more
comfortable before reaching home.
Scenario3: Suspend/Resume User’s Task
On another day, Bob is watching a video at home, but must
go out for an appointment. He takes a screen shot of the TV
with u-Photo before suspending the show. After his appointment, he goes to a cafe to relax. There, he looks up a “Public
Display Service”. To use the service, he opens the u-Photo
that was taken when watching his video at home. By Operating the GUI on the u-Photo Viewer, he can easily migrate
the state of the show to the public display. As a result, he can
watch the rest of the show using the public display (shown in
Figure3).
DESIGN ISSUES
In this section, we consider design issues for u-Photo. First,
we address the issue concerning the snapshot based method
Figure 3: (a)“taking a u-Photo” to save task state; (b)
resume the task using public TV
adopted by u-Photo. We will present the reason we adopted
this method for building our system, for several methods exist in previous researches for dealing with contextual information. Second, we address issue on the domain of information that should be treated as contextual information.
Snapshot Based Method
In the previous scenario, using the snapshot based method
enabled users to obtain several benefits such as referring contextual information of their room, controlling devices remotely
and resuming suspended tasks. When designing a system
dealing with contextual information, how a system decides
the target area and the timing to capture contextual information tends to be the most important issue. Next we discuss
these two issues, target and timing.
In general, there are two approaches for target determination.
In one approach, the system automatically decides the target
without the user intervening. In another approach, the user
decides on the target area. Evidently, the first approach of
obtaining contextual information automatically is easier for
the user, since no interaction or operation is necessary between the user and the system. However, a system taking
this approach needs to decide on the appropriate range of
capturing contextual information considering user’s requirements. This is difficult, for user’s requirements are prone to
change depending on their situation. The second approach
allows the user to specify the target area. Therefore, forcing
an undesired target area on the user is avoided. However, this
approach involves complicated operations in cases where the
system can not provide an appropriate method for specifying
a target area.
To solve this problem, we use “snapshot” as a metaphor for
“capturing contextual information”; a user can specify a target area through an intuitive operation similar to taking an
ordinary photograph. Although this method is more complicated compared to systems that adopt an automatic capturing mechanism, our system has the advantage of providing
a method of capturing contextual information intuitively. To
be more specific, by taking a picture using a digital camera, a
user can take a “u-Photo” which contains various contextual
information about the area within the range of the finder.
Ordinarily when taking a photograph, user places focus on
objects that are in sight. Similarly, when taking a u-Photo, a
user captures an area of contextual information based on objects, namely devices or sensors. We call the objects that are
landmarks in the user’s sight as “key object”. In the u-Photo
system, stored contextual information are indexed by key objects, so users may view contextual information nearby and
control devices that are designated as key objects.
photo image. As described in the scenario, users can control devices remotely by touching the icon corresponding to
the device, and can recognize context such as temperature or
brightness by reading information indicated on the u-Photo.
Contextual Information in u-Photo
Next, we discuss the issue of the kind of information u-Photo
should treat as contextual information. Previous researches
have taken up contextual information with several difference
meanings. Therefore, we will first discuss our definition in
this research. Roughly speaking, the definition of contextual information can be divided into two extremes which are
highly integrated and fundamental. Fundamental information can be directly obtained from devices or sensors. This
information is usually represented as numerical value or raw
data, depending on the individual sensors or devices. Highly
integrated contextual information is produced as a result of
aggregation and interpretation of fundamental information,
such as “the user is working” or “the user is talking with
another person” and so on. In our research, we focus on
the method of capturing contextual information from the real
world, so our system deals with individual sensors and devices that provide only simple information on the environment. Dealing with highly integrated contextual information
in u-Photo is future work.
We assume that fundamental information can be classified
into following three groups:
To summarize the discussion of target are and timing of capturing, our approach which uses the action of taking a photo
as a metaphor would be a reasonable method for dealing with
contextual information.
Device Information:
Device information is information on the kinds of devices
available in the environment taken in u-Photo. Device information is needed to indicate and display available devices on a u-Photo. By clicking the icon displayed in the
u-Photo, the user can control appropriate devices as presented in the scenario. To create this icon, u-Photo system
needs to recognize the available devices in the environment
when the u-Photo is created.
Sensor Information:
Information from sensors embedded in a room or device is
sensor information. Sensor information is obtained from
sensors or devices directly, thus data format is dependent
on the source. For u-Photo to be available in various environments, abstracted interface for handling data and representation depended interface need to be provided.
Task Information:
Task information is information on tasks that are performed
by users when takeing a u-Photo. This information contains execution status of devices within a u-Photo. Using
the stored information, a user can resume the task in other
environment.
Turning now to the method of viewing contextual information, in our approach, the information appears on top of the
In the next section, we describe a design of mechanisms for
managing these information in detail.
The second issue, the timing the system captures contextual information needs to be discussed. We found three approaches to solve this problem. In the first approach, the system continuously captures contextual information or a user,
and searches for an appropriate context later on when the
user wants to refer to a particular information. This approach
causes several problems such as scalability, for much disk
space is necessary, and difficulty in searching appropriate information from the massively stored data. The second approach is to make the system decide on the timing. This
approach causes problems similar to when the system automatically decides on the target area. The third approach
which is the one we choose, is make the user himself decide
on the timing. The action of taking the photo is interpreted
as the timing decision by the system, so users may intuitively
decide the timing of capture.
as device, PCs and so on. In the scenario, the room light and
the air conditioner are correspond to key objects.
DESIGN
System Overview
In previous section, we discussed about design issues that
should be addressed to develop the practical system for uPhoto. Components and the relation between them are shown
in Figure 4. Most components belong to the u-Photo Creator
that is the software for creation of u-Photo. Each component
gathers proper information for creating u-Photo.
When a user takes a u-Photo, u-Photo system should obtain
an object location in the focus area, because contextual information must be mapped on an appropriate location on the
image. Therefore, it is necessary to provide a mechanism for
recognition of key objects in the u-Photo Creator by image
processing.
Device and Task Information Management
We classify devices treated by u-Photo into two types. One
can deal with media related data such as televisions, speakers
and so on. Another is a simple controllable device such as a
light, an air conditioner and so on.
Figure 4: Design of u-Photo Creator
We adopt Wapplet framework[5] to develop the first type of
device, namely devices that deal with media data. In Wapplet framework, devices are abstracted as “Service Provider”
that is a middleware running on individual devices. Service
Provider has several types of interfaces for media types that
devices can handle. For example, Television can handle two
types of media type, video and sound output. Thus, a Service Provider of Television has two interfaces for each media
type. We define four media types as following; video, audio,
text and image. Interfaces for each media types to control
devices are designed. If two devices provide same interface,
users can use them alternatively even if actual devices are
different. When user moves to other environment after taking u-Photo, usually it is not certainty that same devices are
available. In Wapplet framework, not only interface but format of stored information are unified for every media types
in order for suspended tasks migrate between devices using
stored information.
Description Format of Contextual Information
u-Photo Creator need to store gathered information from each
component into an image file. Since the description of information written in a u-Photo should be readable and wellformatted, we choose XML for description of various information obtained from each component. Sample XML description is show in Figure 5.
<?xml version="1.0" encoding="shift_jis" ?>
<u_photo xsize="640" ysize="480">
<timestamp>Tue Jan 13 03:48:05 JST 2004</timestamp>
<location>Keio University SFC SSLab</location>
<devices>
<device id="1" name="PSPrinter">
<coordinate><x>51</x><y>161</y></coordinate>
<wapplet name="PSPrinter">
<media_type>text</media_type><status>100000</status><time>0</time>
<service_provider>PrinterProvider</service_provider>
<ip>dhcp120.ht.sfc.keio.ac.jp</ip>
</wapplet>
<sensors></sensors>
</device>
<device id="2" name="CDPlayer">
<coordinate><x>349</x><y>343</y></coordinate><wapplet name="CDPlayer">
<media_type>audio</media_type><status>100000</status><time>0</time>
<service_provider>AudioProvider</service_provider>
<ip>dhcp120.ht.sfc.keio.ac.jp</ip></wapplet>
<sensors></sensors>
</device>
Another type of devices are those that provide only simple
interface for controlling themselves, such as the air conditioner shown in the scenario. Devices belonging to this type
can be controlled by command based interface such as “on”,
“off”, and so on. In u-Photo Creator, these devices provide
description of GUI written in XML to control themselves.
Each description correspond to one command to drive the
action of the device. Sample description of the button that
provide “light on button” is shown in figure 6
<device id="3" name="ColorProinter"><coordinate><x>542</x><y>308</y>
</coordinate><wapplet name="ColorProinter">
<media_type>text</media_type>
<status>100000</status><time>0</time>
<service_provider>PrinterProvider</service_provider>
<ip>dhcp120.ht.sfc.keio.ac.jp</ip></wapplet>
<sensors></sensors>
</device>
</devices><sensors></sensors>
</u_photo>
Figure 5: Sample XML Format
Object Recognizer
As previously described, users may decide the target scope
by viewing the finder of a camera. It is possible that several
key objects are contained in a u-Photo. A key object is a
landmark for deciding an area that a user wants to capture
contextual information as u-Photo. An object that can meet
requirements of the key object is visible and intelligent such
<button name="ON">
<ip>131.113.209.87</ip><port>34567</port>
<command>LIGHT_ON</command>
</button>
Figure 6: Sample Descrption of Button
Both types of devices also should reply to requests from uPhoto Creator for obtaining service state or command list.
Sensor Information Management
When a user takes a u-Photo, an area is decided by the action
of “view a finder” and “release a shutter”. The environmental information which are stored in the u-Photo must cover
the area where the user desire to capture. Therefore, u-Photo
needs a sensor system which can represent various areas as
higher abstraction. For example, in the scenario, the sensor
system needs to represent areas as “around the object” or “in
the scope of finder”. By these abstract representation of areas, u-Photo can provide intuitive environmental information
to the user.
screen as a finder and also can release a shutter on the PDA.
In result of this action of taking photo, u-Photo Creator creates a u-Photo and sends it to u-Photo Viewer running on the
same PDA.
MARS is a sensor system which can meet requests with specified sensing area from application, and provides sensor data
acquired from sensors in the specified sensing area. MARS
supports following representations of the area.
“In the scope of the finder”
“Around an object”
This enables applications to acquire sensor data of the target
area without considering independent sensors.
In MARS, every sensors have its own meta-information. Sensors notify MARS of their own meta-information. On the basis of the meta-information, MARS determines if the sensor
data is associated with application’s specified area. MARS
defines the meta-information as listed in Table 1.
meta-information
The area of sensor existence
The object of attaching the sensor
Location information of the sensor
The types of the sensor
The format of the sensor data
examples
roomname
char,bed,user
(x,y,z)
temperature,humidity
8byte,0.1/sec
Figure 7: System Architecture
u-Photo Creator
To recognize a key object, LED Tag, which is shown in Figure 8 are attached to corresponding objects (shown in Figure8). LED Tag Recognizer capture images from USB Camera and process them to detect the location of objects on the
image. Each LED Tag appearing different colors, that represent the ID of the object. LED Tag Recognizer looks up the
directory service to obtain information about the object to be
used by Device/Task Information Manager. Using the information which is result of lookup, u-Photo Creator checks the
device status and obtains command list to contorl them .
Table 1: meta-information of sensors
MARS has a database which manages various meta-information
of sensors. When an application requires its own area, MARS
searches the information which is associated with the application’s request in the database. If some meta-information fit
the request, MARS provides sensor data to the application.
For example, when there are a room and a display in u-Photo,
u-Photo Creator specifies its own area as “in the roomA” and
“around the displayB”. MARS searches meta-information which
have “roomA” or “displayB”. If MARS discovers them, the
sensor which have the meta-information provides its data
through MARS.
IMPLEMENTATION
Currently, our implementation is organized as shown in Figure 7. We assume that a PDA can be used as a digital camera, however, image proccessing on the PDA is difficult due
to limitations of computation power. Therefore, the component for capturing images and detecting objects are separated
away from PDA in our prototype implementation. However,
users can determine the scope of u-Photo by using the PDA
Figure 8: LEDTag on the light device
All part of implementation of u-Photo Creator is written in
We use EXIF[2] to embed all of the information from each component
into JPEG image.
Java. For image procesing,we used JMF2.1.1e (Java Media
Framework).
ing status of tasks and using photos as user interfaces of a
target object in the photograph.
MARS
CONCLUSION
Implementation of MARS was done on linux 2.4.18-0vl3,
using J2SDK1.4.1 and PostgreSQL7.3.4. In this implementation, mica2[4] developed by UC Berkeley was used for the
sensor, and temperature and brightness were acquired.
When mica2 starts, it notifies its meta-information to MARS,
and MARS registers the meta-information into a database.
MARS is able to offer the sensor data according to the demand from u-Photo.
u-Photo Viewer
u-Photo Viewer was implemented on a PDA, Zaurus SL-860,
and was written using Java. Current u-Photo Viewer provide
following fuctionalities presented in the scenario; controlling
devices, viewing environmental information and resuming
the task stored in the u-Photo.
RELATED WORK
There has been similar researches that capture contextual information. NaviCam[6] displays situation sensitive information by superimposing messages on its video see-through displays using PDAs, head mounted displays, CCD cameras and
color-code IDs. InfoScope[3] is also an information augmentation system using camera and PDA’s display without
attaching any tags on objects. When the user points their
PDA to buildings or places, the system displays the name of
the place or stores in the building on PDA. DigiScope[1] annotate image using visual see-through tablet. In this system,
the user can interact with embedded information related to
a target object by pointing to the object. Although these researches are similar to u-Photo in terms of annotating image,
they focuse on real-time use in which the users can interact
a target object currenly in front of them. We concentrated on
recording contextual information and reusing them in different environment.
Truong et al.[7] have developed applications in which tasks
are recorded as streams of information that flow through time.
Classroom 2000, one of their applications, captures a fixed
view of the classroom, the lecture and other web accessible media the lecture may want to present. In this approach,
what to record or when to record streams depends on each
applications. In addition, since tasks they target on are never
executed again, every state of the task need to be recorded
as streams. On the other hand, tasks we target on are reproducible, since we only note the status of tasks which is
captured when the user release shutter to digital photos.
Focusing on recording contextual information to digital photos, several products have already been provided. Statuses
of cameras (e.g. focal length, zoom, and flash) are provided
by digital cameras and information of global positioning system (GPS) are provided by cellular phones. However, present
products and format of photos don’t provide methods for not-
In this paper, we presented u-Photo, which provides an intuitive method for capturing contextual information. With this
snapshot based method, users can easily determine a target
and timing of capturing contextual information as desired.
By using u-Photo, users can view contextual information,
easily control device and suspend/resume their task. In our
current implementation, we used mica2 as the sensor, Zaurus
as the PDA for camera, and several devices for controlling.
We achieved several applications presented in the scenario
using this implementation.
REFERENCES
1. Alois Ferscha and Markus Keller. Digiscope: An invisible worlds window. In Adjunct Proceedings of The
Fifth International Conference on Ubiquitous Conputing, pages 261–262. acm, 2003.
2. Exchangeable Image File Format. http://www.exif.org.
3. Ismail Haritaoglu. Infoscope: Link from real world to
digital information space. In Proceedings of the 3rd international conference on Ubiquitous Computing, pages
247–255. Springer-Verlag, 2001.
4. Jason Hill and David Culler. A wireless embedded sensor architecture for system-level optimization. Technical
report, U.C. Berkeley, 2001.
5. Takeshi Iwamoto, Nobuhiko Nishio, and Hideyuki
Tokuda. Wapplet: A media access framework for wearable applications. In Proceedings of International Conference on Information Networking, volume II, pages
5D4.1–5D4.11, 2002.
6. Jun Rekimoto and Katashi Nagao. The world through
the computer: Computer augmented interaction with real
world. In Proceedings of Symposium on User Interface
Software and Technology, pages 29–36. acm, 1995.
7. Khai N. Truong, Gregory D. Abowd, and Jason A. Brotherton. Who, what, when, where, how: Design issues
of capture & access applications. In Proceedings of the
3rd international conference on Ubiquitous Computing,
pages 209–224. Springer-Verlag, 2001.
The Re: living Map - an effective experience
with GPS tracking and photographs
Yoshimasa Niwa*, Takafumi Iwai*, Yuichiro Haraguchi**, Masa Inakage*
Keio University *Faculty of environmental information
**Graduate school of media and governance
{niw, takafumi, hrgci, inakage}@imgl.sfc.keio.ac.jp
ABSTRACT
This paper proposes an application, The Re: Living Map,
which provides an effective city experience using a mobile
phone, GPS tracking and photographs, and describes a new
method for constructing the system, named “gpsfred”.
Keywords
GPS, Tracking, Mobile Phone, Information Design
INTRODUCTION
The proliferation of network connected, GPS-enabled
mobile phones have allowed people to utilize positional
information via GPS. While accessing information with
mobile phones is becoming routine, the current use of GPS
information so far has been limited to navigation. However,
as it is possible to obtain GPS information from anywhere
while connected to network, we can expect to see various
other practical uses to emerge in the future. The mobile
phones’ functions as a digital camera are evolving rapidly
as well.
There are also some applications that enable
communications through use of GPS information [3, 6].
One such example, The Living Map (Figure 1), is a
previous research we have done. It is an online community
tool that enables the user to exchange information about the
city using a mobile phone. On this research, we used a city
map as the interface for exchanging city information and
creating network communities based on people’s interests.
Although these applications utilize GPS and/or
photographs, no previous application has put GPS tracking
at its basis to our knowledge. We propose three phases that
allows us to effectively experience the city with GPS
tracking and photographs: Packaging, Reliving, and
Sharing.
Standing on these backgrounds, we propose an application
that allows users to effectively experience (relive) the city
from a new perspective using photographs and the GPS
tracking system named gpsfred. The project is based on our
previous research, The Living Map [3].
BACKGROUND
By linking photographs and transforming them in a similar
fashion as Hayahito Tanaka's Photo Walker [4], we can
create a virtual three-dimensional space that users are able
to vicariously experience. Noriyuki Ueda's GIS with
cellular phone+WebGIS [8] also creates a virtual city by
relating photographs and GPS position data.
Figure 1: The Living Map
EFFECTIVE EXPERIENCE WITH GPS TRACKING
We propose three phases to realize effective city
experiences. In the city, users can package their own
experiences through taking photographs and using the GPS
tracking systems embedded in the mobile phones
(Packaging). After returning home, they use our proposed
application, to relive their experiences (Reliving), and
finally, to share their city experiences with others and
experience those of others online (Sharing).
Screen
Screen
1. Packaging
In the Packaging phase, users are able to take photographs
any time they like. These photographs, while sequentially
discrete, reflect the users’ display of strong interest (Figure
2). In contrast, information obtained through GPS tracking
is sequential because it is always enabled in this phase, and
it provides a common attribute for every user - position and
time. For these reasons, the combination of GPS tracking
and photographs allows users to package their experiences
for sharing.
Photograph
Photograph
Photograph
Turn right
Photograph
Turn left
Screen
Screen
Photograph
Photograph
Photograph
Photograph
REAL TIME AND REAL PLACE
1
2
3
4
5
6
7
Go straight
8
Go back
Figure 3: Adding effects to photographs to enhance the
reliving
TAKE PHOTOGRAPHS
PHOTO
PHOTO
PHOTO
PHOTO
PHOTO
1
3
6
7
8
GPS TRACKING
1
PACKAGING PHASE
3
Go straight
EXPERIENCE PACKAGE
1
3
6
7
8
Go straight
Go back
4
SHOW STRONG IMPRESSION
Turn right
5
2
Figure 2: Experience package
Turn right
Turn left
6
2. Reliving
In the Reliving phase, The Re: Living Map gives users a
richer city experience than a collection of still photographs
can give. This is accomplished by automatically adding
effects to photographs and playing them back in intervals
proportional to the actual time intervals in which the
original pictures were taken. The effects are calculated
from the GPS tracking data. (Figure 3 and Figure 4).
This effect is created in accordance to the users' actions.
For example, if the tracking data shows that the user turned
right, we will see the next photograph push the previous
photograph off the screen.
Playing them back in intervals proportional to real time
prompts the users' memories to fill in the intervals between
them. Through these effects, users are able to effectively
relive their activities at the city.
Photograph
7
Figure 4: Automatic direction detection
3. Sharing
In the Sharing phase, we propose a method to share
experiences, named "Intersect". With Intersect, the
intersections of GPS tracking data between users act as
starting points for the sharing of experiences. If the owner
of the intersecting experience had allowed others access to
them, users are able to experience them themselves using
the intersections as their entryways (Figure 5).
The Intersect method provides users a new type of
experience, attainable only through the sharing of
individual experiences via digital means.
1
switch the GPS tracking on or off and take photographs
anytime they wish. The photographs and GPS tracking
data, which constitute the users' experiences, are stored in
the users’ database via the internet.
RE:LIVING MAP
powered by gpsfred
GPS TRACKING ON
2
GPS TRACKING OFF
1 VIEW MY EXPERIENCE
2 VIEW SHARED EXPERIENCE
3 PREFERENCE
Intersection
INTERSECT
powered by gpsfred
1
2
Others' experience
RE:LIVING MAP
My experience
Exit
Select
Back
Select
Off
Exit
Select
Back
Select
On
3
3
Photograph
1 Time
Figure 5: Shared experience using “Intersect”
With The Re: living Map, we propose an online application
implementing these three phases with GPS tracking and
photographs to give users a (re)living of their city
experiences.
Figure 6: Application interface for mobile phone
RE: LIVING MAP
powered by gpsfred
PHOTO VIEW
PHOTO VIEW
APPLICATION
On this section, we propose an application, The Re: living
Map, which provides effective city experiences on the
aforementioned three phases. This application is
implemented with gpsfred, an application framework we
designed.
We will first give an overview of The Re: Living Map, and
then go into the details of each of the features.
TRACKING VIEW
INTERSECT VIEW
PACKAGE VIEW
Figure 7: Application interface for the PC
Overview
The Re: living Map consists of two interfaces, one for the
mobile phone and one for the personal computer (Figure 6
and 7). The interface for the mobile phone provides users
an interface mainly to package their own experiences. The
interface for the personal computer provides an interface to
relive and share their experiences.
Package experience on the mobile phone
The interface for the mobile phone has two functions; GPS
tracking and taking photographs (Figure 6). Users may
Relive experiences on the personal computer
The interface for the personal computer has three views;
The Photo View, the Intersect View and the Package View
(Figure 7).
In the bottom view, the Package View, users can select
their own experiences that have been stored from the
mobile phone. In the middle view, Intersect View, users are
able to select a path from the experience selected in the
bottom view. Finally, the top view, Photo View, shows
photographs with effects in accordance with the users'
activities that are automatically calculated from the selected
path.
Photo view
GPS TRACKING FRAMEWORK DESIGN
Although there are many applications which utilize GPS,
no previous application to our knowledge has put GPS
tracking at its basis. On this section, we describe a new
application framework for GPS tracking.
Figure 8 shows the Photo View. This view usually plays
back photographs enhanced with effects in time intervals in
proportion to real time.
Experience
Users are also able to experience the paths using a mouse.
In this view, a mouse cursor tells users where they have
been or went by changing shape. Users will come to see the
cursor as representations of themselves, and the photo
effects tell them the positional relationship of the
photographs.
gpsfred
Combined, these effects prompt the users' memories to fill
in the blanks between the photographs and effectively
relive their experiences of the city.
Network Layer
Middleware Layer
Hardware Layer
Application Layer
Effect: Turn right
PHOTO VIEW
A mouse cursor changes.
Figure 8: Photo View
Application
Personal Computer
Mobile Phone
Internet
Figure 10: Application framework for GPS tracking
The framework has 4 layers; the network layer, the
hardware layer, the middleware layer, and the application
layer. These layers make the experience of the users more
effective. The most important components are gpsfred on
the middleware layer, and the application on the
application layer (Figure 10).
Features of gpsfred
Intersect view and intersect method
Users are able to select a tracking path on which to show in
Photo View. Also, users are able to choose whether the
selected path should accept Intersects from others users.
Should the user allow it, the application sends the
information about the selected path to the application
server which is shared by all users. The server checks other
paths and broadcasts the detected intersections, related path
and photograph information as a shared experience.
If the path has intersections and a user selects an
intersection on the view, photographs from both
experiences are presented in the Photo View (Figure 9).
Select intersection to
show shared experience.
TRACKING VIEW
I'm here.
Figure 9: Intersect View
gpsfred we have implemented a middleware which makes
it easy to implement GPS tracking with mobile phones and
applications using it. Developers who plan to use gpsfred
to implement applications are able to extend any
functionality of gpsfred to suit them to their needs through
plug-ins. Figure 11 shows the 5 features of gpsfred.
Support Plug-in - 5
Get a location
2
Plug-ins are supported in every level of gpsfred.
Developers are able to extend any functionality of gpsfred.
Some methods gpsfred provides by default are
implemented in the form of plug-ins, in fact.
1 Repeat for tracking
Take a photograph
FUTURE WORK
5
3
Store to a database
4
Remake for using
On this research, the primary issue for the future is the
operation and evaluation of the application. Currently,
several problems exist before large-scale testing can be
attempted. One such problem is the phone bills generated
from usage of the program. Solutions such as fixed-rate
communication services are on the horizon, however.
Extend with plug-ins
Framework and application
The current GPS tracking method gpsfred has a problem in
where position detection takes at least approximately 15
seconds. We have recognized it as an implementation
problem, and will modify it to avoid the problem.
Figure 11: Features of gpsfred
Generate tracking data - 1
REFERENCES
The first step is repeatedly detecting the current position,
generating the tracking data. Developers are able to change
the repetition intervals by 1/100 second units.
1. Fujihata M. Ikedorareta Sokudo. ICC Gallery, 1994.
Join photographs and tracking data - 2
A photograph taken while the GPS tracking is running is
associated with the tracking data and stored. It adds
chronological information to photographs, allowing
developers to handle them in that order.
Store into a database - 3
All data are stored into a database accessed via a network.
For that reason, developers are able to flexibly handle both
the tracking data and the photographs.
Reuse tracking data and photographs - 4
gpsfred provides tracking photograph data in an XML
format. Developers are then able to perform operations
such as converting the axis of the tracking data,
normalizing them, resizing the photographs to fit, and so
on.
2. Fujihata M., and Kawashima T. Field-Work@Alsace.
2002.
3. Haraguchi Y., and Shinohara T., and Niwa Y., and
Iguchi K., and Ishibashi S., and Inakage M. “The Living
Map - A communication tool that connects real world
and online community by using a map.” Journal of the
Asian Design International Conference Vol.1, Oct.,
2003, K-49.
4. Photo Walker. Available at http://www.photowalker.
net/
5. Sasaki M., Affordance – Atarashi Ninchi no Riron,
Iwanami Shoten, 1994
6. Takahashi K., and Tsuji T., and Nakanishi Y., and
Ohyama M., and Hakozaki K. "iCAMS : Mobile
Communication Tool using Location Information and
Schedule Information." School of Information
Environment, Tokyo Denki University, 2003.
7. Ttsuchiya J., and Tsuji H. GPS Sokuryo no Kiso. Nihon
Sokuryo Kyokai, 1999, pp.57-80.
8. Ueda N., and Nakanishi Y., and Manabe R., and Motoe
M., and Matsukawa S. "GIS with cellular phone +
WebGIS - Construction of WebGIS using the GPS
camera cellular phone." The Institute of Electronics,
Information and Communication Engineers, 2003.
Relational Analysis among Experiences and Real World
Objects in the Ubiquitous Memories Environment
Tatsuyuki KAWAMURA, Takahiro UEOKA, Yasuyuki KONO, Masatsugu KIDODE
Graduate School of Information Science, Nara Institute of Science and Technology (NAIST)
Keihanna Science City, 630-0192 Nara, Japan
{kawamura, taka-ue, kono, kidode}@is.naist.jp
ABSTRACT
This paper introduces a plan of an experiment to analyze
relations among user experiences and real world objects on
the Ubiquitous Memories system. By finding out the
characteristics, we could develop functions for an automatic
linking method among an experience and objects, and
recommending experiences linked with several objects. In
order to conduct this experiment, we attached 2,257 RFID
tags to real world objects. We also categorized the objects
into 21 purpose-based object types, and investigated shareability of experiences by a questionnaire in advance. Both
works are important for us to focus analyzation paramters
in the experiment. In this paper we represent basic results of
the categorization of the attached objects and the
questionnaire.
Author Keywords
Ubiquitous Memories, Real World Object, Augmented
Memory, Wearable Computer.
INTRODUCTION
The research area of computational augmentation of human
memory has been extensively studied in recent years.
Rhodes, B. termed this augmentation of human memory
“augmented memory’’ [1]. Especially, a “sharing of
experiences” technology in everyday life attracts
researchers who know that we would get richer knowledge
if the rechnology were accomplished. The technology
would give us a solution against a difficult matter we hardly
experience and do not know how to overcome it.
Rsearchers, however, have not known both what support
techniques are there for collaboration among people and
what support techniques should be implemented to realize
the sharing of experiences in the real world. We believe that
the problem is the current most important issue to
accomplish the sharing of experiences in everyday life.
We have studied the Ubiquitous Memories project since fall
1999. The overall aim of the project is to realize a digital
nostalgia. The digital nostalgia would be created as an
autobiographic history by linking a human experience with
a real world object. The project first proposed its concept in
2001 [2], and implemented the prototype system that can
operate in everyday life in 2002 [3]. We have conducted
experiements to evaluate performance of the Ubiquitous
Memories under the stand-alone user condition [4]. In 2004,
we are conducting an experiment to analyze relations
among user’s experiences and real world objects. This
paper mainly introduces the plan of the experiment.
We are planning to conduct a long-term experiment in the
real world. The aim of this experiment is to investigate
relations among experiences linked with objects and the
objects. The experiment also would give us characteristics
of the object-object relations. By finding out the
characteristics, we could develop functions for automatic
linking method among experiences and objects, and
recommending experiences linked with several objects. For
the experiment, we attach 2,257 RFID tags to real world
objects. We then categorize the objects into 21 purposebased object types distinguished by functional attributes,
and into 116 role-based sets contained the classes. The
categorization of the objects will be used to clarify the
identity of the objects by analyzing video data sets linked
with objects in each category.
We also investigate share-ability of experiences by a
questionnaire including four questions. The questionnaire
gives us a direction to find out what objects would be useful
to analyze logs before we conduct the experiment.
Furthermore, we do not know how to discover the
mechanisms of the relations among people, objects, and
experiences because of 2,257 objects, huge video data sets,
and operations logs.
UBIQUITOUS MEMORIES
We have proposed a conceptual design for ideally and
naturally bridging the space between augmented memory
and human memory by regarding each real world object as
the augmented memory archive. To seamlessly integrate
between human experience and augmented memory, we
consider that providing users with natural actions for
storing/retrieving augmented memories is important. A
“human hand” plays an important role for integrating the
augmented memory into objects. Human body is used as
media for both perceiving the current context (event) as a
memory and propagating the memory to an object, i.e., the
memory travels in all over his/her body like electricity and
the memory runs out of one of his/her hands in our design.
Terms of the latest version of conceptual actions [5] are
defined as follows:
Figure 1. The Ubiquitous Memories Equipment
• Enclose action is shown by two steps of behavior. 1) A
person implicitly/explicitly gathers current context
through his/her own body. 2) He/She then arranges
contexts as ubiquitous augmented memory with a real
world object using a touching operation.
Figure 2. The Location of the Experiment
• Accumulate denotes a situation where augmented
memories are enclosed in an object. The situation
functionally means that the augmented memories are
stored in computational storages somewhere on the
Internet with links to the object.
• Disclose action is a reproduction method where a person
recalls the context enclosed in an object. The
“Disclosure” has a similar meaning of replaying media
data.
Equipment
Figure 1. depicts the equipment worn with the Ubiquitous
Memories. The user wears a Head-mounted Display (HMD;
SHIMADZU, DataGlass2) to view augmented memories
(video data) and a wearable camera (KURODA
OPTRONICS, CCN-2712YS) to capture video data of
his/her viewpoint. The user also wears a Radio Frequency
Identification (RFID; OMRON, Type-V720) tag
reader/writer on his/her wrist. Additionally, the wearer uses
a VAIO jog remote controller (SONY, PCGA-JRH1). In
order to control the system, the wearer attaches RFID
operation tags to the opposite side of wrist from the RFID
tag reader/writer. The wearer carries a wearable computer
on his/her hip. The RFID device can immediately read an
RFID tag data when the device comes close to the tag. The
entire system connects to the World Wide Web via a
wireless LAN.
System Operations
The Ubiquitous Memories system has five operational
modes: ENCLOSE, DISCLOSE, MOVE, COPY, and
DELETE. There are two basic operation tags and three
additional operation tags for changing the mode. The user
can select one of the following types:
Figure 3. The Seating Chart
ENCLOSE: By touching the “Enclose” tag and an object
sequentially, the wearer encloses augmented memory to
an object.
DISCLOSE: The user can disclose an augmented
memory from a certain real world object.
Using additional operation tags, the user can operate an
augmented memory in the real world in the similar way as
files in a PC by using the “DELETE,” “MOVE,” and
“COPY” tags.
A SUBSTANTIATION EXPERIMENT PROGRAM
Purpose
The aim of this experiment is to investigate relations among
experiences and the objects linked with them. The
experiment also could give us characteristics of the objectobject relations. By finding out the characteristics, we could
develop functions for automatic linking method among an
experience and objects, and recommending experiences
linked with several objects. In order to achieve the aim, we
are gathering the data of linking/rearranging/referring
behaviors with the objects attached RFID tags.
Subjects and Locations
This experiment is conducting at the Nara Institute of
Science and Technology (NAIST) in Nara, Japan among
three graduate students of Information Science Department.
They are in a laboratory. They belong to the same research
group and are well known each other. Subject1 is the eldest
student in the subjects. Subject3 is the youngest student in
the subjects. All subjects are Ph.D course students. Figure 2
illustrates the environment of the experiment. The location
is the building of the graduate school of information science
at NAIST. The location is on the 7th floor of the ridge B.
The location is composed of the room B708, B711 through
B715. In the experiment, we labeled the area of B711
through B715 “room A” and the area of B708 “room B.”
Additionally, the hallway, which is for the room A through
the room B, is included in the experiment. Figure 3 shows
the detailed location of the room A. The figure also
describes desks the subjects usually use.
Score
(Avg)
1.0
-2.0
Rate
(%)
48.74
20.07
Questionnaire survey for Share-ability of Experiences
We investigated share-ability of experiences among users
by a questionnaire. In order to accomplish the experiment,
we should know how this subjects usually use real world
objects and what they would employ the objects to link
experiences with them in advance. The three test subjects
answered the questionnaire. They had to answer the
following questions as 5-level evaluation against 2,257
objects attached RFID tags.
Q1: How is the object shared?
(1:individual ~ 5:shared)
Q2: How often do you use the object?
(1:never ~ 5:often)
5.01
-4.0
4.25
-5.0
21.93
Table 2. Average Scores and Score Ratio in Q1
Score
(Avg)
1.0
-2.0
-3.0
Rate
(%)
23.84
50.02
17.28
-4.0
-5.0
7.00
1.86
Table 3. Average Scores and Score Ratio in Q2
Pattern
Target Real World Objects
This section represents basic information of 2,257 RFID
tags attached to real world objects for the experiment.
Subject1 has 538 belongings. Subject2 and Subject3 have
170 and 93 belongings respectively. Shared objects are 561,
and others’ belongings 895 objects. A tag is attached to an
object in general, although plural tags are attached to the
object whose elements can be regarded as objects, e.g. each
bay in a bookcase is attached an tag. Table 1. (see the last
page of this paper) describes 116 classes of the objects, and
21 categories. Note that, “Class’’ is a set of objects
distinguished by a functional attribute, and “Category’’ is a
role-based set contained the classes. Additionally, the PC
peripheral Type-A contains “keyboard,” “mouse” and
“display,” and the PC peripheral Type-B means the other
devices that are not daily used. The column of class in
Table 1. show the number of the objects and “Mobility.”
The mobility means how often an object is moved from a
certain point to another point (1:never ~ 5: often). The
categories also represent the ratio of object number and the
average mobility. The subjects are allowed to additionally
attach an RFID tag to an object when he finds the object he
wants to link an experience with.
-3.0
Low
Number
Ratio (%)
1,007
44.62
S1
529
23.44
S2
257
11.39
S3
103
4.56
S12
159
7.05
S13
11
0.49
S23
5
0.02
186
8.24
S123
Table 4. Number and Ratio of Group Paterns in Q2
Q3: How many experiences will you link with the object?
(1:nothing ~ 5:a lot)
Q4: How many experiences linked with an object will be
shared?
(1:nothing ~ 5:a lot)
Table 2. illustrates the average scores and the ratio of the
score in Q1. Here, “-2.0” means that 1 < Avg Score ≤ 2 .
The rest of notations also show the same meaning. “-3.0”
means that 2 < Avg Score ≤ 3 . “-4.0” means that
3 < Avg Score ≤ 4 . “-5.0” means that 4 < Avg Score ≤ 5 .
The result shows that objects to which we attached RFID
tags contain 68.81% individual objects (1.0 and -2.0) and
26.18% high-shared objects (-4.0 and –5.0). The result
means that we could widely investigate relations among
experiences and the objects in each score.
Table 3 describes the average scores and the ratio of the
score in Q2. The notation of the score in Table 3. is the
same as one in Table 2. This result shows that nobody use
23.84% (1.0) of the objects and would also use them in the
Score
(Avg)
1.0
Rate
(%)
13.56
-2.0
62.87
-3.0
21.18
-4.0
Pattern
-5.0
2.39
0.00
Table 5. Average Scores and Score Ratio in Q3
Number
Low
Rate (%)
Low
1,431
63.40
S12
586
25.96
S13
27
1.20
S23
0
0.00
213
9.44
S123
Pattern
Number
Ratio (%)
1,146
50.78
S1
335
14.84
S2
366
16.22
S3
108
4.79
S12
112
4.96
S13
7
0.03
S23
64
2.84
S123
119
5.27
Table 8. Number and Ratio of Group Patterns in Q4
Table 6. Number and Ratio of Group Pattern in Q3
Score
(Avg)
1.0
-2.0
-3.0
-4.0
Rate
(%)
0.18
29.11
45.90
22.29
-5.0
2.53
Table 7. Average Scores and Score Ratio in Q4
Figure 4. The Object Layout Plan in the Room B
experiment. However, 8.86% of the objects (-4.0 and -5.0)
would be often used. We should investigate the relations
among experiences and this type of objects because the
objects would be more useful than other objects for a user
to rearrange his/her experience in his/her everyday life. On
the other hand, 67.30% of the objects (-2.0 and –3.0) are
also next important matters because less opportunities of
use of them would make us easily forget experiences that
are linked with the objects.
same as one in Table 2. Unfortunately, there were not
objects that would be linked with a lot of experiences by the
subjects. In contrast, all subjects answered that they never
use 13.56% of the objects.
Table 4. shows that the number of objects, which would be
used by subject(s), and the ratio of group patterns of
subjects who checked the high score (over 3-point). Note
that, “Low” shows that nobody checked high score. “S1,”
“S2,” and “S3” means that only Subject1, Subject2, or
Subject3 checked the high score. “S12,” “S13,” and “S23”
show that two of subjects checked the high score. “S123”
shows that all subjects checked the high score. Fortunately,
all subjects would use 8.24% of the objects. Subject1 and
Subject2 would also use experiences with 7.05% of the
objects.
Table 5. shows that the average scores and the ratio of the
scores in Q3. The notation of the score in Table 5 is the
Table 6. shows the number and ratio of group patterns of
subjects who checked the high score. Note that, the labels
of “Low” through “S123” show the same meaning of them
in Table 4. Totally, the subjects would link experiences
with 13.1% of the same object (S12, S13, S23, and S123)
each other.
Table 7. shows that the average scores and the ratio of the
scores in Q4. The notation of the score in Table 8. is the
same as one in Table 2. The subject would use 45.90% of
the objects neutrally. Few objects would be employed
individual (0.18%) or sharing experiences (2.53%) use.
Table 8. shows that the number of objects, which would be
used by subject(s), and the ratio of group patterns of
subjects who checked the high score. Note that, the labels
of “Low” through “S123” show the same meaning of them
in Table 8. 63.4% of the objects would not employed to
share experiences. The result represents that Subject1 and
Figure 5. The Object Layout Plan in the Room A
Subject1
Subject2
Subject3
Q1 vs. Q2
0.56
0.62
0.50
Q1 vs. Q3
0.33
0.56
0.54
Q1 vs. Q4
0.47
0.94
0.64
Q2 vs. Q3
0.80
0.82
0.95
Q2 vs. Q4
0.04
0.59
0.90
Q3 vs. Q4
-0.12
0.55
0.94
Table 9. Correlations among the Questions
Subject2 would share experiences via 799 objects. Subject3
almost share experiences via the objects that are in the
pattern S123.
Figure 4. and Figure 5. are the location of objects that are
voted the high score in both Q3 and Q4. Totally, 233
numbers of objects were selected. 109 numbers of objects
are in S12, 5 numbers of objects are S13, and 119 numbers
of objects are contained in S123. The objects would be
linked with experiences, and the subjects will employ them
for sharing experiences higher probability than other
objects.
We computed correlations among Q1, Q2, Q3 and Q4 (see
Table 9.). The result of Q2 vs. Q3 means that all subjects
would link experiences with the objects that are often used.
Subject1 has little policy for sharing experiences when he
would link experiences with the objects (Q2 vs. Q4 and Q3
vs. Q4). Subject2 considers that the shared object would be
linked with share-able experiences (Q1 vs. Q4). Subject3
would link experiences with the objects that he often uses,
and the experiences would be shared (Q2 vs. Q3, Q2 vs. Q4
and Q3 vs. Q4).
The Plan of Relational Analysis
Parameters
In order to analyze relations among subjects, objects, and
experiences in the experiment, we are planning to employ
the following five parameters:
1) Operations
•
Enclose, Disclose, Move, Copy, Delete
2) Logs
•
Time, Referring user, Linking user, Refered object
3) Defined object categories
•
Category, Class, Mobility
4) Video contexts
•
Linked video, Video that was captured when a user
referes a linked video
5) Questionnaire data
•
Four questions conducted in this paper
Note that the video contexts will be divided into categories
defined by the authors.
Prepared Analyzing Topics
We are mainly employing the following three analyzing
topics:
•
What kinds of video data are linked with a certain
object? (We expect that an identity of an object could
be computed from the set of contexts of video data
linked with the object.)
•
What kinds of purposes does a user have when the user
chooses an object to link an experience with it? (The
system could give the user heterogeneous services
depending on the user’s purpose.)
•
What kinds of objects (or categories) are employed for
sharing experiences? (This topic is approximately the
same meaning of the second-topic.)
We are, however, not sure how many kinds of relations
would exist, and what kinds of relations would be reliable
to make a user satisfied on the Ubiquitous Memories system.
Therefore, we must investigate other relations in the
experiment at the same time. Furthermore, we must clarify
what kinds of parameters should be employed to find out
valuable relations in the next stage of the experiment for
supporting the user on the system.
CONCLUSION
We introduced a plan of an experiment to analyze relations
among user experiences and real world objects on the
Ubiquitous Memories system. We also investigated shareability of experiences by a questionnaire in advance. In
order to accomplish the experiment, we should know how
this subjects usually use real world objects and what they
would employ the objects to link experiences with them.
The results of questionnaire give us a direction to find out
where we should analyze logs that will be recorded in the
experiment.
We are continuing the relation analysis from the
questionnaire. The questionnaire could give us more
detailed relations among subjects, objects, and experiences
although this paper described basic results of the
questionnaire in Table 2. through Table 9. For instance, we
can analyze the relations on “class” level shown in Table 1.
In addition to the above reason, we are sure that operation
logs and video data will be huge in the experiment. A
storage size of a server, which has logs and video data, will
be over 5TB when all subjects link ten two-minutes video
data with the objects everyday during a year. Total number
of linked video data would be over 10,000. Discovering the
mechanisms of the relations among people, objects, and
experiences is difficult for us because of 2,257 objects, over
10,000 video data, and operation logs. Therefore we must
analyze relations among subjects, objects, and experiences
using the questionnaire in advance and parallel with the
experiment.
ACKNOWLEDGMENTS
This research is supported by Core Research for Evaluational
Science and Technology (CREST) Program “Advanced Media
Technology for Everyday Living” of Japan Science and
Technology Agency (JST).
REFERENCES
1. Rhodes, B. The Wearable Remembrance Agent: a
Sytem for Augmented Memory, Proc. 1st International
Symposium on Wearable Computers (ISWC’97), 123128, 1997.
2. Fukuhara, T., Kawamura, T., Matsumoto, F., Takahashi,
T., Terada, K., Matsuzuka, T. and Takeda, H.
Ubiquitous Memories: Human Memory Support System
Using Physical Objects, Proc.15th Annual Conference
JSAI, 2001. (in Japanese)
3. Kawamura, T., Kono, Y. and Kidode, M. Wearable
Interfaces for Video Diary: towards Memory Retrieval,
Exchange, and Transportation. Proc. 6th IEEE
International Symposium on Wearable Computers
(ISWC2002), 31-38, 2002.
4. Kawamura, T., Fukuhara, T., Takeda, H., Kono, Y. and
Kidode, M. Ubiquitous Memories: Wearable Interface
for Computational Augmentation of Human Memory
based on Real World Objects. Proc. 4th International
Conference on Cognitive Science (ICCS2003), 273—
278, 2003.
5. Kono, Y., Kawamura, T., Ueoka, T., Murata, S. and
Kidode, M. Real World Objects as Media for
Augmenting Human Memory, Proc. Workship on MultiUser and Ubiquitous User Interfaces (MU3I 2004), 3742, 2004.
A Framework for Personalizing Action History Viewer
Masaki Ito
Jin Nakazawa
Hideyuki Tokuda
[email protected]
[email protected]
[email protected]
Graduate School of Media and Governance
Keio University
5322, Endo, Fujisawa, Kanagawa, Japan
ABSTRACT
This paper presents a programmable analysis and visualization framework for action histories, called mPATH framework. In ubiquitous computing environment, it is possible
to infer human activities through various sensors and accumulate them. Visualization of such human activities is one
of the key issues in terms of memory and sharing our experiences, since it acts as a memory assist when we recall, talk
about, and report what we did in the past. However, current
approaches for analysis and visualization are designed for a
specific use, and therefore can not be applied to diverse use.
Our approach provides users with programmability by a visual language environment for analyzing and visualizing the
action histories. The framework includes icons representing
data sources of action histories, analysis filters, and viewers.
By composing them, users can create their own action history viewers. We also demonstrated several applications on
the framework. The applications show the flexibility of creating action history viewers on the mPATH framework.
Keywords
Action history, visualization, visual language
INTRODUCTION
In the ubiquitous computing environment where computers
and sensors are embedded in our surroundings, it will be possible to recognize our action and record it. From the accumulated action history, we will be able to get highly abstracted
context information of the human activity. These information are used to develop context-aware applications, and also
used to provide us with useful information. Well presented
action histories help our life such as retrieving memory and
sharing our experiences.
Several representation applications of the action history have
also proposed[3][6][8][9][10][11]. These systems represent
user’s activity to provide functionalities of navigation and indexed action histories. However these representations are de-
signed for a specific use, and hence users can not customize
them to acquire personalized view of their action histories.
For example, though PEPYS[8] can organize human action
histories based on the location- and time-axis, it can not handle additional information, such as images, related to an action history item. Activity Compass[9] provides a navigation
functionality based on a location track analysis, which also
lacks diversity of location history analysis. The action history viewers, therefore, should provide users with personalized views for enabling them to analyze, recall, talk about,
and report their action histories.
In this paper, we propose a new framework for creating the
personalized action history viewer. The framework provides
users with programmability by a visual language environment for analyzing and visualizing the action histories. The
framework includes icons representing data sources of action
histories, analysis filters, and viewers. By composing them,
users can create their own action history viewers.
This paper is organized as follows. In the next section, we introduce scenarios where various visualization technique for
analyzing action history. The third section shows current
techniques for analysis and visualization and clarify the requirements. In the 4th section, we introduce mPATH framework as a framework of personalizing action history viewer.
Next four sections introduce the usage of the mPATH framework and show several applications. We evaluated the system in the following session and introduce related works. In
the final session, we conclude this paper and suggest future
work.
SCENARIOS
We introduce two scenarios where visualization method are
shared and easily developed by users. These scenarios show
usage of action history viewers with a personalizing function. The feature helps our communication and the deep understanding of past activity in the scenarios.
Last week, Alice has traveled Kyoto, Japan. When she arrived at Kyoto station, she
borrowed a PDA with GPS as a guide for sightseeing. While
she was in Kyoto, the PDA assisted her in planning her travel
and gave her a guidance at a tourist attraction. When she left
Kyoto, she returned the PDA and received a small memory
Navigation of a Travel Memory
card in which her location track data, names of tourist attractions which she visited, and photos she took were recorded.
Today she is talking about her travel with her boyfriend, Bob.
She inserted the memory card to her PC and showed a map of
Kyoto which was overlaid with lines of location track data.
She started talking on her experience from the beginning of
her travel, but immediately found the map were not designed
to represent temporal aspects of her travel. She searched for
visualization methods of travel experience and downloaded
them.
One method analyzed her travel log and calculated weights
of each tourist attraction she visited by her walking speed
and number of pictures. The method visualized the map of
Kyoto with distortion which stands for the weights, she could
intuitively know her travel.
She thought shopping is also important to calculate the weights,
she changed the parameters of analysis and generated a map
which highly reflect her impression. The map realized smooth
understanding of her travel for him.
Now,
Bob wants to go to Kyoto. He asked her to go with him, but
she refused because she has just been there. He then decided
to show the attraction of Kyoto she did not know.
Development of Analysis and Visualization Method
Since he is an amateur programmer, he decided to develop
a visualization method in which attractive places where she
did not visit were emphasized. He at first searches for a web
guide of Kyoto in which tourist attractions are ranked and
contains many pictures. Then he developed an algorism to
find attractions she did not visit by comparing the web guide
with her tour data. He designed to visualize a map of Kyoto
with many photos of the attractions.
He uploaded his algorism and asked her to download and
apply it to her memory. She noticed unknown attractions in
Kyoto and decided to visit again.
VISUALIZATION OF ACTION HISTORY
In this section, we define action history, and mention current techniques of visualization and analysis of action history. Then we clarify requirements of the visualization system.
In this paper, an action history is an aggregated form of information which contains location, date and description of
action about a certain person. Location track data obtained
by GPS is one example of action history. Digital photo data
is also an action history if it contains a time stamp and location information as its meta information.
Text-based Visualization Text-based visualization is a simple technique to represent daily and special experiences. Without machinery, some people keep diaries not to forget daily
events, and in some situations, to share a secret with a intimate friend by exchanging a diary book. Text-based style
has no restriction of a format and contents, therefore we can
easily represent various experiences in the style. However it
is difficult to find certain information with a specific point of
view from a diary.
List is a structured format of text which shows specific
aspect of experiences in order. A chronology is one example
of list which shows a temporal aspect of history. This style
helps intuitive understanding of a specific feature of action
history such as time, event, and name of place.
List
PEPYS[8] represents user’s activity as a list in temporal order. We can know temporal context of each action. However,
it is difficult to know spatial aspect of the action since rooms
which he or she was in are represented only as names.
A map, which is widely used to
represent geographic information, can be also used to represent a spatial aspect of action history. Overlaying a readymade map with points and lines which suggest certain action
history is a popular method for representing action history.
We can understand an action history in a geographic context
easily by using such a map. Map-based visualization is utilized in several researches[9][11]. However, a ready-made
map contains only common objects like restaurants, gas stations and hotels, and not enough to show personal experiences.
Map-based Visualization
Especially for hikers and climbers
who are logging their tracks by handy GPS devices, several
applications in which mountains and valleys are shown as
3D graphics, and they are overlaid with lines of the tracks.
KASHMIR 3D[10] is widely used in Japan, and Wissenbach
Map3D[3] is also developed for the same purpose. These applications utilize digital elevation model of a certain area for
creating terrain model.
3D Map Visualization
Photo-based Visualization Photographs taken by a user represent his or her interest during his or her activity like travel.
Simply placing many thumbnails of photos on a screen is
widely utilized technique. However, spatial and temporal aspects of photos disappear in this visualization.
STAMP[6] represents spacial relationship of each photo by
linking the same object in two photos. We can brows photographs by following the links by a mouse. This method
simultaneously visualize user’s points of view and spatial
structure.
Current Visualization
Analysis of Action History
There are several visualization methods for representing action history. We introduce some of them and argue their features.
As most of the visualization technique represent only a few
aspects of action history, analysis method of action history
to extract certain aspects is also important in a visualiza-
tion process. Currently, analysis and visualization are tightly
combining. We argue them separately in this paper.
Time and space are basic aspects of action history. We often
utilize them to order action history and as clues of retrieving certain history. Most of the visualization technique are
designed to represent both or either of them.
For reducing the cost of
developing a new analysis and visualization method, we divided the method into several components. We defined three
types of component, data source, viewer and filter. Users can
create visualization method by combining existing components.
Component Based Architecture
We defined a unified type
of data in the system. The description of location, date and
other information are unified. When we input action history
or geographic information, they must be converted into single type of data.
Standardization of Internal Data
By analyzing action history, we can acquire highly abstracted
information like a frequency of visiting a certain place, a
daily pattern of a movement, and user’s interests and preferences. There are several researches to analyze location track
data captured by GPS and extract such information [1][9][12].
Some of these researches utilize additional geographic information to detect an activity in a certain place.
To utilize action history as an assistant of our memory and
communication, we should understand various aspects of action history as the scenarios show. However, current visualization systems are designed for specific use of action history. To represent action history in various aspects, flexible
analysis and visualization features are required.
Requirements
For flexible programming of visualization mentioned in the
scenario, following features are required to the system.
Flexibility of Data Input The system must treat various types
of data available in the ubiquitous computing environment
simultaneously. Action histories are characterized by their
description of location, contents of what we did there, and
the way of acquisition of the data.
The type of data is used to exchange action history between
components. Since data type is unified, we can easily design a component for several action history. The feature also
realize flexible combination of components.
To control combination of components, we provide a visual programming
system of data flow style. Visual programming reduces difficulty of creation of original visualization method by combining components.
Using Data Flow Style Visual Language
Implementation
We implemented a prototype of mPATH framework with Java.
Our system is implemented as a GUI application and consists
of 14,000 lines of Java language. Figure 1 shows a screen
shot of the system.
The system also need to treat geographic information for
analysis of action history. A digital map and the Yellow
Pages with address are examples of geographic information.
For
all users of the system, it must be easy to create their original visualization method. For skillful users, the system must
provide a flexible programmability. Even for unskilled users,
the system must provide possibility of changing a visualization method.
Providing Programmability of Analysis and Visualization
Sharing existing methods, which are programmed by third
parties, also increases programmability of the system. Using
existing method reduces cost of creating new analysis and
visualization method.
Figure 1: Screen shot of the system
We developed a programmable analyzing and visualizing framework for action history named mPATH framework. In this
section we introduce the features and implementation of the
mPATH framework.
In the current implementation, we mainly focus on realizing
visual programming for data analysis and visualization. This
version works as a platform for creating analysis method by
data flow style programming. Exporting and importing of
program function is still under construction. We are planning
to use XML to exchange them.
Approach
Experiment
The followings are the approaches taken by the mPATH framework to accomplish aforementioned flexibility.
Since June 2003, Ito, the first author of this paper, has been
carrying a “Garmin eTrex Legend”[5], a handy GPS receiver.
mPATH FRAMEWORK
Table 1 shows the amount of captured data. While the experiment, he took pictures as action histories using a digital camera. The track data of the handy GPS and taken pictures are
used as a data of development and evaluation of this system.
Table 1: Captured Data of GPS and Digital Camera
Date
Jun. 2003
Jul. 2003
Aug. 2003
Sep. 2003
Oct. 2003
Nov. 2003
Dec. 2003
Jan. 2004
Feb. 2004
Total
Average
size(byte)
321,310
294,518
473,739
365,855
307,187
193,772
205,621
278,071
292,319
2,732,392
303,599
track
286
297
412
298
287
153
208
215
231
2,387
265
point
5,893
5,336
8,676
6,730
5,608
3,573
3,727
5,135
5,392
50,070
5,563
picture
515
131
218
37
342
54
7
108
5
1,417
157
COMPONENTS FOR mPATH FRAMEWORK
We developed several components for mPATH framework.
In this section, we classify them into data source, filter and
viewer, and introduce components we developed.
Data Source
To input various action history, we developed several components as data sources. These components access action
history and transform the data into standard data format.
Time filter extracts action histories during a specified term. Through the GUI of the filter, we can change the
term. The operation immediately affects the output on the
viewer.
Time Filter
Speed filter filters data by its speed. It is useful
especially for infer transportation from location track data.
Speed Filter
Formalize filter classifies a group of point
data into movements and stops, and outputs as track and
point data by two separated sockets. Users can change threshold of time to detect actor’s stop.
Formalize Filter
Matching filter has two input sockets: one
for the map data and the other for a geographic coordinate.
The matching filter calculates the name of the specified coordinate from the specified map.
Matching Filter
Inside count filter, geographical region are divided into a grid. The filter counts input data in every grid.
Users can know how many times he or she visited a certain
point, i.e. a weight of the point, in an action history.
Count Filter
Viewer
We developed three viewers to visualize action history.
To visualize spatial aspect of action
history, we developed normal map viewer. In this viewer, every input are ordered by the geographic coordinates, so that
generic map-like visualization is realized. This viewer accepts multiple data and overlays them. Figure 2 shows an
image of normal map viewer.
Normal Map Viewer
GPS Location Track Data
Source access files of location track information or a GPS device through an RS-232C interface. Acquired data are transformed into a group of points where actor passed by.
GPS Location Track Data Source
Photo data source deals with image files
taken by a digital camera. By reading time stamp and location in EXIF[7] information of the files, the components
generate an action history of “taking pictures” in the standard data format.
Photo Data Source
The component can accept location track data. By matching timestamp of photos with location track data, it estimates
location of pictures.
Map Data Source Map data source inputs map data in a
vector format such as points of station, lines of road and polygons of buildings. Users can detect detail of action history by
comparing it with the map data. Since the data was designed
for GIS, we can get generic map by rendering them and use
the map as a background of visualization.
Figure 2: Normal Map Viewer
Filters
Components of filter has one or more inputs and outputs.
They input data of standard format and output the result of
processing in the same format.
We used table type viewer shown in Figure
3 to visualize data output by the matching filter. In the left
column, the name of the place he visited are listed, in center,
Table Viewer
time of stay are shown. In the right column, the date the user
begin to stay are shown.
Figure 5: Visual Programming Window
Figure 3: Table Viewer
filter of data. By clicking two icons on canvas, two components can be connected and disconnected.
To represent highly abstracted context
of action, we developed weight map viewer. The viewer accepts weight of regions in addition to normal action history
or geographic information. This viewer visualize the weight
as a color or scale of each region as shown in Figure 4.
Weight Map Viewer
Figure 6: Example of Visual Programming
Figure 4: Weight Map
VISUAL PROGRAMMING
In this section we introduce a visual programming manner
of mPATH system by creating map like viewer of action history. The main window of the mPATH framework consists
of mainly two windows, one is a pallette and the other is a
canvas. Figure 5 shows the detail.
In the pallette, components of data sources, viewers and filters are registered as icons. By dragging and dropping of the
icon, a component is copied and registered to the canvas.
In the canvas, every data flow is shown as lines from left to
right. Each icon of component has sockets of data, Right
socket means an output of the data, and left socket means an
input. Icons only with right socket are data sources, and icons
without right socket are viewers. Icons with both reft and
right socket are components of analysis method and work as
Figure 6-(1) is a simple visualization of location track data
of GPS. In this program, GPS data are input to normalize
filter. Only track data are input to a normal map viewer and
shown in the geographic coordinates. The result of visualization is Figure 2-(1). When we connect both track and point
socket to the viewer, we can see stop point data with track
data as shown in Figure 2-(2). The operation on the canvas is
immediately reflected on the viewer, therefore an interactive
operation is accomplished.
By insertion of time filter between GPS data source and formalize filter as shown in Figure 6-(2), we can control the term
of location track data.
To visualize the detail of location track data, we use a map
as a background. Figure 2-(3) is a result of connection between map data source and normal map viewer directory. By
inputting the output of map data source to the viewer of track
data as Figure 6-(3), we can see track data with normal data
as Figure 2-(4).
We can join data of two inputs by comparing geographic coordinates of each input by a matching filter. By utilizing this
filter, we can acquire a name of a stop point from a map. Figure 6-(4) is a program which utilize matching filter to acquire
names of stop points by inputting a map data and a formalized filter. In this program, we visualize the output as a list
by using table viewer shown in Figure 3.
4. preNotification(ActionFilter fromFilter):
This function is called when the state of upper ActonFilter given as an argument is changed. After processing this
function, the change event is transmitted to lower ActionFilters.
DEVELOPMENT OF A DATA SOURCE, A FILTER AND A
VIEWER
When developing a viewer, it is needed to refresh the output of the viewer in this function.
If ready-made data sources, filters or viewers do not satisfy
users, they can also develop original one with Java code.
We provide a skeleton of them, users can create a new data
source, filter and viewer by extending four functions in the
skeleton.
Inside the mPATH framework, data sources, filters and viewers are designed as the same object named ActionFilter. An
ActionFilter can have any number of input and output connector. If an ActionFilter is equipped with no input connector, it works as a data source. If an ActionFilter is equipped
with no output connector, it works as a viewer. An ActionFilter with both input and output connector works as a filter.
Since a data transfer mechanism in the mPATH framework
is designed as an demand-driven style, filtering logic is implemented in a function which returns result of the filter. A
messaging mechanism to notify change of upper ActionFilter
is equipped to realize data-driven analysis. When the lowest ActionFilter receives a change event, it demands newer
result of upper ActionFilters and accomplishes data-driven
analysis.
Following functions are prepared for developers.
1. getActionElement(GeoShape area):
This function was called when analysis result is required
by lower ActionFilters. An area information is given as an
argument, the function must return all analyzed result in a
format of ActionElement.
When developing a filter, a developer implements logic of
the filter in the function. When developing a data source,
this function is utilized to acquire action history and form
them into the unified type of data.
2. afterConnectFrom(ActionFilter fromFilter):
This function is called when other ActionFilter is connected
to upper Socket. A connected ActionFilter is noticed as an
argument.
When developing a viewer, it is needed to change the output of the viewer when this function is called.
APPLICATIONS
In addition to normal map viewer application mentioned as
an example, we developed two applications on the mPATH
framework. First one is listing points of interest system by
analyzing stay stop in a certain place. Second one is a visualization system of travel activity focusing on traveler’s interests on the places.
Listing Points of Interest System
We developed extraction system of user’s interest point on
the mPATH framework. An interest point is a place where
user did something or stayed for a long time. We also detect
name of the point using matching filter. In this system, we
visualized interest points in a table. Figure 3 shows the result
of the visualization.
Visualization with Weight
We developed an weigh method to classify regions by user’s
activity. In our algorithm, we at first divide the region into
small grids. In each grid, we count the number of times we
visited, the number of times we took pictures, and the number of times we shopped. Then we add each value and obtain
the weight of each cell as a number.
We implemented this algorithm on the mPATH framework,
and developed a visualization system reflecting the weight of
the region. Figure 4 shows an example. In the example, we
weighed each cell mainly by the number of pictures actor’s
took, and the weight of each cell is visualized as a scale. We
realized showing photo by clicking on the map, this system
can be used as a photo viewer.
EVALUATION
In this section we evaluate the performance of the mPATH
framework. It shows if the response of the system is enough
to develop interactive development of visualization methods.
It can also be used to find functions which should be improved.
For measuring the performance, we used the environment
shown in table 2 and used data shown in table 3.
Table 2: Evaluation Environment
3. afterDisconnectFrom(ActionFilter fromFilter):
This function is called when connected upper ActionFilter
is disconnected. A disconnected ActionFilter is noticed as
an argument.
When developing a viewer, it is needed to erase the output
of the viewer when this function is called.
CPU
memory
OS
JDK
Pentium4 2.53GHz
1024MB
Linux 2.4.22
J2SDK 1.4.2 03
Table 3: Data
Type
description
size
GPS location tracking
Jun. 2003 – Nov. 2003
1,956,381 byte
Map Data
Fujisawa city
9,261,000 byte
Overhead of component architecture
We measured the performance of the component architecture, therefore the architecture seemed to be an overhead
compared to a hard-coding implementation. We measured
simple visualization method with several time filters, in which
we changed the number of the time filter.
Figure 8: Application Performance
like SQL. DFQL[4] is one of the graphical query languages
for the use in scientific database. DFQL uses data flow language and enable analysis and visualization in addition to
data retrieval which general graphical query languages focus
on.
Max/MSP
Figure 7: Measurement of the Overhead
Figure 7 shows the result of the measurement. While we
increased time filter from zero to 15, the growth of the time
is small. This result shows that the overhead of connecting
analysis components is small.
Evaluation of the Performance of the Visualization on the
System
We measured the performance of a visualization system constructed on the mPATH framework. We used simple visualization application in which location track data and map data
are rendered on a same window, and measured the time required for rendering. Figure 8 shows the result. It also shows
the result of rendering single location or map.
Lowest case of rendering time is about 800ms, and in the
case of small scale maps, it takes more than 1500ms. In the
overlaid case, the rendering time of the map are two times
the single rendering of the map, and can be reduced.
RELATED WORKS
We introduce several researches and products as related work
of mPATH framework. These systems do not treat action history, but provide flexibility of data analysis and representation by visual language manner.
DFQL
To improve query language of databases, various graphical
query languages are proposed instead of text based language
Max/MSP[2] is a visual programming system for midi and
audio signal. The system enables interactive processing of
music stream with a graphical data flow language and is used
to create electronic musical instruments or effectors with original sound algorism, and interactive media systems.
The visual language is used by many creator of electronic
media and fruits of the programming are exchanged widely
on the Internet. Since modules of the visual language can be
developed with C language, many original modules are also
distributed.
CONCLUSION
In this paper, we presented a programmable analysis and visualization framework for action histories, called mPATH
framework. The mPATH framework provides data flow visual language, and enable flexible and interactive analysis by
connecting analysis components through mouse operation.
By component architecture, the framework enable providing
various visualizations which represent various aspects of action history. We implemented the mPATH framework with
Java language, and demonstrate constructing various viewer
applications. We also evaluated performance of the framework, and proved its interactivity. We are planning to extend
mPATH framework especially focusing on following issues.
We are now implementing a mechanism to share a program on the mPATH
framework. We designed XML based description as a file
format of programs on the mPATH framework. We will implement functions to import and export the XML. We are
also planning to provide a server on the Internet to upload
and download the XML files.
Implementation of a Sharing Mechanism
We
will extend the visual language and enable visualization of
parameters of each modules. We will enable parameters of
models to be treated as a modules in the system. This feature
will realize coordination of each module and enable more
complex analysis and visualization.
Enable Visual Programming of Parameters of Modules
REFERENCES
1. D. Ashbrook and T. Starner. Learning Significant Locations and Predicting User Movement with GPS. In
Sixth International Symposium on Wearable Computers(ISWC 2002), pages 101–108, October 2002.
2. Cycling’74.
Max/MSP.
http://www.cycling74.com/products/maxmsp.html.
3. Dave Wissenbach.
Wissenbach Map3D.
http://myweb.cableone.net/cdwissenbach/map.html.
4. S. Dogru, V. Rajan, K. Rieck, J. R. Slagle, B. S. Tjan,
and Y. Wang. A Graphical Data Flow Language for
Retrieval, Analysis, and Visualization of a Scientific
Database. Journal of Visual Languages & Computing,
7(3):247–265, 1996.
5. Garmin Ltd.
Garmin eTrex Legend, 2001.
http://www.garmin.com/products/etrexLegend/.
6. Hiroya Tanaka and Masatoshi Arikawa and Ryosuke
Shibazaki. Extensive Pseudo 3-D Spaces with Superposed Photographs. In Proceedings of Internet Imaging
III and SPIE Electronic Imaging, pages 19–25, January
2002.
7. JEIDA. Digital Still Camera Image File Format Standard (Exchangeable image file format for Digital Still
Cameras: Exif) Version 2.1. 1998.
8. W. M. Newman, M. A. Eldridge, and M. G. Lamming. PEPYS: Generating Autobiographies by Automatic Tracking. In Proceedings of ECSCW ’91, pages
175–188, September 1991.
9. D. J. Patterson, L. Liao, D. Fox, and H. Kautz. Inferring High-Level Behavior from Low-Level Sensors. In
Proceedings of The Fifth International Conference on
Ubiquitous Computing (UBICOMP2003), pages 73–
89, 2003.
10. Tomohiko
Sugimoto.
http://www.kashmir3d.com.
KASHMIR
3D.
11. K. Toyama, R. Logan, and A. Roseway. Geographic
Location Tags on Digital Images. In Proceedings of the
eleventh ACM international conference on Multimedia,
pages 156–166. ACM Press, 2003.
12. J. Wolf, R. Guensler, and W. Bachman. Elimination of
the travel diary: An experiment to derive trip purpose
from GPS travel data. Notes from Transportation Research Board, 80th annual meeting, January 2001.
Providing Privacy While Being Connected
Natalia Romero, Panos Markopoulos,
Eindhoven University of Technology
Den Dolech 2, 5600MB Eindhoven
The Netherlands
[email protected]
INTRODUCTION
Privacy is typically studied as conflicts of information and
data security or human rights issues. However a broader
view of privacy focuses on how people choose to share as
well as to keep for themselves personal information.
This research examine this later view, studying what people
want to share, when, how and to whom in the context of
Awareness Systems.
SUPPORTING INFORMAL SOCIAL COMMUNICATION
We describe research into supporting the leisure (non work
related) use of communication media and more specifically
of Awareness Systems. Awareness Systems are meant to
provide a low effort, background communication channel
that occupies the periphery of the attention of the user, and
which helps this person stay aware of the activity of another
person or group. Awareness Systems do not aim directly to
support information exchange tasks, as for example e-mail
and telephone calls do. Rather, the awareness they aim to
create is similar to the awareness of people in surrounding
offices at work or of one’s neighbours at home. Such
awareness is built by tacitly synthesizing cues of people’s
presence and activities, e.g., footsteps on the corridor, and
discussions in the street outside. In many cases, these cues
have very low accuracy, e.g., we can notice that there are
people talking but not what they say, but this low accuracy
is sufficient for providing this awareness [4, 8].
Awareness Systems have been studied in the work
environment, starting from the Media Spaces work [3], at
Xerox. Research into leisure and especially domestic use of
Awareness Systems is more recent. In our research, we
study and envision the use of such systems to support the
communication between people with an existing and close
social relationship. A solution to providing an awareness
system for helping family members stay in touch through
the day is the ASTRA system described in [6]. Here we are
concerned mostly with the privacy issues arising in the
context of Awareness Systems and more specifically in the
context of ASTRA.
The ASTRA System
The operation of the ASTRA prototype described in [6, 11],
is shown in figure 1. The system helps to communicate
asynchronously family members who do not live in the
same household. An individual takes a picture of a
situation she would like to share right away with a specific
person or to all person at another household. She composes
a message with the picture and a personal note and sends it.
A person at home can at, any moment, check the messages
sent.
The technology used consists of a mobile device (a mobile
phone with camera on and GPRS functionality) that
captures and sends pictures and notes to a home device (a
portable display with touch screen capabilities) that
continuously shows the collection of messages that have
been sent by members of the other household. For more
details of the implementation please refer to [6, 11].
The homebound device uses a spiral visualization to place
the messages in a timeline structure where the user at home
can navigate between previous and more recent messages.
The display offers a shared space where all members of the
family can see the messages that have been sent to the
family. It also offers a personal space where each member
can view the messages that has been sent only to her/him.
Figure 1: Connecting mobile user with the household through
the ASTRA prototype
Pictures plus notes may be used to trigger communication
or as conversation props during other communication
activities.
A field test [6] was executed as part of the ASTRA project,
which confirmed (also with quantitative evidence) that the
system indeed helps related distributed households to stay
in touch and get more involved in each other’s life. From
the sender side, results show that by taking pictures and
writing handwritten notes the system supports mobile
individuals to share moments through the day that they
might not feel are sufficiently noteworthy for sharing by
means of more intrusive way of communication. From the
receiver side, participants indicated that by receiving
regularly messages it gives them a lasting sense of
awareness about the members of the other household and
therefore it makes them feel being much closer to each
other’s lives.
from phone calls that resemble a ‘doctor interview’
conversation, because she will be able to answer those
questions directly from an Awareness System, and therefore
concentrate on more nice and meaningful talks.
ADDING PERVASIVE FEATURES TO AWARENESS
SYSTEMS
By adding automation the question is how to deal with the
balance of convenience and control when interacting with
Awareness Systems. On the one hand it may relieve the
user of undesired tasks and therefore support her to focus
on the meaningful tasks. On the other hand it may easily
become a surveillance system, where the user is unable to
control what information about him has been captured and
delivered to others.
The ASTRA system offers a simple and explicit way of
peripheral awareness between family members based on
explicit picture-based communication and on manually
inputting to the system one’s availability status, e.g., by
email, telephone, instant messaging, etc. While the field
study has shown that ASTRA provides measurable
affective benefits to its users, from a research point of view
it is interesting to study to which extent adding more flows
of communication and some degree of automation to the
system will add more benefits without incurring too many
costs. Relevant costs that may be experienced are the loss
of autonomy because someone feels watched, the feeling of
being obliged to return calls, the disappointment of not
receiving an expected answer to a message, etc. Other costs
relate to the effort of continually updating one’s status or
taking pictures or being disrupted by the arrival of new
messages.
To a large extent improvements to ASTRA can help
alleviate some of the costs mentioned above. Possible
extensions include:
•
•
Automatic notification to the sender when a
picture has been viewed by the receiver(s).
Peripheral awareness of the history of use of the
home device by a user.
•
Automatic presence capturing when a user is
using/looking at the home device.
•
Machine perception (e.g. sensors) to support users
to manage their reachability information could
help family members to control the disclosing and
access of information while avoiding excessive
interaction workload.
However, besides these benefits automation also incurs
costs. For example, there is a tension between providing
the desired level of control of what it is captured, shared
and displayed and the convenience level of interaction a
user wants to engage.
Automatic capture can help users to effortlessly maintain
sense of each other’s context and activities. For example,
an Awareness System can provide cues of the type of
situation in which users find themselves (e.g. private or
public context). This information can help users to adapt
their behaviours between different situations [2]. For
example, in an elderly care situation a daughter who is
concerned of how her elderly father is doing can refrain
Automatic capture offers a good variety of techniques to
support these goals. Sensors and logs activities are some of
the techniques we want to explore.
PRIVACY IN AWARENESS SYSTEMS
Awareness Systems technologies provide access to an
increase amount of information that is captured by sensing
context in physical environment and social situation. This
capability is potentially valuable for the consumer who has
access to increasing volumes of content, through numerous
media, places and times of the day. Critical for ensuring
user acceptance is to find a balance between the amount of
personal information captured, how it is captured, the way
it will be use, etc., and protection of user’s privacy.
Typically questions of privacy are interpreted by people to
refer to: (a) undue continuous surveillance by third party
(what is known as the ‘Big Brother’); (b) unauthorised
access to private information.
These views are rather restricted, as we can see by a simple
consideration of privacy issues in the ASTRA system.
In the ASTRA system every user of the home device has
their own “area”. This is where they can see postcards sent
only to them instead of the household. In the design phase,
we decided not to use any authorization process for
accessing this area (e.g. login/password). Within a family
we can rely on social norms. Family members will
normally respect each other’s privacy and refrain from
opening doors that they should not [1] or peeping into
drawers or personal objects (e.g. a teenager’s diary).
Protective and security mechanisms seemed an unnecessary
interaction cost inconsistent with the idea of having a low
cost/effort communication medium.
During the field tests, mobile individuals did not send at all
to persons direct. They indicated that nothing was too
personal in nature as they appreciated communicating at
once to the whole household and not just one member.
In conclusion privacy issues emerge already without
introducing surveillance; however a feeling of being under
surveillance can arise when a constantly on communication
channel is open and when a feeling of obligation to interact
is felt. Also, privacy and information security are not
synonymous. Privacy management can be achieved by
social interactions and social rules within a group of people
and may concern equally well the will to share information
instead of protecting it only.
Awareness Systems can lead to us knowing more than we
want about friends and family, breaching their privacy or
creating embarrassment For example, the parents may
unintentionally obtain an overview of their teenage
daughters’ social network, or grandma may find out that the
grandchild who said she couldn’t visit because of a school
trip, is at home listening to music.
Besides providing inappropriate amount of information
another privacy aspect of Awareness Systems concerns the
failure to establish appropriate interaction/communication
patterns. E.g., an always-on channel for communication
between a mother and her son who is far from home, tells
her lots about his daily routine and activities when at home.
This can give her a sense of connectedness but may also
give rise to an undesired level of engagement whenever the
son is at home, even if he just wants to stay there without
having to interact with anyone.
Looking at these two kinds of privacy failures in Awareness
Systems, it seems crucial to enable users to regulate the
process of privacy management. Our aim is to design a
“Privacy Profile Interface” (PPI) for Awareness Systems
that helps the user determine their own balance between
their needs for communication and privacy.
PRIVACY IN SOCIAL PSYCHOLOGY
Early works like Westin’s [14] and Altman’s [1] theory
study privacy from a social perspective, i.e. pertaining to
human-human unmediated social interactions. Both these
works conceptualise privacy as a dynamic process between
the desire of being alone and the desire of interacting with
others.
Westin’s theory of privacy states and functions [14] has
been an influential discussion on the ways people might
want to achieve privacy, focusing on different ways and
reasons for individuals to be alone or to be left alone. He
identifies four different types of privacy (solitude,
anonymity, intimacy and reserve) used as mechanisms to
achieve four purposes or ends of privacy (personal
autonomy, emotional release, self-evaluation and limited
and protected communication). Without getting into the
details of his theory we can clearly see that privacy may
refer to groups as well as individuals level, can be affected
through physical separation or behavioural mechanisms of
people.
Altman takes a broader view than Westin consider privacy
as a dialectic process by which people manage the extent to
which they are accessible to the environment. He defines
behavioural mechanisms (verbal and non-verbal behaviour,
personal space and territory, and cultural defined norms and
practices) for privacy regulation. He includes in his theory
both social and environmental psychological concepts and
describes how environment use by people is used to
manage privacy (e.g. territory, personal space) and how
these mechanisms affect regulation of social interaction
looking at both input (e.g. regulating who visits, being
observed) and output (e.g. disclosing to another) aspects of
privacy.
Although these theories do not cover the high complexity
that privacy brings about when trying to study its impact in
awareness systems, it gives us a good framework to
conceptualize privacy in the context of human social
behaviours and human social needs.
Mediated Social Communication
Social communication can be characterized as an
interaction need of users to exchange information, and an
outeraction [9] need that comprises several conversational
processes outside the exchange of information to reach out
others for communication.
Following the same idea, privacy concerns can be divided
under two perspectives: information and interactional
control perspective. An information perspective addresses
privacy of the information content communicated: what
information users want to exchange? How? When? To
whom?. An interactional perspective addresses privacy of
the outeraction needs for communication: what behavioural
mechanisms users need? In which context? How to support
them?. Rather than controlling access to Personal
Information (PI) an interaction control perspective
encourages users to develop their own social mechanisms to
address the problem of interruption undesired
communication.
PRIVACY MANAGEMENT
Recent studies [7, 12, 15] mainly focused in the mobile
communication domain have developed several techniques
to address these conflicts. We vision the state of art of
privacy in Awareness Systems in terms of the distinction
between information and interaction control perspective.
Information Control Perspective
Most of the works done try to facilitate communication by
helping users to control their own PI and to access other’s
PI. Two examples of such systems are Personal Level
Routing and Presence Cues, described below.
The Personal-level Routing [12] is a personal proxy to
maintain person-to-person reachability. It is a rule-based
engine that by asking users to set their own rules, it offers
them a routing service that tracks location, converts
message’s format and forwards it to the proper
communication medium. It protects privacy by hiding
location information and by filtering and routing incoming
messages according to user’s desires. A clear constraint of
this solution is that users need to interact with complex
interfaces to explicitly set their own rules.
The Presence Cues project [7] offers presence cues for
telephone users that display dynamic information of the
recipient’s reached number and how available s/he is for the
next call, in what they called a “life address book”. In this
case presence information is based on availability, current
reachable number and personalized status messages. It
requires users to update explicitly their own presence
information when automatically detecting a potential
updating situation, offering also a multiple-devices access
to actually perform the update. By this means it tries to
address the trade-off between overheads vs. control of
information. Although this solution provides a good
balance between automatic versus manual updating it
underestimates the highly dynamic aspect of availability
information that needs to be constantly updated. In
consequence it was not valued as a reliable and useful
social cue in their tests.
Interaction Control Perspective
Interaction control perspective
communication activities faces
conflicts:
1.
2.
in mediated social
two mayor privacy
Interactional commitment or attentional contract
[15] refers to the level of engagement both
recipient and initiator are willing to convey in their
current communication activity. For example, it
could be phrased in terms of desired effort to put:
‘a short chat’, ‘a long talk’, ‘just a note’, or in
terms of which mediums is chosen: ‘only text’,
‘only voice’, ‘only image’, ‘video’, etc. A typical
conflict scenario will be how to negotiate the
initiator’s intention for communication with the
recipient’s desired level of commitment.
There is a natural asymmetry between initiator and
recipient refers to the unbalanced power that the
initiator has over the recipient mainly when
starting a communication activity.
Push-to-talk [15] represents the idea of protection of
privacy by an interactive negotiation. Based on cellular
radio technology it offers direct and accessible
communication channel between small groups of people. It
covers several styles of conversation like bursty,
intermittent and focused. Instead of relying on automatic
management of users’ reachability, it relies upon
lightweight social interaction mechanisms to avoid
undesired levels of engagement when communicating.
For example, plausible deniability of presence by the
recipient helps to negotiate the intention of the initiator with
the desired commitment of the recipient by that time with
low social cost. Delaying/omitting responses, provides a
more relaxed protocol where expectations or obligations are
not strong enough to overrule the personal desire at that
time of interacting with another person. Decreased costs
for openings/closings makes it easier for both the initiator
and the recipient to propose and/or to reject an initiation of
a conversation without feeling too much responsibility on
that action. While it seems to be a very effective solution to
protect privacy its success is mostly based on supporting
only small groups of people where a (high) level of socialknowledge already exists. The design question here is: to
which extent can aware systems afford sufficiently
numerous and flexible such mechanisms to support users
control their social interactions?.
FUTURE WORK
The every day perception of the term privacy is associated
with threats, violations, misuse, etc. of personal information.
By answering the question of what do individuals NOT
want to share will clearly leads us to an unlimited list of
issues. Our approach proposes to observe users’ attitudes
and behaviours when using awareness systems. This can
help identify privacy requirements from such systems,
answering the question of what information about
themselves DO individuals want to share, with whom, at
what contexts/times and for what purposes. In this sense,
awareness systems provide a sociable way to study privacy
requirements. We examine the sharing of information and
the negotiation of information communication channels,
when a social purpose is pursued and when the social image
of a person is concerned. (How their “self” is presented.)
Two major design tradeoffs play a crucial role:
•
Informativeness vs. privacy, has to deal with how
much personal information a user needs and wants
to convey without violating his/her own privacy.
•
Overhead vs. control, has to do with how a user
wants to maintain his/her own personal
information.
We aim to investigate to which extent does information
management become an excessive workload for the user
and whether people can and are willing to control over
privacy management of awareness systems.
The two perspective to study privacy
As introduced and explained in previous chapters, we
propose two different perspectives to study privacy in
awareness systems: information and interactional control
perspectives. Based on the literature findings previously
described and taking advantage of an existing awareness
system, the following proposal describes how the ASTRA
system could be extended to address privacy from these two
different angles.
The ASTRA system will be extended with a PPI (Privacy
Profile Interface) to allow for management of a person’s
privacy using mechanisms that correspond to both these
perspective. The extended system will be tested in order to
validate and generalize concepts of privacy regulation to
help users with the dynamic process of privacy
management in awareness systems.
Information Control of PPI
The main objective is to address privacy concerns based on
disclosing, control, and access of information depending on
the type of information exchanged:
•
•
Information awareness that facilitates communication
can be provided by means of personal information (e.g.
availability, location, reachability), context cues (e.g.
office hours, traffic jam, sport night, holidays, etc.) and
social cues (e.g. dinner time, social evening, family
meeting, etc.)
Information content that is exchanged during
communication, where factors like sensitivity,
relevance, temporality, etc. influences how to deal with
privacy.
awareness systems. These policies should support different
levels of automation when sharing information based on
content shared, the circumstances and the audience
involved. This might guide us on the creation of a proper
design interaction framework to offer a build-up privacy
model when designing awareness systems.
Expectations from the Workshop
The focus of attention of this research can be described in
the following list of research questions:
Information perspective
•
What information do people want to be captured
implicitly by automatic capturing technique and
explicitly by input devices? How to represent
information of one’s actions with respect to a
specific receiver?
•
What information is temporality sensitive
(becomes history) when log applications are
provided? How to provide the proper interpretation
of past, present and future actions?
Interactional Control of PPI
The main objective is to address privacy concerns based on
the choice and use by the user of awareness mediated
mechanisms:
•
•
•
From the initiator point of view a major privacy need
relates to control over connection failure. This can be
supported by “preambles” where the initiator can be
informed of the readiness of the recipient for
communication, before attempting to make a contact.
The chance of easily switching media can be another
solution helping the initiator to choose the proper
media for a successful connection.
From the recipient point of view a major privacy need
relates to control the timing of a communication. For
this purpose several mechanisms can be used:
screening of messages so that messages can be easily
masked without interrupting other activities of the
recipient; plausible deniability of presence by which
the recipient decides whether to show to the initiator
that she is there or not; delaying/omitting response by
which the recipient can decide whether to react or not
on a response without incurring in high cost for not
answering a message.
From both recipient and initiator point of view the
possibility to collectively control interactional
commitment and desired level of engagement are
others mechanisms for regulation of privacy.
Interesting examples are: (1) lightweight openings and
closings with no need of fixed protocols (how are you,
I need to hang up now, etc.) that makes it easier for the
initiator to propose a contact and easier for the
recipient to engage or reject it; (2) lightweight
swapping
of
activities;
(3)
reduced
feedback/accountability where less awareness may lead
to less expectations and obligations.
CONCLUSION
This project looks forward to define a set of policies that
will ensure a proper balance between the communication
benefits and privacy costs that are experienced by users of
Interaction perspective
•
What are the desired levels of feedback
(accountability) user want when sensor capturing
occurs? How to provide understanding and
anticipation of how one’s actions appear to others?
•
What are the desired levels of control for the
receiver over the information displayed? Decision
of what to view, when and how to view it.
REFERENCES
[1] Altman, I. The environment and social behaviour.
Brooks/Cole., Monterey, CA, 1975.
[2] Anne Adams, M.A.S., Privacy Issues in Ubiquitous
Multimedia Environments: Wake Sleeping Dogs, or
Let Them Lie? Proceedings of Interact '99,
International Conference on Human-Computer
Interaction, Edinburgh, UK, 1999, IOS Press, IFIP
TC.13, 214-221.
[3] Bly, S., Harrison, S.R. and Irwin, S., Media Spaces:
Bringing People Together in a Video, Audio and
Computing Environment. in Communications of the
ACM, (1993), 28-47.
[4] Eggen, B., Hollemans, G. and Sluis, R.v.d. Exploring
and enhancing the home experience. Journal of
Cognition Technology and Work, 5. 44-54 , 2001.
[5] Langheinrich, M., Privacy by Design - Principles of
Privacy-Aware Ubiquitous Systems. Proceedings of
International Conference on Ubiquitous Computing Ubicomp, Atlanta, Georgia, 2001, Springer
[6] Markopoulos, P., Romero, N., Baren, J.v.,
IJsselsteijn, W., de Ruyter, B. and Farshchian, B.,
Keeping in Touch with the Family: Home and Away
with the ASTRA Awareness System. To appear in
Proceedings CHI 2004, Extended Abstracts, Vienna,
2004, ACM Press.
[7] Milewski, A.E. and Smith, T.M., Providing presence
cues to telephone users. Proceedings of the 2000
ACM conference on Computer supported
cooperative work, Philadelphia, Pennsylvania,
United States, 2000, ACM Press, 89-96.
[8] Mynatt, E.D., Back, M. and Want, R., Designing
Audio Aura. Proceedings of CHI 98, Los Angeles,
CA, USA, 1998, 556-573.
[9] Nardi, B.A., Whittaker, S. and Bradner, E.,
Interaction and Outeraction: Instant Messaging in
Action. Proceedings of the 2000 ACM conference on
Computer supported cooperative work, Philadelphia,
Pennsylvania, United States, 2000, ACM Press, 7988.
[10] Palen, L. and Dourish, P., Unpacking privacy for a
networked world. Proceedings of CHI’03, Ft.
Lauderdale, Florida, USA, 2003, ACM Press.
[11] Romero, N., van Baren, J., Markopoulos, P., de
Ruyter, B. and IJsselsteijn, W., Addressing
interpersonal
communication
needs
through
ubiquitous connectivity: Home and away.
Proceedings of Ambient Intelligence, 2003, SpringerVerlag, 419-431.
[12] Roussopoulos, M., Maniatis, P., Swierk, E., Lai, K.,
Appenzeller, G. and Baker, M., Person-level Routing
in the Mobile People Architecture. Proceedings of
2nd USENIX Symposium on Internet Technologies
and Systems, Boulder, Colorado, USA, 1999.
[13] Sven Meyer, A.R., A survey of research on contextaware homes. Proceedings of Australasian
information security workshop conference on ACSW
frontiers, (Adelaide, Australia, 2003), Australian
Computer Society, Inc, 159 - 168.
[14] Westin, A.F. Privacy and Freedom. Atheneum, New
York NY, 1967.
[15] Woodruff, A. and Aoki, P.M., How push-to-talk
makes talk less pushy. Proceedings of the 2003
international ACM SIGGROUP conference on
Supporting group work, Sanibel Island, Florida, USA,
2003, ACM Press, 170 - 179.
Capturing Conversational Participation in a Ubiquitous
Sensor Environment
Yasuhiro Katagiri
ATR Media Information
Science Labs.
2-2-2 Hikaridai Keihanna
Science City Kyoto Japan
+81 774 95 1480
[email protected]
Mayumi Bono
ATR Media Information
Science Labs.
2-2-2 Hikaridai Keihanna
Science City Kyoto Japan
+81 774 95 1466
[email protected]
ABSTRACT
We propose the application of ubiquitous sensor technology
to capturing and analyzing the dynamics of multi-party
human-to-human conversational interactions. Based on a
model of conversational participation structure, we present
an analysis of conversational interactions in the open
interaction space of a poster presentation session. We
argue that the patterns of transition of the conversational
roles each participant plays in conversational interactions
can be captured through analysis of the participants'
exchange of verbal and non-verbal information in
conversations. Furthermore, we suggest that this dynamics
of conversation participation structures captured in the
ubiquitous sensor environment provides us with a new
method for summarizing and displaying human memories
and experiences.
Keywords
Conversation participation, ubiquitous sensor environment,
human behavior analysis, non-verbal information
INTRODUCTION
A natural human conversation requires more than a mere
exchange of utterances between conversational participants.
Conversational participants first need to be established and
mutually admitted into the conversation before they can
engage in it. Participants play certain roles in the
conversation, and these roles change during the course of
the conversation. New participants may join and old
participants may leave. These dynamic changes are signaled
and managed through the use of various verbal and nonverbal cues.
Studies on the structure of conversations and social
interactions have been conducted in the fields of sociology
and anthropology. However, little work has been done in
the CHI community, despite the increasing recognition of
the importance of non-verbal information and its functions
in human-to-human interactions. Several attempts have
recently been made to automatically extract conversational
events from speech in two person dialogues (Basu 2002,
Noriko Suzuki
ATR Media Information
Science Labs.
2-2-2 Hikaridai Keihanna
Science City Kyoto Japan
+81 774 95 1422
[email protected]
Choudhury 2004). We argue that ubiquitous sensor
technology provides a useful set of tools that facilitate the
empirical examination of human conversational processes
from real conversation data, through systematic collection
and analysis of non-verbal as well as verbal information
exchanged in multi-party conversations. Furthermore, it
offers a novel opportunity to share our memories and
experiences by exploiting information on fine-grained
verbal and non-verbal exchanges in conversations. This
data can be utilized in both summary creation and
presentation of captured experience data.
Dynamic
information on the structure of conversation participation
can be used to organize and summarize one's personal
history of social interaction (Hagita et al., 2003).
Elucidating the dynamic transitions of conversation
participation structures is also essential to developing
robotic/electronic agents that can interact with and help
humans in daily life situations.
We first present our attempt to capture the dynamics of
human conversation participation structures through the
analysis of speech turn structures and eye gaze distributions
produced by the participants in conversations obtained in
our Ubiquitous Sensor Room environment. We then present
a simple system that provides conversational participants
with real-time information feedback on the status of
ongoing conversations. Finally, we show, based on our
experiment on collecting poster presentation conversation
data, that interaction metrics, associated with conversational
participation structures, obtained from ubiquitous sensors
provide a good measure of people's interest toward objects
and events in their interactive experiences.
PARTICIPATION STRUCTURE IN CONVERSATION
Participation Structure
To engage in a conversational interaction, people first need
to establish a conversational space together with their
conversational partners, or otherwise enter an existing one,
before they can actually talk to each other. Conversational
space formation normally proceeds by conversational
partners first approaching each other to form a spatial
aggregate and then exchanging eye gaze and various forms
of greetings. Goffman (1981) analyzed the phases of
conversational interaction and defined the internal structure
of conversational space as 'participation structure' or
'participation framework.' In conversation, the participants
exchange their roles, such as 'Speaker' and 'Addressee,' by
exchanging the right of utterance for a moment. The
structure of conversation (participation structure) consists
of components, e.g.,
participants, and their
interrelationships. Both can change dynamically through
the course of conversational progressions. Clark (1996)
proposed a model of participant relationships as shown in
Figure 1.
Figure 1. Participation structure.
Clark defined Speaker as the agent of the illocutionary act
and Addressee as the participant who is the partner of the
joint action that the Speaker is projecting for them to
perform. Side Participants take part in the conversation but
are not currently being addressed. All other listeners are
Overhearers who have no rights or responsibilities in the
conversation, that is, they don't take part in it. There are two
main types of Overhearers: Bystanders are openly present
but not part of the conversation, while Eavesdroppers listen
in without the speaker's awareness.
of the audience. Although studies have been made on these
issues, such as knowledge and information in the
participant's mind, by observing the contents and forms of
utterances to derive a pragmatic interpretation, there has
been little work on non-verbal behaviors. We attempt to
elucidate in this study, with the help of ubiquitous sensor
technologies, how non-verbal cues, such as body postures
and eye gaze distributions, are utilized in the process of
audience design, and what effects they have on
conversational interactions.
CAPTURING ENVIRONMENT
In order to empirically examine conversational participation
processes from real conversation data and to investigate the
possibilities of incorporating conversational participation
information in experience sharing technologies, we have set
up a Ubiquitous Sensor Room environment and have been
collecting a corpus of conversational interaction data in
poster presentation settings (Hagita et al., 2003).
Figure 2 shows a schematic layout of the Ubiquitous Sensor
Room environment. It has several presentation booths,
each with its own set of posters and demonstrations, where
the exhibitors give poster presentations. The room has a
number of cameras and sensors for recording the behaviors
of both exhibitors and visitors.
Audience Design and Interactivity
In a conversation with more than two participants, one
person speaks to the others at a time. While a group of
listeners are collectively called an 'audience,' they don't
have equal rights and responsibilities in listening to the
Speaker's utterance. The Speaker can exert control over the
progress of conversation by selecting, from among the
members of her audience, who is to be Addressee, and she
then directs her speech to him. Clark & Carlson (1982)
introduced the notion of audience design to capture this
phenomenon. The Speaker designs an utterance for a
specified listener who is assigned the role of Addressee by
making her utterance easily accessible to him through
common background between them, for example, by
including topics that only the listener knows.
Audience design involves both verbal and non-verbal
information exchange, and it is expected to create specific
interactions between Speaker and Addressee, different from
those between Speaker and other non-Addressee members
Figure 2. Ubiquitous sensor environment.
Two cameras are fixed to the ceiling of each booth to
capture human behaviors: placement, inter-personal
distance and posture. Furthermore, each participant is
equipped with a headset microphone with sensors and a
camera that captures speech and approximate gaze
direction. We observed the relationships of speech and
gaze directions in the conversation by using the recorded
data, which indicate patterns of interaction in the
participants' verbal and non-verbal behaviors.
An interaction corpus in this environment has been
collected during the ATR Exhibitions of 2002 and 2003,
when a wide variety of people from outside ATR visited
poster and oral presentations and joined demonstrations.
We have also conducted experimental corpus collection
with a limited number of participants.
non-verbal as well as verbal information exchanges in
managing their conversational participation.
Transition of Participation Role Assignment
DYNAMICS OF PARTICIPATION
Here, we present our initial results of analyzing the
Interaction Corpus (Hagita et al., 2003, Bono et al., 2003a)
by using concepts of participation structure. We focus on
situations in which a third participant (second visitor) joins
the already established conversation between two
participants (poster exhibitor and first visitor).
Two Phases of Participation
Clark's model of participation structure indicates a natural
organization of two phases of participation: participation in
the conversational space and participation in the
conversation itself. In a poster presentation conversation,
visitors first approach the poster to hear the exhibitor's
speech and to look at the poster contents in detail. This
reduction of physical distance amounts to the initial
participation in the conversational space, i.e., being
promoted from a non-participant to a bystander participant.
Beyond participation in the conversational space, the
participant needs to be further promoted to enter the
conversation itself. The participant either takes the floor of
the conversation himself/herself (i.e., being promoted to
Speaker), is assigned the role of Addressee by the current
Speaker (i.e., being promoted to Addressee), or is admitted
to join by receiving the Speaker's gaze, namely, the
recognition of his/her existence by the existing participants
(i.e., being promoted to Side Participant). These are some
of the possibilities explaining how participation progresses
in conversation.
Figure 3. Participation in conversational space.
Participation in the conversational space does not
necessarily lead to participation in the conversation. Figure
3 shows a conversational scene taken by a fixed camera
during a poster presentation. Here, the Exhibitor (E) of the
poster and the first visitor (Visitor A: VA) are already
engaged in a conversational interaction, which the second
visitor (Visitor B: VB) is attempting to join. The scene
indicates that VB has approached E and VA, thereby
achieving the initial participation in the conversational
space. However, the video clip of the scene shows that VB
just stayed silently there for about 27 seconds and then left
the scene without actually participating in the conversation
itself. The activities of the two visitors, VA and VB, were
quite different. VA was playing the role of Addressee,
while VB was acting as a Bystander. The difference was
manifested in the non-verbal behaviors of the participants.
E and VA exchanged their eye gazes frequently while they
were talking, whereas E did not direct his eye gaze toward
VB. E and VA also directed their body postures to each
other, away from VB, as if to prevent the newcomer from
cutting off their talk. VB, after failing to find a chance to
enter the conversation, gave up and left the scene. This
small episode clearly indicates that people implicitly rely on
Figure 4. Dynamics of participation structure transitions.
Figure 4 shows an example of the interplay between nonverbal information exchanges and the switch of
conversational roles among participants. The figure shows
a sequence of events that took place in a conversational
interaction.
Each row indicates a new speech turn
beginning. The second column from the left indicates the
role of each participant in the conversation. The next three
columns show the subjects' view data taken by the wearable
cameras of three participants.
The person standing in front of the poster is the Exhibitor
(E), who initially has an absolute right to be a speaker
owing to his knowing the contents of the poster and the
generally accepted social hierarchy in this situation. The
others are visitors who came to listen to the presentation.
One of them (Visitor A: VA) came to this booth before the
other visitor (Visitor B: VB) (in scene 1). After E and VA
exchanged utterances and shared the floor of the
conversation for a while, VB arrived (in scene 2). In scene
2, E is Speaker (SPK). Since E is directing his eye gaze
toward VA, as is observed from E's view image data at
scene 2, VA is assigned an Addressee (ADR) role, and VB
is a Side Participant (SPT) in Clark's categorization.
In this sequence of events, audience behaviors, that is, the
behaviors of VA and VB, are exactly opposite: VA is
passive and VB is active. In scene 4, the role of VA was
demoted from ADR to SPT, and she did not have the right
of turn. On the other hand, VB was promoted from SPT to
SPK, so he produced speech and directed it to E. Even
though VA stays at this booth longer than VB, VB is more
active than VA. This suggests that the duration of staying
time alone is not sufficient information for understanding
participant activities, particularly their interests in their
experiences.
Figure 5 summarizes the patterns of transitions in the
conversational participation structure. The process of
participation in conversation is closely tied to audience
design by the Speaker, in the sense that some of the
transitions (e.g., Bystander to Side Participant and Side
Participant to Addressee) need to be sanctioned by the
Speaker.
frequently and rapidly among them. We call the first oneway exposition phase a lecture mode (L-mode)
conversation and the second two-way interactive discussion
phase an interactive mode (I-mode) conversation.
Figure 6 shows a typical pattern of speech pause durations
inserted in the exhibitor’s talk in both L-mode and I-mode
conversations in a presentation session. The figure indicates
that the exhibitors continue to speak with only short pauses
in L-mode, while they interleave speaking and pausing, e.g.,
take turns with other participants, in I-mode. This
difference in speaking style suggests that it is relatively
straightforward to distinguish these two conversational
modes in terms of turn dominance ratio. Figure 7 shows an
example of the conversational mode display we
implemented in ATR Exhibition 2003. The display
indicates in real time, for each poster presentation booth,
whether the conversation taking place is in L-mode (oneperson sign) or in I-mode (two-person sign). Visitors can
choose to visit a lecture session, which is more static and
probably easier to join, or to go to a discussion session,
which is more active and could be more entertaining.
Figure 6. Exhibitor turn dominance ratio in lecture mode
and interaction mode conversations.
Figure 5. Dynamic transitions of conversational
participation structure.
AUTOMATIC DETECTION AND DISPLAY FEEDBACK OF
CONVERSATIONAL MODES
Through examination of many poster presentation sessions,
we found that we could distinguish two phases in most of
the poster presentations (Bono et al., 2003b). A typical
poster presentation starts with the exposition of the contents
of the poster by the exhibitor, which is then followed by
discussions between the exhibitor and the visitors. In the
first exposition phase, the exhibitor takes the Speaker role
for most of its entire duration, while in the second
discussion phase, the exhibitor and the visitors take turns
and the Speaker/Addressee/Side Participant roles switch
Figure 7. Conversation mode display.
(a) Audience interest and time spent in front of posters.
(b) Audience interest and verbal response frequencies.
Figure 8. Audience interest and interaction.
It has often been suggested that the amount of time people
spend in front of some material, be it a web page,
merchandise, or an exhibit, can be used to measure how
much interest they have in it. Different from these
durational measures, our analysis of conversational
participation dynamics suggests another set of interest
measures from the perspective of interaction. The more
involved people get in interactions the more interested they
are in the people, objects and events in the interactions.
Participation structure dynamics could produce a good
measure for human interest in interactive experiences such
as conversations.
In order to investigate the relationship between
conversational participation and audience interest, we
conducted an experimental data collection of poster
presentation sessions in the Ubiquitous Sensor Room. The
experimental set up was exactly the same as that used for
our ATR Exhibit corpus collection, but the subjects were
recruited and paid to participate in the experiment. Three
poster booths were set up with the exhibitors and their
posters. A total of 24 subjects participated in the
experiment as visitors. After the poster presentation
sessions, they were given a questionnaire to gauge their
interest in each of the three posters.
Figure 8 shows a comparison between a durational measure
and an interaction measure for subject interest. Figure 8(a)
indicates, for each of the three posters, the number of
subjects who showed the most interest in the poster as well
as the number of subjects who spent the longest time in
front of it. No specific correlations can be seen between
interest and duration. Figure 8(b) indicates, for each of the
three posters, the number of subjects who showed the most
interest in the poster and the number of subjects who
produced the largest number of verbal responses, including
both speech turns and backchannels, in the interaction with
the exhibitor of the poster. We can see that, contrary to the
ineffectiveness of the durational measure, the amount of
interactive responses makes a good measure for subject
interest.
When people get involved in conversational interactions,
they invariably play the Addressee role, as well as the
Speaker role, a number of times during the course of the
conversation. Since the Speaker, as a way of her audience
design, selects an Addressee by directing her gaze toward
him, Speaker gaze allocation could be another candidate for
the measure of subject interest. Figure 9 shows the
relationship between the visitors' interest toward posters
and the temporal duration for which they were given the
Speaker's gaze, and hence for which they played the
Addressee role. The figure indicates that visitors who
ranked a poster the most interesting actually played the
Addressee role the most frequently by receiving the
Speaker’s gaze for the longest duration in the presentation
sessions. There were tending to be significant differences in
three rankings of interest (F = 2.82, p = .09) as a result of
full factorial ANOVAs using between-subject factors. As a
result of multiple comparisons, we found significant
difference between ranking of interest No. 1 and No. 3 (p
< .05).
Addressee in Poster1 (%)
AUDIENCE INTEREST THROUGH INTERACTION
80
60
40
20
0
1
2
3
Ranking of Interest
Figure 9. Effect of audience design on audience interest to
poster 1.
These results suggest that the dynamics of conversational
participation structures can provide us with a good measure
to gauge people's interest in various objects and events.
Aspects of this dynamics can be captured with our
Ubiquitous Sensor Room environment by examining both
verbal and non-verbal signals exchanged in human
conversational interactions.
These measures could
effectively be employed in experience technologies, both in
producing summaries of our memories and in providing
assistance in presenting and discussing our experiences,
through the extraction of interesting episodes that have
significant meanings for us.
CONCLUSIONS
We proposed an application of ubiquitous sensor
technology for capturing and analyzing the dynamics of
multi-party human-to-human conversational interactions.
We presented our analysis results of conversational
interactions carried out in open interaction space of a poster
presentation session. We showed that multiple channel
speech and view data collected for each of the
conversational participants, together with pictures taken by
ceiling-mounted cameras, provide us with a good source of
information from which to identify patterns of dynamic
transitions of conversational participation structures. We
then argued that this dynamics of conversational structures
can be utilized to identify objects and events that are
significant for people’s interactive experiences.
The implications of our study, although still preliminary
and restricted to a small range of interaction types, are
promising for future extensions. We believe the
methodology developed in this paper, that is, elucidating
the use of verbal and non-verbal cues in human-to-human
interactions with the help of ubiquitous sensor environment
technologies, has huge potential in developing technologies
for sharing memories and experiences, particularly when it
is combined with automatic signal processing techniques.
ACKNOWLEDGMENTS
This research was supported
in part by the
Telecommunications Advancement Organization of Japan.
REFERENCES
1. Basu, S. 2002. Conversational Scene Analysis. Ph.D thesis at
the Massachusetts institute of Technology.
2. Bono, M., Suzuki, N. and Katagiri, Y. 2003a. An analysis of
participation structure in conversation based on Interaction
Corpus of ubiquitous sensor data. M.Rauterberg et al. (Eds.)
INTERACT 03: Proceedings of the Ninth IFIP TC13
International Conference on Human-Computer Interaction.
713-716. IOS Press.
3. Bono, M., Suzuki, N. and Katagiri, Y. 2003b. An analysis of
non-verbal cues for turn-taking through observation of
speaker behaviors. ICCS/ASCS-2003: Proceedings of the
Joint International Conference on Cognitive Science (CDROM), Elsevier.
4. Choudhury, T, K. 2004. Sensing and modeling human
networks. Ph.D thesis at the Massachusetts institute of
Technology.
5. Clark, H. H. and Carlson, T. B. 1982 Hearers and speech acts.
Language, 58: 332-373.
6. Clark, H. H. 1996 Using language. Cambridge University
Press.
7. Goffman, E. 1981 Forms of talk. University of Pennsylvania
Press.
8. Goodwin, C. 1981 Conversational organization: Interaction
between speakers and hearers. New York: Academic Press.
9. Hagita, N., Kogure, K., Mase, K. and Sumi, Y. 2003
Collaborative Capturing of Experiences with Ubiquitous
Sensors and Communication Robots. 2003 IEEE International
Conference on Robotics and Automation (IEEE ICRA 2003).
10. Kendon, A. 1990 Conducting interaction. Cambridge
University Press.
ISBN: 4-902401-01-0