Towards a Smart Wearable Tool to Enable People with SSPI

Transcription

Towards a Smart Wearable Tool to Enable People with SSPI
Challenge Handicap et Technologie
2014, May 26-27, Lille, France
Towards a Smart Wearable Tool to Enable
People with SSPI to
Communicate by Sentence Fragments
https://www.youtube.com/watch?v=qzPAY4f8Wc8
András Németh (presenter, constructor, ELTE) – on movie
Anita Verő (natural language processing, ELTE) – on movie
András Sárkány (natural language processing, ELTE) – on movie
Gyula Vörös (natural language processing, ELTE) – on movie
Brigitta Miksztay-Réthey (special need expert, ELTE)
Takumi Toyama (eye-tracking solution, DFKI)
Daniel Sonntag (head of the German group, DFKI)
András Lőrincz (project leader, ELTE)
Motivation
• The ability to communicate with others is of paramount importance for mental
well-being.
• SSPI: Severe speech and physical impairments / communication disorders
– Cerebral palsy, stroke, ALS/motor neurone disease (MND), muscle spacticity
etc.
• People with SSPI face enormous challenges in daily life activities.
• A person who cannot speak may be forced to only communicate directly to
his closest relatives, thereby completely relying on them for interacting with
the external world.
• Understanding people using traditional alternative and augmentative
communication (AAC) methods (such as gestures and communication
boards) requires training.
• These methods restrict possible communication partners to those who are
already familiar with AAC.
Motivation
• The ability to communicate with others is of paramount importance for mental
well-being.
• In AAC, utterances consisting of multiple symbols are often telegraphic: they
are unlike natural sentences, often missing words to allow for successful and
detailed communication. (“Water now!” instead of “I would appreciate a glass of water,
immediately.”)
– Some systems allow users to produce whole utterances or sentences
that consist of multiple words. The main task of the AAC system is to
store and retrieve such utterances.
– However, using a predefined set of sentences severely restricts the
applicability.
• Other approaches allow for a generation of utterances from an unordered,
incomplete set of words, but they use predefined rules that constrain
communication.
Augmented Reality in Medicine
Video: https://dl.dropboxusercontent.com/u/48051165/ERmed_CHI2013video.mp4
4
System Architecture: Mobile Web and App Design
Speech-based interaction
XML-RPC
ERmed Proxy
Ermed Bridge
TCP
Pen-based
interaction
HMD & Gaze-based
interaction
Goals of presented work
• To enable broad accessibility and
communication possibilities for people with
SSPI
• Technical Challenges:
– Overcome physical impairments (for user
input)
– Help non-trained people to understand people
with SSPI (towards generation)
Approach
•
The most effective way for people to communicate would be spontaneous
language generation.
•
Novel utterance generation: the ability to say “almost everything,” without a
strictly predefined set of possible utterances/generations/productions.
•
We attempt to give people with SSPI the ability to say almost anything. For
this reason, we would like to build a general system that produces novel
utterances without predefined rules. We chose a data-driven approach in the
form of statistical language modeling.
•
In some conditions (e.g., cerebral palsy), people suffer from communication
and very severe movement disorders at the same time. For them, special
peripherals are necessary. Eye tracking provides a promising alternative to
people who cannot use their hands.
7
Fourfold solution
(1) Smart glasses with gaze trackers (thereby
extending
) (gaze tracking and
interpretation/graphical symbol selection)
(2) Symbol / Utterance selection
(3) Language generation (sentence generation,
thereby using language models to propose best
hypotheses)
(4) Text-To-Speech functionality (TTS)
8
Hardware components
• Eye tracking glasses (ETG)
– Forward-looking camera
– Eye-tracking cameras
• Head mounted display (HMD)
– See-through
• Motion Processing Unit (MPU)
– Capture head gestures that indicate
recalibration need
Setup:
Moverio Glass with Display
Inertial Motion Unit
WheelPhone
WheelPhone + Smart Phone + Constructor
10
Setup with eye-tracking
AiRScouter Head Mounted Display
Brother Industries
Eye Tracking Glasses by
SensoMotoric Instruments GmbH
MPU-9150 Motion Processing Unit
InvenSense Inc.
In the experiments
Communication Board
12
Main functions
• Symbol selection with gaze tracking
– Calibration is crucial and poses a problem
– Adapt or recalibrate?
• Utterance generation
– Based on selected symbols
– Uses natural language models
Usage scenario and steps
Experiment 1
Retina HMD Setup
15
Symbol selection on HMD
• Idea: the user selects communication
symbols on the HMD, using eye tracking
• Crucial problems with calibration
– Eye tracking has to be calibrated
– Calibration may degrade over time
– Adapt or recalibrate?
• User might tolerate calibration errors (when their
are small and negligible)
• User should be able to initiate recalibration (head
gesture)
Symbol selection on HMD
Symbol selection game on HMD
• Goal: select the numbers in correct order (1234123…)
• Each selection adds a small amount of error
• Participants indicate when error is too significant to be
tolerated
• 4 participants, 4 experiments each
• On average, errors up to 2.7° deviation from actual eye
focus are tolerable.
Real communication experiments
• The Brother Retina HMD was too small to show a fullsized communication board (CB) (although resolution is
quite good)
• We use a projector / beamer to display the CB
– Simulate a large HMD
• Recognition Feedback is provided (to the user)
– Estimated gaze position + selection
• Speech generation
– word for the selected symbol synthesized
Experiment 2
Projector as HMD Surrogate
Real patient(s)
20
Participant
• The participant of this example test is a 30 year old man
with cerebral palsy.
• He usually communicates with a headstick and an
alphabetical communication board, or with a PC-based
virtual keyboard controlled by head tracking.
• Mousense is already an improvement.
https://www.youtube.com/watch?v=C4DOmMtgqz0
21
Bliss symbols and words (in Hungarian) on
board
tea, two, sugar
(tea, kettő, cukor)
tea, lemon, sugar
(tea, citrom, cukor)
one, glass, soda
(egy, pohár, kóla)
23
Communication Setting
• The participant could move his head
• The head position must be accounted for (MPU)
• Fiducial markers around the board are recognized by
vision-based pattern recognition techniques
Usage and HCI details
• The estimated gaze position was indicated as a small red
circle on the projected board (similar to the previous test).
• A symbol was selected by keeping the red circle on it for
two seconds.
• The eye tracker sometimes needs recalibration;
– the user could initiate recalibration by raising his
head up (detected by the MPU.)
– Once the recalibration process was triggered, a
distinct sound was played, and an arrow indicated
where the participant had to look for doing RC.
25
Communication Setup
Communication experiments
• Goal: communicate with a partner
• Two situations (with different boards)
– Buying food
– Discussing an appointment
Experimental results
• Verification
– To verify that communication was successful, the
participant indicated misunderstandings using wellknown yes-no gestures, which were quick and
reliable. Moreover, a certified expert in AAC was
present to indicate apparent communication
problems.
– 205 symbol selections happened
– 23 of them were incorrect
• 89% accuracy
– The error rate was acceptable
• Real communication took place!
Experiment 3
External Symbols (towards mixed
reality setup)
29
External symbols
• Idea
– Not all symbols are present in the system
– Optical character recognition can help
– (Object recognition can help)
• Example
– The user wants to buy a certain type of
sandwich
– In the store, there are labels with the names of
the sandwich types on it
• The OCR was simulated
External symbols - Communication Setting
32
Results
• Wizard-of-Oz experiment for OCR
• Similar “good” results as in experiment 2
33
Technical input processing
and sentence generation
methods
34
Sentence fragment generation
•
A word guessing game
– Good afternoon, how are
you?
sorry!
– I apologize for being late, I am very
– My favorite OS is [Linux, Mac OS, Windows XP, Win CE].
•
A symbol guessing game
•
LM needs to help
– To select words with the right sense (disambiguate homonyms: e.g., river
bank versus money bank)
– To select hypothesis where graphical symbols (words) are ‘in the
right place’
– To increase cohesion between words (agreement)
35
Sentence fragment generation
• Understanding symbol communication requires
practice
• Symbol communication is non-syntactical
– Function words are rarely used
– Order of symbols may vary
• e.g., {lemon, sugar, tea} -> tea with lemon and sugar
• Idea
– Generate fragments by inserting function words
– Rank them based on language models
– User should select from top 4 variants
Language Models
• Estimate the probabilities of n-grams (sequences of
words)
• P(tea with sugar) > P(tea the sugar)
– Use a corpus (collection of texts)
• Sparsity problem
– Long n-grams tend to be rare
– Smoothing and backoff methods are used to deal with
this problem
37
Stupid backoff
• Let
•
denote a string of L tokens of a fixed vocabulary
approximation reflects the Markov assumption that only the most recent n-1
tokens are relevant when predicting the next word.
• For any substring wij of w1L let
denote the frequency of occurrence of that
substring in the longer training data. The maximum likelihood probability estimates for
the n-grams are given by their relative frequencies
•
• Problematic because (de-)nominator can be zero.
• Appropriate conditions are needed on the r.h.s.
• Noisy: estimates need smoothing
Language corpora in use
• Google Books n-gram corpus
– Collection of digitized books from Google
– Very large, freely available
– Represents written language
• OpenSubtitles corpus
– Collection of film and TV subtitles
– Moderate size, freely available
– Represents spoken language
Language modeling tools used
• Google Books n-gram corpus
– Software: BerkeleyLM
•
Pauls, A., Klein, D.: Faster and Smaller N-Gram Language Models. In: 49th Annual Meeting of the ACL:
Human Language Technologies, Vol. 1, pp. 258--267. ACL, Stroudsburg, USA (2011)
– Method: Stupid Backoff
•
•
Brants, T., Popat, A. C., Xu, P., Och, F. J., Dean, J.: Large language models in machine translation. In:
EMNLP 2007, pp. 858--867.
OpenSubtitles corpus
– Software: KenLM
•
Heafield, K.: KenLM: Faster and Smaller Language Model Queries. In: EMNLP 2011 Sixth
Workshop on Statistical Machine Translation, pp. 187--197. ACL, Stroudsburg, USA (2011)
– Method: Modified Kneser-Ney smoothing
•
Heafield, K., Pouzyrevsky, I., Clark, J. H., Koehn, P.: Scalable Modified Kneser-Ney Language
Model Estimation. In: 51st Annual Meeting of the Association for Computational Linguistics, pp.
690--696. Curran Associates, Inc., New York, USA (2013)
Prefix tree building algorithm (1)
• Representation
– Work with a prefix tree, where each path represents an n-gram
• Input
– Set of named entities (e.g., tea, sugars)
– Set of function words (e.g., you, with, for)
– Desired length of sentence fragment (e.g., 3 words)
• Parameters
– Score threshold (minimal score, e.g., 10-30)
– Leaf limit (maximal number of open leafs, e.g., 200)
• Output
– Tree (or a list) of potential sentence fragments
• Contain all the important words
• Ordered by estimated probability (score)
Prefix tree building algorithm (2)
• Essentially a breadth-first search, with
some constraints
• Start with an empty tree, root node is open
• While there is an open node:
– Extend all open leafs with all available words, if
the resulting n-gram’s “score” is above a
certain threshold
– Discard every open leaf node that cannot be
extended (falls below threshold)
– Close every open leaf except the highest
scoring ones (leaf limit)
Initial state
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Generated text fragments examples
Conclusion on NLP
• A system to enable people with SSPI to
communicate with natural language
sentences
• Demonstrated the feasibility of the
approach
Our contribution
• An interaction system to reduce communication barriers for people with
severe speech and physical impairments (SSPI) such as cerebral palsy.
• The system consists of two main components: (i) the head-mounted humancomputer interaction (HCI) part consisting of smart glasses with gaze trackers
and text-to-speech functionality (which implement a communication board
and the selection tool), and (ii) a natural language processing pipeline in the
backend in order to generate complete sentences from the symbols on the
board.
• We developed the components to provide a smooth interaction between the
user and the system thereby including gaze tracking, symbol selection,
symbol recognition, and sentence generation.
• Our results suggest that such systems can dramatically increase
communication efficiency of people with SSPI.
Eye Tracking Requirements
• Eye gaze is a compelling interaction
modality but requires user calibration
before interaction can commence.
• State of the art procedures require the user
to fixate on a succession of calibration
markers, a task that is often experienced
as difficult and tedious.
53
Possible improvements and outlook
•
3D gaze tracker that estimates the part of 3D space observed by the user
•
OCR, signs and object recognition methods to
convert “symbols” to the communication board
•
new HMDs and eye tracker setups (ergonomic)
•
Advanced NLP to transform series of symbols to whole domain sentences
within the context
•
integrated calibration: e.g., https://www.andreasbulling.de/fileadmin/docs/pfeuffer13_uist.pdf
•
personalisation according to patient record
•
CPS integration
54
The Narrative Clip is a tiny, automatic camera and app
that gives you a searchable and shareable photographic
memory.
http://getnarrative.com/
55
56
https://www.spaceglasses.com
57
Thank you for your attention!