Towards a Smart Wearable Tool to Enable People with SSPI
Towards a Smart Wearable Tool to Enable People with SSPI
Challenge Handicap et Technologie 2014, May 26-27, Lille, France Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments András Németh (presenter, constructor, ELTE) – on movie Anita Verő (natural language processing, ELTE) – on movie András Sárkány (natural language processing, ELTE) – on movie Gyula Vörös (natural language processing, ELTE) – on movie Brigitta Miksztay-Réthey (special need expert, ELTE) Takumi Toyama (eye-tracking solution, DFKI) Daniel Sonntag (head of the German group, DFKI) András Lőrincz (project leader, ELTE) Motivation • The ability to communicate with others is of paramount importance for mental well-being. • SSPI: Severe speech and physical impairments / communication disorders – Cerebral palsy, stroke, ALS/motor neurone disease (MND), muscle spacticity etc. • People with SSPI face enormous challenges in daily life activities. • A person who cannot speak may be forced to only communicate directly to his closest relatives, thereby completely relying on them for interacting with the external world. • Understanding people using traditional alternative and augmentative communication (AAC) methods (such as gestures and communication boards) requires training. • These methods restrict possible communication partners to those who are already familiar with AAC. Motivation • The ability to communicate with others is of paramount importance for mental well-being. • In AAC, utterances consisting of multiple symbols are often telegraphic: they are unlike natural sentences, often missing words to allow for successful and detailed communication. (“Water now!” instead of “I would appreciate a glass of water, immediately.”) – Some systems allow users to produce whole utterances or sentences that consist of multiple words. The main task of the AAC system is to store and retrieve such utterances. – However, using a predefined set of sentences severely restricts the applicability. • Other approaches allow for a generation of utterances from an unordered, incomplete set of words, but they use predefined rules that constrain communication. Augmented Reality in Medicine Video: 4 System Architecture: Mobile Web and App Design Speech-based interaction XML-RPC ERmed Proxy Ermed Bridge TCP Pen-based interaction HMD & Gaze-based interaction Goals of presented work • To enable broad accessibility and communication possibilities for people with SSPI • Technical Challenges: – Overcome physical impairments (for user input) – Help non-trained people to understand people with SSPI (towards generation) Approach • The most effective way for people to communicate would be spontaneous language generation. • Novel utterance generation: the ability to say “almost everything,” without a strictly predefined set of possible utterances/generations/productions. • We attempt to give people with SSPI the ability to say almost anything. For this reason, we would like to build a general system that produces novel utterances without predefined rules. We chose a data-driven approach in the form of statistical language modeling. • In some conditions (e.g., cerebral palsy), people suffer from communication and very severe movement disorders at the same time. For them, special peripherals are necessary. Eye tracking provides a promising alternative to people who cannot use their hands. 7 Fourfold solution (1) Smart glasses with gaze trackers (thereby extending ) (gaze tracking and interpretation/graphical symbol selection) (2) Symbol / Utterance selection (3) Language generation (sentence generation, thereby using language models to propose best hypotheses) (4) Text-To-Speech functionality (TTS) 8 Hardware components • Eye tracking glasses (ETG) – Forward-looking camera – Eye-tracking cameras • Head mounted display (HMD) – See-through • Motion Processing Unit (MPU) – Capture head gestures that indicate recalibration need Setup: Moverio Glass with Display Inertial Motion Unit WheelPhone WheelPhone + Smart Phone + Constructor 10 Setup with eye-tracking AiRScouter Head Mounted Display Brother Industries Eye Tracking Glasses by SensoMotoric Instruments GmbH MPU-9150 Motion Processing Unit InvenSense Inc. In the experiments Communication Board 12 Main functions • Symbol selection with gaze tracking – Calibration is crucial and poses a problem – Adapt or recalibrate? • Utterance generation – Based on selected symbols – Uses natural language models Usage scenario and steps Experiment 1 Retina HMD Setup 15 Symbol selection on HMD • Idea: the user selects communication symbols on the HMD, using eye tracking • Crucial problems with calibration – Eye tracking has to be calibrated – Calibration may degrade over time – Adapt or recalibrate? • User might tolerate calibration errors (when their are small and negligible) • User should be able to initiate recalibration (head gesture) Symbol selection on HMD Symbol selection game on HMD • Goal: select the numbers in correct order (1234123…) • Each selection adds a small amount of error • Participants indicate when error is too significant to be tolerated • 4 participants, 4 experiments each • On average, errors up to 2.7° deviation from actual eye focus are tolerable. Real communication experiments • The Brother Retina HMD was too small to show a fullsized communication board (CB) (although resolution is quite good) • We use a projector / beamer to display the CB – Simulate a large HMD • Recognition Feedback is provided (to the user) – Estimated gaze position + selection • Speech generation – word for the selected symbol synthesized Experiment 2 Projector as HMD Surrogate Real patient(s) 20 Participant • The participant of this example test is a 30 year old man with cerebral palsy. • He usually communicates with a headstick and an alphabetical communication board, or with a PC-based virtual keyboard controlled by head tracking. • Mousense is already an improvement. 21 Bliss symbols and words (in Hungarian) on board tea, two, sugar (tea, kettő, cukor) tea, lemon, sugar (tea, citrom, cukor) one, glass, soda (egy, pohár, kóla) 23 Communication Setting • The participant could move his head • The head position must be accounted for (MPU) • Fiducial markers around the board are recognized by vision-based pattern recognition techniques Usage and HCI details • The estimated gaze position was indicated as a small red circle on the projected board (similar to the previous test). • A symbol was selected by keeping the red circle on it for two seconds. • The eye tracker sometimes needs recalibration; – the user could initiate recalibration by raising his head up (detected by the MPU.) – Once the recalibration process was triggered, a distinct sound was played, and an arrow indicated where the participant had to look for doing RC. 25 Communication Setup Communication experiments • Goal: communicate with a partner • Two situations (with different boards) – Buying food – Discussing an appointment Experimental results • Verification – To verify that communication was successful, the participant indicated misunderstandings using wellknown yes-no gestures, which were quick and reliable. Moreover, a certified expert in AAC was present to indicate apparent communication problems. – 205 symbol selections happened – 23 of them were incorrect • 89% accuracy – The error rate was acceptable • Real communication took place! Experiment 3 External Symbols (towards mixed reality setup) 29 External symbols • Idea – Not all symbols are present in the system – Optical character recognition can help – (Object recognition can help) • Example – The user wants to buy a certain type of sandwich – In the store, there are labels with the names of the sandwich types on it • The OCR was simulated External symbols - Communication Setting 32 Results • Wizard-of-Oz experiment for OCR • Similar “good” results as in experiment 2 33 Technical input processing and sentence generation methods 34 Sentence fragment generation • A word guessing game – Good afternoon, how are you? sorry! – I apologize for being late, I am very – My favorite OS is [Linux, Mac OS, Windows XP, Win CE]. • A symbol guessing game • LM needs to help – To select words with the right sense (disambiguate homonyms: e.g., river bank versus money bank) – To select hypothesis where graphical symbols (words) are ‘in the right place’ – To increase cohesion between words (agreement) 35 Sentence fragment generation • Understanding symbol communication requires practice • Symbol communication is non-syntactical – Function words are rarely used – Order of symbols may vary • e.g., {lemon, sugar, tea} -> tea with lemon and sugar • Idea – Generate fragments by inserting function words – Rank them based on language models – User should select from top 4 variants Language Models • Estimate the probabilities of n-grams (sequences of words) • P(tea with sugar) > P(tea the sugar) – Use a corpus (collection of texts) • Sparsity problem – Long n-grams tend to be rare – Smoothing and backoff methods are used to deal with this problem 37 Stupid backoff • Let • denote a string of L tokens of a fixed vocabulary approximation reflects the Markov assumption that only the most recent n-1 tokens are relevant when predicting the next word. • For any substring wij of w1L let denote the frequency of occurrence of that substring in the longer training data. The maximum likelihood probability estimates for the n-grams are given by their relative frequencies • • Problematic because (de-)nominator can be zero. • Appropriate conditions are needed on the r.h.s. • Noisy: estimates need smoothing Language corpora in use • Google Books n-gram corpus – Collection of digitized books from Google – Very large, freely available – Represents written language • OpenSubtitles corpus – Collection of film and TV subtitles – Moderate size, freely available – Represents spoken language Language modeling tools used • Google Books n-gram corpus – Software: BerkeleyLM • Pauls, A., Klein, D.: Faster and Smaller N-Gram Language Models. In: 49th Annual Meeting of the ACL: Human Language Technologies, Vol. 1, pp. 258--267. ACL, Stroudsburg, USA (2011) – Method: Stupid Backoff • • Brants, T., Popat, A. C., Xu, P., Och, F. J., Dean, J.: Large language models in machine translation. In: EMNLP 2007, pp. 858--867. OpenSubtitles corpus – Software: KenLM • Heafield, K.: KenLM: Faster and Smaller Language Model Queries. In: EMNLP 2011 Sixth Workshop on Statistical Machine Translation, pp. 187--197. ACL, Stroudsburg, USA (2011) – Method: Modified Kneser-Ney smoothing • Heafield, K., Pouzyrevsky, I., Clark, J. H., Koehn, P.: Scalable Modified Kneser-Ney Language Model Estimation. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 690--696. Curran Associates, Inc., New York, USA (2013) Prefix tree building algorithm (1) • Representation – Work with a prefix tree, where each path represents an n-gram • Input – Set of named entities (e.g., tea, sugars) – Set of function words (e.g., you, with, for) – Desired length of sentence fragment (e.g., 3 words) • Parameters – Score threshold (minimal score, e.g., 10-30) – Leaf limit (maximal number of open leafs, e.g., 200) • Output – Tree (or a list) of potential sentence fragments • Contain all the important words • Ordered by estimated probability (score) Prefix tree building algorithm (2) • Essentially a breadth-first search, with some constraints • Start with an empty tree, root node is open • While there is an open node: – Extend all open leafs with all available words, if the resulting n-gram’s “score” is above a certain threshold – Discard every open leaf node that cannot be extended (falls below threshold) – Close every open leaf except the highest scoring ones (leaf limit) Initial state Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Generated text fragments examples Conclusion on NLP • A system to enable people with SSPI to communicate with natural language sentences • Demonstrated the feasibility of the approach Our contribution • An interaction system to reduce communication barriers for people with severe speech and physical impairments (SSPI) such as cerebral palsy. • The system consists of two main components: (i) the head-mounted humancomputer interaction (HCI) part consisting of smart glasses with gaze trackers and text-to-speech functionality (which implement a communication board and the selection tool), and (ii) a natural language processing pipeline in the backend in order to generate complete sentences from the symbols on the board. • We developed the components to provide a smooth interaction between the user and the system thereby including gaze tracking, symbol selection, symbol recognition, and sentence generation. • Our results suggest that such systems can dramatically increase communication efficiency of people with SSPI. Eye Tracking Requirements • Eye gaze is a compelling interaction modality but requires user calibration before interaction can commence. • State of the art procedures require the user to fixate on a succession of calibration markers, a task that is often experienced as difficult and tedious. 53 Possible improvements and outlook • 3D gaze tracker that estimates the part of 3D space observed by the user • OCR, signs and object recognition methods to convert “symbols” to the communication board • new HMDs and eye tracker setups (ergonomic) • Advanced NLP to transform series of symbols to whole domain sentences within the context • integrated calibration: e.g., • personalisation according to patient record • CPS integration 54 The Narrative Clip is a tiny, automatic camera and app that gives you a searchable and shareable photographic memory. 55 56 57 Thank you for your attention!
Similar documents
paper - ELTE
The most effective way for people to communicate would be spontaneous
novel utterance generation – the ability to say anything, without a strictly predefined set of possible utterances [8].
We atte...
Towards a Smart Wearable Tool to Enable People with SSPI
of utterances from an unordered, incomplete set of words [5, 6, 7], but they use
predefined rules that constrain communication.
The most effective way for people to communicate would be spontaneous...