Linguistic analysis of keystroke logging data

Transcription

Linguistic analysis of keystroke logging data
Prresentatio
on
nce:
Referen
Van Waaes, L., & Leijten, M. (20
016). Linguiistic analysiis: Analyzing keystrokee logging fro
om a
linguistiic perspectiive. Presentation at Woorkshop: Ussing Keystro
oke Loggingg in Writing
Researcch, Boston, MIT. http:///www.inpu tlog.net/MIT_worksho
op.html
Trainingsschool on Keystroke Logging | Antwerp
Linguistic analysis
INPUTLOG 7.1
From character level analyses to word level analyses
a research tool for logging and analyzing writing process data
Linguistic analysis
Analyzing keystroke logging data
from a linguistic perspective: Dutch
versus English expository texts
Mariëlle Leijten & Luuk Van Waes
1
Linguistic Analyses
2
Flow linguistic analyses
The concept explained
 Aggregate letter to word level
 Parsing the S-notation
 Enriching process data with linguistic information
3
Mariëlle Leijten
Flanders Research Foundation (FWO)
University of Antwerp – [email protected]
4
Luuk Van Waes
University of Antwerp
[email protected]
Trainingsschool on Keystroke Logging | Antwerp
Aggregate letter to word level
Part of speech tagging and chunking 1
 Extract word, word groups and sentences
 Tokenize sentences
There is a man sleeping in an easy chair.
EX V DT NN
V
IN DT JJ NN
NP
EX
V
DT NN
V
IN
5
Enrichment with process data 1
Part of speech tagging and chunking 2
There is a man sleeping in an easy chair.
EX V DT NN
V
IN DT JJ NN
O-
Before Word Pause -1, -2
Thre<<ere is a_man sleapp<<ping in an easy chair.
B-
140 593
-1
-2
NP
B-
The first pause before a word (-1)
The second pause before a word (-2)
B-
B-NP I-NP
6
DT JJ NN
B-
B-NP I-NP I-NP
Mariëlle Leijten
Flanders Research Foundation (FWO)
University of Antwerp – [email protected]
Aggregated before word pause: 733ms
7
8
Luuk Van Waes
University of Antwerp
[email protected]
Trainingsschool on Keystroke Logging | Antwerp
Enrichment with process data 2
Enrichment with process data 3
Word production
Within Word Pause
7207
7145
Thre<<ere is a man sleapp<<ping in an easy chair.
Thre<<ere is a man sleapp<<ping in an easy chair.
546
499
Production time of word
[EndTime of last Character of Word – StartTime first character of word]
The sum of the pauses within a word
[WitinWordPause 1 + WitinWordPause 2 + WitinWordPause N]
9
Enrichment with process data 4
Read more
After Word Pause +1
Thre<<ere is a_man_sleapp<<ping in an easy ch...
140
+1
10
234
+1
The first pause after a word (+1)
 Leijten, M., Van Horenbeeck, E., & Van Waes, L. (2015). Analyzing writing process
data: A linguistic perspective. In G. Cislaru (Ed.), Writing(s) at the crossroads: the
process-product interface (pp.277-302). Amsterdam/Philadelphia: John
Benjamins Publishing Company. ISBN: 978 90 272 5802 1. DOI: 10.1075/Z.194
 Macken, L., Hoste, V., Leijten, M., & Van Waes, L. (2012). From keystrokes to
annotated process data: Enriching the output of Inputlog with linguistic
information. Paper presented at the Eight International Conference on Language
Resources and Evaluation (LREC'12), Istanbul, Turkey.
 Leijten, M., Macken, L., Hoste, V., Van Horenbeeck, E., & Van Waes, L. (2012).
From Character to Word Level: Enabling the Linguistic Analyses of Inputlog Process
Data. Paper presented at the European Association for Computational Linguistics,
EACL - Computational Linguistics and Writing (CL&W 2012): Linguistic and
Cognitive Aspects of Document Creation and Document Engineering, Avignon.
12
11
Mariëlle Leijten
Flanders Research Foundation (FWO)
University of Antwerp – [email protected]
Luuk Van Waes
University of Antwerp
[email protected]
Trainingsschool on Keystroke Logging | Antwerp
Linguistic perspective in L1 and L2 writing
Introduction
Analyzing keystroke logging data from a linguistic perspective
 Linguistic proficiency is important factor in writing
 Describe cognitive costs of formulation process
 Inter-word pausing dynamics
 Word patterns
 Research technique: linguistic analysis
 Semi-automatic analysis in Inputlog 7.1
 Combination of linguistics and processes
14
13
Method
Method
 Quasi-experiment ~ within subjects design
 48 students of Master in Multilingual Professional
Communication
 2 Expository writing tasks
 Data collection: Inputlog 5
 Data preparation and analysis: Inputlog 7
Relevant analysis:




 Description of last holiday (2’ planning + max 8’ writing)
 Distraction task
 Description of last weekend (2’ planning + max 8’ writing)
Summary analysis (threshold 30 & 2000ms)
Pause analysis (threshold 30 & 2000ms)
S-notation
Linguistic analysis
15
Mariëlle Leijten
Flanders Research Foundation (FWO)
University of Antwerp – [email protected]
16
Luuk Van Waes
University of Antwerp
[email protected]
Trainingsschool on Keystroke Logging | Antwerp
Final text (product)
Method
Both groups needed about 8:00 minutes to describe their holiday/weekend.
 S-notation
 Linguistic Analysis (manually corrected)
Error rate 14% Dutch and 12% English
Dutch
English
Mijn laatste vakantie was naar Tenerife. Dit was van 16 t.e.m. 21 september. Ik
ben hier met mijn vriend XXXX naartoe gegaan. We zijn bijna 3 jaar samen en
vonden het dus wel eens tijd wordne om samen op reis te gaan.
Op Tenerife was het prachtig weer, in tegenstelling tot België. in ons kleine
landje hebben we deze zomer vooral wolken en regen gezien. Onze vlucht
vertrok heel vroeg. Als ik me goed herinner, zijn we opgestaan om 3u om op tijd
op het vliegveld te geraken. We waren dus nogal moe toen we aankwamen op
het eiland. De warmte die ons tegemoet kwam toen we van het vliegtuig
stapten, veranderde dit meteen. We waren ongelooflijk blij dat we eindelijk
een week zouden kunnen genieten van de zon, de zee en het strand.
We hebben slechts twee uitstpajes gedaan, omdat ons budget beperkt was. We
moesten als student alles zelf betalen en de reis zelf kostte al redelijk wat.
uiteindelijk hebben we ervoor gekozen om de vulkaan, de Teide, te bezichtigen
en om een boottochtje te maken om dolfijnen en walvissen te spotten.
Een bijzondere gebeurtenis is er niet echt geweest, behalve dat we in de zee
waren aan het zwemmen en Hans plots een rog onder ons zag zwemmen. We
waren beide erg verschoten en liepen zeer snel het water uit.
This weekend was a long weekend, because Friday was a holiday and we didn't
have to got to school.
On Friday I haven't really done anything. I went to a party on Thursdaynight, so I
was tired and all I've done that day, was watching television with my sisters. She
had just downloaded the film Pocahontas and it was such a long time since I had
since this film.
On saturday I realised I had to do my homework, otherwise I wouldn't get it all
done in time. That night I went to my boyfriend's. It was really cosy at his place
because he had put on the fireplace. He recently got a new cat and it's so little,
so I played with the kitten for a very long time. It wasn't planned, but I stayed
over, because my boyfriend din't want to drive me home.
Sunday, I had to get up at 8 o'clock, because my boyfriend had to go to Brussels
with his familiy. That afternoon, I went to the swimming pool because I had to
be there to assisist during a competition. I give swimming lessons on Friday and
this was 'my children's' first competition. They were all very nervous, but
everything worked out well.
average: 256 words
average: 239 words
36.5 words per minute
32.9 words per minute
18
17
Fragmentation
General pause results
Probability of a pause longer than 30ms within and between words
Pause threshold > 2000ms
L1
Example of a Dutch text
Mijn laatste vakantie was naar Tenerife.  Dit was van 16 t.e.m. 21 september...
P-burst
21 p-bursts
112 characters
26.1 seconds
Example of an English text
This weekend  was a long weekend  , because Friday was a holiday  and
P-burst
25 p-bursts
81 characters
22.8 seconds
19
Mariëlle Leijten
Flanders Research Foundation (FWO)
University of Antwerp – [email protected]
≠
=
In general students pause shorter in L1 than in L2. This confirms previous findings.
Luuk Van Waes
University of Antwerp
[email protected]
20
L2
Trainingsschool on Keystroke Logging | Antwerp
Linguistic Analysis
The concept explained
read more: manual Inputlog ~ article(s)
Part of speech tagging and chunking
 Aggregating letter to word level
 Parsing the S-notation
 Enriching process data with linguistic information
There is a man sleeping in an easy chair.
EX V DT NN
V
IN DT JJ NN
(PoS-tags, Lemma’s, chunks, Frequencies, ...)
NP
EX
V
DT NN
V
IN
21
Word classes (Part-of-Speech)
Word classes (Part-of-Speech)
Mean number of words per class (based on two tasks)
Mean pause duration before word classes
L1
25
Mariëlle Leijten
Flanders Research Foundation (FWO)
University of Antwerp – [email protected]
24
DT JJ NN
L2
L1
In general pauses increase by 26% when writing in L2
Luuk Van Waes
University of Antwerp
[email protected]
26
L2
Trainingsschool on Keystroke Logging | Antwerp
Word classes (Part-of-Speech)
Word classes (Part-of-Speech)
Proportional increase of initial word pause for each word class (English versus Dutch)
Mean pause duration before prepositions, pronouns and conjunctions
L1
L2
L1
L2
*
*
*
Students have significant longer pauses before prepositions, pronouns and conjunctions
in English as opposed to Dutch
29
27
Patterns
Patterns
The house
My house
L1
L2
L1
31
Mariëlle Leijten
Flanders Research Foundation (FWO)
University of Antwerp – [email protected]
32
Luuk Van Waes
University of Antwerp
[email protected]
L2
Trainingsschool on Keystroke Logging | Antwerp
Patterns | alzheimer project
Evidence for parallel processing
My house
Taken a presentation by Thierry Olive (Antwerp trainingschool on keystroke logging March 2016)
L1
L2
H
CI
Children
1st clause
H
2nd clause
CI
Adults
1st clause
2nd clause
34
35
Conclusions
Patterns
In the house
L1
L2
1. Students in L2 produce shorter texts, write in shorter bursts
and pause longer within and between words.
2. Students in L2 especially pause longer before pronouns,
preposition and conjunctions.
3. Constituents follow a different pause distribution in
different linguistic contexts (e.g. PREP-ART-N ≠ ART-N)
Take home message:
Diversification of pauses between words are necessary to fully
understand the cognitive effort of text production.
36
Mariëlle Leijten
Flanders Research Foundation (FWO)
University of Antwerp – [email protected]
38
Luuk Van Waes
University of Antwerp
[email protected]
Trainingsschool on Keystroke Logging | Antwerp
More information(@uantwerpen.be)
Mariëlle Leijten
University of Antwerp
Research Foundation – Flanders
Luuk Van Waes
University of Antwerp
www.inputlog.net
39
Mariëlle Leijten
Flanders Research Foundation (FWO)
University of Antwerp – [email protected]
Luuk Van Waes
University of Antwerp
[email protected]