Translation-oriented corpus construction

Transcription

Translation-oriented corpus construction
CONTRASTING POST-EDITING AND
HUMAN TRANSLATION
Oliver Čulo, Jean Nitzke
Universität Mainz
[email protected]
MT @ work, Brussels
December 5th, 2014
CRITT TPR DATABASE

translation process database with key-logging and eye-tracking data

coordinator: Copenhagen Business School

English-German data collection at FTSK in Germersheim

First run: 6 source texts (newspaper) with different complexity
levels, 12 professional translators, 12 semi-professional translators,
translation vs. post-editing vs. monolingual editing

Second run: 6 source texts (3 manuals, 3 package leaflets), 12 semiprofessional translators, translation vs. full post-editing vs. light post editing

MT system: Google Translate

eye-tracking (Tobii TX 300), key-logging (Translog II),
retrospective questionnaires
FIRST RUN: RESEARCH BEHAVIOUR AND
EXEMPLARY OBSERVATIONS
WEBSITE USAGE: PROPORTIONS
Post-Editing
Monolingual Editing
Human Translation
bilingual dictionary
monolingual dictionary
synonyms
machine translation
encyclopedia
encyclopidia
search engine
news
WEBSITE USAGE: TOTALS
Website Use per Task - Status
Website Use per Task
250
400
350
200
300
250
150
Professinals
200
Students
100
150
100
50
50
0
0
Monolinguales PostEditing
Post-Editing
Human Translation
Monolingual PostEditing
Post-Editing
Human Translation
SEMANTIC ERRORS THROUGH ‘BLIND’
(MONOLINGUAL) EDITING
HT(8)
ME(8)
PE(7)
Incorrect translation (Gefährdung)
0
4
0
Incorrect translation (other)
0
2
0
correct
8
2
7
EO: Increasing mobility and technological advances resulted in the
increasing exposure of people to cultures and societies different
from their own.
MT: Zunehmende Mobilität und der technologische Fortschritt
führte zu der zunehmenden Gefährdung
von Personen...
Lit.: `... led to the increasing endangerment of people.`
LACK OF CONSISTENCY (1)
nurse-incons.
HT(7)
ED(7)
PE(8)
0
0
4
EO: Killer nurse receives four life sentences. Hospital nurse C.N. was
imprisoned for life today for the killing of four of his patients.
PE: Killer-Krankenschwester zu viermal lebenslanger Haft verurteilt. Der
Krankenpfleger C.N. wurde heute auf Lebenszeit eingesperrt für die
Tötung von vier seiner Patienten.
‘Killer-nurse.FEM to four times lifetime imprisonment sentenced. The
nurse.MASC C.N. was today on lifetime imprisoned for the killing of four
his.MASC patients.
SECOND RUN: EXEMPLARY OBSERVATIONS
LACK OF CONSISTENCY(2)
dish washer
inconsistency
HT(3)
LPE(3)
FPE(3)
1
3
2
EO: 5x dish washer
MT: 1x Geschirrspülmaschine
2x Geschirrspüler
2x Spülmaschine
LACK OF CONSISTENCY?
EO: Locate sharp items
MT: Suchen Sie scharfer Gegenstände
Look-for you sharp items
HT: Scharfe Gegenstände so positionieren
Sharp items
so position
FPE: Plazieren Sie scharfe Gegenstände so
Place
you sharp items
such-that
HT
polite
P08: 6
imperative P12: 8
(13)
P17: 6
P25: 11
MT
LPE
FPE
11
P10: 12
P16: 12
P22: 12
P30: 12
P09: 12
P14: 11
P21: 12
P29: 11
PRIMING
- strong indicators for syntactic priming in post-editing
(Bangalore et al. submitted)
- indicators for lexical priming
- no. of lexical types (nouns, adjectives, verbs, adverbs)
realised in the second run: HT > FPE > LPE > MT
- further statistical texts based on word alignment
type of translation
no. of lexical types
MT
277
LPE
330
FPE
384
HT
488
Bangalore, Srinivas, Bergljot Behrens, Michael Carl, Maheshwar Gankhot, Arndt Heilmann, Jean Nitzke,
Moritz Schaeffer, Annegret Sturm. submitted. The role of syntactic choices in translation and post-editing.
CONCLUSIONS AND FUTURE WORK
OPEN QUESTIONS
• somehow, translators ‘forget’ about lexical consistency –
cognitive load problem? taking over more than they admit
(or realise)? i.e. lexical priming (besides syntactic priming)?
• post-editing has to be approached and probably taught
differently, but exactly how is a matter of future research
• will productivity gains hold if we make post-edited texts
comparable to human translations?
• if we produce more post-edited material and feed it into
MT systems, will we run into a garbage-in-garbage-out
problem over time?
• Ottmann & Canfora1 propose to make a risk assessment for
every scenario and then to decide whether to send a
translation through an MT or a human process
1
http://tagungen.tekom.de/fileadmin/tx_doccon/slides/
351_Auf_eigenes_Risiko_Wie_Sie_durch_Risikoanalysen_gute_bersetzungen_bekommen.pdf