Translation-oriented corpus construction
Transcription
Translation-oriented corpus construction
CONTRASTING POST-EDITING AND HUMAN TRANSLATION Oliver Čulo, Jean Nitzke Universität Mainz [email protected] MT @ work, Brussels December 5th, 2014 CRITT TPR DATABASE translation process database with key-logging and eye-tracking data coordinator: Copenhagen Business School English-German data collection at FTSK in Germersheim First run: 6 source texts (newspaper) with different complexity levels, 12 professional translators, 12 semi-professional translators, translation vs. post-editing vs. monolingual editing Second run: 6 source texts (3 manuals, 3 package leaflets), 12 semiprofessional translators, translation vs. full post-editing vs. light post editing MT system: Google Translate eye-tracking (Tobii TX 300), key-logging (Translog II), retrospective questionnaires FIRST RUN: RESEARCH BEHAVIOUR AND EXEMPLARY OBSERVATIONS WEBSITE USAGE: PROPORTIONS Post-Editing Monolingual Editing Human Translation bilingual dictionary monolingual dictionary synonyms machine translation encyclopedia encyclopidia search engine news WEBSITE USAGE: TOTALS Website Use per Task - Status Website Use per Task 250 400 350 200 300 250 150 Professinals 200 Students 100 150 100 50 50 0 0 Monolinguales PostEditing Post-Editing Human Translation Monolingual PostEditing Post-Editing Human Translation SEMANTIC ERRORS THROUGH ‘BLIND’ (MONOLINGUAL) EDITING HT(8) ME(8) PE(7) Incorrect translation (Gefährdung) 0 4 0 Incorrect translation (other) 0 2 0 correct 8 2 7 EO: Increasing mobility and technological advances resulted in the increasing exposure of people to cultures and societies different from their own. MT: Zunehmende Mobilität und der technologische Fortschritt führte zu der zunehmenden Gefährdung von Personen... Lit.: `... led to the increasing endangerment of people.` LACK OF CONSISTENCY (1) nurse-incons. HT(7) ED(7) PE(8) 0 0 4 EO: Killer nurse receives four life sentences. Hospital nurse C.N. was imprisoned for life today for the killing of four of his patients. PE: Killer-Krankenschwester zu viermal lebenslanger Haft verurteilt. Der Krankenpfleger C.N. wurde heute auf Lebenszeit eingesperrt für die Tötung von vier seiner Patienten. ‘Killer-nurse.FEM to four times lifetime imprisonment sentenced. The nurse.MASC C.N. was today on lifetime imprisoned for the killing of four his.MASC patients. SECOND RUN: EXEMPLARY OBSERVATIONS LACK OF CONSISTENCY(2) dish washer inconsistency HT(3) LPE(3) FPE(3) 1 3 2 EO: 5x dish washer MT: 1x Geschirrspülmaschine 2x Geschirrspüler 2x Spülmaschine LACK OF CONSISTENCY? EO: Locate sharp items MT: Suchen Sie scharfer Gegenstände Look-for you sharp items HT: Scharfe Gegenstände so positionieren Sharp items so position FPE: Plazieren Sie scharfe Gegenstände so Place you sharp items such-that HT polite P08: 6 imperative P12: 8 (13) P17: 6 P25: 11 MT LPE FPE 11 P10: 12 P16: 12 P22: 12 P30: 12 P09: 12 P14: 11 P21: 12 P29: 11 PRIMING - strong indicators for syntactic priming in post-editing (Bangalore et al. submitted) - indicators for lexical priming - no. of lexical types (nouns, adjectives, verbs, adverbs) realised in the second run: HT > FPE > LPE > MT - further statistical texts based on word alignment type of translation no. of lexical types MT 277 LPE 330 FPE 384 HT 488 Bangalore, Srinivas, Bergljot Behrens, Michael Carl, Maheshwar Gankhot, Arndt Heilmann, Jean Nitzke, Moritz Schaeffer, Annegret Sturm. submitted. The role of syntactic choices in translation and post-editing. CONCLUSIONS AND FUTURE WORK OPEN QUESTIONS • somehow, translators ‘forget’ about lexical consistency – cognitive load problem? taking over more than they admit (or realise)? i.e. lexical priming (besides syntactic priming)? • post-editing has to be approached and probably taught differently, but exactly how is a matter of future research • will productivity gains hold if we make post-edited texts comparable to human translations? • if we produce more post-edited material and feed it into MT systems, will we run into a garbage-in-garbage-out problem over time? • Ottmann & Canfora1 propose to make a risk assessment for every scenario and then to decide whether to send a translation through an MT or a human process 1 http://tagungen.tekom.de/fileadmin/tx_doccon/slides/ 351_Auf_eigenes_Risiko_Wie_Sie_durch_Risikoanalysen_gute_bersetzungen_bekommen.pdf