Natural Language Processing >> Machine Translation <<
Transcription
Natural Language Processing >> Machine Translation <<
Natural Language Processing >> Machine Translation << winter / fall 2014/2015 41.4268 Prof. Dr. Bettina Harriehausen-Mühlbauer Univ. of Applied Science, Darmstadt, Germany www.fbi.h-da.de/~harriehausen [email protected] [email protected] content 1 How did it all start ? 2 The human translator 3 Phases and attempts in MT 4 Comparison : human vs. machine translation 5 Quality of MT 6 NEW : Statistical Machine Translation NLP – Machine Translation WS 14/15 2 How did it all start ? 3400 B.C. 2000 B.C. writing systems for graphical writing 196 B.C. 0 400 A.D. today Saint Hironymus century of translations: clay plates stone of Rosette NLP – Machine Translation WS 14/15 • commercialisation • exponentially growing translation volume • globalisation • own theory (translatology) 3 How did it all start ? : How did we / do we translate ? • • • • • • • from 1390 : pencil and paper from 1870 : mechanical type writer 1960 : first main frame computer 1970 : dictation devices approx. 1970: terminology database on mainframe computers approx. 1975: electrical typewriter approx. 1985: PCs with text processing – terminology processing systems (approx. 1987) – spell checker (approx. 1988) – electr. data sources (dictionaries, thesaurus, ...) (approx. 1990) – translation memory (approx. 1991) – first translation tools (approx. 1995) – voice recognition (approx. 1999) NLP – Machine Translation WS 14/15 4 How did it all start ? : How did we / do we translate ? abc Word DUDEN: Das Wörterbuch der medizinischen Fachausdrücke AlphaSoft-Wörterbuch (ASW) DUDEN Oxford Großwörterbuch American Heritage Dictionary Elsevier's Dictionary of .... American Heritage Talking Dictionary EPCollins 5 Language Multi Dictionary AUR LIO Dicionário eletronico Ernst: Wörterbuch der industriellen Technik Berlitz Interpreter Euroglot Compact 2.0 Berlitz Synonyms Euroglot Compact 3.0 CD ROM Bibliothek für DOS Anwender Collins Electronic English Dictionary & Thesaurus Collins COBUILD CD ROM FB WinVokabel Collins On Line v2.2 Collins COBUILD Student´s Dictionary Online CompLEX Concise Oxford Dictionary Electronic Edition Concise Oxford Dictionary and Oxford Thesaurus Computer Desktop Encyclopedia DUDEN Band 1 DUDEN Band 5 Euroglott Professional 2.0 Macintosh European Business WHO'S WHO 1995 Collins COBUILD E-Dict Collins Series 100 v1.1 Euroglott Professional 1.0 - 3.0 German Business GlobeDisk Editor Hexaglot Quicktionary II Knaurs elektronisches Lexikon von A bis Z Langenscheidts Euro-Set Version 2.0 Langenscheidts Eurowörterbücher Langenscheidts Fachwörterbücher Langenscheidts Handwörterbücher DUDEN Band 8 NLP – Machine Translation WS 14/15 ... 5 How did it all start ? : How did we / do we translate ? terminology processing systems AISYTERM Term Base for Windows CATS® 5.3 TermISys 1.0 EDIPOLE TERM Manager V2.2e KeyTerm TermStar Professional 3.0 LexiGraf 2.1 TermStar Viewstation 3.0 LingTools TermStar Workstation 3.0 MoBiDic Term Tools MTX v2.20 TermTracer MultiTerm 2.0 TMS MultiTerm '95 Plus for Windows TransDict MultiTerm for Windows Lite Edition TransLexis polyglott TWIN (Term Base for Windows) V. 1.0 Superlex Vocabulator 1.5 Superlex for Windows 1.0 WebTerm 2.5 SystemQuirk WhoTerm Termbase NLP – Machine Translation WS 14/15 6 How did it all start ? : How did we / do we translate ? • • • • • • • from 1390 : pencil and paper from 1870 : mechanical type writer 1960 : first main frame computer 1970 : dictation devices approx. 1970: terminology database on mainframe computers approx. 1975: electrical typewriter approx. 1985: PCs with text processing – terminology processing systems (approx. 1987) – spell checker (approx. 1988) – electr. data sources (dictionaries, thesaurus, ...) (approx. 1990) – translation memory (approx. 1991) – first translation tools (approx. 1995) – voice recognition (approx. 1999) NLP – Machine Translation WS 14/15 7 How did it all start ? : How did we / do we translate ? Accent Special Edition 2.0 Automatic Translation Tools AMPAR ARIANE ATLAS ATLAS/Win CULT Der Übersetzungsprofi Eurobrief 3.1 EUROLANG EUREKA research project Expertrad GerRus EZ Japanese Writer FB Translator 4.13 FB Translator 4.13 Profi Version FRAP HICATS Globalink Power Translator für Windows 2.0 Globalink Power Translator Professional Globalink Web-Translator 1.1 GlobeDisk Translation Assistant Hexaglot Quicktionary II IKARUS Translator Pro KANT Langenscheidts T1 Standard 3.0 Langenscheidts T1 Plus 3.0 Langenscheidts T1 Professional 3.0 Language Assistant LMT LOGOS Multilingual Document Translation Software METAL MZ Translator MZ WIN-Translator PC-Transer PC Translator PENSEE Personal Translator PT 2008 Home Personal Translator PT 2008 Home Französisch Personal Translator PT 2008 Office NLP – Machine Translation WS 14/15 ... 8 content 1 How did it all start ? 2 The human translator 3 Phases and attempts in MT 4 Comparison : human vs. machine translation 5 Quality of MT NLP – Machine Translation WS 14/15 9 The human translator : human translation speed facts medium • • • • • • pages/day pencil + paper (without text processing) electr. type writer dictate PC (with textprocessing and terminology processing) PC with translation memory PC with fulltext-machine translation 5-6 5-10 10-20 5-20 5-200 (without post-editing) 500-1000 „endless garbage production“ NLP – Machine Translation WS 14/15 10 The human translator : the translator today main task: transfer of information between various cultures • sensible use of the native language • active and/or passive competence of 1+ foreign languages • textrelevant knowledge (cultural competence) and the ability to acquire such additional knowledge via additional material (e.g. online) • efficient use of MT technology (digitalize material, text preprocessing, information research, terminology databases, online information services,...machine translation) • efficient use and knowledge of work-/project procedures (e.g. DIN norm) • using / adhering to professional quality measurements and guidelines NLP – Machine Translation WS 14/15 11 http://www.zeit.de/1991/38/schragen-mittuecken NLP – Machine Translation WS 14/15 12 quality of machine translation Alltag einer Übersetzerin Schragen mit Tücken. Ich arbeite als Industrieübersetzerin. Ich bin Philologin, verstehe also etwas von Syntax und unregelmäßigen Deklinationen. Im Russischen. Aber das hilft leider auch nicht immer. Denn ich übersetze Technik. Und die hat ihre Tücken. Da war zum Beispiel der Text über Hängeschleifenförderer von der Textilfirma. Na schön, wir hatten schon viele Förderer, warum nicht diesmal einen Hängeschleifenförderer? Zuerst will der Text verstanden sein. Also, ein Hängeschleifenförderer ist entweder ein Förderer für Hängeschleifen oder ein Förderer in Form einer Hängeschleife. Oder, Variante der zweiten Lesart: in Form mehrerer Hängeschleifen. NLP – Machine Translation WS 14/15 13 quality of machine translation Result of German-English translation; PT 1998 Warum halte ich mich schließlich auch nicht an den gern erteilten Rat technisch bewanderter Menschen auf meine Verständnisfragen: "Machen Sie es doch nicht so kompliziert! Übersetzen Sie einfach wörtlich!!" Weekday of a translator. Trestle with malices. I work as industry translator. I am philologist, well understand something of syntax and irregular declinations. By the Russian. However, this doesn't always help either, unfortunately. Because I translate technology. And this one has her malices. The text was for example over hanging bow sponsors of the textile company there. Had beautiful, we already many sponsors, why not this time one hanging bow sponsor, for Na? The text claims to be understood first. Well, a hanging bow sponsor is either a sponsor for hanging bows or a sponsor in form of a hanging bow. Variant of the second version or: In form of several hanging bows. NLP – Machine Translation WS 14/15 14 quality of machine translation : ambiguities the ambiguities of this text are hidden in the compounds example : Hängeschleifenförderer (a) a Förderer for Hängeschleifen (b) a Förderer in form of a Hängeschleife (c) a Förderer in form of many Hängeschleifen Förderer = sponsor, patron, conveyor belt Hänge- = hanging Schleife = bow, loop, ribbon NLP – Machine Translation WS 14/15 15 content 1 How did it all start ? 2 The human translator 3 Phases and attempts in MT 4 Comparison : human vs. machine translation 5 Quality of MT NLP – Machine Translation WS 14/15 16 phases and attempts in MT phase I (1949 until mid 60ies) – 1949 „Translation“ memo by Warren Weaver – 1954 Georgetown-IBM-experiment – until 1966: Euphoria...many MT-projects technology – unreliable, expensive and weak (no memory) computers, additional hardware necessary – no programming experience in NLP – linguistics: descriptive, hardly any formal basis MT-scientists – mathematicians, electrical engineers, computer specialists – no translators, translation is not a science, no linguists problems – inadequate word-to-word-translation – syntax: no basis – semantics: no basis NLP – Machine Translation WS 14/15 17 phases and attempts in MT phase II (mid 60ies until mid 70ies) 1966 ALPAC-report recommendations of the report: – support research in computational linguistics – enhancement of the quality of present translation (techniques) – no further support for MT results – stop / freeze public MT support -> stagnation (USA) – but also: shift interest to basic research in computational linguistics sample for MT systems: – SYSTRAN (-> US Air Force) - start 63/64 – METAL (Univ. of Texas) - 79/84 – TAUM (Projet de Traduction Automatique de l‘Université de Montréal) - start 65 – CETA (Univ Grenoble) - start 71 – LOGOS - 64 – Susy - 67 NLP – Machine Translation WS 14/15 18 phases and attempts in MT phase III (mid 70ies until mid 80ies): „restauration period“ technology – – – – better computers (price, quality, availability, ...) growing memory capacity special software (e.g. for text processing...) user friendly results – – – – – linguists manage MT groups (Europe) again: governmental support in the USA rapid development in Japan first computerlinguistic MT-model formal syntax in MT samples for MT systems – GETA (Grenoble) - 71 – EUROTRA (EG) - 78... – ... NLP – Machine Translation WS 14/15 19 phases and attempts in MT phase IV (since early 80ies) facts – growing memory – logic programming, A I – computerlinguistics AI model in MT – represent and process syntactic-functional, semantic-referential, argumentative etc. knowledge; textspecific knowledge and background knowledge / world knowledge – going beyond sentence level (e.g. question of reference !) – transfer component becomes more relevant: integration of translation specialists and „translation science“ – neuronal networks (multi processors should speed up the translation process) samples for MT systems – KBMT (Carnegie Mellon University) – LMT (IBM) -> PT (Personal Translator) -> base for IBM‘s MTtechnology NLP – Machine Translation WS 14/15 20 phases and attempts in MT : types of MT systems three models for the translation process • model of a direct translation – language pair dependent, complex system, very rigid, not universally applicable • Interlingua-model – „indirect“ translation, intermediary (formal, artificial) universal language (eg. Esperanto) • transfer-model – 3-stage translation process: analysis - transfer - synthesis/generation universal spec. universal language pair specif. rules „Black Box“ NLP – Machine Translation WS 14/15 21 content 1 How did it all start ? 2 The human translator 3 Phases and attempts in MT 4 Comparison : human vs. machine translation 5 Quality of MT NLP – Machine Translation WS 14/15 22 Comparison : human vs. machine translation traditional (human) translation + perfect results – even for idiomatic, stylistically demanding texts + cultural, textspecific and pragmatic variants can be considered during the translation process - quality is intensive reg. time and costs ! NLP – Machine Translation WS 14/15 23 Comparison : human vs. machine translation machine translation + + + universally available fast and inexpensive revolution of modern look-up possibilities (online spellchecking, mono-lingual dictionaries/thesauri: encyclopedias, dictionaries, bi-lingual dictionaries, www,...) richness of information at the click of the mouse! + - consistency of terminology varying translation quality (high expectations reg. the postediting work of the human translator) NLP – Machine Translation WS 14/15 24 let‘s be honest... ...when someone is startled about a translation, it‘s only because it‘s a bad one – we never recognize good translations as being translations (let alone „machine translations“). Use bottom cushion for floatation -> Benutzen Sie das untere Kissen für das Flottmachen „Use the lower cushion to speed up.“ The generation of clear/unambiguous texts is a serious art ! • errors in documentations result in consecutive errors (vicious circle) • errors can be tiresome but also deadly...in all cases they have to be avoided ! NLP – Machine Translation WS 14/15 25 Samples of ambiguities • "Kürzlich erst hatte sie den Drucker eingestellt. • a) Jetzt kündigte er schon wieder. • b) Jetzt war er schon wieder defekt. • It was only recently that she had hired/adjusted the printer. • a) Now he already dismissed. • b) Now it already was defective again. NLP – Machine Translation WS 14/15 26 Comparison of MT software Newspaper article Graffiti, with its raw and defiant nature, still has the ability to raise eyebrows, but as street art becomes mainstream and the quality of the work increases, the debate on its legitimacy as an art form is finally beginning to settle. (Graffiti – Art or Vandalism? - DER OBSIDIAN) NLP – Machine Translation WS 14/15 27 Comparison of MT software Human translation: Graffiti kann in seiner rauen und trotzigen Art immer noch Stirnrunzeln verursachen, aber indem Street Art sich etabliert und die Qualität der Arbeit besser wird, hört schließlich auch die Debatte auf, ob Graffiti überhaupt eine Kunstform ist. NLP – Machine Translation WS 14/15 28 Comparison of MT software Linguatec: Graffiti hat mit seiner rohen und herausfordernden Natur noch die Fähigkeit, Augenbrauen anzuheben, aber als Straße wird Kunst konventionell, und die Qualität der Arbeit nimmt zu, die Debatte über seine Rechtmäßigkeit als eine Kunstform beginnt schließlich, sich zu legen. NLP – Machine Translation WS 14/15 29 Comparison of MT software PROMT: Graffiti, mit seiner rohen und aufsässigen Natur, sind noch in der Lage, Augenbrauen zu erheben, aber weil Straßenkunst Hauptströmung und die Qualität der Arbeitszunahmen, der Debatte über seine Gesetzmäßigkeit wird, weil eine Kunstform schließlich beginnt sich niederzulassen. NLP – Machine Translation WS 14/15 30 Comparison of MT software SYSTRAN: Graffiti, mit seiner rohen und aufsässigen Natur, haben noch die Fähigkeit, Augenbrauen hochzuziehen, aber, während Straßenkunst Mainstream wird und die Qualität der Arbeit sich erhöht, die Debatte auf seiner Legitimität, wie eine Kunstform schließlich anfängt zu vereinbaren. NLP – Machine Translation WS 14/15 31 Comparison of MT software TRIDENT: Aufschriften an Wänden, mit seiniger feuchter und aufreizender Natur, all noch hat Fähigkeit abliefert Augenbrauen, aber als Straßekunst hinstellt hauptsächliche Richtung und Qualität des Arbeitens ansteigt, Debatten an seiniger Gesetzlichkeit als Kunst bilden endgültig anfängt, zu bereinigen. NLP – Machine Translation WS 14/15 32 Comparison of MT software lingenio: Graffiti hat mit ihrer rohen und herausfordernden Natur noch die Fähigkeit, Augenbrauen anzuheben, aber, wie Straßenkunst konventionell wird und die Qualität der Arbeit zunimmt, beginnt die Debatte über ihre Rechtmäßigkeit als eine Kunstform schließlich, sich zu legen. NLP – Machine Translation WS 14/15 33 Comparison of MT software SkyCode: Wandschmierereien, mit sein Sämtlich und trotzig Vehement, dennoch hat die Befähigung hochhebt die Augenbrauen, aber wie Straßeskunst wird zu Mainstream und die Farbqualität von die Arbeit steigt, die Debatte auf seine Legitimität als eine Vorlagesform ist ein endlich aufbrechen entscheidet. NLP – Machine Translation WS 14/15 34 criteria that influence the quality of machine translation • algorithms (MT engine) • lexical information (glossaries...) • quality of the source text (e.g. e-mail !) • post editing NLP – Machine Translation WS 14/15 35 criteria that influence the quality of machine translation quality of the source text (e.g. e-mail !) Fallebeschreibung Tobias und Informatiker und entwickelt Sofware für Roboter. Tobias arbeitet an einem Projekt der von Juliane koordiniert ist . Durch seine Intersse an seinem Abeitsgebiet lernt er James kennen, ein anderer Informatiker der roboter entwichkelt. Der Ideenaustausch zwischen Tobias und James Funktioniert hervorragen, und beide dank der Hilfe von Juliane arbeiten intensiv zusammen NLP – Machine Translation WS 14/15 36 Fallebeschreibung Tobias und Informatiker und entwickelt Sofware für Roboter. Tobias arbeitet an einem Projekt der von Juliane koordiniert ist . Durch seine Intersse an seinem Abeitsgebiet lernt er James kennen, ein anderer Informatiker der roboter entwichkelt. Der Ideenaustausch zwischen Tobias und James Funktioniert hervorragen, und beide dank der Hilfe von Juliane arbeiten intensiv zusammen Falling description Tobias and computer scientist and software develops for robots. Tobias works this one is coordinated by Juliane at a project. He gets to know interest in its field of work for James, another computer scientist of the robots develops through his. The exchange of ideas between Tobias and James works jut out, and both cooperate intensively thanks to the help of Juliane NLP – Machine Translation WS 14/15 37 criteria that influence the quality of machine translation Which are the current challenges ? source language target language words – syntax – semantics <-> semantics – syntax - words (easy (?)) examples NLP – Machine Translation WS 14/15 38 current quality of open-source MT http://www.worldlingo.com/products_services/worldlingo_translator.html Original (German): Der Hund jagt die Katze. Translation (English): The dog hunts the cat. Back-Translation (German): Der Hund jagt die Katze. Original (German): Der Hund jagt die Katze. Translation (Farsi): مندلسونhund jagt katze بميرد. Back-Translation (German): Mendelssohn hund jagt katze die. Original (German): Der Hund jagt die Katze. Translation (French): Le chien chasse le chat. Back-Translation (German): Der Hund verjagt die Katze. NLP – Machine Translation WS 14/15 39 quality of machine translation „tricky“ examples : idiomatic expressions There are idiomatic expressions that can directy (word-by-word) be translated into other languages: Busenfreund bosom friend Others not .... If Bill kicks the bucket, her children will be rich. *Wenn Bill den Eimer tritt... (???) Wenn Bill den Löffel abgibt...(!!!) NLP – Machine Translation WS 14/15 40 Original (German): Du hast wohl nicht alle Tassen im Schrank ! Translation (English): You do not have probably all cups in the cabinet! Back-Translation (German): Sie haben nicht vermutlich alle Schalen im Schrank! Translation (Farsi): !شما هم که احتماال در كابينه جامهاي Back-Translation (German): Im diesem Schrank Sie vermutlich Schalen. Translation (French): Tu n'as pas probablement toutes les tasses dans le coffret ! Back-Translation (German): Du hast nicht wahrscheinlich alle Tassen im Kasten! NLP – Machine Translation WS 14/15 41 current quality of open-source MT http://www.foreignword.com/tools/transnow.htm Free online translation service with Reverso translation solutions Original (English): Translation (German): he kicked the bucket. er kickte den Eimer. BackTranslation (English): he(it) kicked the bucket. NLP – Machine Translation WS 14/15 42 current quality of open-source MT http://www.foreignword.com/tools/transnow.htm kick 1 n 1(3) a (=act of kicking) Tritt m , Stoß m , Kick m inf to take a kick at sb/sth nach jdm/etw treten to give sth a kick einer Sache ( dat ) einen Tritt versetzen he gave the ball a tremendous kick er trat mit Wucht gegen den Ball a tremendous kick by Beckenbauer ein toller Schuss von Beckenbauer to get a kick on the leg einen Tritt ans Bein bekommen, gegen das or ans Bein getreten werden what he needs is a good kick up the backside or in the pants inf er braucht mal einen kräftigen Tritt in den Hintern inf b inf (=thrill) she gets a kick out of it es macht ihr einen Riesenspaß inf , (physically) sie verspürt einen Kitzel dabei to do sth for kicks etw zum Spaß or Jux inf or Fez inf tun just for kicks nur aus Jux und Tollerei inf how do you get your kicks? was machen Sie zu ihrem Vergnügen? NLP – Machine Translation WS 14/15 43 current quality of open-source MT http://www.foreignword.com/tools/transnow.htm c no pl inf (=power to stimulate) Feuer nt , Pep m inf 2(3) this drink hasn't much kick in it dieses Getränk ist ziemlich zahm inf he has plenty of kick left in him er hat immer noch viel Pep inf d [+of gun] Rückstoß m 2 vi [person] treten (=struggle) um sich treten [baby, while sleeping] strampeln [animal] austreten, ausschlagen [dancer] das Bein hochwerfen [gun] zurückstoßen or -schlagen, Rückstoß haben inf [engine] stottern inf kicking and screaming fig unter großem Protest he kicked into third inf er ging in den dritten (Gang) 3 NLP – Machine Translation WS 14/15 44 current quality of open-source MT http://www.foreignword.com/tools/transnow.htm 3 vt 3(3) a (person, horse) [sb] treten, einen Tritt versetzen ( +dat ) [door] treten gegen [football] kicken inf [object] einen Tritt versetzen ( +dat ), mit dem Fuß stoßen to kick sb's backside jdn in den Hintern treten to kick sb in the head/stomach jdm gegen den Kopf/in den Bauch treten to kick sb in the teeth fig jdn vor den Kopf stoßen inf to kick a goal ein Tor schießen to kick one's legs in the air die Beine in die Luft werfen to kick the bucket inf abkratzen inf , ins Gras beißen inf I could have kicked myself inf ich hätte mich ohrfeigen können, ich hätte mir in den Hintern beißen können inf b inf (=stop) to kick heroin vom Heroin runterkommen inf to kick the habit es sich ( dat ) abgewöhnen NLP – Machine Translation WS 14/15 45 quality of machine translation : ambiguities lexically The pipe was brand new. structurally I saw the man with the telescope. Who is holding the telescope ? Er erschlug den Mann mit dem Apfel. Who is holding the apple ? They are flying planes. They are riding horses. They were milking cows. Running lights can be hazardous. They were inspiring musicians. deep structure She got ready for the picture. semantically Bob wants to marry an Italian. pragmatically When he went from the gate to the house, it collapsed. NLP – Machine Translation WS 14/15 Oil or smoke ? Who is taking the picture? Any Italian woman or is his fiancé an Italian. What collapsed? 46 quality of machine translation : ambiguities The crane flew over the plain. The builder operated the crane. (crane = Kranich; Kran) She's a curious person. (curious = neugierig; kurios) Do you know what happened? Do you know this man? NLP – Machine Translation WS 14/15 47 quality of machine translation : ambiguities lexically The pipe was brand new. Oil or smoke ? structurally I saw the man with the telescope. Who is holding the telescope ? Er erschlug den Mann mit dem Apfel. Who is holding the apple ? They are flying planes. They are riding horses. They were milking cows. Running lights can be hazardous. They were inspiring musicians. deep structure She got ready for the picture. Who is taking the picture? semantically Bob wants to marry an Italian. Any Italian woman or is his fiancé an Italian. pragmatically When he went from the gate to the house, it collapsed. NLP – Machine Translation WS 14/15 What collapsed? 48 • • • • I saw the man with the telescope. They are riding horses. They are eating apples. They are diving people. PT 2008 • • • • Ich sah den Mann mit dem Teleskop. Sie reiten Pferde. Sie sind Essäpfel. Sie tauchen Leute. NLP – Machine Translation WS 14/15 49 quality of machine translation : ambiguities lexically The pipe was brand new. Oil or smoke ? structurally I saw the man with the telescope. Who is holding the telescope? Er erschlug den Mann mit dem Apfel. They are flying planes. They are riding horses. They were milking cows. Running lights can be hazardous. They were inspiring musicians. deep structure She got ready for the picture. Who is taking the picture? semantically Bob wants to marry an Italian. Any Italian woman or is his fiancé an Italian. pragmatically When he went from the gate to the house, it collapsed. NLP – Machine Translation WS 14/15 What collapsed? 50 quality of machine translation : ambiguities We encounter referential ambiguities when an object can refer to more than one referent (or: reference object): eg: Put the paper in the printer. Then switch it on. Schröder trat Fischer in die Waden. Der Staatsmann fand das gar nicht lustig. In den amerikanischen Nationalparks gibt es viele Plumpsklos. Dort kann man sich wunderbar entspannen. NLP – Machine Translation WS 14/15 51 quality of machine translation : references / reference resolution (1) Bob put the suitcase onto the table. ??? It fell down, because it was crooked. (2) The trolley carried the food into the car. ??? It collapsed because it was heavy. NLP – Machine Translation WS 14/15 52 quality of machine translation : references / reference resolution / semantical analysis of anaphora (A) syntactic filtering (e.g. gender): The man is next to the table. Is he big ? Is it big ? (B) semantic selectional restrictions: Our porter never walks without a dog. Does he carry a gun ? Does he often bark ? (C) world knowledge / expectational pattern: Dona wanted to go to the disco, but her mother said she was too young. Dona wanted to go to the zoo, but her mother said she didn‘t have money for it. NLP – Machine Translation WS 14/15 53 quality of machine translation : references / reference resolution / semantical analysis of anaphora (3) The processor is a new invention. It has ... (pronominal reference - here: ambiguous; in German: unambiguous bec. of different gender) (4) The mayor and the headmaster....The latter.... (unambiguous) (5) Bob went home. The poor boy.... (NP-paraphrase - here: unambiguous) (6) Bob wants to become a pianist. He thought it was such a nice instrument. (hidden anaphora- here: unambiguous) (7) The network broke down. It caused the loss of data. (sentence pronounhere: unambiguous) (8) She hit him. One has a nervous breakdown every now and then. (oneAnaphora) (9) It is raining. (pronoun without reference) (10) Bob is ordering a pizza. Emma does the same. (proverbs) (11) In 1966 ELIZA was developed. A little later... (time adverbs) NLP – Machine Translation WS 14/15 54 quality of machine translation : ambiguities • Ambiguities need to be solved. • Assume you have a sentence with 4 words and each word has 2 readings, then we have 2*2*2*2 different combinations – but only 1 is correct NLP – Machine Translation WS 14/15 55 quality of machine translation : lexical hole Definition When a word from the source language doesn‘t have a corresponding word in the target language, but only a paraphrase, we speak of a lexical hole. Je l'ignore. (I don't know.) NLP – Machine Translation WS 14/15 56 quality of machine translation : sky heaven lexical hole Himmel NLP – Machine Translation WS 14/15 57 quality of machine translation : lexical hole Eskimos (Inuit) and snow: many various forms / phrases Samples from Kalaallisut (Greenlandic): 1.‘sea-ice’ — siku (in plural = drift ice) 2.‘pack-ice/large expanses of ice in motion’ — sikursuit, pl. (compacted drift ice/ice field = sikut iqimaniri) 3.‘new ice’ — sikuliaq/sikurlaaq (solid ice cover = nutaaq) 4.‘thin ice’ — sikuaq (in plural = thin ice floes) 5.‘rotten (melting) ice floe’ — sikurluk 6.‘iceberg’ — iluliaq (ilulisap itsirnga = part of iceberg below waterline) 7.‘(piece of) fresh-water ice’ — nilak 8.‘lumps of ice stranded on the beach' — issinnirit, pl. 9.‘glacier’ (also ice forming on objects) — sirmiq (sirmirsuaq = inland ice) 10.‘snow blown in (e.g. doorway)’ — sullarniq 11.‘rime/hoar-frost’ — qaqurnak/kanirniq/kaniq 12.‘frost (on inner surface of e.g. window)’ — iluq NLP – Machine Translation WS 14/15 58 quality of machine translation : lexical hole Eskimos (Inuit) and snow: many various forms / phrases Samples from Kalaallisut (Greenlandic): 13. ‘icy mist’ — pujurak/pujuq kanirnartuq 14. ‘hail’ — nataqqurnat 15. ‘snow (on ground)’ — aput (aput sisurtuq = avalanche) 16. ‘slush (on ground)’ — aput masannartuq 17. ‘snow in air/falling’ — qaniit (qanik = snowflake) 18. ‘air thick with snow’ — nittaalaq (nittaallat, pl. = snowflakes; nittaalaq nalliuttiqattaartuq = flurries) 19. ‘hard grains of snow’ — nittaalaaqqat, pl. 20. ‘feathery clumps of falling snow’ — qanipalaat 21. ‘new fallen snow’ — apirlaat 22. ‘snow crust’ — pukak 23. ‘snowy weather’ — qannirsuq/nittaatsuq 24. ‘snowstorm’ — pirsuq/pirsirsursuaq NLP – Machine Translation WS 14/15 59 quality of machine translation : lexical hole Eskimos (Inuit) and snow: many various forms / phrases Samples from Kalaallisut (Greenlandic): 25. ‘large ice floe’ — iluitsuq 26. ‘snowdrift’ — apusiniq 27. ‘ice floe’ — puttaaq 28. ‘hummocked ice/pressure ridges in pack ice’ — maniillat/ingunirit, pl. 29. ‘drifting lump of ice’ — kassuq (dirty lump of glacier-calved ice = anarluk) 30. ‘ice-foot (left adhering to shore)’ — qaannuq 31. ‘icicle’ — kusugaq 32. ‘opening in sea ice imarnirsaq/ammaniq (open water amidst ice = imaviaq) 33. ‘lead (navigable fissure) in sea ice’ — quppaq 34. ‘rotten snow/slush on sea’ — qinuq 35. ‘wet snow falling’ — imalik 36. ‘rotten ice with streams forming’ — aakkarniq 37. ‘snow patch (on mountain, etc.)’ — aputitaq NLP – Machine Translation WS 14/15 60 quality of machine translation : lexical hole Eskimos (Inuit) and snow: many various forms / phrases Samples from Kalaallisut (Greenlandic): 38. ‘wet snow on top of ice’ — putsinniq/puvvinniq 39. ‘smooth stretch of ice’ — manirak (stretch of snow-free ice = quasaliaq) 40. ‘lump of old ice frozen into new ice’ — tuaq 41. ‘new ice formed in crack in old ice’ — nutarniq 42. ‘bits of floating ice’ — naggutit, pl. 43. ‘hard snow’ — mangiggal/mangikaajaaq 44. ‘small ice floe (not large enough to stand on)’ — masaaraq 45. ‘ice swelling over partially frozen river, etc. from water seeping up to the surface’ — siirsinniq 46. ‘piled-up ice-floes frozen together’ — tiggunnirit 47. ‘mountain peak sticking up through inland ice’ — nunataq 48. ‘calved ice (from end of glacier)’ — uukkarnit 49. ‘edge of the (sea) ice’ — sinaaq NLP – Machine Translation WS 14/15 61 quality of machine translation : English lexical hole Welsh geyrrd green blue glas gray brown llwyd NLP – Machine Translation WS 14/15 62 Near to the speaker Nearer to the speaker Near to the hearer and speaker Away from the speaker Away & close to the hearer Same distance from the speaker and hearer Away from the speaker and hearer Away from the speaker and hearer and visible Away from the speaker and hearer and invisible Deictic expressions in 5 languages NLP – Machine Translation WS 14/15 63 Number in Bayso (Ethiopia) and German NLP – Machine Translation WS 14/15 64 quality of machine translation : overlap Jurafsky & Martin 2000, 806 NLP – Machine Translation WS 14/15 65 quality of machine translation : structural differences • Sam likes to swim. • Sam schwimmt gerne. „... a structural mismatch occurs where two languages use the same construction for different purposes, or use different constructions for what appears to be the same purpose“ (Arnold 1994, 110). Arnold, Douglas [et. al.] 1994: Machine Translation: An Introductory Guide, London: NCC Blackwell, im Netz unter: http://www.essex.ac.uk/linguistics/clmt/MTbook/PostScript/ NLP – Machine Translation WS 14/15 66 quality of machine translation : structural differences • Multiple words for 1 word Zeitmangel erschwert das Problem. Lack of time makes more difficult the problem. Correct: Lack of time makes the problem more difficult. MT: Time makes the problem. NLP – Machine Translation WS 14/15 67 quality of machine translation : structural differences • Phrasal translation Eine Diskussion erübrigt sich demnach. A discussion is made unneccessary itself therefore. Correct: Therefore, there is no point in a discussion. MT: A debate turned therefore. NLP – Machine Translation WS 14/15 68 quality of machine translation : structural differences • Syntactic transformations Das ist der Sache nicht angemessen. That is the matter not appropriate. Correct: That is not appropriate for this matter. MT: That is the thing is not appropriate. Den Vorschlag lehnt die Kommission ab. The proposal rejects the commission off. Correct: The commission rejects the proposal. MT: The proposal rejects the commission. NLP – Machine Translation WS 14/15 69 quality of machine translation : structural differences But we may discuss whether some problems are rather lexical or syntactical: Er heißt Sam. His name is Sam. Il s'appelle Sam. NLP – Machine Translation WS 14/15 70 quality of machine translation : structural differences The translation of linguistic features that the languages don‘t share, is extra problematic: Mein Zug fährt um 8.30 Uhr. Mein Zug fährt gerade ab. The adverbial („gerade“) has to be translated into English by using the progressive aspect of the verb: My train leaves at 8.30. My train is leaving. NLP – Machine Translation WS 14/15 71 quality of machine translation : structural differences Look at the mountains back there. (dort hinten)…over there NLP – Machine Translation WS 14/15 72 quality of machine translation : collocations Collocations refer to the parallel occurence of words in a specific context: Die Butter ist ranzig/*sauer. The butter is rancid/*sour. Die Milch ist sauer/*ranzig. The milk is sour/*rancid. ein starker Raucher a heavy smoker un grand fumeur butter is rancid BUT milk is sour a strong smoker / a heavy smoker fast food BUT a quick meal a fast train BUT a quick shower NLP – Machine Translation WS 14/15 73 quality of machine translation : garden path sentences 1. The horse raced past the barn fell. 2. The old man the boat. 3. The cotton clothing is usually made of grows in Mississippi. 4. Until the police arrest the drug dealers control the street. 5. The man who hunts ducks out on weekends. 6. When Fred eats food gets thrown. 7. gave the child the dog bit a bandaid. 8. The girl told the story cried. 9. I convinced her children are noisy. 10. The prime number few. 11. I know the words to that song about the queen don't rhyme. 12. She told me a little white lie will come back to haunt me. 13. Fat people eat accumulates. 14.The raft floated down the river sank. 15. We painted the wall with cracks. NLP – Machine Translation WS 14/15 74 quality of machine translation : garden path sentences 1. The horse (which was) raced past the barn, fell (down). 2. The old (people) man the boat. 3. The cotton (that) clothing is usually made of grows in Mississippi. 4. Until the police (make the) arrest, the drug dealers control the street. 5. The man, who hunts (animals), ducks out on weekends. 6. When Fred eats (his dinner) food gets thrown. 7. Mary gave the child (that) the dog bit a bandaid. 8. The girl (who was) told the story, cried. 9. I convinced her (that) children are noisy. 10. The prime (people) number few. 11. I know (that) the words to that song about the queen don't rhyme. 12. She told me (that) a little white lie will come back to haunt me. 13. (The) fat (that) people eat accumulates (in their bodies). 14.The raft (that was) floated down the river, sank. 15. We painted the wall (that was covered) with cracks. NLP – Machine Translation WS 14/15 75 quality of machine translation : garden path sentences – local vs. global ambiguity Garden Path sentences normally have local ambiguity. Locally ambiguous: The old train... "Train" could be a noun ("The old train left the station") or a verb ("The old train the young"). Globally ambiguous: I know more beautiful women than Julia Roberts. This could mean "I know women more beautiful than Julia Roberts" or "I know more beautiful women than Julia Roberts does". NLP – Machine Translation WS 14/15 76 quality of machine translation : incorrect machine translations What are the reasons ? (a) lexical ambiguities: pipe, pen (b) phrasal ambiguities / compounds: riding horses (noun vs. ing-form : The beautiful riding horses are in the barn. vs. The children enjoy riding horses. BUT: They are riding horses. ???) (c) syntactic ambiguities ...with the telescope (d) semantic-pragmatic-referential ambiguities (e) opposite translations: to make invalid / to cancel (e.g.ticket) – validez (French = to make valid) (f) ethnic differences: the inuit differentiate (also lexically!) the various forms/types of snow Engl-German: answer/reply -> antworten (verb), Antwort (noun) NLP – Machine Translation WS 14/15 77 quality of machine translation : incorrect machine translations (testrun in the mid 90ies) Subjunctive instead of simple past He could hardly believe he had made the mistake. A: Er konnte kaum glauben, dass er den Fehler gemacht hatte. B: Er könnte kaum glauben, dass er den Fehler gemacht hatte. Medial constructions: Computers sell well. A: Computer lassen sich gut verkaufen. B: Rechner-Verkaufs-Brunnen. „like“: Which books do you like reading? A: Welche Bücher liest du gerne? B: Welche Bücher tun Ihnen wie Lesen? NLP – Machine Translation WS 14/15 78 quality of machine translation : incorrect machine translations (testrun in the mid 90ies) I am sorry that it is raining. A: Ich bedaure, daß es regnet. B: Ich bin eine Verzeihung, daß es regnet. Each character used by the printer and the terminal. A: Jedes vom Drucker und dem Terminal verwendete Zeichen. B: Jeder Charakter, der vom Drucker und dem Endstück benutzt wird. Turn on the computer and follow the instructions. A: Schalte den Computer ein und befolge die Anweisungen. B: Drehen Sie sich auf dem Rechner und folgen Sie den Anweisungen. NLP – Machine Translation WS 14/15 79 quality of machine translation : incorrect machine translations (testrun in the mid 90ies) Infinitival constructions Dem Mann verspreche ich, Karla zu helfen. A: I promise the man to help Karla. B: I, Karla promise the man to help. Extraposition: Den Wagen erlaube ich dem Mann zu kaufen. A: I permit the man to buy the car. B: I allow the cars to buy for the man. Business: Die Inspektion brachte keine äußeren Beschädigungen zum Vorschein. A: The inspection didn‘t reveal any external damages. B: inspection brought no external harm to the pre-appearance. NLP – Machine Translation WS 14/15 80 quality of machine translation : incorrect machine translations (enhancements 2001) enhanced syntactic analysis Vorsicht, das ist ein bissiger Hund. 1998: Caution, this are a vicious dog. 2001: Caution, this is a vicious dog. Er liefert uns das, was wir bestellt haben. 1998: He delivers this what we have ordered to us. 2001: He delivers to us what we have ordered. Dies ist genau das, was wir wollten. 1998: This is exactly this what we wanted. 2001: This is exactly what we wanted. NLP – Machine Translation WS 14/15 81 quality of machine translation : incorrect machine translations (enhancements 2001) Er spricht über das, was er denkt. 1998: He speaks about this what he thinks. 2001: He speaks about what he thinks. Warum hast Du nicht auf das gehört, was ich Dir gesagt habe? 1998: Why haven't you listened to this what I have told you? 2001: Why haven't you listened to what I have told you? NLP – Machine Translation WS 14/15 82 quality of machine translation : incorrect machine translations (enhancements 2001) Nach Erklärungen des Bevollmächtigten der argentinischen Regierung kommen als Einwanderer neben Italienern, Spaniern und Franzosen auch Deutsche in Frage. 1998: After explanations of the assignee of the Argentine government Germans are also considered as immigrant next to Italians, Spaniards and Frenchmen. 2001: According to explanations of the assignee of the Argentine government Germans are also considered as immigrants next to Italians, Spaniards and Frenchmen. NLP – Machine Translation WS 14/15 83 quality of machine translation : incorrect machine translations (enhancements 2001) use of the article Sie ist Mutter und Hausfrau. 1998: She is mother and housewife. 2001: She is a mother and a housewife. Er ist als Techniker beschäftigt. 1998: He is employed as technician. 2001: He works as a technician. NLP – Machine Translation WS 14/15 84 quality of machine translation : incorrect machine translations (enhancements 2001) use of the article Wichtigste Regel bei Winterglätte ist, abrupte Brems- und Lenkmanöver zu vermeiden. 1998: It is most essential rule on icy roads to avoid abrupt braking and steering actions. 2001: It is the most important rule on icy roads to avoid abrupt braking and steering actions. NLP – Machine Translation WS 14/15 85 quality of machine translation : incorrect machine translations (enhancements 2001) word order Ich esse immer mageres Fleisch. 1998: I eat always lean meat. 2001: I always eat lean meat. progressive Er lernt jetzt Altgriechisch. 1998: He studies ancient Greekly now. 2001: He is learning classical Greek now. NLP – Machine Translation WS 14/15 86 quality of machine translation : incorrect machine translations (enhancements 2001) Ich ziehe mich gerade an. 1998: I just get dressed. 2001: I am just getting dressed. present -> present perfect Seine Krankheit ärgert ihn seit Jahren. 1998: His illness annoys him for years. 2001: His illness has bothered him for years. NLP – Machine Translation WS 14/15 87 quality of machine translation : incorrect machine translations (enhancements 2001) translations of idiomatic expressions Das ist nur ein aus der Luft gegriffenes Gerücht. 1998: This is only a rumor gripped from the air. 2001: This is only a rumor unfounded. Sie mussten für immer Abschied nehmen. 1998: You had to take discharge for always. 2001: They had to say goodbye for ever. NLP – Machine Translation WS 14/15 88 quality of machine translation : incorrect machine translations (enhancements 2001) time Es ist halb sechs. 1998: It is half six. 2001: It is half past five. lexical ambiguities: ex: drehen / to turn, to revolve around, to concern Die Erde dreht sich um die Sonne. 1998: The earth concerns the sun. 2001: The earth revolves around the sun. Die behenden Kunstreiter drehten sich in der Arena. 1998: The swift trick riders revolved in the arena. 2001: The swift trick riders turned round in the arena. NLP – Machine Translation WS 14/15 89 quality of machine translation : criteria in detail…lexical information What are the challenges? algorithms lexical information quality of the source text post editing specialist / technical dictionaries ! problem: time => money NLP – Machine Translation WS 14/15 90 quality of machine translation : criteria in detail…lexical information (compounds, idioms, phrases,…) examples: Helmut Kohl Pfannenstielsche Inzision Helmut cabbage new: Helmut Kohl Pan handle shear Inzision new: Pfannenstielsche Inzision Taschendolmetscher Ich glaube mein Schwein pfeift. Bag interpreter I believe my pig whistles. Können Sie mir bitte die Tür aufhalten? Ja, natürlich. Please, can you hold back the door for me? Yes, natural. new: Please can you hold back me the door? Yes, of course. (Would you mind opening the door for me ? No, not at all.) NLP – Machine Translation WS 14/15 91 quality of machine translation : criteria in detail…lexical information (compounds, idioms, phrases,…) Unistar Baumaterialien aus Metall; transportable Bauten aus Metall; Schienenbaumaterial aus Metall; Kabel und Drähte aus Metall (nicht für elektrische Zwecke); Schlosserwaren und Kleineisenwaren; Metallrohre; Waren aus Metall (soweit in Klassen 6 enthalten); Laufschienen und Kurven für Einschienenhängebahnen; EisenbahnOberbaumaterial, insbesondere Weichen, Kreuzungen, Prellböcke, Schienenauszugsvorrichtungen, Drehscheiben sowie zugehöriges Verbindungs- und Befestigungsmaterial; Ortsbrustsicherungen;Werkzeugmaschinen; Maschinen; Motoren sowie Kupplungen und Vorrichtungen zur Kraftübertragung (soweit in Klasse 07 enthalten), einschließlich Antriebseinheiten für Einschienenhängebahnzüge; NLP – Machine Translation WS 14/15 92 quality of machine translation : criteria in detail…lexical information (compounds, idioms, phrases,…) Unistar University starling Baumaterialien aus Metall; building materials of metal; transportable Bauten aus Metall; Transportable buildings of metal; Schienenbaumaterial aus Metall; splinting building material of metal (new: rail building material made of metal; Kabel und Drähte aus Metall (nicht für elektrische Zwecke); Cable and wires from metal (not for electrical purposes); Schlosserwaren und Kleineisenwaren; fitter goods and little girl iron goods (new: Locksmith goods and small iron goods); Laufschienen und Kurven für Einschienenhängebahnen; guide rails and curves for one splinting hanging trains; Eisenbahn-Oberbaumaterial, insbesondere Weichen, Kreuzungen, Prellböcke, Schienenauszugsvorrichtungen, Drehscheiben sowie zugehöriges Verbindungsund Befestigungsmaterial; Eisenbahn-Upper building material, particularly sides intersections, bufferses splinting statement devices, Turntables as well as accompanying connection and fastening material; NLP – Machine Translation WS 14/15 93 quality of machine translation : criteria in detail…lexical information (compounds, idioms, phrases,…) after adding a special glossary (domain specific dictionary): Unistar Unistar Baumaterialien aus Metall; metal building materials; transportable Bauten aus Metall; transportable buildings of metal; Schienenbaumaterial aus Metall; material of metal for railway tracks; Kabel und Drähte aus Metall (nicht für elektrische Zwecke); non-electric cables and wires of common metal; Schlosserwaren und Kleineisenwaren; ironmongery and small items of metal hardware; Laufschienen und Kurven für Einschienenhängebahnen; Running rails and curves for suspended monorail systems; Eisenbahn-Oberbaumaterial, insbesondere Weichen, Kreuzungen, Prellböcke, Schienenauszugsvorrichtungen, Drehscheiben sowie zugehöriges Verbindungs- und Befestigungsmaterial; Railway superstructure material, particularly points crossings, buffers Schienenauszugsvorrichtungen, Turntables as well as accompanying connecting and fixing elements; NLP – Machine Translation WS 14/15 94 quality of machine translation : quality enhancements quality enhancements are possible by: • constantly updating the glossaries / dictionaries /rules • adding special / technical glossaries / dictionaries • adding special phrases / fixed expressions / idiomatic expressions -> adding „typical phrases“ / expressions into the translation memory (machine learning of frequently used phases) NLP – Machine Translation WS 14/15 95 quality of machine translation : source text algorithms lexical information quality of the source post editing general rule: garbage in -> garbage out Whatever a human being is not capable to write, a machine cannot translate ! samples: e-mail, text messages,… quality enhancements possible by: • enhancing / correcting the input texts (spell checking, grammar checker) NLP – Machine Translation WS 14/15 96 quality of machine translation : post editing algorithms lexical information quality of the source post editing (Yet another) challenge for the (human) translator ! As of today, a machine cannot perform an efficient post-editing ! quality enhancements possible by: • expanding / updating the translation memory NLP – Machine Translation WS 14/15 97 SMT (statistical machine translation) A relatively new field. (Statistical machine translation was re-introduced in 1991 by researchers at IBM's Thomas J. Watson Research Center.) An „expansion“ of rule-based MT. In SMT, translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The translation of text from one human language to another by a computer that learned how to translate from vast amounts of translated texts. NLP – Machine Translation WS 14/15 98 SMT (statistical machine translation) The idea behind statistical machine translation comes from information theory. A document is translated according to the probability distribution p(e|f) that a string e in the target language is the translation of a string f in the source language. Future: The next step will be to exploit non-parallel corpora, i.e. use frequencies on the web. NLP – Machine Translation WS 14/15 99 summary: What can we expect a machine to do and how should a human translator deal with the technology? The professional use of MT/MAT only works when: (view of the MT-technology developers) • a clear specification of the content / special vocabulary can be determined • special/technical dictionaries are being developed and added to the basic vocabulary • tools (e.g. for testing) are constantly in use • permanent quality control of all components • permanent expansion and enhancements of the glossaries, dictionaries, and the translation memory, as well as the algorithms • the human translator accepts the electronic medium ! NLP – Machine Translation WS 14/15 100 Last but not least... Language is a barrier as well as a challenge – not only for us human beings but also for the machine NLP – Machine Translation WS 14/15 101