Natural Language Processing >> Machine Translation <<

Transcription

Natural Language Processing >> Machine Translation <<
Natural Language Processing
>> Machine Translation <<
winter / fall 2014/2015
41.4268
Prof. Dr. Bettina Harriehausen-Mühlbauer
Univ. of Applied Science, Darmstadt, Germany
www.fbi.h-da.de/~harriehausen
[email protected]
[email protected]
content
1
How did it all start ?
2
The human translator
3
Phases and attempts in MT
4
Comparison : human vs. machine translation
5
Quality of MT
6
NEW : Statistical Machine Translation
NLP – Machine Translation
WS 14/15
2
How did it all start ?
3400 B.C.
2000 B.C.
writing systems for
graphical writing
196 B.C.
0
400 A.D.
today
Saint Hironymus
century of translations:
clay plates
stone of Rosette
NLP – Machine Translation
WS 14/15
•
commercialisation
•
exponentially growing
translation volume
•
globalisation
•
own theory
(translatology)
3
How did it all start ? :
How did we / do we translate ?
•
•
•
•
•
•
•
from 1390 : pencil and paper
from 1870 : mechanical type writer
1960
: first main frame computer
1970
: dictation devices
approx. 1970: terminology database on mainframe computers
approx. 1975: electrical typewriter
approx. 1985: PCs with text processing
– terminology processing systems (approx. 1987)
– spell checker (approx. 1988)
– electr. data sources (dictionaries, thesaurus,
...) (approx. 1990)
– translation memory (approx. 1991)
– first translation tools (approx. 1995)
– voice recognition (approx. 1999)
NLP – Machine Translation
WS 14/15
4
How did it all start ? :
How did we / do we translate ?
abc Word
DUDEN: Das Wörterbuch der medizinischen Fachausdrücke
AlphaSoft-Wörterbuch (ASW)
DUDEN Oxford Großwörterbuch
American Heritage Dictionary
Elsevier's Dictionary of ....
American Heritage Talking Dictionary
EPCollins 5 Language Multi Dictionary
AUR LIO Dicionário eletronico
Ernst: Wörterbuch der industriellen Technik
Berlitz Interpreter
Euroglot Compact 2.0
Berlitz Synonyms
Euroglot Compact 3.0
CD ROM Bibliothek für DOS Anwender
Collins Electronic English Dictionary & Thesaurus
Collins COBUILD CD ROM
FB WinVokabel
Collins On Line v2.2
Collins COBUILD Student´s Dictionary Online
CompLEX
Concise Oxford Dictionary Electronic Edition
Concise Oxford Dictionary and Oxford Thesaurus
Computer Desktop Encyclopedia
DUDEN Band 1
DUDEN Band 5
Euroglott Professional 2.0 Macintosh
European Business WHO'S WHO 1995
Collins COBUILD E-Dict
Collins Series 100 v1.1
Euroglott Professional 1.0 - 3.0
German Business
GlobeDisk Editor
Hexaglot Quicktionary II
Knaurs elektronisches Lexikon von A bis Z
Langenscheidts Euro-Set Version 2.0
Langenscheidts Eurowörterbücher
Langenscheidts Fachwörterbücher
Langenscheidts Handwörterbücher
DUDEN Band 8
NLP – Machine Translation
WS 14/15
...
5
How did it all start ? :
How did we / do we translate ?
terminology processing systems
AISYTERM
Term Base for Windows
CATS® 5.3
TermISys 1.0
EDIPOLE
TERM Manager V2.2e
KeyTerm
TermStar Professional 3.0
LexiGraf 2.1
TermStar Viewstation 3.0
LingTools
TermStar Workstation 3.0
MoBiDic
Term Tools
MTX v2.20
TermTracer
MultiTerm 2.0
TMS
MultiTerm '95 Plus for Windows
TransDict
MultiTerm for Windows Lite Edition
TransLexis
polyglott
TWIN (Term Base for Windows) V. 1.0
Superlex
Vocabulator 1.5
Superlex for Windows 1.0
WebTerm 2.5
SystemQuirk
WhoTerm
Termbase
NLP – Machine Translation
WS 14/15
6
How did it all start ? :
How did we / do we translate ?
•
•
•
•
•
•
•
from 1390 : pencil and paper
from 1870 : mechanical type writer
1960
: first main frame computer
1970
: dictation devices
approx. 1970: terminology database on mainframe computers
approx. 1975: electrical typewriter
approx. 1985: PCs with text processing
– terminology processing systems (approx. 1987)
– spell checker (approx. 1988)
– electr. data sources (dictionaries, thesaurus,
...) (approx. 1990)
– translation memory (approx. 1991)
– first translation tools (approx. 1995)
– voice recognition (approx. 1999)
NLP – Machine Translation
WS 14/15
7
How did it all start ? :
How did we / do we translate ?
Accent Special Edition 2.0
Automatic Translation Tools
AMPAR
ARIANE
ATLAS
ATLAS/Win
CULT
Der Übersetzungsprofi
Eurobrief 3.1
EUROLANG EUREKA research project
Expertrad GerRus
EZ Japanese Writer
FB Translator 4.13
FB Translator 4.13 Profi Version
FRAP
HICATS
Globalink Power Translator für Windows 2.0
Globalink Power Translator Professional
Globalink Web-Translator 1.1
GlobeDisk Translation Assistant
Hexaglot Quicktionary II
IKARUS Translator Pro
KANT
Langenscheidts T1 Standard 3.0
Langenscheidts T1 Plus 3.0
Langenscheidts T1 Professional 3.0
Language Assistant
LMT
LOGOS Multilingual Document Translation Software
METAL
MZ Translator
MZ WIN-Translator
PC-Transer
PC Translator
PENSEE
Personal Translator PT 2008 Home
Personal Translator PT 2008 Home Französisch
Personal Translator PT 2008 Office
NLP – Machine Translation
WS 14/15
...
8
content
1
How did it all start ?
2
The human translator
3
Phases and attempts in MT
4
Comparison : human vs. machine translation
5
Quality of MT
NLP – Machine Translation
WS 14/15
9
The human translator :
human translation speed
facts
medium
•
•
•
•
•
•
pages/day
pencil + paper (without text processing)
electr. type writer
dictate
PC (with textprocessing and terminology processing)
PC with translation memory
PC with fulltext-machine translation
5-6
5-10
10-20
5-20
5-200
(without post-editing)
500-1000
„endless garbage production“
NLP – Machine Translation
WS 14/15
10
The human translator :
the translator today
main task:
transfer of information between various cultures
• sensible use of the native language
• active and/or passive competence of 1+ foreign languages
• textrelevant knowledge (cultural competence) and the ability to
acquire such additional knowledge via additional material (e.g. online)
• efficient use of MT technology (digitalize material, text preprocessing,
information research, terminology databases, online information
services,...machine translation)
• efficient use and knowledge of work-/project procedures (e.g. DIN
norm)
• using / adhering to professional quality measurements and guidelines
NLP – Machine Translation
WS 14/15
11
http://www.zeit.de/1991/38/schragen-mittuecken
NLP – Machine Translation
WS 14/15
12
quality of machine translation
Alltag einer Übersetzerin
Schragen mit Tücken.
Ich arbeite als Industrieübersetzerin. Ich bin Philologin, verstehe
also etwas von Syntax und unregelmäßigen Deklinationen. Im
Russischen. Aber das hilft leider auch nicht immer. Denn ich
übersetze Technik. Und die hat ihre Tücken. Da war zum Beispiel
der Text über Hängeschleifenförderer von der Textilfirma. Na
schön, wir hatten schon viele Förderer, warum nicht diesmal einen
Hängeschleifenförderer?
Zuerst will der Text verstanden sein. Also, ein
Hängeschleifenförderer ist entweder ein Förderer für
Hängeschleifen oder ein Förderer in Form einer Hängeschleife.
Oder, Variante der zweiten Lesart: in Form mehrerer
Hängeschleifen.
NLP – Machine Translation
WS 14/15
13
quality of machine translation
Result of German-English translation; PT 1998
Warum halte ich mich schließlich auch nicht an den gern erteilten Rat
technisch bewanderter Menschen auf meine Verständnisfragen: "Machen Sie
es doch nicht so kompliziert! Übersetzen Sie einfach wörtlich!!"
Weekday of a translator.
Trestle with malices.
I work as industry translator. I am philologist, well understand something of
syntax and irregular declinations. By the Russian. However, this doesn't
always help either, unfortunately. Because I translate technology. And this
one has her malices. The text was for example over hanging bow sponsors
of the textile company there. Had beautiful, we already many sponsors, why
not this time one hanging bow sponsor, for Na?
The text claims to be understood first. Well, a hanging bow sponsor is either
a sponsor for hanging bows or a sponsor in form of a hanging bow. Variant of
the second version or: In form of several hanging bows.
NLP – Machine Translation
WS 14/15
14
quality of machine translation : ambiguities
the ambiguities of this text are hidden in the compounds
example : Hängeschleifenförderer
(a) a Förderer for Hängeschleifen
(b) a Förderer in form of a Hängeschleife
(c) a Förderer in form of many Hängeschleifen
Förderer = sponsor,
patron, conveyor
belt
Hänge- = hanging
Schleife = bow, loop,
ribbon
NLP – Machine Translation
WS 14/15
15
content
1
How did it all start ?
2
The human translator
3
Phases and attempts in MT
4
Comparison : human vs. machine translation
5
Quality of MT
NLP – Machine Translation
WS 14/15
16
phases and attempts in MT
phase I (1949 until mid 60ies)
– 1949 „Translation“ memo by Warren Weaver
– 1954 Georgetown-IBM-experiment
– until 1966: Euphoria...many MT-projects
technology
– unreliable, expensive and weak (no memory) computers, additional
hardware necessary
– no programming experience in NLP
– linguistics: descriptive, hardly any formal basis
MT-scientists
– mathematicians, electrical engineers, computer specialists
– no translators, translation is not a science, no linguists
problems
– inadequate word-to-word-translation
– syntax: no basis
– semantics: no basis
NLP – Machine Translation
WS 14/15
17
phases and attempts in MT
phase II (mid 60ies until mid 70ies)
1966 ALPAC-report
recommendations of the report:
– support research in computational linguistics
– enhancement of the quality of present translation (techniques)
– no further support for MT
results
– stop / freeze public MT support -> stagnation (USA)
– but also: shift interest to basic research in computational linguistics
sample for MT systems:
– SYSTRAN (-> US Air Force) - start 63/64
– METAL (Univ. of Texas) - 79/84
– TAUM (Projet de Traduction Automatique de l‘Université de Montréal) - start
65
– CETA (Univ Grenoble) - start 71
– LOGOS - 64
– Susy - 67
NLP – Machine Translation
WS 14/15
18
phases and attempts in MT
phase III (mid 70ies until mid 80ies): „restauration period“
technology
–
–
–
–
better computers (price, quality, availability, ...)
growing memory capacity
special software (e.g. for text processing...)
user friendly
results
–
–
–
–
–
linguists manage MT groups (Europe)
again: governmental support in the USA
rapid development in Japan
first computerlinguistic MT-model
formal syntax in MT
samples for MT systems
– GETA (Grenoble) - 71
– EUROTRA (EG) - 78...
– ...
NLP – Machine Translation
WS 14/15
19
phases and attempts in MT
phase IV (since early 80ies)
facts
– growing memory
– logic programming, A I
– computerlinguistics
AI model in MT
– represent and process syntactic-functional, semantic-referential,
argumentative etc. knowledge; textspecific knowledge and background
knowledge / world knowledge
– going beyond sentence level (e.g. question of reference !)
– transfer component becomes more relevant: integration of translation
specialists and „translation science“
– neuronal networks (multi processors should speed up the translation
process)
samples for MT systems
– KBMT (Carnegie Mellon University)
– LMT (IBM) -> PT (Personal Translator) -> base for IBM‘s MTtechnology
NLP – Machine Translation
WS 14/15
20
phases and attempts in MT :
types of MT systems
three models for the translation process
• model of a direct translation
– language pair dependent, complex system, very rigid, not universally
applicable
• Interlingua-model
– „indirect“ translation, intermediary (formal, artificial) universal language (eg.
Esperanto)
• transfer-model
– 3-stage translation process: analysis - transfer - synthesis/generation
universal
spec.
universal
language pair
specif. rules
„Black Box“
NLP – Machine Translation
WS 14/15
21
content
1
How did it all start ?
2
The human translator
3
Phases and attempts in MT
4
Comparison : human vs. machine translation
5
Quality of MT
NLP – Machine Translation
WS 14/15
22
Comparison : human vs. machine translation
traditional (human) translation
+
perfect results – even for idiomatic, stylistically
demanding texts
+
cultural, textspecific and pragmatic variants can be
considered during the translation process
-
quality is intensive reg. time and costs !
NLP – Machine Translation
WS 14/15
23
Comparison : human vs. machine translation
machine translation
+
+
+
universally available
fast and inexpensive
revolution of modern look-up possibilities
(online spellchecking, mono-lingual dictionaries/thesauri: encyclopedias,
dictionaries, bi-lingual dictionaries, www,...)
richness of information at the click of the mouse!
+
-
consistency of terminology
varying translation quality (high expectations reg. the postediting work of the human translator)
NLP – Machine Translation
WS 14/15
24
let‘s be honest...
...when someone is startled about a translation, it‘s only
because it‘s a bad one – we never recognize good
translations as being translations (let alone „machine
translations“).
Use bottom cushion for floatation -> Benutzen Sie das untere
Kissen für das Flottmachen
„Use the lower cushion to speed up.“
The generation of clear/unambiguous
texts is a serious art !
• errors in documentations result in consecutive errors
(vicious circle)
• errors can be tiresome but also deadly...in all cases
they have to be avoided !
NLP – Machine Translation
WS 14/15
25
Samples of ambiguities
• "Kürzlich erst hatte sie den Drucker
eingestellt.
• a) Jetzt kündigte er schon wieder.
• b) Jetzt war er schon wieder defekt.
• It was only recently that she had
hired/adjusted the printer.
• a) Now he already dismissed.
• b) Now it already was defective again.
NLP – Machine Translation
WS 14/15
26
Comparison of MT software
Newspaper article
Graffiti, with its raw and defiant nature, still
has the ability to raise eyebrows, but as
street art becomes mainstream and the
quality of the work increases, the debate on
its legitimacy as an art form is finally
beginning to settle. (Graffiti – Art or
Vandalism? - DER OBSIDIAN)
NLP – Machine Translation
WS 14/15
27
Comparison of MT software
Human translation:
Graffiti kann in seiner rauen und trotzigen
Art immer noch Stirnrunzeln verursachen,
aber indem Street Art sich etabliert und die
Qualität der Arbeit besser wird, hört
schließlich auch die Debatte auf, ob Graffiti
überhaupt eine Kunstform ist.
NLP – Machine Translation
WS 14/15
28
Comparison of MT software
Linguatec:
Graffiti hat mit seiner rohen und
herausfordernden Natur noch die Fähigkeit,
Augenbrauen anzuheben, aber als Straße
wird Kunst konventionell, und die Qualität
der Arbeit nimmt zu, die Debatte über seine
Rechtmäßigkeit als eine Kunstform beginnt
schließlich, sich zu legen.
NLP – Machine Translation
WS 14/15
29
Comparison of MT software
PROMT:
Graffiti, mit seiner rohen und aufsässigen
Natur, sind noch in der Lage, Augenbrauen
zu erheben, aber weil Straßenkunst
Hauptströmung und die Qualität der
Arbeitszunahmen, der Debatte über seine
Gesetzmäßigkeit wird, weil eine Kunstform
schließlich beginnt sich niederzulassen.
NLP – Machine Translation
WS 14/15
30
Comparison of MT software
SYSTRAN:
Graffiti, mit seiner rohen und aufsässigen
Natur, haben noch die Fähigkeit,
Augenbrauen hochzuziehen, aber, während
Straßenkunst Mainstream wird und die
Qualität der Arbeit sich erhöht, die Debatte
auf seiner Legitimität, wie eine Kunstform
schließlich anfängt zu vereinbaren.
NLP – Machine Translation
WS 14/15
31
Comparison of MT software
TRIDENT:
Aufschriften an Wänden, mit seiniger
feuchter und aufreizender Natur, all noch hat
Fähigkeit abliefert Augenbrauen, aber als
Straßekunst hinstellt hauptsächliche
Richtung und Qualität des Arbeitens
ansteigt, Debatten an seiniger Gesetzlichkeit
als Kunst bilden endgültig anfängt, zu
bereinigen.
NLP – Machine Translation
WS 14/15
32
Comparison of MT software
lingenio:
Graffiti hat mit ihrer rohen und
herausfordernden Natur noch die Fähigkeit,
Augenbrauen anzuheben, aber, wie
Straßenkunst konventionell wird und die
Qualität der Arbeit zunimmt, beginnt die
Debatte über ihre Rechtmäßigkeit als eine
Kunstform schließlich, sich zu legen.
NLP – Machine Translation
WS 14/15
33
Comparison of MT software
SkyCode:
Wandschmierereien, mit sein Sämtlich und
trotzig Vehement, dennoch hat die
Befähigung hochhebt die Augenbrauen,
aber wie Straßeskunst wird zu Mainstream
und die Farbqualität von die Arbeit steigt, die
Debatte auf seine Legitimität als eine
Vorlagesform ist ein endlich aufbrechen
entscheidet.
NLP – Machine Translation
WS 14/15
34
criteria that influence the
quality of machine translation
• algorithms (MT engine)
• lexical information (glossaries...)
• quality of the source text (e.g. e-mail !)
• post editing
NLP – Machine Translation
WS 14/15
35
criteria that influence the
quality of machine translation
quality of the source text (e.g. e-mail !)
Fallebeschreibung
Tobias und Informatiker und entwickelt Sofware für Roboter. Tobias
arbeitet an einem Projekt der von Juliane koordiniert ist . Durch seine
Intersse an seinem Abeitsgebiet lernt er James kennen, ein anderer
Informatiker der roboter entwichkelt. Der Ideenaustausch zwischen
Tobias und James Funktioniert hervorragen, und beide dank der Hilfe
von Juliane arbeiten intensiv zusammen
NLP – Machine Translation
WS 14/15
36
Fallebeschreibung
Tobias und Informatiker und entwickelt Sofware für Roboter. Tobias
arbeitet an einem Projekt der von Juliane koordiniert ist . Durch seine
Intersse an seinem Abeitsgebiet lernt er James kennen, ein anderer
Informatiker der roboter entwichkelt. Der Ideenaustausch zwischen Tobias
und James Funktioniert hervorragen, und beide dank der Hilfe von Juliane
arbeiten intensiv zusammen
Falling description
Tobias and computer scientist and software develops for robots. Tobias
works this one is coordinated by Juliane at a project. He gets to know interest
in its field of work for James, another computer scientist of the robots develops
through his. The exchange of ideas between Tobias and James works jut
out, and both cooperate intensively thanks to the help of Juliane
NLP – Machine Translation
WS 14/15
37
criteria that influence the
quality of machine translation
Which are the current challenges ?
source language
target language
words – syntax – semantics <-> semantics – syntax - words
(easy (?)) examples
NLP – Machine Translation
WS 14/15
38
current quality of open-source MT
http://www.worldlingo.com/products_services/worldlingo_translator.html
Original (German):
Der Hund jagt die Katze.
Translation (English):
The dog hunts the cat.
Back-Translation (German):
Der Hund jagt die Katze.
Original (German):
Der Hund jagt die Katze.
Translation (Farsi):
‫ مندلسون‬hund jagt katze ‫بميرد‬.
Back-Translation (German):
Mendelssohn hund jagt katze die.
Original (German):
Der Hund jagt die Katze.
Translation (French):
Le chien chasse le chat.
Back-Translation (German):
Der Hund verjagt die Katze.
NLP – Machine Translation
WS 14/15
39
quality of machine translation
„tricky“ examples :
idiomatic expressions
There are idiomatic expressions that can directy
(word-by-word) be translated into other languages:
Busenfreund
bosom friend
Others not ....
If Bill kicks the bucket, her children will be rich.
*Wenn Bill den Eimer tritt... (???)
Wenn Bill den Löffel abgibt...(!!!)
NLP – Machine Translation
WS 14/15
40
Original (German):
Du hast wohl nicht alle Tassen im
Schrank !
Translation (English):
You do not have probably all cups in
the cabinet!
Back-Translation (German):
Sie haben nicht vermutlich alle
Schalen im Schrank!
Translation (Farsi):
‫!شما هم که احتماال در كابينه جامهاي‬
Back-Translation (German):
Im diesem Schrank Sie vermutlich
Schalen.
Translation (French): Tu n'as pas
probablement toutes les tasses
dans le coffret !
Back-Translation (German):
Du hast nicht wahrscheinlich alle
Tassen im Kasten!
NLP – Machine Translation
WS 14/15
41
current quality of open-source MT
http://www.foreignword.com/tools/transnow.htm
Free online translation service
with Reverso translation solutions
Original
(English):
Translation (German):
he kicked the
bucket.
er kickte den Eimer.
BackTranslation
(English):
he(it) kicked the
bucket.
NLP – Machine Translation
WS 14/15
42
current quality of open-source MT
http://www.foreignword.com/tools/transnow.htm
kick
1 n
1(3)
a (=act of kicking) Tritt m , Stoß m , Kick m inf
to take a kick at sb/sth nach jdm/etw treten
to give sth a kick einer Sache ( dat ) einen Tritt versetzen
he gave the ball a tremendous kick er trat mit Wucht gegen den Ball
a tremendous kick by Beckenbauer ein toller Schuss von Beckenbauer
to get a kick on the leg einen Tritt ans Bein bekommen, gegen das or
ans Bein getreten werden
what he needs is a good kick up the backside or in the pants inf er braucht mal einen
kräftigen Tritt in den Hintern inf
b inf
(=thrill)
she gets a kick out of it es macht ihr einen Riesenspaß inf , (physically) sie verspürt
einen Kitzel dabei
to do sth for kicks etw zum Spaß or Jux inf or Fez inf tun
just for kicks nur aus Jux und Tollerei inf
how do you get your kicks? was machen Sie zu ihrem Vergnügen?
NLP – Machine Translation
WS 14/15
43
current quality of open-source MT
http://www.foreignword.com/tools/transnow.htm
c no pl
inf (=power to stimulate) Feuer nt , Pep m inf
2(3)
this drink hasn't much kick in it dieses Getränk ist ziemlich zahm inf
he has plenty of kick left in him er hat immer noch viel Pep inf
d [+of gun] Rückstoß m
2 vi
[person] treten
(=struggle) um sich treten
[baby, while sleeping] strampeln
[animal] austreten, ausschlagen
[dancer] das Bein hochwerfen
[gun] zurückstoßen or -schlagen, Rückstoß haben
inf
[engine] stottern inf
kicking and screaming fig unter großem Protest
he kicked into third inf er ging in den dritten (Gang)
3
NLP – Machine Translation
WS 14/15
44
current quality of open-source MT
http://www.foreignword.com/tools/transnow.htm
3 vt
3(3)
a (person, horse) [sb] treten, einen Tritt versetzen ( +dat )
[door] treten gegen
[football] kicken inf
[object] einen Tritt versetzen ( +dat ), mit dem Fuß stoßen
to kick sb's backside jdn in den Hintern treten
to kick sb in the head/stomach jdm gegen den Kopf/in den Bauch treten
to kick sb in the teeth fig jdn vor den Kopf stoßen inf
to kick a goal ein Tor schießen
to kick one's legs in the air die Beine in die Luft werfen
to kick the bucket inf abkratzen inf , ins Gras beißen inf
I could have kicked myself inf ich hätte mich ohrfeigen können, ich hätte mir in
den Hintern beißen können inf
b inf
(=stop)
to kick heroin vom Heroin runterkommen inf
to kick the habit es sich ( dat ) abgewöhnen
NLP – Machine Translation
WS 14/15
45
quality of machine translation :
ambiguities
lexically
The pipe was brand new.
structurally
I saw the man with the telescope.
Who is holding the telescope ?
Er erschlug den Mann mit dem Apfel. Who is holding the apple ?
They are flying planes.
They are riding horses.
They were milking cows.
Running lights can be hazardous.
They were inspiring musicians.
deep structure She got ready for the picture.
semantically
Bob wants to marry an Italian.
pragmatically
When he went from the gate
to the house, it collapsed.
NLP – Machine Translation
WS 14/15
Oil or smoke ?
Who is taking the picture?
Any Italian woman or is his fiancé
an Italian.
What collapsed?
46
quality of machine translation : ambiguities
The crane flew over the plain.
The builder operated the crane.
(crane = Kranich; Kran)
She's a curious person.
(curious = neugierig; kurios)
Do you know what happened?
Do you know this man?
NLP – Machine Translation
WS 14/15
47
quality of machine translation :
ambiguities
lexically
The pipe was brand new.
Oil or smoke ?
structurally
I saw the man with the telescope.
Who is holding the telescope ?
Er erschlug den Mann mit dem Apfel. Who is holding the apple ?
They are flying planes.
They are riding horses.
They were milking cows.
Running lights can be hazardous.
They were inspiring musicians.
deep structure
She got ready for the picture.
Who is taking the picture?
semantically
Bob wants to marry an Italian.
Any Italian woman or is his fiancé
an Italian.
pragmatically
When he went from the gate
to the house, it collapsed.
NLP – Machine Translation
WS 14/15
What collapsed?
48
•
•
•
•
I saw the man with the telescope.
They are riding horses.
They are eating apples.
They are diving people.
PT 2008
•
•
•
•
Ich sah den Mann mit dem Teleskop.
Sie reiten Pferde.
Sie sind Essäpfel.
Sie tauchen Leute.
NLP – Machine Translation
WS 14/15
49
quality of machine translation :
ambiguities
lexically
The pipe was brand new.
Oil or smoke ?
structurally
I saw the man with the telescope.
Who is holding the telescope?
Er erschlug den Mann mit dem Apfel.
They are flying planes.
They are riding horses.
They were milking cows.
Running lights can be hazardous.
They were inspiring musicians.
deep structure
She got ready for the picture.
Who is taking the picture?
semantically
Bob wants to marry an Italian.
Any Italian woman or is his fiancé
an Italian.
pragmatically
When he went from the gate
to the house, it collapsed.
NLP – Machine Translation
WS 14/15
What collapsed?
50
quality of machine translation :
ambiguities
We encounter referential ambiguities when an object can refer to
more than one referent (or: reference object):
eg:

Put the paper in the printer. Then switch it on.

Schröder trat Fischer in die Waden. Der Staatsmann fand das gar
nicht lustig.

In den amerikanischen Nationalparks gibt es viele Plumpsklos. Dort
kann man sich wunderbar entspannen.
NLP – Machine Translation
WS 14/15
51
quality of machine translation :
references / reference resolution
(1) Bob put the suitcase onto the table.
???
It fell down, because it
was crooked.
(2) The trolley carried the food into the car.
???
It
collapsed because it was heavy.
NLP – Machine Translation
WS 14/15
52
quality of machine translation :
references / reference resolution / semantical analysis
of anaphora
(A) syntactic filtering (e.g. gender):
The man is next to the table.
Is he big ?
Is it big ?
(B) semantic selectional restrictions:
Our porter never walks without a dog.
Does he carry a gun ?
Does he often bark ?
(C) world knowledge / expectational pattern:
Dona wanted to go to the disco, but her mother said she was too
young.
Dona wanted to go to the zoo, but her mother said she didn‘t have
money for it.
NLP – Machine Translation
WS 14/15
53
quality of machine translation :
references / reference resolution / semantical analysis
of anaphora
(3) The processor is a new invention. It has ... (pronominal reference - here:
ambiguous; in German: unambiguous bec. of different gender)
(4) The mayor and the headmaster....The latter.... (unambiguous)
(5) Bob went home. The poor boy.... (NP-paraphrase - here: unambiguous)
(6) Bob wants to become a pianist. He thought it was such a nice instrument.
(hidden anaphora- here: unambiguous)
(7) The network broke down. It caused the loss of data. (sentence pronounhere: unambiguous)
(8) She hit him. One has a nervous breakdown every now and then. (oneAnaphora)
(9) It is raining. (pronoun without reference)
(10) Bob is ordering a pizza. Emma does the same. (proverbs)
(11) In 1966 ELIZA was developed. A little later... (time adverbs)
NLP – Machine Translation
WS 14/15
54
quality of machine translation :
ambiguities
• Ambiguities need to be solved.
• Assume you have a sentence with 4 words and each word has 2
readings, then we have 2*2*2*2 different combinations – but only 1
is correct
NLP – Machine Translation
WS 14/15
55
quality of machine translation :
lexical hole
Definition
When a word from the source language doesn‘t
have a corresponding word in the target
language, but only a paraphrase, we speak of a
lexical hole.
Je l'ignore. (I don't know.)
NLP – Machine Translation
WS 14/15
56
quality of machine translation :
sky
heaven
lexical hole
Himmel
NLP – Machine Translation
WS 14/15
57
quality of machine translation :
lexical hole
Eskimos (Inuit) and snow: many various forms / phrases
Samples from Kalaallisut (Greenlandic):
1.‘sea-ice’ — siku (in plural = drift ice)
2.‘pack-ice/large expanses of ice in motion’ — sikursuit, pl. (compacted drift
ice/ice field = sikut iqimaniri)
3.‘new ice’ — sikuliaq/sikurlaaq (solid ice cover = nutaaq)
4.‘thin ice’ — sikuaq (in plural = thin ice floes)
5.‘rotten (melting) ice floe’ — sikurluk
6.‘iceberg’ — iluliaq (ilulisap itsirnga = part of iceberg below waterline)
7.‘(piece of) fresh-water ice’ — nilak
8.‘lumps of ice stranded on the beach' — issinnirit, pl.
9.‘glacier’ (also ice forming on objects) — sirmiq (sirmirsuaq = inland ice)
10.‘snow blown in (e.g. doorway)’ — sullarniq
11.‘rime/hoar-frost’ — qaqurnak/kanirniq/kaniq
12.‘frost (on inner surface of e.g. window)’ — iluq
NLP – Machine Translation
WS 14/15
58
quality of machine translation :
lexical hole
Eskimos (Inuit) and snow: many various forms / phrases
Samples from Kalaallisut (Greenlandic):
13. ‘icy mist’ — pujurak/pujuq kanirnartuq
14. ‘hail’ — nataqqurnat
15. ‘snow (on ground)’ — aput (aput sisurtuq = avalanche)
16. ‘slush (on ground)’ — aput masannartuq
17. ‘snow in air/falling’ — qaniit (qanik = snowflake)
18. ‘air thick with snow’ — nittaalaq (nittaallat, pl. = snowflakes; nittaalaq
nalliuttiqattaartuq = flurries)
19. ‘hard grains of snow’ — nittaalaaqqat, pl.
20. ‘feathery clumps of falling snow’ — qanipalaat
21. ‘new fallen snow’ — apirlaat
22. ‘snow crust’ — pukak
23. ‘snowy weather’ — qannirsuq/nittaatsuq
24. ‘snowstorm’ — pirsuq/pirsirsursuaq
NLP – Machine Translation
WS 14/15
59
quality of machine translation :
lexical hole
Eskimos (Inuit) and snow: many various forms / phrases
Samples from Kalaallisut (Greenlandic):
25. ‘large ice floe’ — iluitsuq
26. ‘snowdrift’ — apusiniq
27. ‘ice floe’ — puttaaq
28. ‘hummocked ice/pressure ridges in pack ice’ — maniillat/ingunirit, pl.
29. ‘drifting lump of ice’ — kassuq (dirty lump of glacier-calved ice = anarluk)
30. ‘ice-foot (left adhering to shore)’ — qaannuq
31. ‘icicle’ — kusugaq
32. ‘opening in sea ice imarnirsaq/ammaniq (open water amidst ice = imaviaq)
33. ‘lead (navigable fissure) in sea ice’ — quppaq
34. ‘rotten snow/slush on sea’ — qinuq
35. ‘wet snow falling’ — imalik
36. ‘rotten ice with streams forming’ — aakkarniq
37. ‘snow patch (on mountain, etc.)’ — aputitaq
NLP – Machine Translation
WS 14/15
60
quality of machine translation :
lexical hole
Eskimos (Inuit) and snow: many various forms / phrases
Samples from Kalaallisut (Greenlandic):
38. ‘wet snow on top of ice’ — putsinniq/puvvinniq
39. ‘smooth stretch of ice’ — manirak (stretch of snow-free ice = quasaliaq)
40. ‘lump of old ice frozen into new ice’ — tuaq
41. ‘new ice formed in crack in old ice’ — nutarniq
42. ‘bits of floating ice’ — naggutit, pl.
43. ‘hard snow’ — mangiggal/mangikaajaaq
44. ‘small ice floe (not large enough to stand on)’ — masaaraq
45. ‘ice swelling over partially frozen river, etc. from water seeping up to the
surface’ — siirsinniq
46. ‘piled-up ice-floes frozen together’ — tiggunnirit
47. ‘mountain peak sticking up through inland ice’ — nunataq
48. ‘calved ice (from end of glacier)’ — uukkarnit
49. ‘edge of the (sea) ice’ — sinaaq
NLP – Machine Translation
WS 14/15
61
quality of machine translation :
English
lexical hole
Welsh
geyrrd
green
blue
glas
gray
brown
llwyd
NLP – Machine Translation
WS 14/15
62
Near to the
speaker
Nearer to the
speaker
Near to the
hearer and
speaker
Away from the
speaker
Away & close
to the hearer
Same distance
from the
speaker and
hearer
Away from the
speaker and
hearer
Away from the
speaker and
hearer and
visible
Away from the
speaker and
hearer and
invisible
Deictic expressions in 5 languages
NLP – Machine Translation
WS 14/15
63
Number in Bayso (Ethiopia) and German
NLP – Machine Translation
WS 14/15
64
quality of machine translation :
overlap
Jurafsky & Martin 2000, 806
NLP – Machine Translation
WS 14/15
65
quality of machine translation :
structural differences
• Sam likes to swim.
• Sam schwimmt gerne.
„... a structural mismatch occurs where two languages use the
same construction for different purposes, or use different
constructions for what appears to be the same purpose“
(Arnold 1994, 110).
Arnold, Douglas [et. al.] 1994: Machine Translation: An Introductory Guide, London: NCC
Blackwell, im Netz unter: http://www.essex.ac.uk/linguistics/clmt/MTbook/PostScript/
NLP – Machine Translation
WS 14/15
66
quality of machine translation :
structural differences
• Multiple words for 1 word
Zeitmangel erschwert
das Problem.
Lack of time makes more difficult the problem.
Correct: Lack of time makes the problem more difficult.
MT: Time makes the problem.
NLP – Machine Translation
WS 14/15
67
quality of machine translation :
structural differences
• Phrasal translation
Eine Diskussion erübrigt
sich demnach.
A discussion
is made unneccessary itself therefore.
Correct: Therefore, there is no point in a discussion.
MT: A debate turned therefore.
NLP – Machine Translation
WS 14/15
68
quality of machine translation :
structural differences
• Syntactic transformations
Das ist der Sache nicht angemessen.
That is the matter not appropriate.
Correct: That is not appropriate for this matter.
MT: That is the thing is not appropriate.
Den Vorschlag lehnt die Kommission ab.
The proposal rejects the commission off.
Correct: The commission rejects the proposal.
MT: The proposal rejects the commission.
NLP – Machine Translation
WS 14/15
69
quality of machine translation :
structural differences
But we may discuss whether some problems are
rather lexical or syntactical:
Er heißt Sam.
His name is Sam.
Il s'appelle Sam.
NLP – Machine Translation
WS 14/15
70
quality of machine translation :
structural differences
The translation of linguistic features that the languages
don‘t share, is extra problematic:
Mein Zug fährt um 8.30 Uhr.
Mein Zug fährt gerade ab.
The adverbial („gerade“) has to be translated into English by using the
progressive aspect of the verb:
My train leaves at 8.30.
My train is leaving.
NLP – Machine Translation
WS 14/15
71
quality of machine translation :
structural differences
Look at the mountains back there. (dort hinten)…over there
NLP – Machine Translation
WS 14/15
72
quality of machine translation :
collocations
Collocations refer to the parallel occurence of words in a specific context:
Die Butter ist ranzig/*sauer.
The butter is rancid/*sour.
Die Milch ist sauer/*ranzig.
The milk is sour/*rancid.
ein starker Raucher
a heavy smoker
un grand fumeur
butter is rancid BUT milk is sour
a strong smoker / a heavy smoker
fast food
BUT
a quick meal
a fast train
BUT
a quick shower
NLP – Machine Translation
WS 14/15
73
quality of machine translation :
garden path sentences
1. The horse raced past the barn fell.
2. The old man the boat.
3. The cotton clothing is usually made of grows in Mississippi.
4. Until the police arrest the drug dealers control the street.
5. The man who hunts ducks out on weekends.
6. When Fred eats food gets thrown.
7. gave the child the dog bit a bandaid.
8. The girl told the story cried.
9. I convinced her children are noisy.
10. The prime number few.
11. I know the words to that song about the queen don't rhyme.
12. She told me a little white lie will come back to haunt me.
13. Fat people eat accumulates.
14.The raft floated down the river sank.
15. We painted the wall with cracks.
NLP – Machine Translation
WS 14/15
74
quality of machine translation :
garden path sentences
1. The horse (which was) raced past the barn, fell (down).
2. The old (people) man the boat.
3. The cotton (that) clothing is usually made of grows in Mississippi.
4. Until the police (make the) arrest, the drug dealers control the street.
5. The man, who hunts (animals), ducks out on weekends.
6. When Fred eats (his dinner) food gets thrown.
7. Mary gave the child (that) the dog bit a bandaid.
8. The girl (who was) told the story, cried.
9. I convinced her (that) children are noisy.
10. The prime (people) number few.
11. I know (that) the words to that song about the queen don't rhyme.
12. She told me (that) a little white lie will come back to haunt me.
13. (The) fat (that) people eat accumulates (in their bodies).
14.The raft (that was) floated down the river, sank.
15. We painted the wall (that was covered) with cracks.
NLP – Machine Translation
WS 14/15
75
quality of machine translation :
garden path sentences – local vs. global ambiguity
Garden Path sentences normally have local ambiguity.
Locally ambiguous: The old train...
"Train" could be a noun ("The old train left the station") or a verb
("The old train the young").
Globally ambiguous: I know more beautiful women than Julia Roberts.
This could mean "I know women more beautiful than Julia Roberts"
or "I know more beautiful women than Julia Roberts does".
NLP – Machine Translation
WS 14/15
76
quality of machine translation :
incorrect machine translations
What are the reasons ?
(a) lexical ambiguities: pipe, pen
(b) phrasal ambiguities / compounds: riding horses (noun vs. ing-form :
The beautiful riding horses are in the barn. vs. The children enjoy riding
horses. BUT: They are riding horses. ???)
(c) syntactic ambiguities ...with the telescope
(d) semantic-pragmatic-referential ambiguities
(e) opposite translations: to make invalid / to cancel (e.g.ticket) – validez
(French = to make valid)
(f) ethnic differences: the inuit differentiate (also lexically!) the various
forms/types of snow
Engl-German: answer/reply -> antworten (verb), Antwort (noun)
NLP – Machine Translation
WS 14/15
77
quality of machine translation :
incorrect machine translations (testrun in the mid 90ies)
Subjunctive instead of simple past
He could hardly believe he had made the mistake.
A: Er konnte kaum glauben, dass er den Fehler gemacht hatte.
B: Er könnte kaum glauben, dass er den Fehler gemacht hatte.
Medial constructions:
Computers sell well.
A: Computer lassen sich gut verkaufen.
B: Rechner-Verkaufs-Brunnen.
„like“:
Which books do you like reading?
A: Welche Bücher liest du gerne?
B: Welche Bücher tun Ihnen wie Lesen?
NLP – Machine Translation
WS 14/15
78
quality of machine translation :
incorrect machine translations (testrun in the mid 90ies)
I am sorry that it is raining.
A: Ich bedaure, daß es regnet.
B: Ich bin eine Verzeihung, daß es regnet.
Each character used by the printer and the terminal.
A: Jedes vom Drucker und dem Terminal verwendete Zeichen.
B: Jeder Charakter, der vom Drucker und dem Endstück benutzt wird.
Turn on the computer and follow the instructions.
A: Schalte den Computer ein und befolge die Anweisungen.
B: Drehen Sie sich auf dem Rechner und folgen Sie den
Anweisungen.
NLP – Machine Translation
WS 14/15
79
quality of machine translation :
incorrect machine translations (testrun in the mid 90ies)
Infinitival constructions
Dem Mann verspreche ich, Karla zu helfen.
A: I promise the man to help Karla.
B: I, Karla promise the man to help.
Extraposition:
Den Wagen erlaube ich dem Mann zu kaufen.
A: I permit the man to buy the car.
B: I allow the cars to buy for the man.
Business:
Die Inspektion brachte keine äußeren Beschädigungen zum
Vorschein.
A: The inspection didn‘t reveal any external damages.
B: inspection brought no external harm to the pre-appearance.
NLP – Machine Translation
WS 14/15
80
quality of machine translation :
incorrect machine translations (enhancements 2001)
enhanced syntactic analysis
Vorsicht, das ist ein bissiger Hund.
1998: Caution, this are a vicious dog.
2001: Caution, this is a vicious dog.
Er liefert uns das, was wir bestellt haben.
1998: He delivers this what we have ordered to us.
2001: He delivers to us what we have ordered.
Dies ist genau das, was wir wollten.
1998: This is exactly this what we wanted.
2001: This is exactly what we wanted.
NLP – Machine Translation
WS 14/15
81
quality of machine translation :
incorrect machine translations (enhancements 2001)
Er spricht über das, was er denkt.
1998: He speaks about this what he thinks.
2001: He speaks about what he thinks.
Warum hast Du nicht auf das gehört, was ich Dir gesagt habe?
1998: Why haven't you listened to this what I have told you?
2001: Why haven't you listened to what I have told you?
NLP – Machine Translation
WS 14/15
82
quality of machine translation :
incorrect machine translations (enhancements 2001)
Nach Erklärungen des Bevollmächtigten der argentinischen Regierung
kommen als Einwanderer neben Italienern, Spaniern und Franzosen auch
Deutsche in Frage.
1998: After explanations of the assignee of the Argentine government
Germans are also considered as immigrant next to Italians, Spaniards and
Frenchmen.
2001: According to explanations of the assignee of the Argentine
government Germans are also considered as immigrants next to Italians,
Spaniards and Frenchmen.
NLP – Machine Translation
WS 14/15
83
quality of machine translation :
incorrect machine translations (enhancements 2001)
use of the article
Sie ist Mutter und Hausfrau.
1998: She is mother and housewife.
2001: She is a mother and a housewife.
Er ist als Techniker beschäftigt.
1998: He is employed as technician.
2001: He works as a technician.
NLP – Machine Translation
WS 14/15
84
quality of machine translation :
incorrect machine translations (enhancements 2001)
use of the article
Wichtigste Regel bei Winterglätte ist, abrupte Brems- und
Lenkmanöver zu vermeiden.
1998: It is most essential rule on icy roads to avoid abrupt braking
and steering actions.
2001: It is the most important rule on icy roads to avoid abrupt
braking and steering actions.
NLP – Machine Translation
WS 14/15
85
quality of machine translation :
incorrect machine translations (enhancements 2001)
word order
Ich esse immer mageres Fleisch.
1998: I eat always lean meat.
2001: I always eat lean meat.
progressive
Er lernt jetzt Altgriechisch.
1998: He studies ancient Greekly now.
2001: He is learning classical Greek now.
NLP – Machine Translation
WS 14/15
86
quality of machine translation :
incorrect machine translations (enhancements 2001)
Ich ziehe mich gerade an.
1998: I just get dressed.
2001: I am just getting dressed.
present -> present perfect
Seine Krankheit ärgert ihn seit Jahren.
1998: His illness annoys him for years.
2001: His illness has bothered him for years.
NLP – Machine Translation
WS 14/15
87
quality of machine translation :
incorrect machine translations (enhancements 2001)
translations of idiomatic expressions
Das ist nur ein aus der Luft gegriffenes Gerücht.
1998: This is only a rumor gripped from the air.
2001: This is only a rumor unfounded.
Sie mussten für immer Abschied nehmen.
1998: You had to take discharge for always.
2001: They had to say goodbye for ever.
NLP – Machine Translation
WS 14/15
88
quality of machine translation :
incorrect machine translations (enhancements 2001)
time
Es ist halb sechs.
1998: It is half six.
2001: It is half past five.
lexical ambiguities: ex: drehen / to turn, to revolve around, to concern
Die Erde dreht sich um die Sonne.
1998: The earth concerns the sun.
2001: The earth revolves around the sun.
Die behenden Kunstreiter drehten sich in der Arena.
1998: The swift trick riders revolved in the arena.
2001: The swift trick riders turned round in the arena.
NLP – Machine Translation
WS 14/15
89
quality of machine translation :
criteria in detail…lexical information
What are the challenges?
algorithms
lexical information
quality of the source text
post editing
specialist / technical dictionaries !
problem: time => money
NLP – Machine Translation
WS 14/15
90
quality of machine translation :
criteria in detail…lexical information (compounds,
idioms, phrases,…)
examples:
Helmut Kohl
Pfannenstielsche Inzision
Helmut cabbage
new: Helmut Kohl
Pan handle shear Inzision
new: Pfannenstielsche Inzision
Taschendolmetscher
Ich glaube mein Schwein pfeift.
Bag interpreter
I believe my pig whistles.
Können Sie mir bitte die Tür aufhalten? Ja, natürlich.
Please, can you hold back the door for me? Yes, natural.
new: Please can you hold back me the door? Yes, of course.
(Would you mind opening the door for me ? No, not at all.)
NLP – Machine Translation
WS 14/15
91
quality of machine translation :
criteria in detail…lexical information (compounds,
idioms, phrases,…)
Unistar
Baumaterialien aus Metall; transportable Bauten aus Metall;
Schienenbaumaterial aus Metall; Kabel und Drähte aus Metall (nicht
für elektrische Zwecke); Schlosserwaren und Kleineisenwaren;
Metallrohre; Waren aus Metall (soweit in Klassen 6 enthalten);
Laufschienen und Kurven für Einschienenhängebahnen; EisenbahnOberbaumaterial, insbesondere Weichen, Kreuzungen, Prellböcke,
Schienenauszugsvorrichtungen, Drehscheiben sowie zugehöriges
Verbindungs- und Befestigungsmaterial;
Ortsbrustsicherungen;Werkzeugmaschinen; Maschinen; Motoren
sowie Kupplungen und Vorrichtungen zur Kraftübertragung (soweit in
Klasse 07 enthalten), einschließlich Antriebseinheiten für
Einschienenhängebahnzüge;
NLP – Machine Translation
WS 14/15
92
quality of machine translation :
criteria in detail…lexical information (compounds,
idioms, phrases,…)
Unistar University starling
Baumaterialien aus Metall; building materials of metal;
transportable Bauten aus Metall; Transportable buildings of metal;
Schienenbaumaterial aus Metall; splinting building material of metal (new: rail
building material made of metal;
Kabel und Drähte aus Metall (nicht für elektrische Zwecke); Cable and wires
from metal (not for electrical purposes);
Schlosserwaren und Kleineisenwaren; fitter goods and little girl iron goods (new:
Locksmith goods and small iron goods);
Laufschienen und Kurven für Einschienenhängebahnen; guide rails and curves
for one splinting hanging trains;
Eisenbahn-Oberbaumaterial, insbesondere Weichen, Kreuzungen, Prellböcke,
Schienenauszugsvorrichtungen, Drehscheiben sowie zugehöriges Verbindungsund Befestigungsmaterial; Eisenbahn-Upper building material, particularly sides
intersections, bufferses splinting statement devices, Turntables as well as
accompanying connection and fastening material;
NLP – Machine Translation
WS 14/15
93
quality of machine translation :
criteria in detail…lexical information (compounds,
idioms, phrases,…)
after adding a special glossary (domain specific dictionary):
Unistar Unistar
Baumaterialien aus Metall; metal building materials;
transportable Bauten aus Metall; transportable buildings of metal;
Schienenbaumaterial aus Metall; material of metal for railway tracks;
Kabel und Drähte aus Metall (nicht für elektrische Zwecke); non-electric
cables and wires of common metal;
Schlosserwaren und Kleineisenwaren; ironmongery and small items of metal
hardware;
Laufschienen und Kurven für Einschienenhängebahnen; Running rails and
curves for suspended monorail systems;
Eisenbahn-Oberbaumaterial, insbesondere Weichen, Kreuzungen,
Prellböcke, Schienenauszugsvorrichtungen, Drehscheiben sowie zugehöriges
Verbindungs- und Befestigungsmaterial; Railway superstructure material,
particularly points crossings, buffers Schienenauszugsvorrichtungen,
Turntables as well as accompanying connecting and fixing elements;
NLP – Machine Translation
WS 14/15
94
quality of machine translation :
quality enhancements
quality enhancements are possible by:
• constantly updating the glossaries / dictionaries /rules
• adding special / technical glossaries / dictionaries
• adding special phrases / fixed expressions /
idiomatic expressions
->
adding „typical phrases“ / expressions into the
translation memory (machine learning of frequently
used phases)
NLP – Machine Translation
WS 14/15
95
quality of machine translation :
source text
algorithms
lexical information
quality of the source
post editing
general rule: garbage in -> garbage out
Whatever a human being is not capable to write, a machine
cannot translate !
samples: e-mail, text messages,…
quality enhancements possible by:
• enhancing / correcting the input texts (spell checking, grammar
checker)
NLP – Machine Translation
WS 14/15
96
quality of machine translation :
post editing
algorithms
lexical information
quality of the source
post editing
(Yet another) challenge for the (human) translator !
As of today, a machine cannot perform an efficient
post-editing !
quality enhancements possible by:
• expanding / updating the translation memory
NLP – Machine Translation
WS 14/15
97
SMT (statistical machine translation)
A relatively new field. (Statistical machine translation was re-introduced in 1991 by
researchers at IBM's Thomas J. Watson Research Center.)
An „expansion“ of rule-based MT.
In SMT, translations are generated on the basis of statistical models
whose parameters are derived from the analysis of bilingual text corpora.
The translation of text from one human language to another by a
computer that learned how to translate from vast amounts of translated
texts.
NLP – Machine Translation
WS 14/15
98
SMT (statistical machine translation)
The idea behind statistical machine translation comes from information
theory. A document is translated according to the probability distribution
p(e|f) that a string e in the target language is the translation of a string
f in the source language.
Future: The next step will be to exploit non-parallel corpora, i.e. use
frequencies on the web.
NLP – Machine Translation
WS 14/15
99
summary:
What can we expect a machine to do and how should a
human translator deal with the technology?
The professional use of MT/MAT only works when:
(view of the MT-technology developers)
• a clear specification of the content / special vocabulary can be
determined
• special/technical dictionaries are being developed and added to the
basic vocabulary
• tools (e.g. for testing) are constantly in use
• permanent quality control of all components
• permanent expansion and enhancements of the glossaries,
dictionaries, and the translation memory, as well as the algorithms
• the human translator accepts the electronic medium !
NLP – Machine Translation
WS 14/15
100
Last but not least...
Language is a barrier as well as a challenge –
not only for us human beings but also for the machine
NLP – Machine Translation
WS 14/15
101

Similar documents