PROBLEMS AND ISSUES IN MACHINE TRANSLATION: THE CASE

Transcription

ŠIAULIAI UNIVERSITY
FACULTY OF HUMANITIES
DEPARTMENT OF ENGLISH PHILOLOGY
PROBLEMS AND ISSUES IN MACHINE TRANSLATION: THE CASE
OF TRANSLATION FROM ENGLISH TO LITHUANIAN
BACHELOR THESIS
Research adviser: Assist. Lolita Petrulionė
Student: Viktorija Stalmačenkaitė
Šiauliai, 2013
CONTENTS
INTRODUCTION .................................................................................................................... 3
1. AN OVERVIEW ON MACHINE TRANSLATION ........................................................ 5
1.1. General definitions .......................................................................................................... 5
1.2. Machine translation process ............................................................................................ 6
2. PROBLEMS OCCURRING IN MT .................................................................................. 8
2.1. Linguistic mistakes .......................................................................................................... 9
2.1.1.Grammatical mistakes .............................................................................................. 9
2.1.2.Lexical mistakes ..................................................................................................... 11
2.2. Systemic mistakes ......................................................................................................... 14
3. TRANSLATING TEXTS OF DIFFERENT REGISTERS ........................................... 15
4. THE ANALYSIS OF TEXTS OF DIFFERENT REGISTERS..................................... 17
4.1. The methodology of the research .................................................................................. 17
4.2. Analysis of technical set of instructions ........................................................................ 18
4.3. Analysis of popular non-fiction text .............................................................................. 21
4.4. Analysis of belles-lettres style ....................................................................................... 25
4.5. Analysis of newspaper article ........................................................................................ 29
4.6. Analysis of the official document .................................................................................. 34
4.7. Statistical analysis of data.............................................................................................. 38
CONCLUSIONS ..................................................................................................................... 42
REFERENCES ....................................................................................................................... 44
WEBSITES ............................................................................................................................. 45
DICTIONARIES .................................................................................................................... 46
SOURCES ............................................................................................................................... 46
ANNEX 1.................................................................................................................................47
2
INTRODUCTION
It can be stated that a necessity to translate languages occurred together with first
civilizations. What is more, if we look at ‘‘The Book of Genesis’’ we would find that ‘‘the
whole earth had one language and the same words’’ (Genesis 11:1). Everything changed,
when people decided to build an enormous tower, which would reach the heaven.
Deffinbaugh (2004) asserts that God, then, realized that if humans succeed it would lead to
‘‘arrogant self-confidence and independence of God,’’ so he decided to destroy people’s
plans: ‘‘Come, let us go down, and confuse their language there, so that they will not
understand one another’s speech’’ (Genesis 11:7). From this point, people became bewildered
by the variety of languages. In the course of time humans perceived that ability to understand
and to use other languages is crucial in modern society what caused the emergence of
translation.
Shapa (2009) distinguishes four formal types of translations:
1. Oral translation (interpreting).
2. Written translation.
3. Computer-assisted translation.
4. Machine translation.
Each translation requires a decent knowledge from a person who is rendering the text.
To save time and energy of human translators the idea of machines, which could translate a
large amount of texts, has been implemented. However, products of the human translator and
the machine can differ greatly. Such being the case, machine translation has been investigated
more and more thoroughly throughout translation studies.
The aim of the current paper is to discuss problems and issues of machine translation in
the texts of various genres. The following objectives have been set to reach the aim:
1. To analyze scholarly literature on the notion, process, problems and issues of
machine translation.
2. Briefly present a notion of different text genres in the English language.
3. To perform machine translation and compare the output with the texts rendered by a
human translator.
4. Highlight the most crucial mistakes in each text and evaluate the quality of the texts.
Methods employed in this study are as follows:
3
1. The method of meta-analysis helped to review the conclusions made by other
authors about the problems and issues of machine translation and theory related to
different genres in the English language.
2. The sampling method was used to select and classify the examples of mistakes
found in the output of machine translation.
3. The contrastive method enabled to investigate texts of different languages with a
purpose of highlighting differences and issues of machine translation comparing to
human translation.
4. The statistical method helped to systematize and generalize the empirical data of the
present research.
The scope of the paper is 49 examples which are selected from texts of different
literature genres. Each example consists of 3 segments – the source text segment and two
target segments. The total number of words under analysis reaches 1.577. Sources were
mainly extracted from the Internet, whereas one work of fiction and instruction manual of an
alarm clock were also used as the sources. More detailed description of all sources and the
process of analysis will be discussed in chapter 4.1.
As regards the practical value of the research paper, very little research had been
done concerning the concept of machine translation from English to Lithuanian. Therefore,
other students, carrying out research concerning this phenomenon, will be able to use data
collected in this paper.
The current bachelor thesis consists of the following parts: introduction, theoretical part,
practical part, which includes the methodology of the research and statistical analysis,
conclusions, and a list of references and sources. Introduction presents the aim of the study as
well as the objectives, the scope, materials, methods used in this work, and the practical value
of the present paper. The theoretical part covers the theory on machine translation, i.e. the
general definitions, the process and findings on the topic of machine translation already
discussed by other researchers. It also involves the theory related to different literature genres
of the English language. The practical part consists of the collected examples and their
explanations, as well as the methodology of the study and statistical analysis.
4
1. AN OVERVIEW ON MACHINE TRANSLATION
This chapter briefly discusses the concept of machine translation, its process and
approaches. This section also presents findings concerning problems and issues of machine
translation made by other researchers.
1.1. General definitions
Daudaravičius (2006:7) suggests, that terms automatic, computer and machine
translation are often used without drawing any difference between them and that the use of
those terms are usually determined by the context. However, Hutchins and Somers (1992:3)
state that ‘‘Machine Translation is traditional and standard name for computerised systems
responsible for the production of translations from one natural language into another, with or
without human assistance.’’ According to the authors, such terms as automatic translation
and mechanical translation are extremely rare in English. What is more, all authors agree that
the definitions mentioned above do not include computer-based translation, i.e. translators are
not supported by the access to various online dictionaries or terminology databases (Hutchins
and Somers 1992:3, Daudaravičius 2006:7). Such being the case, the target text (TT) is
rendered by employing pre-established algorithms and logical rules (Daudaravičius 2006:7).
Thus we can conclude that it would be incorrect to state that machine translation is fully
automatic process, because human interference is necessary to some extent, e.g. to install the
rules followed by the machine. One more fact supporting this idea is that it is still practically
impossible to get a high quality machine translation product. Therefore, outputs of almost
every machine system are post-edited by humans (Hutchins and Somers 1992:3). However,
the idea of machine translation is that of fully automatic translation process. Therefore, it
should not be mixed with Machine-Aided Human Translation (MAHT) and Human-Aided
Machine Translation (HAMT), boundaries between which are very often uncertain and the
term Computer-Aided Translation (CAT) can cover both (Ibid.). Throughout the present
paper the term machine translation (MT) will be used to present the material relevant to the
process of applying machine system to render selected texts.
As it was mentioned previously MT defines any computerized process of translating
texts, but systems themselves can be of different kinds. Hutchins and Somers (1992:4) point
out two types of MT systems: 1) bilingual – a system designed for only one particular pair of
languages and 2) multilingual – a system designed for more than two languages. The latter
may be either uni-directional or bi-directional (Robin 2009). However, in most cases they
5
tend to be bi-directional, which means they can translate texts from any provided language to
any other given language and vice versa (Ibid.).
The definitions given in this chapter define general systems of MT, but it is important to
distinguish which forms are most often used by the society. This aspect had been discussed by
Manion (2009:6) who distinguishes the following types: 1) MT for dissemination – this form
is usually employed by corporations which publish many of their documentations and the TT
must be of publishable quality. Texts are edited by the human translators afterwards what
helps them to save time; 2) MT for assimilation – most often used on the Internet for it
provides information in real time, but the TT will not always be intelligible. The aim of this
form is to give a general meaning of the source text (ST); 3) MT for communication – this
form is very similar to previously mentioned one, because it also has to perform in real time
and also is found on the Internet. However, the difference is that MT for communication deals
with the language utilized in conversations so it is more appropriate to be employed in emails
and chat rooms (Ibid.). The system employed in this research is of second type, for it provides
the information in real time, but the texts sometimes is not always comprehensible. Other
types do not suit here, because the system is not able to cope with large amount of text, which
is usually the case with the texts corporations translate, whereas the third type is not suitable,
because chat rooms usually employ many colloquial words, abbreviations, etc. which also is a
big obstacle for the current MT system.
1.2. Machine translation process
Even though translation process consists of many tasks, two main actions can be
pointed out, i.e. analysing the meaning of the ST and transferring encoded meaning into the
TT. However, MT process is a bit more complex and mostly hinges on different approaches
used in certain systems.
Classical MT structure is presented by Jurafsky and Martin (2006:10). Authors present
3 approaches usually applied in MT and briefly describe them. Firstly they talk about direct
translation and say that in this approach ‘‘we proceed word-by-word through the source
language text, translating each words as we go’’ (Ibid.). Extensive bilingual dictionaries are
employed in this process, where each dictionary is like a small programme and is responsible
of translating one word (Ibid.). Further transfer approach is described. Jurafsky and Martin
(2006:10) suggest that in this case, firstly we perform a grammatical analysis of the ST, and
then reconstruct the input text into the target language (TL) parse structure by utilizing
various rules. The next step, according to authors, is building the TL sentence from the
6
grammatical structure. Finally, Jurafsky and Martin (2006:10) point out, so called, interlingua
approach. The essence of this approach is that we analyze the ST and put it in an abstract
notion, entitled as an interlingua. The following stage is to create the TT ‘‘from this
interlingual representation’’ (Ibid.).
These approaches and processes are illustrated in well-known Vauquois triangle
presented in Figure 1 below.
Figure 1. The Vauquois triangle adopted from Jurafsky and Martin (2006:11)
The triangle illustrates the knowledge required in the different types of analysis. It is
evident that direct approach utilizes the lowest amount of translation knowledge, because
words employed in this approach are usually rendered word-by-word as we go along. As we
move up the process of translation gets more and more complex, because a deeper analysis of
the ST and greater efforts to generate the meaning into the TT are required. The ideal of the
rendering process is interlingua, i.e. ‘‘a scheme capable of representing all meaning
expressible in any language in language-independent form’’ (Gerber 2009).
A bit different and less complex scheme of MT process is presented by Robin (2010)
(see Figure 2).
Figure 2. A typical MT process adopted from Robin (2010)
7
Comparing with previously presented Figure 1 we can state that Figure 2 shows
interlingua, for we can see that it involves various types of analyses which help to transfer the
meaning of the ST into the TT as close as possible. To achieve a decent product deformatting
and reformatting are utilized. Deformatting is used to identify portions of the ST which do not
require translation while reformatting deals with putting those non-translated portions back
into the TT (Robin 2010). Pre-editing means segmentation of long sentences into short ones
or fixing up punctuation and separating portions which are untranslatable. Meanwhile, postediting fixes the TT that it would be up to the mark (Ibid.).
Both figures constitute the idea that MT is an elaborate process if one seeks to produce
a satisfactory translation product. The machine has to perform a number of tasks to analyse
the ST properly and to generate the TT which would contain the meaning as close as possible
to the translated text.
2. PROBLEMS OCCURRING IN MT
Many authors have discussed and classified problems one faces while using MT
systems. For instance, Hutchins and Somers (1992:81-96) distinguish 5 categories which are:
1) morphology problems; 2) lexical ambiguity; 3) structural ambiguity; 4) anaphora1
resolution and 5) quantifier scope ambiguity.
However, only 3 reasons why MT is an elaborate process are discussed by Arnold et al.
(1994:105). These problems are as follows: 1) ambiguity; 2) structural and lexical
differences between languages, and 3) multiword units2.
Riedel and Schwarze (2001) cited by Petkevičiūtė and Tamulynas (2011:38) provide
one more distribution of translation issues. Authors divide rendering problems into 8 groups:
1) polysemy3; 2) homonymy4; 3) syntactic ambiguity; 4) referential ambiguity; 5)
indefinite errors5; 6) synonyms; 7) metaphors and symbols, and 8) neologisms.
It is evident that all authors highlight more or less the same problems, only the depth of
the analysis of these problems differs. Petkevičiūtė and Tamulynas (2011:39) conclude that
1
Anaphora – use of a grammatical substitute (as a pronoun or a pro-verb) to refer to the denotation of a
preceding word or group of words. Merriam-Webster Online. [Online] Available from: http://www.merriamwebster.com/dictionary/anaphora [Accessed on 10th October 2012].
2
Authors mean idioms and collocations.
3
Polysemy – having multiple meanings. Merriam-Webster Online. [Online] Available from:
http://www.merriam-webster.com/dictionary/polysemous [Accessed of 10th October 2012].
4
Monosemy – the property of having only one meaning. Oxford Dictionaries [Online] Available from:
http://oxforddictionaries.com/definition/english/monosemy [Accessed on 10th October 2012].
5
By indefinite errors it is meant terms, sayings, and unclear words.
8
the most sufficient classification is given by Hutchins and Somers (1992:81-96), for it is
thorough and involves all spheres of translation issues.
Present research will be based on the classification proposed by Petkevičiūtė and
Tamulynas (2011:39). They distinguish 2 types of possible MT mistakes: linguistic and
systemic. Linguistic mistakes are further subdivided into morphological (grammatical) and
lexical issues, whereas systemic problems do not have any subgroups (Ibid.). More detailed
information about morphological, lexical, and systemic problems will be discussed in the
following chapters.
2.1. Linguistic mistakes
This chapter covers a presentation of the classification proposed by two Lithuanian
authors Petkevičiūtė and Tamulynas (2011:39). This classification includes the most common
mistakes in MT process when translating English texts into Lithuanian. In this section we will
be discussing grammatical and lexical mistakes marked out by authors mentioned above.
2.1.1. Grammatical mistakes
Petkevičiūtė and Tamulynas (2011:39-40) point out 7 main grammatical mistakes
evident in the process of MT, which are as follows:

Grammatical case;

Verb forms;

Number;

Verb conjugation;

Gender;

Parts of speech;

Negative verbs.
According to the authors mistakes related to the grammatical case are one of the most
common mistakes. MT system usually translates words in the nominative case. This issue is
extremely distinct in the complex sentences where the system has to translate words linked
together, but which are separated by extra words. Such being the case, MT system usually is
able to identify the case of the first word correctly, but the second word is rendered in the
nominative or other random case (Petkevičiūtė and Tamulynas 2011:39). This problem occurs
9
due to differences between cases of the Lithuanian and English languages. In the Lithuanian
language cases are defined by adding different inflections to the word, whereas in the English
language word order is the main tool to determine the grammatical case (Valeika and
Buitkienė 2003:49). If MT system is not conversant with these peculiarities of both languages
it will most likely cause problems in translating the ST.
Another common mistake in MT is the issue dealing with verb forms. Usually MT
system uses the infinitive or the third form of the present tense. These forms are selected
because the system cannot use any other text comprehension knowledge, i.e. semantics and
pragmatics and this makes it difficult to assess which form is more appropriate (Petkevičiūtė
and Tamulynas 2011:39). What is more, in some cases preceding words and the grammatical
structures do not indicate which forms should be employed (Ibid.). Authors state that this
problem is quite often met when MT system has to translate derivative verb forms.
Further Petkevičiūtė and Tamulynas (2011:39-40) discuss issues related to the
grammatical number. Authors indicate that the incorrect usage of the number is usually
determined by the preceding pronoun, e.g. the English pronoun you can mean either tu or jūs
in Lithuanian. Considering the fact mentioned in the latter paragraph, that MT systems are
unable to use text comprehension knowledge, it is hard to identify which translation is more
appropriate. Consequently, if the pronoun is rendered incorrectly words following the
pronoun will be translated incorrectly as well. Authors also note that in some cases the wrong
grammatical number is used even though the pronoun is translated properly (Petkevičiūtė and
Tamulynas 2011:40). However, this mistake could be attributed to the group of systemic
mistakes, because it is hard to explain why this happens.
The forth mistake presented by Petkevičiūtė and Tamulynas (2011:40) is the verb
conjugation. This mistake occurs due to similar reasons mentioned in the preceding
paragraph. Authors point out that conjugations also depend on the pronouns which precede
them. Such being the case, the incorrect translation of the pronoun leads to the misalignment
of the verb conjugation (Ibid.).
Next important issue is the usage of the gender. Petkevičiūtė and Tamulynas (2011:40)
mark out that ‘‘if there are no clear attributes of gender (like pronouns she, he) MT system
renders word in masculine gender.’’ This is due to the masculine gender is unmarked member
in the English language. The same can be said about the Lithuanian language. However, the
difference is that in the English language only words denoting persons (and some animals) are
marked as masculine or feminine and words denoting non-persons have neuter gender. The
Lithuanian language, on the other hand, marks both groups of words, persons and objects, as
masculine or feminine (Valeika and Buitkienė 2003:54). These differences can cause certain
10
misalignment in the TT if MT system is not programmed to identify those non-person words
and their gender with the remaining text.
One more common and grave mistake is the usage of the incorrect part of speech. It is a
severe error, because the improper part of speech can completely change the meaning of the
ST sentence. We encounter this problem, because some words in the English language can be
a noun, an adjective, an adverb, or a verb, due to this semantic information of the sentence is
fundamental here (Petkevičiūtė and Tamulynas 2011:40). Usually if the word is preceded by
the particle to it is believed that this word is a verb, if the particle is absent we assume the
word to be a noun or an adjective. However MT system not always follows these rules (Ibid.).
The last grammatical mistake presented by Petkevičiūtė and Tamulynas (2011:40) is the
translation of negative verbs, i.e. sentences where a verb is preceded by a negative adverb
(mostly never) and both words must be translated in a negative form in the TT, e.g. Tom never
asks Ann’s permission must be translated Tomas niekada neprašo Anos leidimo. Even though
this mistake is quite rare in MT process, but authors highlight that it may occur when some
extra word/s intervene/s in the negative adverb and the verb. Such being the case, these two
words are not linked together and are rendered separately (Ibid.).
Summing up the topic of the grammatical mistakes in MT process, it is evident that
grammar is essential to MT. Grammatical rules embedded in MT system help to achieve a
high quality TT. Systems should be programmed to deal with each grammatical aspect
separately in order to render the text as precisely as possible. However, due to the certain
shortages, like disability to use text comprehension knowledge, some rules are skipped, which
eventually cause low quality of the TT.
2.1.2. Lexical mistakes
Lexis in MT process is as important as grammar, which is why it is important to discuss
lexical mistakes occurring in translation process. Petkevičiūtė and Tamulynas (2011:40)
classify lexical mistakes in the following way:

Sayings;

Polysemy;

Words which were not translated;

Abbreviations;

Pronouns;

Phraseological units/collocations;
11

Hyphened words;

Proper names.
Firstly, one of the most severe and common mistakes is related to polysemantic words.
The Lithuanian language, just like any other language in the world, has many words which are
polysemantic. Polysemy is especially evident in the English language, because it has many
polysemic words, which can only be deciphered from the co-text, e.g. fast may stand for an
adjective, a verb, an adverb, and a noun. Such being the case, MT system does not always
choose the right meaning considering the context, which later causes the change or a total loss
of the meaning of the sentence. MT systems usually use a big corpora to look how often one
or another word is used and in which context they usually appear (Petkevičiūtė and
Tamulynas 2011:40). However we may assume that if a word will be placed in an unusual
text, the quality of translation would decrease. Cvilikaitė (2008:30) also notes that this
problem may also occur if the system was programmed to translate only general texts and
specific corpora was not included in the system. The author also indicates that many mistakes
occur while dealing with the polysemic words which end in –ed and –ing (Ibid.). Cvilikaitė
(2008:31) marks out that usually human translator changes the part of speech to avoid wordby-word translation. However MT system is not able to perform such transformation and
translates the word by its primary meaning (Cvilikaitė 2008:32). The issues which were
discussed above confirm that it is important to install high quality dictionaries and corpora in
MT systems, in order to achieve the best results in translation process.
Further, we face the problem with the translation of pronouns from English into
Lithuanian. As it was mentioned in the latter chapter, the biggest mistake occurs when the
system has to translate the pronoun you, because it can be translated either as jūs or tu.
Petkevičiūtė and Tamulynas (2011:40) notice that almost everytime the machine translates
this pronoun as jūs, even though the second meaning tu is often preferred. This mistake
appears due to the reason mentioned before, i.e. MT systems are not able to use text
comprehension knowledge and choose the right form of the word. Authors also denote that in
many cases incorrectly translated pronoun has a wrong grammatical case as well (Ibid.).
One more big issue is translation of proper names/nouns. This problem is discussed by
Cvilikaitė (2008:31-32) and by Petkevičiūtė and Tamulynas (2011:41). All authors agree
upon the idea that systems usually translate proper names if they coincide with nouns
included in systems’ dictionaries. We can state that it is important to create the rule which
could recognize proper nouns and leave them in their original form afterwards.
12
Another grave and essential mistake concerning translation quality is rendering cultural
realia6. Nida (1964:30) suggests that poor knowledge of source culture can cause more
problems than differences within language structure. Petkevičiūtė and Tamulynas (2011)
somehow missed this problem, but there are many authors who have discussed this issue
thoroughly. Karamanian (2002) notes that ‘‘translators must be both bilingual and bicultural,
if not indeed multicultural’’, therefore, the idea of Robinson (2003:186) that words so familiar
and usual in one language can be completely untranslatable in another language sounds
indisputable. The idea of Karamanian (2002) is supported by Petrulionė (2012:44) as she also
states that cultural realias ‘‘require from translators both linguistic and cultural competence’’
to achieve that the TT would not lose its value. Here we face a major problem, because it
would be very hard to install all the peculiarities of all cultures and languages into the
machine so rendering realia is a huge challenge for MT systems. Cvilikaitė (2008:34) writes
that systems translate realia only if the concept is included in its dictionary, if not MT systems
can perform the following steps:
1) Leave word in its original form;
2) Omit the word;
3) Perform word-by-word translation;
4) Use the explanation of the concept given in the dictionary.
In rare cases some of these options can meet the requirements of the TT, but often they
cause problems. Usually if the human translator leaves word in its original form he includes
footnotes to explain the concept, which MT system is unable to do (Cvilikaitė 2008:34).
Omission and word-by-word translation can cause a gap of information or a great confusion
preventing to perceive the idea of the sentence or the text. Finally, the wrong explanation can
be included in the dictionary, usage of which also will make the TT confusing (Ibid.).
Finally, it must be noted that Petkevičiūtė and Tamulynas (2011:40) point out that a
number of lexical mistakes occur because of the same reason, i.e. certain words or
collocations are absent in system’s dictionary. This problem is most evident when translating
sayings, abbreviations, phraseological units/collocations, and hyphened nouns (Ibid.).
Therefore, we could state that the improvement and constant update of dictionaries installed
in MT systems would considerably improve the outputs of MT.
Concluding this topic, we can note that it is fundamental to update dictionaries and
corpora used in MT systems, for low quality databases reduce quality of the TT.
6
Cultural realia (sometimes referred as lacuna or non-equivalence) – indicates the absence of a word in one
language from the point of view of another language, when in the reality, portrayed by those two languages, this
particular token or phenomenon exists (Gudavičius 2007:93).
13
2.2. Systemic mistakes
Systemic mistakes usually occur due to inaccuracies in the system, i.e. issues regarding
algorithms of dictionaries or programme. Most often these mistakes do not have any logical
explanation. Petkevičiūtė and Tamulynas (2011:41) provide the following classification of the
systemic mistakes:
 Omission of verb;
 Spelling of capital and minuscule letters;
 Omission of word;
 Word translated in another language;
 Ignorance of diacritics;
 Word translated using the concept absent in the dictionary;
 Extra word.
A further description of these mistakes is quite brief, because as authors state no other
researchers have distinguished such type of MT problems (Ibid.).
Firstly Petkevičiūtė and Tamulynas (2011:41) present the omission of verb. They state
that usually the system omits the predicate, e.g. if the predicate is is (or its forms was, were) it
may be omitted. However, this mistake appears not in all sentences: the predicate may be
omitted in one sentence, but in the following one it may be already translated (Ibid.). Here we
can add the issue concerning the omission of words, for these two problems are similar.
Researchers point out that in most cases the system omits dependent parts of speech
(conjunctions, prepositions, particles) (Ibid.). However, the consequences of these mistakes
differ. The absence of the predicate can cause the misunderstanding in the whole meaning of
the sentence, whereas the absence of dependent parts of speech do not bare such significance
on catching the meaning of the sentence (Ibid.).
Discussion of remaining problems is quite superficial, for authors do not explain why
these mistakes occur. As it was mentioned above, it is hard to explain why the systemic
problems occur. We may state, that the perception of appearance of one or another systemic
mistake can only be achieved through the analysis of the translation system itself, i.e. the rules
and algorithms which are installed in it.
14
3. TRANSLATING TEXTS OF DIFFERENT REGISTERS
Authors DiMarco and Hirst (1990), Calude (2004), and Proshina (2008) highlight the
importance of text genre in the process of translation. DiMarko and Hirst (1990: 65) state that
‘‘a significant part of the meaning of any text lies in the author’s style<...>which must be
carried through in any translation if it is to be considered faithful.’’ The idea that the TT must
satisfy, as closely as possible, the same writing style as the one of the ST is maintained by
Proshina (2008: 196). She also adds that at the same time the translator must mind stylistic
peculiarities of the SL (Ibid.) what is a hard task for the current MT systems, say DiMarco
and Hirst (1990: 65).
The most common classification of functional styles of the English language is that of
Galperin (1981: 33). The author distinguishes five major genres, which are as follows:
1) The language of belles-lettres;
2) The language of publicistic literature;
3) The language of newspapers;
4) The language of scientific prose;
5) The language of official documents.
Proshina (2008: 196) however, adds to this list such functional styles as colloquial style
and advertising style, whereas Galperin (1981) does not list colloquial style as a separate
functional style, but points out that this genre is used in belles-lettres style and newspaper
style. The advertising style is listed as the subgroup of newspaper style (Ibid.). A quite
different approach is proposed by DiMarco and Hirst (1990:66). Authors base their
classification already from the MT point of view. They state that despite looking at separate
styles, the machine is more concerned in the analysis of group style7. Further they divide this
group into two big subgroups: literary and utilitarian styles (Ibid.). It is further explained that
utilitarian style is a more general name for the texts ‘‘which have a particular purpose, such
as medical text books or newspaper articles’’ (DiMarco and Hirst 1990:66). Each of the style
has its own goals: to inform, to instruct, to suggest a possible way, to convince the reader, etc.
and as it was mentioned previously the TT must preserve the same goal and functions of a
particular functional style cannot be mixed (Galperin 1981, DiMarco and Hirst 1990, Proshina
2008).
7
Group style – authors refer ‘‘to a characteristic of text that, although possibly produced by one individual,
shares the stylistic standards of a body of writers.’’ (Ch. DiMarco, G. Hirst 1990: 66).
15
It is also important to understand the properties of different functional styles. The
properties of the text, in this case, mean pragmatics8 of texts of different genres. Calude
(2004:8) presents a table with general attributes of four styles.
Text genres
Sentence types
Pragmatic
information
Domain/scope
Short
Little
Very limited
Neutral
Neutral
Lots
Very broad
In abundance
Very broad
Technical set of instructions (scientific
prose*)
Popular magazine extract (publicistic
style*)
Newspaper article extract (language of
newspapers*)
Combination of
long and short
Many short and
effective
Short story extract (belles-lettres style*)
Long, elaborate
Table 1. Attributes of the different text genres adopted from Calude (2004: 8)
Table 1 shows the properties of four functional styles. As it is evident each of this style
bears a different amount of pragmatic information. Belles-lettres style has the abundance of
contextual meaning, whereas scientific prose has very little of it. Magazine extract has neutral
pragmatic information, which means that contextual meaning is not always found in these
kinds of articles. Finally, we can see that newspaper articles have lots of contextual meaning.
Calude (2004:8) explains that this genre usually fancies context-bound information, i.e. it is
supposed that the reader is already aware of events that had already happened in the world.
However, since Calude (2004) does not describe the style of official documents, theory
collected by Galperin (1981) must be revised. He claims that ‘‘there is no room for contextual
meaning or any kind of simultaneous realization of two meanings’’ (Galperin 1981:314).
What is more, author states that each substyle of official documents has its own peculiar
compositional patterns (Ibid.).
In the practical part of this paper we will follow the classification suggested by Galperin
(1981).
8
Pragmatics – the study of how words and phrases are used with special meanings in particular situations.
Longman Dictionary of Contemporary English (2005).
* Author’s remarks.
16
4. THE ANALYSIS OF TEXTS OF DIFFERENT REGISTERS
4.1. The methodology of the research
Before proceeding to the empirical part of the study the methods and the process of the
analysis employed in this paper will be briefly discussed.
The main purpose of the present paper is to analyse MT mistakes in texts of various
genres while translating them from English into Lithuanian. To fulfil this goal several
methods were applied.
Firstly the sampling method was used to collect a number of examples from various
literature and Internet sources. In order to obtain better and clearer results on how the machine
deals with the texts which have an abundance of contextual meaning and with those which
have very little of it, our sources vary from the most pragmatic to the least context bound
texts. Some examples for the present paper were taken from the Internet. These examples
include: a popular magazine article found on website http://www.popsci.com, a newspaper
article, extracted from the website http://www.guardian.co.uk/ (both texts were translated by
the author of this research, because translations made by professional translators could not be
found).
A
text
of
official
documents
was
also
taken
from
the
website
http://www.constitution.org/constit_.htm. Its Lithuanian translation was found in the library
of Šiauliai University. Additionally, instruction manual, received together with the purchase
of alarm clock radio and its Lithuanian version attached to the clock, were also taken as
sources of the present research, whereas examples of belles-lettres style were picked up from
Oscar Wilde’s novel ‘‘The Picture of Dorian Grey’’ (2003) and its Lithuanian version
‘‘Doriano Grėjaus Portretas’’ (2001) translated by Lilija Vanagienė. It is also important to
note that all the examples were selected randomly instead of analysing the whole or a part of
selected texts.
After performing MT, which was done by employing an online machine translation
programme created by Vytautas Magnus University and which could be found here
http://vertimas.vdu.lt/twsas/, all examples were grouped into those consisting of grammatical,
lexical and systemic misalignments. This was done by using contrastive method and
comparing the ST with the TT and the MT output. The descriptive method enabled to describe
our findings and provide the short analysis.
Finally, by utilizing the statistical method, the results were graphically presented and
the percentage of the previously mentioned mistake groups, i.e. grammatical, lexical and
systemic, were calculated using MS Excel programme. The percentage was calculated
according to mathematical formula X=N:Z*100%, where X – the percentage of number N; N
17
– the number of which the percentage needs to be found; Z – the number which denotes
100%.
It also should be mentioned that this research was done on the grounds of the following
approach: black box, i.e. only the SL texts and rendered texts were analyzed, but there was no
analysis made on how the programme works itself.
4.2. Analysis of technical set of instructions
Firstly the analysis of the instruction manual of an alarm clock radio (see Annex 1)
produced by ‘‘SOUNDMAX’’ was performed.
First 3 examples illustrate the grammatical mistakes detected in the selected text.
(1) ST: ‘‘Connect a 9 volt battery (not included) to the terminals inside the battery
compartment.’’
TT: ‘‘Prijunkite 9 voltų bateriją (nepridedama) prie baterijos dėkle esančių
įvadų.’’
MT: ‘‘Prijunkite 9 voltų bateriją (neįtrauktas) į terminalus baterijų skyriuje.’’
In the example (1) we see the incorrect usage of the grammatical gender while
translating words in brackets. This mistake occurs because the word battery in the English
language has a neuter gender, but in Lithuanian it is of the feminine gender. Because there are
no specific indications about this misalignment, the system follows the rule that the masculine
gender is the unmarked member in the English language and translates words in the masculine
gender. The second mistake in the example (1) is misalignment in the grammatical number.
We see that the second word battery is in a singular form, but MT system translates it in a
plural form. This mistake is quite surprising, because there are no pronouns which could
mislead the system. The machine probably assumes that there are mutual syntactic relations
between the words terminals and battery therefore these words must share the same
grammatical number.
(2) ST: ‘‘<...>but there is now the advantage that if there is a mains current failure
your clock will continue to work.’’
TT: ‘‘<...>tačiau privalumas tas, kad nutrūkus elektros tiekimui, jūsų laikrodis ir
toliau veiks.’’
MT: ‘‘<...>bet yra dabar pranašumo kad, jei yra maitinimo tinklo srovės nesėkmė,
jūsų laikrodis tęs dirbti.
18
The example (2) shows two more grammatical mistakes discussed in the theoretical part
of this paper, i.e. the wrong usage of the grammatical case and the incorrect translation of the
verb. It is quite hard to explain why the machine uses the improper case while translating the
word advantage, because no other word/s in this sentence require/s the genitive case. The
mistake is probably caused by some systemic failure. The second mistake is made while
translating the infinitive to work. As seen in the TT, the word is translated in the future
simple, but the machine leaves it in the infinitive form. As it is evident from the MT output
the infinitive to work in this case should be translated as a noun, because the preceding word
tęs requires the following word to be in the accusative case. MT system is probably unable to
identify this and translates the word in the infinitive form.
(3) ST: ‘‘The clock display will not light up, as the clock time will be held in the clock
memory by the battery back-up system.’’
TT: ‘‘Laikrodžio ekranėlis nenušvis, nes laikrodžio laiką baterijos rezervinė sistema
laikys laikrodžio atmintyje.’’
MT: ‘‘Laikrodžio parodymas neapšvies, kadangi laikrodžio laikas bus laikytas
laikrodžio atmintyje baterijos atsarginės sistemos.’’
The example (3) contains the grammatical mistake concerning the wrong usage of the
part of speech. Instead of using a verb in the future simple tense, MT system uses the
participle of the past tense. What is more, the word laikytas is preceded by the word bus
indicating the future. The word combination like this indicates some future result, but the ST
does not have this indication, it simply explains what happens at the moment. What is more,
words bus and laikytas cannot be used together in Lithuanian, because they contradict each
other. That is why the usage of the past participle is incorrect in this case.
Further examples illustrate the lexical mistakes found in the analysis of the selected
text.
(4)
ST: ‘‘The battery back-up system is only meant to be used from short
temporary power failures.’’
TT: ‘‘Baterijos rezervinė sistema skirta naudoti tik trumpam nutrūkus elektros
tiekimui.’’
MT: ‘‘Baterijos atsarginė sistema yra tiktai reikšta, kad būtų panaudota nuo trumpų
laikinų valdžios nesėkmių.’’
19
The example (4) contains the lexical mistake which is essential for the correct
understanding of the sentence. It is quite confusing why the machine selects to translate the
word power as valdžios, because as far as a number of dictionaries9 have been looked through,
this is only a third definition provided and MT systems usually use the first given meaning.
We could say that this mistake occurs because the machine does not recognize the context of
the text and is unable to choose the right meaning of the word power.
(5) ST: ‘‘The clock display will not light up, as the clock time will be held in the clock
memory by the battery back-up system.’’
TT: ‘‘Laikrodžio ekranėlis nenušvis, nes laikrodžio laiką baterijos rezervinė sistema
laikys laikrodžio atmintyje.’’
MT: ‘‘Laikrodžio parodymas neapšvies, kadangi laikrodžio laikas bus laikytas
laikrodžio atmintyje baterijos atsarginės sistemos.’’
The example (5) also shows the incorrect translation of the word display, however this
mistake is not as severe as the one discussed in the example (4). Misalignment in the example
(5) has no impact on the meaning of the sentence. It can only confuse the reader, because one
may think that the talk is about the numbers but not the whole clock screen. This mistake also
occurs for the same reason as the error in the example (4), i.e. the system is not able to
recognize the context and chooses the improper meaning of the specific word.
As it is evident from the provided examples, the vast majority of mistakes occurring in
the translation of technical set of instructions are the grammatical mistakes. However, they are
not very serious and would not cause the misinterpretation of the text. The only crucial
mistake in this text is the lexical one provided in the example (4). No systemic mistakes occur
in the selected text. Few crucial mistakes can be found in the text under analysis due to such
texts usually use clear definite statements and standardized words, which mean they have only
one meaning. All in all, we can conclude that the technical text is quite sufficient. Despite two
lexical mistakes, text is understandable and there are no other severe mistakes which could
prevent the reader from misunderstanding the text.
9
(2007) Mokomasis Anglų Kalbos Žodynas. Vilnius: Alma Littera.
Baravykas, V. (1961) Anglų-Lietuvių Kalbų Žodynas. 2nd edition. Vilnius: Valstybinė Politinės Ir Mokslinės
Literatūros Leidykla.
Piesarskas, B. (2004) Dvitomis Anglų-Lietuvių Kalbų Žodynas. Vilnius: Alma Littera.
20
4.3. Analysis of popular non-fiction text
The second text was taken from an online popular non-fiction magazine Popular
Science (see: www.popsci.com). The chosen article, entitled Wrap Factor, was written by
Konstantin Kakaes who discusses the intents of NASA to create a spaceship which could
travel faster than light and therefore make travels beyond our solar system possible.
Firstly, we will present grammatical mistakes found in the process of translation.
(6)
ST: ‘‘<...>engineers and space enthusiasts gathered at the Hyatt Hotel in
downtown Houston<...>’’ (K. Kakaes 2013)
TT: ‘‘<...>inžinierių ir kosmoso entuziastų rinkosi Hyatt viešbutyje, Hiustono
centre<...>’’ (Author)
MT: ‘‘<...>inžinierių ir kosminių entuziastų rinkosi Hyatt Viešbutyje Hiustono
miesto centre<...>’’
(7)
ST: ‘‘The first mainstream use of the expression ‘‘wrap drive’’ dates to
1966...’’ (K. Kakaes 2013)
TT: ‘‘Pirmą kartą sąvoka ‚‚deformacijos variklis‘‘ imta plačiai vartoti dar 1966...’’
(Author)
MT: ‘‘Pirmas vyraujantis posakio naudojimas ‚‚deformuoja variklį‘‘ datos iki
1966...’’
The examples (6) and (7) illustrate the grammatical mistake of the incorrect part of
speech translation. MT translation of space enthusiasts into kosminių entuziastų, in the
example (6), can be treated as misleading, because it implies that enthusiast are from space,
kosminių here stands as an adjective and describes the following word. However, the original
text means a certain sphere which people are interested in, therefore, word space must be
treated as a noun and rendered kosmoso. In the example (7) we observe that the machine is
not able to indicate that the word dates is a verb and translates it as a noun in the plural form,
which not even causes the misinterpretation of the text but also makes the sentence hardly
comprehensible.
(8)
ST: ‘‘<...>and then takes me down the hall to Eagleworks.’’ (K. Kakaes 2013)
TT: ‘‘<...>po to nuveda mane kolidoriumi žemyn į ‘‘Erelio dirbtuves’’ (Author)
MT: ‘‘<...>ir paskui ima mane žemyn salė į Eagleworks.’’
21
The example (8) presents another type of the grammatical mistake, which is the wrong
usage of the grammatical case. This mistake occurs due to the reason that MT system is not
able to link words take me down and the hall. As it is evident from the TT the word hall must
be in the ablative case, but not in the nominative case, as is rendered by the machine.
A great number of the lexical mistakes are found in the popular non-fiction text. These
misalignments are presented below.
(9) ST: ‘‘<...>something he calls a quantum vacuum plasma thrusters (QVPT).’’
(K. Kakaes 2013)
TT: ‘‘<...>kažką ką jis vadina plazmine važiuokle (variklio tipas, kuris geba iš
vakuumo išgauti energiją) (PV).’’ (Author)
MT: ‘‘<...>kažkas, kurį jis kviečia, kvantas siurbia plazminį stūmiką (QVPT).’’
The example (9) contains one of the most severe mistakes within the text under
analysis. We can see that the machine is unable to translate the specific term and performs
word-by-word translation, which is utterly incomprehensible. This error appears because it is
quite difficult to translate such terms, which are used in a particular sphere of study. Even the
human translator finds it difficult to render such word clusters, so the one has to use some
explanatory notes, what is done by the author while translating the example (9). The translator
must look through various materials to find out what the phrase means and put it in the text in
short terms, yet we cannot expect for the machine to perform such an operation.
(10) ST: ‘‘White shows me into the facility<...>’’ (K. Kakaes 2013)
TT: ‘‘Vaitas palydi mane į patalpą, kurioje stovi įrenginys<...>’’
(Author)
MT: ‘‘Baltas rodo man į priemonę<...>’’
Another crucial mistake is evident in the example (10). This example illustrates the kind
of mistake when the SL word requires a whole phrase in the TL, as opposed to a single word.
The improper translation of such words usually severely affects the meaning of the whole
sentence. MT system uses the meaning of the word facility which can be found in the
dictionary, but from the ST we understand that by facility author means the room where this
device is located and this must be highlighted in the TT. In addition, the machine does not
understand that words show into must be translated together. It happens because the pronoun
me stands between show and into, what becomes an obstacle for MT system. Finally, this
22
example contains the wrong translation of a proper noun, in this case a surname. This mistake
occurs because there is no distinction made between common words and proper names when
the latter have lexical meaning. The machine cannot distinguish between them if they are not
marked in deformatting stage. However, it can be noted that when the name is in the
possessive case, e.g. White’s device this error does not occur.
(11) ST: ‘‘<...>and that he was commencing physical tests in his NASA lab, which he
calls Eagleworks.’’ (K. Kakaes 2013)
TT: ‘‘<...>ir kad jis jau pradėjo fizinius testus savo NASA laboratorijoje, kurią jis
vadina ‚‚Erelio dirbtuvėmis‘‘.’’ (Author)
MT: ‘‘<...>ir kad jis pradėjo fizinius testus savo NASA laboratorijoje, kurią jis
kviečia Eagleworks.’’
The example (11) depicts other lexical mistake found in the selected text which is the
wrong choice of a word meaning. While checking up a number of dictionaries (see footnote 9,
p. 20), it is clear that the meaning kviesti of the word call is only the second one, and as it was
mentioned before in this paper, MT systems usually employ first meaning of a word. That is
why this mistake is quite unusual. However, this mistake is not crucial, for it does not have an
impact on the meaning of the sentence.
(12) ST: ‘‘Put plainly, warp drive would permit faster-than-light travel.’’ (K. Kakaes
2013)
TT: ‘‘Aiškiai kalbant, deformacijos diskas leistų keliauti greičiau už šviesą.’’
(Author)
MT: ‘‘Padėtas aiškiai, deformacijos variklis leistų greitesnę negu šviesa kelionę.’’
The example (12) deals with the translation of a collocation. As it is evident the
collocation put plainly is absent in the system’s dictionary therefore, the machine performs
word-by-word translation.
The selected text also contains several systemic mistakes. They are presented in the
examples below.
(13) ST: ‘‘As we walk, he tells me about his quest to open the lab.’’ (K. Kakaes 2013)
TT: ‘‘Mums beeinant jis man pasakoja apie savo siekius atidaryti laboratoriją.’’
(Author)
MT: ‘‘Kadangi mes einame, jis sako, kad aš apie jo ieškojimą atidaryčiau
laboratoriją.’’
23
In the example (13) we notice that MT system inserts the conjunction kad in its output
even though it is absent in the ST. As it was mentioned in the theoretical part of this paper, it
is hard to explain why such systemic mistakes occur. To understand we must analyze the
machine itself.
(14) ST: ‘‘Johnson Space Center sprawls beside lagoons<...>’’ (K. Kakaes 2013)
TT: ‘‘Džonsono kosminis centras driekiasi palei lagūnas<...>’’ (Author)
MT: ‘‘Johnson Kosminis centras išsidriekia šalia lagūnų<…>’’
The example (14) presents misalignment in the spelling of capital and minuscule letters.
It is known that in English full names of companies or institutions are considered to be a
proper noun and the whole title is written in capital letters, e.g. Chelsea Hotel, Houston Space
Center, The Homeless Center for Strafford County, etc. (Marshall 2012). However, this is not
the case in Lithuanian. Usually such words as center, hotel, etc. are written in minuscule
letters (Lingytė 2002). This rule probably is not installed in the MT system and the machine
renders it improperly.
(15) ST: ‘‘<...>wrap drive<...>’’ (K. Kakaes 2013)
TT: ‘‘<...>deformacijos diskas<...>’’ (Author)
MT: ‘‘<...>deformacijos variklis<...>’’
The example (15) shows the error when the machine translates the word using the
concept absent in the dictionary. After checking up a number of dictionaries (see footnote 9,
p. 20), the word drive does not have the meaning variklis. However, this mistake could be
explained by saying, that the person/s who installed the dictionaries in MT system was/were
aware of this phrase, but had a bit different understanding of it and did not check how it is
rendered in other scientific sources. What is more, this mistake does not cause
misinterpretation of the text, because the reader still can understand the meaning of the text.
Summing up, it could be said that comparing to the previous chapter and the text of
technical instructions this piece of text has more lexical mistakes than the grammatical ones.
In addition, there are a few systemic mistakes in this text, which are absent in the text
analyzed in chapter 4.2. What is more, translation of popular non-fiction text contains more
severe errors. The examples (7) and (9) illustrate mistakes which make the sentences hard to
comprehend and the example (10) is rendered completely incorrectly. Furthermore, MT
system faces a difficulty to translate specific terms used in a certain field of interest as seen in
the examples (9) and (15). To avoid these kinds of mistakes, dictionaries installed in the
24
system should be constantly updated and the person/s who is/are responsible for updating
them should look closely for the right terms.
4.4. Analysis of belles-lettres style
For the belles-lettres style the novel The Picture of Dorian Gray by Oscar Wilde was
selected. The story is about a young man, Dorian Gray, and his mysterious portrait, which
was getting older and uglier as his master lived wild and sinful life. The novel was translated
into Lithuanian by Lilija Vanagienė.
The examples (16) – (21) illustrate the grammatical mistakes found in the text under
analysis.
(16) ST: ‘‘<...>there came through the open door the heavy scent<...>’’ (Wilde 2003:4)
TT: ‘‘<...>pro atviras duris svaigiai padvelkdavo<...>’’ (Wilde 2001:8)
MT: ‘‘<...>ten prasiskverbė atviros duris sunkus aromatas<...>’’
The example (16) presents misalignment in the grammatical case. We see that
regardless the indication that the accusative case should be used, which is expressed by the
word through, the machine still renders the word open in the wrong case.
(17) ST: ‘‘<...>a smile of pleasure passed across his face, and seemed about to linger
there.’’ (Wilde 2003:5)
TT: ‘‘<...>ir pasitenkinimo šypsena neblėso jam iš veido.’’
MT: ‘‘<...>malonumo šypsena praėjo per jo veidą, ir atrodo, ketino užtrukti ten.’’
(18) ST: ‘‘<...>a long thin dragon-fly floated past<...>’’ (Wilde 2003:8)
TT: ‘‘<...>ir ilgas plonas laumžirgis<...>praplaukė pro šalį<...>’’ (Wilde 2001:12)
MT: ‘‘<...>ilgas plonas laumžirgis paskleista praeitis<...>’’
Words seemed and floated in the examples (17) and (18) have definite indicators of the
past tense, i.e. ending –ed, but still the word seemed is translated in the present tense and the
word floated is rendered as a noun.
(19) ST: ‘‘<...>my acquaintances for their good characters<...>’’ (Wilde 2003:10)
TT: ‘‘<...>pažįstamus dėl gero būdo<...>’’ (Wilde 2001:14)
MT: ‘‘<...>savo pažįstamą jų geriems charakteriams<...>’’
25
In the example (19), where the noun has a clear sign of being in the plural form, ending
–s, the word is still translated in the singular form. This mistake becomes more conspicuous,
because we see that following words are rendered correctly, i.e. in the plural form. Regarding
this, this misalignment could even be classified as the systemic mistake.
(20) ST: ‘‘<...>talking [Dorian Gray] to the pretty Duchess of Monmouth<...>’’ (Wilde
2003:170)
TT: ‘‘<...>šnekučiavosi [Dorianas Grėjus] su dailiąją Monmeto hercogiene<...>’’
(Wilde 2001:174)
MT:‘‘<...>kalbėdama [Dorianas Grėjus] su gražia Monmouth Kunigaikštiene<...>’’
In the example (20) the word talking is translated in the feminine gender, even though
it is mentioned that the person who is talking is male. This error could also be entitled as the
systemic mistake, because it is hard to explain why the machine makes such mistakes despite
clear indications.
(21) ST: ‘‘The girl never really lived<...>’’ (Wilde 2003:93)
TT: ‘‘Mergaitė niekada tikrai negyveno<...>’’ (Wilde 2001:97)
MT: ‘‘Mergaitė niekada iš tikrųjų gyveno<...>’’
Finally, the example (21) is a typical example of the translation of negative forms. The
machine renders the segment incorrectly, because two words denoting negativity are
separated by another word. However, trying to translate such simple sentence as He never
lived on his own [MT: jis niekada negyveno savarankiškai] we observe that the machine is
capable of recognizing and translating negative forms correctly.
Further examples are the illustrations of the lexical mistakes found in the text of belleslettres style.
(22) ST: ‘‘There was something in his face that made one trust him at once.’’ (Wild
2003:17)
TT: ‘‘Kažkodėl jo veidas skatino juo pasitikėti.’’ (Wilde 2001:21)
MT: ‘‘Buvo kažkas jo veide, kuris privertė vieną patikėti juo tuojau pat.’’
(23) ST: ‘‘Was he always to be burdened by his past?’’ (Wilde 2003:196)
TT: ‘‘Negi visada jį slėgs praeitis?’’ (Wilde 2001:200)
MT: ‘‘Jis turėjo visada būti kraunamas jo praeities?’’
26
The selected text contains a number of mistakes concerning the wrong translation of
pronouns. Some of these mistakes are presented in the examples (22) and (23). In the example
(22) a wrong indefinite pronoun is used. Instead of kuris the system has to use the pronoun
kas, because word kažkas is of neuter gender and so must be the pronoun linked with that
word. The example (23), on the contrary, deals with the personal pronouns. In this example
the primary pronoun jo is used instead of the possessive one. This mistake is quite crucial, for
it has an impact on the meaning of the sentence. When we use a primary pronoun we get the
idea that the past of another person is burdened over someone, but when a possessive pronoun
is used, the reader can clearly understand that the person is suffering because of his own deed
he did in the past.
(24) ST: ‘‘<...>they got on the roof<...>’’ (Wilde 2003:197)
TT: ‘‘<...>jie užlipo ant stogo<...>’’ (Wilde 2001:200)
MT: ‘‘<...>jie sėdo ant stogo<...>’’
(25) ST: ‘‘There was only one bit of evidence left against him.’’ (Wilde 2003:196)
TT: ‘‘Prieš jį tėra tik vienas įrodymas.’’ (Wilde 2001:200)
MT: ‘‘Buvo tik vienas bitas įrodymo, kurį paliekama prieš jį.’’
Errors concerning polysemy are present as well and are presented in the examples (24)
and (25). The example (24) is incorrect, because when to get on is translated like sėsti it
indicates that someone is moving inside. However, in this case people are moving outside. In
the following example the word bit is misinterpreted and confused with the term used to
define computer capacity. This mistake is also quite conspicuous, because there are no
previous indications about computers or any electronic devices which could confuse the
system. The machine is probably misled by the word one and assumes that those two words
mean the capacity of some king of device.
Systemic mistakes are also present in the belles-lettres text. They are represented in the
examples below.
(26) ST: ‘‘The studio was filled with the rich odour of roses<...>’’ (Wilde 2003:4)
TT: ‘‘Dailininko dirbtuvėje tvyrojo saldus rožių aromatas<...>’’ (Wilde 2001:8)
MT: ‘‘Studija buvo pripildyta turtingo roses aromato<...>’’
(27) ST: ‘‘Dorian Gray stepped up on the dais with the air of a young Greek
martyr<...>’’ (Wilde 2003:18)
27
TT: ‘‘Dorijanas Grėjus užlipo ant pakylos jauno graikų kankinio veidu<...>’’ (Wilde
2001:22)
MT: ‘‘Dorėnų Pilkuma žengė į priekį ant pakylos [su] išraiška jauno Graikijos
kankinio<...>’’
(28) ST: ‘‘<...>the sharp snaps of the guns that followed<...>’’ (Wilde 2003:177)
TT: ‘‘<...>ir įkandin jį sekąs šaižūs šautuvų pyškėjimai<...>’’ (Wilde 2001:181)
MT: ‘‘<...>ir aštrus šnapso ginklų, kurie sekė<...> ’’
The examples (26) and (28) could be explained from the point of view that there is a
major error in the system’s dictionary and the word rose is absent, whereas the word snaps is
confused with the word schnapps. But we cannot explain why the machine misses the word su
in the example (27) even though the ST contains the conjunction with. However, we see that
in the TT this conjunction is absent as well, but the grammatical case is adjusted to maintain
the meaning, what is absent in MT output.
What is more, some portions of the selected text are of such low quality, that it is almost
impossible to understand them. Such sentences could be called miscellaneous, for they bear a
number of various mistakes, grammatical and lexical, which make the sentences to be of such
unsatisfactory quality. These portions are presented below.
(29) ST: ‘‘From the corner of the divan of Persian saddle-bag on which he was lying,
smoking, as was his custom, innumerable cigarette, Lord Henry Wotton could just
catch the gleam of the honey-sweet and honey-coloured blossoms of a laburnum,
whose tremulous branches seemed hardly able to bear the burden of a beauty so
flamelike as theirs<...>’’ (Wilde 2003:4)
TT: ‘‘Gulėdamas ant persiškom gūnion apklotos kanapos ir kaip visada
rūkydamas nežinia kelintą iš eilės cigaretę, lordas Henris Votonas iš savo kampo
dar matė geltonus ir saldžius it medus akacijos žiedus ir virpančias šakas, lūžte
lūžtančias nuo grožybių naštos, taip panašios į liepsną<...>’’ (Wilde 2001:8)
MT: ‘‘Nuo kampo sofos persų persisveriamų krepšių, ant kurių jis gulėjo,
rūkymas, kaip buvo jo padaryta pagal užsakymą, nesuskaičiuojamos cigaretės,
Lord Henry Wotton galėjo tik sugauti saldaus medumi ir sužydėjimo laburnum
spalvos medaus šviesaus, kurios drebančios šakos atrodė vos tik gabios turėti naštą
grožio taip panašaus į liepsną kaip jų<...>’’
28
(30) ST: ‘‘<...>but now and then a thrill of terror ran through him when he
remembered that, pressed against the window of the conservatory, like a white
handkerchief, he had seen the face of James Vane watching him.’’ (Wild 2003:175)
TT: ‘‘<...>tačiau kartkartėmis jį persmelkdavo siaubas, kai tik prisimindavo
Džeimso Veino veidą, kuris jį stebėjo, prisispaudęs prie oranžerijos stiklo lyg balta
nosinė.’’ (Wilde 2001:179)
MT: ‘‘<...>bet dabar ir paskui teroro jaudulys pakartojo jį, kai jis atsiminė, kad,
spaustas prieš langą konservatorijoje, kaip balta nosinė, jis pamatė veidą James
Vane, stebinčio jį.’’
The examples above contain almost every mistake possible in MT, i.e. the wrong part of
speech usage, e.g. rūkymas instead of rūkyti (example (29)), spaustas instead of prisispaudęs
(example (30)), misalignment in verb forms, e.g. pakartojo instead of persmelkdavo (example
(30)), the improper grammatical case, e.g. medaus šviesaus instead of it medus (example
(29)), polysemy, e.g. padarytas pagal užsakymą instead of kaip visada (example (29)),
untranslated word, e.g. laburnum [akacija] (example (29)), and the wrong translation of
cultural specific item, e.g. persų persisveriami krepšiai instead of persiškom gūniom apklota
kanapa (example (29)). All these mistakes together contribute greatly to the low quality of the
sentences, because one error inevitably leads to another. What makes these examples even
harder to understand is word order, which is very primitive.
To sum up, it is evident that translation of belles-lettres style is completely
unsatisfactory. The output contains almost every possible mistake the machine can do while
translating the text. What is more, even though some sentences are understandable, there are a
number of portions which are hardy, if any, comprehensible. In addition, word order is also a
common mistake which contributes greatly to the misinterpretation of the selected text.
4.5. Analysis of newspaper article
The article was extracted from one of the biggest newspapers’ in Britain, The Guardian,
website (see: http://www.guardian.co.uk). The article is called Kim Jong-un Has Made A
Decent Fist of Rattling The US and discusses the motives of nuclear war threats USA got
from North Korea lately. The author of the article is Justin McCurry.
The grammatical mistakes are in abundance in the selected text and are shown in the
examples below.
29
(31) ST: ‘‘<...>or an attack on islands near the disputed North-South maritime
border.’’ (McCurry 2013)
TT: ‘‘<...>arba salų esančių šalia ginčytinos Šiaurės-Pietų jūrų sienos puolimas.’’
(Author)
MT: ‘‘<...>ar atakos ant salų šalia ginčytino Šiaurės-pietų jūrinė siena.’’
The example (31) illustrates the improper usage of the grammatical case. As it is the
case in other texts, here the machine is unable to link certain words, such as ginčytina with
jūrų sienos and this eventually cause the wrong choice in the grammatical case. This mistake
could be explained through another mistake evident in this example, which is misalignment in
gender. Apparently, the machine is not able to relate the word ginčytina to any other word in
the sentence and renders it in the masculine gender, despite every other word is in the
feminine gender. This misalignment contributes to the improper usage of the grammatical
case, because the word wall in Lithuanian has only the feminine gender and it is impossible to
use it with the word of the masculine gender. Because the machine can not relate those two
words it translates the remaining sentence as the separate one and uses the nominative case.
(32) ST: ‘‘Jang, who says he talks ‘two or three times a day’<...>’’ (McCurry 2013)
TT: ‘‘Jang, kuris teigia, kad kalbasi ‘du ar tris kartus per dieną’<...>’’ (Author)
MT: ‘‘Jang, kas sako, kad jis kalba ‘du ar trys kartai dieną’<...>’’
The example (32) also presents error concerning the grammatical case. In this instance
the machine is unable to link words three times with a day. However the origin of this mistake
is quite different from the one discussed in the example (31). This time the mistake occurs due
to another error evident in this example: omission of the article a. When this article is omitted
the word day loses its meaning of adverbial modifier of time and becomes a simple noun. Due
to this the system is unable to link those words and renders them both in the nominative case.
(33) ST: ‘‘The coming weeks could see more attempts to unsettle the region.’’
(McCurry 2013)
TT: ‘‘Daugiau pasikėsinimų sutrikdyti sritį gali būti įvykdyta per ateinančias
savaites.’’ (Author)
MT: ‘‘Besiartinančios savaitės galėjo pamatyti daugiau pastangų sutrikdyti sritį.’’
The example (33) depicts the mistake concerning the wrong usage of verb form.
Regardless the indication of the future tense, could see, the machine translates the verb in the
past tense. These kinds of mistakes are hard to explain, because when there is a clear
30
indication of a certain tense, but the machine chooses completely different tense, we can
assume there are some systemic mistakes with the system itself.
(34) ST: ‘‘Among the options<...>’’ (McCurry 2013)
TT: ‘‘Tarp galimų pasirinkimų<...>’’ (Author)
MT: ‘‘Tarp pasirinkimo<...>’’
Misalignment in the grammatical number is also present in the selected text. As it is the
case in other texts analyzed earlier, the example (34) depicts that despite the distinct
indication of the plural form, ending –s, the system translates the word in the singular form.
This also can be regarded as the systemic mistake, because there is no clear explanation why
this happens.
(35) ST: ‘‘<...>and repairing the damage UN sanctions have inflicted<...>’’ (McCurry
2013)
TT: ‘‘<...>ir atitaisyti žalą sukeltą JT apropojimų<...>’’ (Author)
MT: ‘‘<...>ir pakenkimo JT sankcijų remontas sukėlė<...>’’
The mistake in the example (35) deals with the part of speech misinterpretation, which
is also evident in the text under analysis. This particular mistake occurs because, as it is
evident from MT output, the word repairing is related to words sanctions have inflicted and
the system assumes it has to be rendered as a noun, but not the verb.
Further the lexical mistakes, found while analyzing the selected text, are presented and
discussed.
(36) ST: ‘‘<...>that his rule would mark<...>’’ (McCurry 2013)
TT: ‘‘<...>kad jo valdymas žymės<...>’’ (Author)
MT: ‘‘<...>kad jo tasyklė pažymės<...>’’
The machine faces problems translating polysemous words, as can be seen in the
example (36). In this particular example the error appears because the machine is not able to
use any other comprehension knowledge, consequently it is unable to detect that the word rule
must be rendered by its other meaning, i.e. valdymas instead of taisyklė.
(37) ST: ‘‘<...>now a senior fellow at the Institute for National Security Strategy<...>’’
(McCurry 2013)
31
TT: ‘‘<...>vyriausiasis Instituto, atsakingo už Nacionalinio saugumo strategiją,
narys<...>’’ (Author)
MT: ‘‘<...>dabar vyresnis bičiulis Institute Nacionalinio saugumo Strategijos<...>’’
One collocation which is misinterpreted is also present in the article under analysis and
is presented in the example (37). Such phraseological unit as senior fellow is probably not
very commonly used in everyday English and is not included in the dictionary the system
uses. That is why this collocation is translated word-by-word and loses its meaning, which is
‘‘the most experienced, or most successful of an elite group of people who work together as
peers in an academic setting or institution’’ (Jones 2013).
(38) ST: ‘‘The defector, who arrived in South Korea with his wife and children<...>’’
(McCurry 2013)
TT: ‘‘Pabėgėlis, kuris atvyko į Pietų Korėją su savo žmona ir vaikais<...>’’
(Author)
MT: ‘‘Dezertyras, kuris atvyko į Pietų Korėją su jo žmona ir vaikais<...>’’
The example (38) shows the improper translation of pronouns. The machine once again
uses a primary pronoun instead of a possessive. This is a crucial mistake, because it causes
misinterpretation of the text. In this particular case the idea of the ST is that someone came to
South Korea together with the family, whereas the output of MT conveys the idea that the
person came to the country with someone else’s family.
(39) ST: ‘‘<...>a departure from bellicosity<...>’’ (McCurry 2013)
TT: ‘‘<...>karo veiksmų nutraukimas<...> (Author)
MT: ‘‘<...>išvykimą nuo bellicosity<...>’’
The example (39) shows that the dictionary of the system is quite primitive, because it
does not have noun bellicosity, but while checking if it has other forms of this word it turned
out it does. MT system renders correctly words such as bellicose [karingas], belligerence
[karingumas], belligerent [karingas].
Some systemic mistakes are also found in the text. They are illustrated in the examples
below.
(40) ST: ‘‘Kim Jong-un’s aim is to unite the North Korean military and people around
his regime<...>’’ (McCurry 2013)
32
TT: ‘‘Kim Jong-uno tikslas yra, kad Šiaurės Korėjos karinės pajėgos ir liaudis
prisijungtu prie jo režimo<...>’’ (Author)
MT: ‘‘Kim Jong-un tikslas suvienija Šiaurės Korėjos kariuomenę ir žmones aplink
jo režimą<...>’’
The example (40) shows the error of omitting the verb. Once again, the clear
explanation cannot be provided why this mistake is present. However, it can be said, that it is
quite crucial, for it slightly changes the meaning of the sentence. The ST means that people
are against Kim Jong-un’s regime and he is taking some actions to change it. Yet, the output
of MT conveys the idea that people are already accepting the leader’s regime.
One instance is found in the selected text which is translated incomprehensibly. The
portion is presented in the example (41).
(41) ST: ‘‘<...>but is not willing to give anything up to get it.’’ (McCurry 2013)
TT: ‘‘<...>tačiau nesiruošią ką nors aukoti, kad tai gautų.’’ (Author)
MT: ‘‘<...>bet nenori duoti kažkam iki, gauna tai.’’
As it is evident the most crucial mistake here is the wrong translation of the particle to.
In this case the particle is matched to the first part of the sentence, which is but is not willing
to give anything up, but this particle has to be attributed to the remaining part of the sentence.
This one mistake leads to other crucial mistake: the wrong usage of verb form (cf. kad gautų
tai  gauna tai). It is important to note that such portions are not so common in the
newspaper article as it is in the text of beller-lettres syle, but they still are present.
Consequently, the quality of the text under analysis decreased greatly.
Finally, it can be concluded that the newspaper article has the best error score so far. 5
out of 7 grammatical mistakes are present in this article, which concerns misalignment in the
grammatical case, number, gender, verb forms, and the part of speech. Such common
linguistic mistakes as polysemy, untranslated words, the wrong translation of pronouns, and
collocations are also detected in the selected text. What is more, the systemic mistake, dealing
with the omission of the verb is found and some portions are completely misinterpreted. All in
all, we can state that MT output is below average and a serious editing is required if the
newspaper article is being rendered by the machine.
33
4.6. Analysis of the official document
Constitution for the United States of America was chosen to see what mistakes occur
while MT system translates the text of the official document. The text was extracted from the
online version of the Constitution (see: http://constitution.org/c5/index.php). Findings of the
analysis are presented and discussed below.
Firstly we will discuss the grammatical mistakes.
(42) ST: ‘‘<...>two Senators from each State, chosen by the Legislature thereof, for six
Years<...>’’ (Constitution for the United States of America (henceforth U.S.
Constitution), Article I, Section 3)
TT: ‘‘<...>du senatorius, renkamus atitinkamų valstijų įstatymų leidžiamųjų
susirinkimų šešeriems metams<...>’’ (Jungtinių Amerikos Valstijų Konstitucija
(nuo čia JAV Konstitucija), I Straipsnis, 3 Skyrius)
MT: ‘‘<...>dviejų Senatorių nuo kiekvienos valstybės, pasirinktos Įstatymų
leidžiamojo organo jo šešerius Metus<...>’’
(43) ST: ‘‘<...>shall be vested in a Congress of the United States, which shall consist of
a Senate and House of Representatives.’’ (U.S. Constitution, Article I, Section 1)
TT: ‘‘<...>suteikiami Jungtinų Amerikos Valstijų Kongresui, susidedančiam iš
Senato ir Atstovų rūmų.’’ (JAV Konstitucija I Straipsnis, 1 Skyrius)
MT: ‘‘<...>turi būti suteiktos Kongrese Jungtinių Valstijų, kurios turi susidėti iš
Senato it Atstovų Rūmų.’’
The most common grammatical mistakes present in the text of the official document are
misalignment in the grammatical gender and number as can be seen in the examples (42) and
(43). In both cases the machine relates wrong words. In the example (42) chosen is related
with States and in the example (43) the word which is also related with the same word, what
causes words chosen and which to be translated in the feminine gender, even though they
have to be rendered in the masculine gender. Errors in connecting the right words,
consequently, lead to the improper usage of the grammatical number. Due to the fact that the
word States is in the plural form, words correlated to it are also translated in plural, even
though they have to be in singular. Another common mistake in the text under analysis is
misinterpretation of the grammatical case depicted in the example (42). This mistake is
present because the machine misses the preposition for, which shows that the dative case must
be used.
34
(44) ST: ‘‘When the president of the United States is tried<...>’’ (U.S. Constitution,
Article I, Section 3)
TT: ‘‘Kai teisiamas Jungtinių Valstijų prezidentas<...>’’ (JAV Konstitucija, I
Straipsnis, 3 Skyrius)
MT: ‘‘Kai Jungtinių Valstijų prezidentas bus teistas<...>’’
The example (44) presents misalignments in verb form and the part of speech. As it can
be seen the machine renders the verb is in the future tense, even though it is used in the
present tense in the ST. This mistake becomes more conspicuous due to the wrong usage of
the part of speech. The machine translates the verb tried as the participle. What is more, the
participle is used in the past tense, which is also a crucial mistake, for such word
combinations are not employed in Lithuanian, because they contradict each other.
(45) ST: ‘‘<...>and shall have the sole Power of Impeachment.’’ (U.S. Constitution,
Article I, Section 2)
TT: ‘‘<...>tiktai jie turi išimtinę teisę pradėti apkaltos procesą.’’ (JAV Konstitucija,
I Straipsnis, 2 Skyrius)
MT: ‘‘<...>ir turėsiu vienintelę Apkaltos Valdžią.’’
Misalignment in verb conjugation is illustrated in the example (45). Such mistake
appears, due to there is no pronoun the machine could relate the word have with, plus the
system is not able to link this sentence with the previous one, and determine that both
sentences have the same subject, which is the Speaker and other Officers. Therefore, the
machine implies someone is talking in first person and renders the word improperly.
Examples below depict the lexical mistakes found in the text under analysis.
(46) ST: ‘‘The House of Representatives shall be composed of Members chosen every
second Year by the People of the several States<...>’’ (U.S. Constitution, Article I,
Section 2)
TT: ‘‘Atstovų rūmus sudaro nariai, renkami kas dveji metai
valstijų
gyventojų<...>’’ (JAV Konstitucija, I Straipsnis, 2 Skyrius)
MT: ‘‘Atstovų Rūmai turi būti sudaryti iš Narių, pasirinktų kiekvieni antri Metai
Žmonių kelių valstybių<...>’’
The most common mistake in the text is related with polysemy and is presented in the
example (46). This mistake usually occurs while translating the word States, which is
translated as valstybė. The machine is probably programmed to translate the word States as
35
Valstija only in such collocation as the United States of America and when the word occurs
alone the first meaning is programmed to be used. This assumption is made because when we
try to translate word cluster including this word, when it should be translated as Valstija, the
same mistake is present (cf. The State of Alabama  Alabamos valstybė).
(47) ST: ‘‘The Times, Places and Manner of holding Elections for Senators and
Representatives, shall be prescribed in each State by the Legislature thereof<...>’’
(U.S. Constitution, Article I, Section 4)
TT: ‘‘Senatorių ir atstovų rinkimų laiką, vietą ir tvarką kiekvienoje valstijoje nustato
jos įstatymų leidžiamasis susirinkimas<...>’’ (JAV Konstitucija, I Straipsnis, 4
Skyrius)
MT: ‘‘The Times, Vietos ir Būdas surengti Rinkimus Senatoriams ir Atstovams,
turi būti nurodytas kiekvienoje valstybėje Įstatymų leidžiamojo organo jo<...>’’
A quite conspicuous mistake concerning untranslated words can be seen in the example
(47). This mistake is interesting, because we cannot explain why the machine leaves words
The Times in English. The word times is included in the system’s dictionary, for if writing this
word alone it is translated. What is more, if we write such word combination the Times it also
is rendered correctly, but if we have such word combination as presented in the above
example the machine does not translate it. Thus, we can assume there are some problems in
the way the machine works and this mistake could be even categorized as the systemic
mistake.
(48) ST: ‘‘The House of Representatives shall chuse their Speaker and other Officers;
and shall have the sole Power of Impeachment.’’ (U.S. Constitution, Article I,
Section 2)
TT: ‘‘Atstovų rūmai renka savo spikerį ir kitus pareigūnus; tiktai jie turi išimtinę
teisę pradėti apkaltos procesą.’’ (JAV Konstitucija, I Straipsnis, 2 Skyrius)
MT: ‘‘Atstovų Rūmai turi būti chuse jų Kalbėtoją ir kitus Pareigūnus; ir turėsiu
vienintelę Apkaltos Valdžią.’’
The usage of archaic10 words, such as chuse seen in the example (48), also causes
misalignment in the output of MT. These kinds of words are not incorporated in the presentday dictionaries, therefore, the machine does not recognize them and leaves them in original
10
Having the characteristics of the language of the past and surviving chiefly in specialized uses. MerriamWebster Online [Online] Available from: http://www.merriam-webster.com/dictionary/archaic [Accessed on 10th
April 2013].
36
form. Misinterpretation of collocations, also showed in the example above, occurs while
translating words Speaker and sole Power. Each of this word is a specific term used in the
certain context, which is why their meanings should be checked more closely. However, the
machine is note able to perform such task so it renders the word Speaker by its first meaning
kalbėtojas and performs word-by-word translation while rendering collocation sole Power,
which is translated as vienintelė valdžia.
A further example is the illustration of the part of the text which is translated
incomprehensibly.
(49) ST: ‘‘<...>but the Party convicted shall nevertheless be liable and subjected to
Indictment, Trial, Judgement and Punishment, according to Law.’’ (U.S.
Constitution, Article I, Section 3
TT: ‘‘Tačiau šitaip nuteistas asmuo taip pat gali būti pagal įstatymą traukiamas
baudžiamojon atsakomybėn, teisiamas ir baudžiamas teismo nuosprendžiu.’’
(JAV Konstitucija, I Straipsnis, 3 Skyrius)
MT: ‘‘<...>bet Partija kaltino, vis dėlto būsiu atsakingas ir paveiksiu prie
Kaltinamojo akto, Teismo, Nuosprendžio ir Bausmės, pagal Įstatymą.’’
As it is the case in the belles-lettres and popular non-fiction texts such portion appears
due to a number of the grammatical and lexical mistakes occurring at the same time. As can
be seen in the provided example a short sentence contains polysemous word Party which is
misinterpreted, what leads to the wrong use of the part of speech and furthermore,
misalignment in verb form (cf. asmuo gali būti traukiamas  partija kaltino). What is more,
the same example contains the wrong verb conjugation, which makes the sentence completely
unclear.
What is more, one mistake present throughout all the examples picked out from the text
of the official document is the spelling of capital letters. All words written with capital letters
in English are transferred into Lithuanian, even though the human translation does not consist
of such transference. The translator probably thought these words did not have the same
significance in Lithuanian as they do in English, therefore he/she wrote them with minuscule
letters. However, the machine is not able to do such assumptions, consequently, it transfers
words as it is written in the ST. Nevertheless, we cannot state that it is a crucial mistake,
because for some people these words can have a huge significance and to write them with
capital letters seems completely natural.
Summing up, we could state that the quality of the text of the official document is below
average. Due to numerous specific terms and archaic words used in the text, the machine is
37
incapable to perform a satisfactory translation. In addition, the grammatical mistakes are also
evident in the selected text. The most common ones are misalignment in the grammatical
gender, number, case, verb forms, and verb conjugation. What is more, it should be noted that
the text of the official document is the only one which contains errors concerning the
improper translation of verb conjugation.
It is also important to note, that other crucial mistakes can be detected in the examples
provided in the chapters above. These errors are not discussed in detail due to limited scope of
the present paper and because they are not mentioned in the theoretical review presented in
current research. However, these mistakes are as important as the ones discussed here. These
mistakes include: 1. translation of prepositions, which make a great percentage of all
mistakes found in all styles and effect greatly the quality of MT output, and 2. word order in
the translated text, which also contributes greatly to comprehensibility of the translation.
Also the mistake found in the example (47), i.e. the case when the word cluster The Times is
left untranslated, could be brought to more attention, because it is very interesting why capital
letter has such big influence on the machine’s work.
4.7. Statistical analysis of data
After translating 5 different texts from English into Lithuanian and analyzing them
thoroughly, 6311 instances of various mistakes were found. Below we are presenting the
statistical data of those mistakes.
Firstly, the pie chart showing the percentage distribution (rounded to units) of total
mistakes found in all texts are presented.
Figure 3. The percentage distribution of all mistakes found in the selected texts.
11
The number of total mistakes differs from the scope of the paper, because some sentences contain more than
one mistake.
38
Figure 3 shows that the most dominant mistakes throughout all the texts are the
grammatical errors, which contribute to 43% of all mistakes. The second most common
mistakes are the lexical ones, making up 40% of all misalignments. The systemic mistakes
make up 7% of all mistakes found in the selected texts. The miscellaneous mistakes make up
only 6%. From the data presented, we can assume that strong attention should be paid while
installing algorithms dealing with Lithuanian grammar.
Further pie chart illustrates the percentage distribution (rounded to units) of all
grammatical mistakes found in 5 different texts.
Figure 4. The percentage distribution of the grammatical mistakes found in the selected texts
Figure 4 reveals that the most common grammatical mistake is that of the part of speech
misalignment, which make up of 26% of all grammatical errors. Cases dealing with the
grammatical case and verb forms are the second most common type of mistakes. These
mistakes contribute to 19% of all errors. Misalignment in the grammatical gender and number
distributes evenly – 15% each. The lowest percentage, making up only 4%, is contributed to
the mistakes dealing with the verb conjugation and negative verbs.
Statistical analysis of the collected data also presents the percentage distribution
(rounded to units) of the lexical mistakes. This distribution is illustrated in the pie chart
below.
39
Figure 5. The percentage distribution of the lexical mistakes found in the selected texts.
From the Figure 5 it is evident that all texts have a number of polysemious words and
they are a big obstacle for the machine. These kinds of mistakes contribute to 39% of all the
lexical mistakes. Also texts are rich with collocations and the system is also unable to
translate them. Translation of collocations makes up 17% of all mistakes. However, not so
many proper names and cultural realia are found in the selected texts and these kinds of
mistakes make up only 4%. No hyphened words and abbreviations are found in the selected
texts. From the data presented we may assume, that the dictionaries must be updated
constantly and new concepts should be included in order to reduce the percentage of lexical
mistakes.
Finally, a pie chart was organized to present the percentage distribution of systemic
mistakes. Percentage is also rounded to units.
Figure 6. The percentage distribution of the systemic mistakes found in the selected texts.
40
Figure 6 reveals that the systemic mistakes spread quite evenly. The biggest amount,
which is 32%, is contributed to words translated with the concept absent in the dictionary.
Other mistakes, such as the omission of verb, the usage of capital and miniscule letters,
omission of word, and extra word make only 17% of all systemic mistakes. What is more, not
all systemic mistakes mentioned in the theoretical part were found in the selected texts.
Mistakes absent in the texts are words translated in another language and ignorance of
diacritics. Therefore, we can conclude that the machine itself works quite well and makes a
few systemic mistakes. However an improvement is needed to avoid rendering words with the
concepts, which are not suitable for them.
41
CONCLUSIONS
The aim of this paper was to discuss the problems and issues the machine encounters
while translating texts of various genres from English into Lithuanian. After gathering a great
amount of theoretical material and the analysis of 5 different texts the following conclusions
have been drawn:
1.
The most general definition used in Modern English to define the process of
translation done by the machine is that of machine translation (MT). Nevertheless,
despite the idea of fully automatic translation, almost in every case the output of MT
is being edited by the human. What is more, there are different types of MT
systems: those designed for only one particular pair of language, i.e. bilingual, and
those designed for a variety of language pairs, i.e. multilingual. Moreover, the
system can also differ in its approaches. It is usually distinguished between 3 main
approaches: direct, transfer and interlingua.
2.
After analyzing several classifications on problems and issues of MT it was found
out that the most common mistakes are grammatical, lexical and systemic. Linguists
point out that grammatical mistakes are the chief mistakes in the output of MT.
3.
Several different approaches towards text genres in the English language had been
looked through. Mostly 5 different genres are distinguished, which are: 1) the
language of belles-lettres; 2) the language of publicistic style; 3) the language of
newspapers; 4) the language of scientific prose and 5) the language of official
documents. Each of this style bares a different amount of pragmatic information
varying from abundance of pragmatics to the least pragmatic texts respectively.
4.
After analysing 5 different English texts, 63 mistakes were found in 49 instances.
When all errors were assembled into charts it turned out that the biggest group of
mistakes was that of the grammatical misalignments, which made up 43% of all
errors. The second most common type of mistakes was the lexical one. Such
mistakes contributed to 40% of all misalignments. Systemic and miscellaneous
errors made up 7% and 6% of all mistakes respectively.
5.
Statistical analysis revealed that the wrong usage of the part of speech, grammatical
case and verb forms were the most common mistakes among the category of
grammatical errors. Such mistakes made up 26% and 19% of all misalignments
respectively. The incorrect usage of grammatical number and gender were also
common in the selected texts. Each of the error made up 15% of all mistakes,
whereas such misalignments as verb conjugation and negative verbs contributed
42
only to 4% of all errors each. After analyzing the lexical mistakes, it was found out
that polysemy, collocations and untranslated words were the most common errors in
this group. These mistakes contributed to 39%, 17% and 16% of all misalignments
respectively. Other lexical mistakes distributed in the following way: pronouns 12%,
sayings 8%, proper names 4% and cultural realia 4%. Among the systemic mistakes
most common errors were those dealing with the words which were translated using
the concepts absent in the dictionary. Such misalignments made up 32% of all
mistakes. Remaining errors, i.e. omission of verb, spelling of capital and miniscule
letters, omission of word, and extra word contributed 17% each.
6.
What is more, a great amount of other mistakes, such as translation of prepositions
and word order were evident in the examples under analysis and contributed greatly
to the quality of MT translation. These mistakes were not analyzed in detail in the
current paper due to limited scope and because they were excluded from the
classification of MT problems, which this research was based on.
All in all, it was clear that the best quality text was that of technical instruction, whereas
the worst was the belles-lettres text. Consequently it can be stated that a further improvement
is necessary if we desire to have the machine capable of producing a high quality texts of all
genres. This improvement should consist not only of updating the dictionaries installed in the
system, but also a number of research should be conducted to obtain a better understanding of
how the machine works itself.
43
REFERENCES
1. (1960) The Book of Genesis. New York: Paulist Press.
2. Arnold, D., Balkan, L., Meijer S., Humphreys, R. L., Sadler, L. (1994) Machine
Translation: An Introductory Guide. London: Blackwells-NCC.
3. Calude, A. S. (2004) Machine Translation of Various Text Genres. [Online] Available
from: http://www.calude.net/andreea/MT.pdf [Accessed on 11th December 2012].
4. Cvilikaitė, J. (2008) Leksinės Mašininio Vertimo Klaidos: Beekvivalenčių Žodžių
Vertimas. Filologija, (13), 27-38.
5. Daudaravičius, V. (2006) Pradžia į begalybę. Mašininis vertimas ir lietuvių
kalba. Darbai ir dienos, (45), 9-18.
6. DiMarco, Ch., Hirst, G. (1990) Accounting for Style in Machine Translation. [Online]
Available from: http://mt-archive.info/TMI-1990-DiMarco.pdf [Accessed on 11th
December 2012].
7. Galperin, I. R. (1981) Stylistics. 3rd edition. Moscow: Higher School.
8. Gudavičius, A. (2007) Gretinamoji Semantika. Šiauliai: Šiaulių Universiteto leidykla.
9. Hutchins, J. W., Somers, H. L. (1992) An Introduction to Machine Translation. [ebook] London: Academic Press. Available from: http://www.hutchinsweb.me.uk/
IntroMT-TOC.htm [Accessed on 5th October 2012].
10. Jurafsky, D., Martin, J. H. (2006) Speech and Language Processing: An Introduction
to Natural Language Processing, Computational Linguistics, and Speech Recognition.
Pearson Prentice Hall.
11. Manion, S. L. (2009) Fluency Enhancement. Applications to Machine Translation.
MA thesis. Massey University.
12. Nida, E. A. (1964) Toward a Science of Translating: With Special Reference to
Principles and Procedures Involved in Bible Translating. The Netherlands: Brill
Archive.
13. Petkevičiūtė, I., Tamulynas, B. (2011) Computational Linguistics. Studies About
Languages, (18), 38-45.
14. Petrulionė, L. (2012) Translation of Culture-Specific Items from English into
Lithuanian: the Case of Joanne Harris’s Novels. Studies about Languages, (21), 43-49.
15. Proshina, Z. (2008) Theory of Translation (English and Russian). 3rd edition.
Vladivostok: Far Eastern University Press.
44
16. Riedel, M., Schwarze, T. (2001) Machine Translation: History, Theory, Problems and
Usage. In: Petkevičiūtė, I., Tamulynas, B. (2011) Computational Linguistics. Studies
About Languages, (18), 38-45.
17. Robinson, D. (2003) Becoming a Translator: An Introduction to the Theory and
Practise of Translation. London and New York: Routledge.
18. Valeika, L., Buitkienė, J. (2003) An Introductory Course in Theoretical English
Grammar. Vilnius: Vilnius Pedagogical University.
WEBSITES
1. Deffinbaugh, B. (2004) The Unity of Unbelief (Genesis 11: 1-9). [Online] Available
from: http://bible.org/seriespage/unity-unbelief-genesis-111-9 [Accessed 20th October
2012].
2. Gerber, L. (2009) Machine Translation: Ingredients for Productive and Stable MT
Deployments – Part 2. [Online] Available from: http://www.translationdirectory.com/
articles/article1945.php [Accessed on 26th October 2012].
3. Jones, P. S. (2013) What is a Senior Fellow? [Online] Available from:
http://www.wisegeek.com/what-is-a-senior-fellow.htm [Accessed on 7th April 2013].
4. Karamanian, A. P. (2002) Translation and Culture. Translation Journal, [Online] 6
(1), Available from: http://www.bokorlang.com/journal/19culture2.htm [Accessed on
1st December 2012].
5. Lingytė, J (2002) Tikriniai Sudėtiniai Įstaigų, Įmonių ir Organizacijų Pavadinimai.
Lietuvių Kalbos Taisyklių Sąvadas [Online] Available from: http://siauliai.mok.lt/
daukantas/darbai/Rasyba/Tikriniai_istaigu_pavadinimai.htm [Accessed on 25th April
2013].
6. Marshall, P. (2012) Proper Nouns. K12 Reader. [Online] Available from:
http://www.k12reader.com/proper-nouns/ [Accessed on 25th April 2013].
7. Robin, A. (2009) Machine Translation – Overview. [Online] Available from:
http://language.worldofcomputing.net/machine-translation/machine-translationoverview.html [Accessed on 3rd November 2012].
8. Robin, A. (2010) Machine Translation Process. [Online] Available from:
http://language.worldofcomputing.net/machine-translation/machine-translationprocess.html [Accessed on 2nd November 2012].
45
9. Shapa, E. (2009) Translation Types. [Online] Available from: http://www.slideshare.
net/elenashapa/translation-types [Accessed on 23rd October 2012].
DICTIONARIES
1. (2005) Longman Dictionary of Contemporary English. Harlow: Pearson Education.
2. (2007) Mokomasis Anglų Kalbos Žodynas. Vilnius: Alma Littera.
3. Baravykas, V. (1961) Anglų-Lietuvių Kalbų Žodynas. 2nd edition. Vilnius: Valstybinė
Politinės ir Mokslinės Literatūros Leidykla.
4. Merriam-Webster Online [Online] Available from: http://www.merriam-webster.com/
[Accessed on 10th October 2012].
5. Oxford
Dictionaries
[Online]
Available
from:
http://oxforddictionaries.com/
[Accessed on 10th October 2012].
6. Piesarskas, B. (2004) Dvitomis Anglų-Lietuvių Kalbų Žodynas. Vilnius: Alma Littera.
SOURCES
1. A System of Machine Translation from English to Lithuanian. (2008) [Online]
Available from: http://vertimas.vdu.lt/twsas/
2. (2009) Jungtinių Amerikos Valstijų konstitucija. Viena: Usia Regional Program
Office.
3. Constitution for the United States of America. [Online] Available from:
http://www.constitution.org/constit_.htm [Accessed on 9th April 2013].
4. Kakaes, K. (Monday 1st April 2013) Wrap Factor. Popular Science. [Online]
Available
from:
http://www.popsci.com/technology/article/2013-03/warp-factor
?single-page-view=true [Accessed on 5th April 2013].
5. McCurry, J. (Friday 5th April 2013) Kim Jong-un Has Made A Decent Fist Of
Rattling
The
US.
The
Guardian.
[Online]
Available
from:
http://www.guardian.co.uk/world/2013/apr/05/kim-jong-un-rattles-us?INTCMP
=SRCH [Accessed on 6th April 2013].
6. Soundmax. Radijas Su Žadintuvu: Naudojimo instrukcija.
7. Soundmax. Alarm Clock Radio: Instruction Manual.
8. Wilde, O. (2001) Doriano Grėjaus Portretas. Vilnius: Alma Littera.
9. Wilde, O. (2003) The Picture Of Dorian Gray. London: Collectors Library.
46
47
ANNEX 1
48

PROBLEMS AND ISSUES IN MACHINE TRANSLATION: THE CASE

Transcription

Similar documents

Call for Entries: Translation Internship Experience 2014

Converging Texts: Teaching Culture throughTranslaVon and

here - Speedwell Infant School

Summer Work for Students Entering AP Calculus AB

A bi-directional English-Portuguese corpus to

How to build a Babel fish

hands free, eyes free

5~~"i!

Suriname Maroon Cluster Mar 2015 Update

P1(表1） - Katawa Shoujo