Does Regularity Make Reading a Foreign Language Easier?
Transcription
Does Regularity Make Reading a Foreign Language Easier?
University of Groningen Does Regularity Make Reading a Foreign Language Easier? Studying the power of entropy measures when predicting written mutual intelligibility among five Germanic languages. Anne Kingma S2405792 [email protected] March 6, 2015 Program: Language & Cognition 1st supervisor: Charlotte Gooskens 2nd supervisor: Wilbert Heeringa Acknowledgments First of all, I would like to thank my first supervisor, Charlotte Gooskens, for her help and guidance and for always being willing to discuss things. She carefully read everything I wrote and gave me both positive and negative feedback, discussed the results with me and gave me suggestions for new directions to explore or literature to read. Secondly, I am grateful to my second supervisor, Wilbert Heeringa, who allowed me (and taught me how) to use his scripts for calculating the linguistic measures in this thesis. When I had questions he always answered them quickly and he helped me improve the technical and statistical parts of the thesis. I would like to thank Femke Swarte for her help and feedback and for allowing me to use some of the intelligibility data she collected for her PhD thesis, even though it has not been published yet. Without the results from her intelligibility experiments, this thesis would not have been as interesting. Finally, I would like to thank Mark Härtl for his help with the layout, the images and the graphs. I could never have made this thesis look so pretty myself. 2 Table of Contents Acknowledgments .......................................................................................................................................... 2 1. Introduction ................................................................................................................................................. 5 2. Background .................................................................................................................................................. 6 2.1 Intelligibility .......................................................................................................................................... 6 2.2 Previous research on intelligibility .............................................................................................. 9 2.3 Measuring linguistic distance ....................................................................................................... 12 2.4 The MICReLa project ....................................................................................................................... 19 3. Research questions.................................................................................................................................. 20 4. Languages.................................................................................................................................................... 22 4.1 Overview............................................................................................................................................... 22 4.2 Germanic languages ......................................................................................................................... 24 4.2.1 North Germanic: Swedish and Danish .............................................................................. 26 4.2.2 West Germanic: German and Dutch................................................................................... 27 4.2.3 West Germanic: English.......................................................................................................... 30 4.3 Germanic orthography .................................................................................................................... 32 4.3.1 English ........................................................................................................................................... 33 4.3.2 Dutch .............................................................................................................................................. 35 4.3.3 German .......................................................................................................................................... 35 4.3.4 Danish ............................................................................................................................................ 36 4.3.5 Swedish ......................................................................................................................................... 37 4.3.6 Summary ...................................................................................................................................... 38 4.3.7 The North Wind and the Sun ................................................................................................ 39 5. Data ................................................................................................................................................................ 41 6. Methods ....................................................................................................................................................... 44 6.1 Methods for measuring linguistic distance ............................................................................. 44 6.1.1 Lexical distance.......................................................................................................................... 44 3 6.1.2 Orthographic Levenshtein distance .................................................................................. 46 6.1.3 Measuring conditional entropy ........................................................................................... 48 6.2 Measuring intelligibility ................................................................................................................. 53 6.2.1 Participants ................................................................................................................................. 53 6.2.1 Cloze test ...................................................................................................................................... 56 6.2.2 Word translation task ............................................................................................................. 56 7. Results .......................................................................................................................................................... 57 7.1 Linguistic measures ......................................................................................................................... 57 7.1.1 Lexical distance ......................................................................................................................... 57 7.1.2 Orthographic Levenshtein distance .................................................................................. 59 7.1.3 Entropy measures .................................................................................................................... 61 7.1.4 Correlations of the different linguistic measures ........................................................ 68 7.2 Using linguistic measures to predict intelligibility .............................................................. 71 7.2.1 Cloze test ...................................................................................................................................... 73 7.2.2 Word translation task ............................................................................................................. 77 8. Discussion ................................................................................................................................................... 82 8.1 Linguistic measures ......................................................................................................................... 82 8.2 Research questions .......................................................................................................................... 84 8.3 Future research ................................................................................................................................. 86 9. Conclusion .................................................................................................................................................. 88 10. References................................................................................................................................................ 89 Etymological dictionaries ..................................................................................................................... 93 Appendix .......................................................................................................................................................... 94 Appendix A: Excluded Words .............................................................................................................. 94 Appendix B: Word List ........................................................................................................................... 96 4 1. Introduction The Scandinavian languages Swedish, Danish and Norwegian are of the North Germanic branch of the Indo-European language family. They are so similar to each other, that their speakers can (and do) to some extent communicate with each other while each using their own language: receptive multilingualism. The successfulness of this type of communication depends on the level of mutual intelligibility that exists between the languages: how well can both speakers understand each other’s language. Three factors are thought to determine this (Gooskens 2007a:446): 1. The listener’s attitude towards the speaker’s language 2. The listener’s contact with the speaker’s language and other language experience 3. Linguistic distance of the speaker’s language to the listener’s language In this thesis, I will focus on the third of these factors: the linguistic factors. The linguistic relations among five Germanic languages (English, Dutch, German, Danish and Swedish) are calculated on the orthographic level in three ways: lexical distance, Levenshtein distance (Heeringa 2004) and conditional entropy (Moberg et al. 2007). These results are correlated to the results of two written intelligibility tasks carried out by Femke Swarte as part of the Micrela project of the University of Groningen (see e.g. Heeringa et al. 2013): a cloze test and a word translation task. The main purpose of this study is to determine the value of entropy calculations in addition to the lexical and Levenshtein distance: entropy between two languages is inherently asymmetrical, unlike lexical and Levenshtein distances. Therefore, it could be a useful way to capture the existing asymmetry found in mutual intelligibility (Moberg et al. 2007), for example the asymmetry between Swedish and Danish (Schüppert 2011). More on this asymmetry can be found in Chapter 2. In Chapter 2, the theoretical background of the topic will be provided, leading to the research questions in Chapter 3. Chapter 4 will give some background on the languages included in this research and the relations among them. Chapter 5 describes the process of building the word lists used for the linguistic measures, and 5 Chapter 6 explains the procedures of these measures and the intelligibility experiments. Chapter 7 shows the results, Chapter 8 discusses these and Chapter 9 is the conclusion of this thesis. 2. Background The first part of this chapter will outline the background and history of intelligibility research. In the second part, the issue of measuring linguistic distance will be elaborated upon. Finally, in the third section a recent project on linguistic factors influencing intelligibility will be described: the Micrela project. 2.1 Intelligibility When two people with different native languages want to communicate, there are several ways in which they can go about this. Firstly, one of the speakers can learn to speak the other’s native language. This commonly happens when one language is clearly dominant over the other. In individual cases this could be an immigrant learning the language of his or her new country of residence. It is easy to see the many advantages learning this new language would give him, making it worth the effort he will have to put into it. This strategy is also used more structurally, however, in the case of minority languages. There are many such situations all over the world: Gaelic languages in Ireland and Great Britain, Frisian in The Netherlands, Swedish in Finland, Catalan in Spain. The native speakers of the dominant language do not bother to learn the minority language, forcing the speakers of the minority language to learn the dominant language in order to advance in life. This situation is somewhat unequal, as only one of the speakers has to invest time and effort into learning another language, whereas the other can enjoy the ease of being able to express their thoughts in their native language. Moreover, it potentially leads to endangerment of 6 the minority language. The ultimate goal of successful communication is reached, however. A second possibility is for both speakers to learn a third language, native to neither of them. This frequently happens with lingua francas, like English, Latin, Hindi, Modern Standard Arabic.1 It could be argued that certain dialect situations fall into this category as well, depending on whether learning the standard language is considered learning a new language. The advantage of this strategy is that both speakers are equal: they both have to learn a new language and they both have to struggle with using a language that is not native to them. At the same time, of course, this is also its major disadvantage. More people having to learn a new language takes more time, effort, and money. More languages run the risk of being endangered, as in the previous situation. Also, the risk of miscommunication is higher when neither speaker is able to use their native language. In addition to this, even in this situation some speakers might have an advantage over others: if for example their native language is close to the lingua franca, or they have an aptitude for language learning. There is a third option, however: receptive multilingualism. In this case, each speaker simply speaks his or her own language: they are using both languages simultaneously to communicate. The great advantage of this strategy is of course that both speakers have the comfort and ease of being able to express themselves in their native language. Neither of them has to invest the time and effort required to learn to speak another language: they only need to learn to understand it. If the languages are closely related, even this might not or barely be necessary. In this case, the languages are inherently mutually intelligible. There are several known situations in which this tactic is applied. Serbian and Croatian, for example, are very similar to each other, and for the most part mutually intelligible. The distinction between the languages is a political one more than linguistic. Another example can be found in Scandinavia. The three main languages spoken there, Norwegian, Swedish and Danish, are so similar to each other, that all 1 Of course, in many situations involving a lingua franca, one of the speakers is in fact a native speaker of the language in question. These situations belong to the first category. 7 speakers can understand the other languages quite well without any previous experience or training. When they need to communicate with a speaker of a different language, rather than resorting to a third language as many other Europeans in that situation would, they often use the strategy of receptive multilingualism. What factors contribute to making receptive multilingualism possible? Gooskens (2007a:446) mentions three factors that could contribute to the level of mutual intelligibility: 1. The listener’s attitude towards the speaker’s language 2. The listener’s contact with the speaker’s language and other language experience 3. Linguistic distance of the speaker’s language to the listener’s language The first factor, attitude, refers to a listener’s opinion of or feeling towards a certain language or language variety. If a listener dislikes a language and/or its speakers, he will probably be less willing to put effort into trying to understand it. This might result in less successful communication. If the listener likes the language variety he is listening to, however, he might understand more, just by trying harder. Schüppert, Hilton and Gooskens (accepted), for example, find a low but significant positive correlation (r = .19) between attitude and word intelligibility for Danish and Swedish. The second factor, contact, concerns previous experience the listener has with the speaker’s language and other, possibly related, languages. Having learned to speak a language naturally improves a person’s ability to understand it. But even if there has only been passive contact, for example by hearing radio programmes or hearing tourists speak, the listener might start to recognize certain sound correspondences. In addition, knowing other languages can aid understanding, for example by providing vocabulary. If a native speaker of Dutch encounters Danish for the first time, having experience with a related language like German will improve his understanding of this new language (Swarte, Schüppert and Gooskens (accepted)). The Danish word kartoffel ‘potato’, for instance, does not have a cognate in Dutch (the Dutch translation is aardappel). A Dutch reader who is familiar with German, 8 however, will immediately recognize the German word (Kartoffel) and translate it to Dutch correctly. The third factor, linguistic distance, is independent of the specific speaker and listener involved, but refers only to how distant the two languages are from each other. What exactly this means and how it can be measured has a whole history of its own, which will be outlined in section 2.3. First, I will discuss the history of intelligibility research in general. 2.2 Previous research on intelligibility Research on intelligibility amongst the Scandinavian languages has a long history. Schüppert (2011) summarizes this history in her introduction. One of the first studies on this topic was carried out by Haugen and published in 1953 (in Norwegian, published in English as Haugen 1966). He sent out a questionnaire in Norway, Sweden and Denmark asking people in the first place about their personal experiences with receptive multilingualism: had they ever communicated with people speaking one of the other two languages, how well had they understood each other, and which problems had occurred. These were very important questions, as no research into this had been done up to that point for the Scandinavian languages. Furthermore, the questionnaire asked about the informant’s opinion on the other languages (i.e. attitude) and amount of contact with them: “He was further asked to indicate the approximate amount of instruction he had received in the languages, how much he read in each, whether he enjoyed inter-Scandinavian radio programs, and whether he listened to broadcasts from the neighbouring countries.” (Haugen 1966:283) He found that of these three languages, Norwegian and Swedish seemed to be the most intelligible to each other, and the combination of Swedish and Danish was the most problematic. 9 Haugen’s (1966) study was based on a questionnaire. All of the data was indirect; it consisted of reports by the respondents of their personal experiences. That is, he used the method of ‘asking the informant’ (Voegelin and Harris, 1951): asking speakers how well they think they can understand a certain language variety, or how well they think they can be understood by speakers of a certain language variety. A more extensive version of essentially the same method is to present informants with a sample of the particular language variety, instead of naming it. This helps them to focus more on the actual linguistic information, instead of the nonlinguistic connotations they may have with the variety. Tang and Van Heuven (2009) observe that "[l]isteners appear to have reliable (i.e. reproducible) ideas about how much language B differs from their own, even if they know the stimulus language from past exposure, and even if the recording quality of the speech samples may differ substantially" (p. 710). However, these judgments do not necessarily match actual intelligibility. A more direct way of determining intelligibility between two language varieties is ‘testing the informant’ (Voegelin and Harris, 1951): have the informants listen to a certain language variety and determine how much they actually understand, by for example, asking questions about the text or asking them to translate parts of it. This gives more objective results than asking informants for their perception of a variety, but it is more complicated to carry out. Instead of asking questions, an experiment needs to be designed. In addition, as Tang and Van Heuven (2009:711) point out, the number of speaker-listener combinations to be tested grows exponentially with the number of language varieties included. We can ask one speaker the same questions about different varieties, but we cannot ask him to translate the same text in different varieties, because of obvious priming effects. Tang and Van Heuven (2009) correlate ‘test the informant’ results with their ‘ask the informant’ results from their previous study (Tang and Van Heuven 2007), and although they find reasonable correlations (between .74 and .82), they conclude that objective intelligibility testing cannot be completely replaced by asking for people’s opinions. 10 Maurud (1976) carried out such an experimental study of the same three Scandinavian languages: Danish, Swedish and Norwegian. He had informants who lived in the capital cities (respectively Copenhagen, Stockholm and Oslo) translate texts from both other languages to their own language. His results agreed with Haugen’s (1966) results in that the combination of Swedish and Norwegian is the most successful and the combination of Swedish and Danish the most problematic, but unlike in Haugen’s study, the scores were not symmetrical. This is true for the spoken texts especially: Swedes had a much harder time with Danish (understanding about 23%) than Danes had understanding Swedish (43%; Schüppert, 2011). Maurud himself seems to attribute this result to non-linguistic factors: “Swedes’ low understanding of the neighbour languages is a sign that the habit of hearing them and the attitude towards the need for understanding them are of major importance for the Scandinavians’ ability to communicate with each other in their respective languages.” (Maurud 1976:71, translated in Schüppert 2011:5) One problem with Maurud’s (1976) study is the fact that all his informants lived in the capital cities of each country (Schüppert 2011). The capital of Denmark, Copenhagen, is very close to Sweden. There is likely to be some contact between Danes and Swedes there, and the people living in that region have access to TV and radio programmes in Swedish. Sweden’s capital Stockholm, on the other hand, is quite far away from both Denmark and Norway. The advantage that Danes seem to have over Swedes, then, could be simply due to the geographical location of the particular participants in this study and the amount of contact with the other languages that that implies. Bø (1978) addressed this issue by testing two groups of informants from each country: a group of people who lived in the border region, and a group of people who lived more inland. The people living in the border region did indeed perform better in the intelligibility tasks, indicating that Maurud’s (1976) results should be interpreted with care. However, the asymmetry between Danish and Swedish persisted nevertheless. 11 Although this research established that both previous contact with and the listener’s attitude to the speaker’s language influenced the level of intelligibility, it was still unclear in how far intelligibility is determined by purely linguistic factors. An attempt to fill this hole was made by Gooskens (2007a). She used the results from an extensive set of intelligibility experiments carried out some years earlier (Delsing and Lundin Åkesson 2005). This study included only background questions on attitude and contact (numbers 1 and 2 of the list in 2.1); no attempt was made to study the influence of linguistic factors (number 3). Gooskens correlated their results with objective measures of linguistic distance, both lexical and phonological, and found that phonetic distance was indeed the best predictor for intelligibility between Swedish, Danish and Norwegian (r = -.80). Probably part of the reason why linguistic factors have been neglected in intelligibility research is the fact that objectively measuring the distance between two languages, like Gooskens (2007a) did, is not quite a straightforward matter. In the next section, I will elaborate on this issue. 2.3 Measuring linguistic distance The history of measuring linguistic distance is closely tied to the history of dialectology. This is not surprising. A core issue in dialectology is determining how different dialects are related and should be grouped together, and at which point two varieties should be considered no longer dialects of the same language, but two different languages altogether. To do that, the researcher needs to determine which criteria are used to make the distinction between language and dialect. An initial approach by for example Haugen (1966), discussed in the previous section, was to simply ask speakers how different they thought a certain variety was from their own. This moves the criteria problem, however. There is no way to know what the informants base their answers on – for example, to what extent non-linguistic factors, such as a general dislike for speakers of a certain dialect, influence the answers given by the informants. Haugen’s questionnaire measured overall intelligibility, not 12 specific linguistic distance. A more objective way of determining distance between dialects was needed. An early criterion to determine language distance was in fact intelligibility itself: if two people can understand each other, they must be speaking the same language (see e.g. Voegelin and Harris, 1951). A problem for this strategy is posed by dialect continua. Going from the west of the Netherlands to the east of Germany, for example, every dialect is mutually intelligible with its neighbouring dialects. Does this mean Dutch and German are one language? Yet the dialects at each end of the spectrum are completely unintelligible to one another. Going from north to south, the situation seems similar: a speaker of a West Flemish dialect (spoken in Belgium in the south of the Dutch language area) will have a hard time communicating with a speaker of a Groningen dialect (spoken in the north) without switching to a standard language. Yet both varieties are considered dialects of the same language: Dutch. As pointed out as early as 1959 by Wolff (1959), intelligibility is not a reliable measure for linguistic distance. Too many other factors play a role. Some of these are attitude and previous contact, as also mentioned by Gooskens (2007a, see section 2.1). Languages are not isolated things, they are used in the context of a certain culture. An objective, computational method would be a more reliable way to measure only linguistic distances. The problem with this is formulated by Tang and Van Heuven (2009) as follows: “In spite of its apparent success and conceptual simplicity, the notion of linguistic distance, i.e. the inverse of similarity shared between languages, has persistently eluded quantification. The problem is that languages do not differ along just one dimension. Languages may differ formally in their lexicon, phonetics and phonology, morphology, and in their syntax. And again, at each of these linguistic levels, the ways in which languages may vary are further subdivided along many different parameters.” (Tang & Van Heuven 2009:710) A group of dialects (a dialect continuum) is characterized by many small changes from one dialect to the other (Heeringa 2004). When looking at the dialects 13 up close, it is often not very clear which differences should be given the most importance when classifying them. A commonly used method is to draw a line on a map representing the border between two particular representations of a linguistic item: isoglosses. A group of isoglosses together is a bundle and signals a possible dialect border (after all, the varieties on either side of the bundle differ on several points). A problem with this is that isoglosses do not always group together nicely into bundles. And even if they do, it is not always clear when this bundle should be considered a border between dialects. Chambers and Trudgill (1980) put the problem as follows: “It is undeniable that some isoglosses are of greater significance than others (…). It is equally obvious that some bundles are more significant than others (…). Yet, in the entire history of dialectology, no one has succeeded in successfully devising a satisfactory procedure or a set of principles to determine which isoglosses or which bundles would outrank some others. The lack of a theory or even a heuristic that would make this possible constitutes a notable weakness in dialect geography.” (Chambers and Trudgill 1980:112, quoted in Tang and van Heuven 2009:710) Two languages being similar in one respect does not entail their being close to each other on the other levels. Moreover, the way to measure and the criterion for closeness to be used are different on every level. We need to determine, then, on which level(s) the distance should be measured, and how to measure it. A basic way to measure distance is on the lexical level: counting the number of cognates two languages share (Séguy 1973). This approach was often used as a way to determine how languages are related to each other, and it is the methodology behind the Swadesh list (Swadesh, 1971): a list of 100 basic words that are not likely to be borrowed from other languages. When measuring objective distance, the words must be chosen randomly, but the principle is the same. This approach has been used in intelligibility research, such as Gooskens (2007a). It has been shown to indeed correlate with intelligibility, but it is not a very reliable predictor. When trying to understand a different language, it is not only important how many words are different from those in your own language, but it also matters very much which words are different. If a few keywords of a text are unintelligible to the listener, he will not 14 be able to understand the text as a whole, even if most of the function words are clear to him (Gooskens, Heeringa and Beijering 2008). Heeringa (2004) presents a history of computational methods used in dialectology. According to him, the first who used a computational strategy to determine dialect distance was Séguy and his associates in creating their Atlas linguistique de la Gascogne (published in six volumes between 1954 and 1973). He mapped many different features of French dialects and calculated distances by counting the number of items on which two neighbouring dialects disagreed. These items were taken from all linguistic levels: lexicon, pronunciation, phonology, morphology and syntax. The higher the percentage of differing items, the more distant the two dialects are. When these distances are visualized in a map, separate dialect areas can be distinguished. Goebl (1982, 1993) took a similar approach to Séguy (although developed independently (Heeringa 2004)) by comparing individual items across dialects. He did not count the items that differed, however, but the items that were the same. His scores do not reflect dialect distance, then, but its opposite, dialect similarity. Hoppenbrouwers and Hoppenbrouwers (1988, 2001) developed the corpus frequency method in order to calculate dialect distances (Heeringa 2002, 2004). In essence, this method compares two languages based on text corpora. It started with the letter frequency method, in which the frequencies of individual letters in the corpora are compared. An issue with this method is the fact that different languages’ orthographies do not represent those languages in the same way – the same sound can be spelled in different ways, or the same spelling used for different sounds. A more accurate comparison, then, would be on the phone level: the phone frequency method. This is essentially the same as the letter frequency method, but instead of text corpora, phonetic transcriptions of texts are compared. This method still has a disadvantage, however: it gives every difference the same weight. Some sounds, however, are obviously closer to each other than others: the difference between [e] and [ɪ] is much smaller than that between [e] and [u], for example. The phone frequency method does not take this into account (Heeringa, 2004). 15 A more refined version of this method, then, is the feature frequency method. It breaks down the individual phones into phonological features (front/back, rounded/unrounded, plosive/fricative, et cetera). Calculating the frequencies of these features in texts in the different languages results in a more reliable measure for dialect distance. Using this method, Hoppenbrouwers and Hoppenbrouwers (2001) mapped and classified 156 varieties of Dutch as spoken in the Netherlands and Belgium. A disadvantage of the feature frequency method is that it does not take the order of speech segments into account (Heeringa, 2004). If two corresponding words in two languages contain exactly the same features, but in a different order, the feature frequency method will not be able to take this difference into account. A simplified example, using letters instead of features, is English wart and its Dutch translation wrat, or Dutch drie ‘three’ with its German equivalent drei. Kessler (1995) introduced a more accurate method to measure dialect distances: the Levenshtein distance. Heeringa (2004) refined his method and applied it to Norwegian and Dutch dialects. Its mechanism consists of mapping words of both languages onto each other and counting how many individual elements (e.g. phonemes or graphemes) need to be changed, removed or inserted to get from one language to the other. The method is described in more detail in Chapter 6. Measuring distance with the Levenshtein algorithm has been done in intelligibility research (amongst others by Gooskens (2007a), which was described above) and been shown to be an accurate measure of linguistic distance. In many cases, it predicted intelligibility better than the lexical distance (Gooskens 2007a, 2007b; Beijering, Gooskens and Heeringa 2008; Kürschner, Gooskens and Van Bezooijen 2008). The Levenshtein algorithm then seems to be the best method of measuring language distance we have so far. A disadvantage it has, however, is that it is symmetrical. It does not take into account which of the two languages measured is the speaker’s language and which is the listener’s language. It simply measures the objective distance between two languages. Asymmetry between languages has, 16 however, clearly been established in past research. Spoken Danish, for example, is harder to understand for Swedes than spoken Swedish is for Danes (Maurud 1976; Bø 1978; Börestam 1987; Delsing and Lundin Åkesson 2005; Gooskens et al. 2010; Schüppert 2011; Gooskens and Van Bezooijen 2013). Gooskens, Van Bezooijen and Van Heuven (accepted) show a similar asymmetry between German and Dutch: Dutch is harder to understand for Germans than German is for Dutch listeners (while controlling for non-linguistic factors such as previous contact, which would otherwise be the more likely cause for asymmetry). The existence of asymmetry, even when all non-linguistic factors have been accounted for, indicates that the Levenshtein distance cannot be the only explanatory factor of the level of intelligibility. There has to be something that explains the difference, something that takes into account the direction of the communication. Moberg et al. (2007) attempt to explain the asymmetrical intelligibility by measuring the amount of entropy in each combination. As the languages involved in these intelligibility studies are related and share a history, the differences between them are not completely random. There is a certain regularity to it. Because sound changes tend to be regular, a certain sound in one language can systematically correspond with a certain different sound in another language. This systematicity can aid the listener with understanding the language. The entropy calculations are a way to measure this regularity: given a certain sound (or character) in language A, how predictable is the corresponding sound (or character) in language B? The more predictable this sound is, the lower the entropy. Higher predictability aids intelligibility, therefore the hypothesis is that a low entropy measure corresponds with a high intelligibility score. Moberg et al. (2007) calculated phonetic entropies between Danish, Swedish and Norwegian and generally found relatively low entropy for combinations where previous research has found high intelligibility and vice versa, supporting the hypothesis. Moreover, they found asymmetric entropy between Danish and Swedish, where asymmetric intelligibility is well-established. Their study did not include enough languages to calculate correlations, however. Tang and Van Heuven (2009) used a calculation similar to conditional entropy (calling it a phonological 17 correspondence index, Cheng 1997) and found that it correlated well with the results from their intelligibility experiments on 15 Chinese dialects (r = .772 and r = .769). One of the strengths of the entropy measurement is the fact that it is naturally asymmetrical. I will demonstrate this using the correspondence between German and Dutch in Table 1. In this particular set, there is no entropy for the vowels. In other words, German <ü> always corresponds to Dutch <u>, German <o> always corresponds to Dutch <oo> and German <u> always corresponds to Dutch <oe>; and vice versa. A speaker of either language reading the words in the other language needs in theory to have no doubt about which sound to look for in his or her own vocabulary.2 Looking at the initial consonants, however, a different story unfolds. German <d> always corresponds to Dutch <d>, German <t> always corresponds to Dutch <d>, and German <z> always corresponds to Dutch <t>. So far so good. Dutch people reading German can predict the sounds in their own language with 100% certainty. The other way around, however, this is not the case. Dutch <d> corresponds to German <d> in 50% of the cases and to German <t> in the other 50% of the cases. A German reader encountering a word containing <d> in a Dutch text cannot be sure of which character to map this unto in his or her own language. There is then some entropy in the direction from Dutch to German, but no entropy from German to Dutch. Table 1: A mini corpus consisting of three word pairs in three languages. German dünn tot zu Dutch dun dood toe English thin dead to Entropy is thus naturally an asymmetrical measure. This is an advantage compared to the Levenshtein distance, which is completely symmetrical. As explained above, Levenshtein distance has already been shown to be a good predictor of 2 In order to be aware of this, the reader needs to have some prior experience with the other language: if he has never encountered it before, he cannot know that for example <u> corresponds to <oe>. 18 intelligibility. However, because the entropy measure, like intelligibility, is asymmetrical, it might be able to provide some more predictive power in addition to the Levenshtein distance. The main purpose of this thesis is to find evidence for this. 2.4 The MICReLa project MICReLa stands for: Mutual intelligibility of closely related languages. It is an extensive project at the Center for Language and Cognition Groningen at the University of Groningen. It is funded by the Netherlands Organization for Scientific Research (NWO). This thesis originated in this project and draws on its materials and preliminary results. In this section, a description of the project will be given, in order to show how this thesis fits into the bigger picture. For more information, see the Micrela project description3 and Heeringa et al. (2013). The project was started in 2011 and is scheduled to last for five years, until 2016. The project leader is Charlotte Gooskens. The project originated from the intelligibility research described in section 2.2 (such as Gooskens, 2007a). This research focused mostly on Scandinavian languages and showed promising results for these languages. In the Micrela project, the research is extended to the three major groups of closely related languages in Europe: Germanic languages, Romance languages and Slavic languages. The main aim is to “develop a model of intelligibility of closely related languages” (Micrela project description, p. 6). This thesis research takes place exclusively within the Germanic languages group. One of the things this project focuses on is how to explain the asymmetrical mutual intelligibility found by previous research. One of the research questions is: “What explanations can be found for asymmetric intelligibility?” (Micrela project description, p. 7). This thesis hopes to contribute to finding an answer to this question by determining the effect of the amount of entropy. The Germanic part of the project includes five languages, divided over the two main sub-branches within the Germanic family: English, Dutch and German as West3 http://www.let.rug.nl/gooskens/project/pdf/Gooskens_Vrije_Competitie.pdf 19 Germanic languages, and Danish and Swedish as North-Germanic languages. Intelligibility is tested by means of three experiments: a word translation task, a cloze test and a picture task. An effort is made to find data for every language combination in both directions. The methodology of the experiments included in this thesis, the word translation task and the cloze test for the Germanic languages, can be found in Chapter 6. 3. Research questions The history described in Chapter 2 has led to the first research question: Are orthographic entropy measures a useful predictor of written intelligibility in addition to Levenshtein distance? As is clear from this question, this thesis will be concerned with written intelligibility only, and correspondingly, the orthographic distances between the languages (as opposed to distances based on phonetic transcriptions of the words). Five Germanic languages are included: English, Dutch, German, Danish and Swedish. Levenshtein distance has in previous research been shown to be a reliable predictor of intelligibility (Gooskens 2007a, 2007b; Beijering, Gooskens and Heeringa 2008; Gooskens, Heeringa and Beijering 2008; Kürschner, Gooskens and Van Bezooijen 2008). However, it does not automatically capture the asymmetry present in mutual intelligibility situations (Maurud 1976; Bø 1978; Börestam 1987; Delsing and Lundin Åkesson 2005; Gooskens et al. 2010; Gooskens and Van Bezooijen 2013). Entropy calculations (Moberg et al. 2007), however, are asymmetrical by default. Therefore, they should contribute to predicting intelligibility, in combination with the Levenshtein distances. This correlation will be negative: the lower the entropy for a certain combination, the higher the intelligibility score. If there exist a high entropy in a certain language combination, the orthographic correspondences between these combinations are irregular and unpredictable. This is likely to make it harder for the 20 reader to decipher the language, as he cannot rely on regular correspondences. Therefore, intelligibility is lower when the entropy is high. The results from Moberg et al. (2007) suggest that this hypothesis is true, but as this study included only three languages, no correlation between entropy and intelligibility could be calculated. In this study, five languages are included, amongst which not only Scandinavian languages, but the West Germanic languages German, Dutch and English as well. Can lexical distance accurately predict written intelligibility? This second question focuses only on the relationship between lexical distance and intelligibility. When the lexical distance between two language varieties is high, this means that they share relatively few cognates. Non-cognates are incomprehensible for a reader who has not learned the language in question, therefore a high number of non-cognates means low intelligibility. A negative correlation between lexical distance and intelligibility is therefore expected. Previous research has often found this negative correlation: the higher the lexical distance between two language varieties, the lower intelligibility. Tang and Van Heuven (2009), for example, found correlations of .78 and .75 for 15 Chinese dialects, and Gooskens, Heeringa and Beijering (2008) found a correlation of -.64 for 18 Scandinavian language varieties. In Gooskens (2007a), investigating six Germanic languages, the correlation with lexical distance was not significant (p = .11), but the tendency was in the same direction. The results of the present study should be in line with previous research, and show a negative correlation between lexical distance and intelligibility. Can orthographic Levenshtein distance accurately predict written intelligibility? Levenshtein distance is a way to measure the amount of difference between two languages. As with lexical distance, a negative correlation is expected: a greater orthographic distance between two languages should hamper written intelligibility. This is in fact what has been found in previous research, and often, the correlation of Levenshtein distance with intelligibility was higher than that of lexical distance with intelligibility (Gooskens 2007a, 2007b; Beijering, Gooskens and Heeringa 2008; Gooskens, Heeringa and Beijering 2008; Kürschner, Gooskens and Van Bezooijen 2008). In Gooskens, Heeringa and Beijering (2008), for example, the correlation 21 between intelligibility and Levenshtein distance was -.86, where the correlation between intelligibility and lexical distance was -.64. Gooskens (2007a) found no significant correlation between lexical distance and intelligibility, but she did find a correlation between Levenshtein distance and intelligibility of -.64. The results of the present study should be in line with previous research, and show a negative correlation between orthographic Levenshtein distance and intelligibility. In addition, this correlation should be greater than that of lexical distance with intelligibility. 4. Languages 4.1 Overview The languages included in this thesis are languages five spoken Germanic in the northern and western parts of Europe. The map in Figure 1 shows where these languages Figure 1: Map of Northwestern Europe. The standard languages of the five marked countries are included in this study. Starting from the left, counterclockwise: the United Kingdom (English), the Netherlands (Dutch), Germany (German), Denmark (Danish) and Sweden (Swedish). are spoken.4 These languages represent the two different branches of the Germanic languages that still exist today: West Germanic and North Germanic (see below). First, a short characterization of each language will be given, based on information from the Ethnologue (Gordon 2005). Following this, a brief history of these languages will be given, showing how 4 That is, the countries with which the standard varieties included in this study are associated. All of these languages are spoken in more than one country. 22 they are related to each other and how they were influenced by each other and by other languages outside of the Germanic group. English English is spoken all over the world by some 335 million people as a native language, and by many more as a second language. In this project, standard British English is used, as spoken in the United Kingdom. Dutch Dutch has some 20 million speakers, most of whom (about 16 million) live in the Netherlands. In this thesis, standard Netherlandic Dutch is used. German The German language has almost 80 million speakers, the majority of which (70 million) live in Germany. In this thesis, standard High German is used. Danish Danish is spoken by over 5.5 million people, almost all of whom live in Denmark. In this thesis, standard Danish as spoken in Denmark is used. Swedish Swedish is spoken by over 9 million people, 8.8 million of whom live in Sweden. In this project, standard Swedish as spoken in Sweden is used. 23 4.2 Germanic languages The Germanic languages are a part of the Indo-European language family, to which most European languages belong. They descended from one common ancestor language, Proto-Germanic. This language split into two branches: East Germanic and Northwest Germanic (Harbert 2007). The East Germanic branch has gone extinct and is of no interest to the current study. The Northwest Germanic group further split into two branches: North Germanic and West Germanic. West Germanic is the ancestor of for example English, German, Dutch, and Frisian. The North Germanic branch consists of the Scandinavian languages Danish, Swedish and Norwegian, as well as Icelandic and Faroese. The split between West and North is thus the first and biggest division within the language group included in this study, grouping Danish and Swedish together on one side, and German, Dutch and English on the other. In the following these two branches will be discussed separately, concluding with a discussion on English only, because its development has been considerably different from the other West Germanic languages. 24 Figure 2: The Germanic family tree (adapted from Harbert 2007:8). 25 4.2.1 North Germanic: Swedish and Danish Around 500 AD, North Germanic, in turn, split into two varieties as well: east and west (Vikør 2001). From the eastern dialect developed Norwegian, Icelandic and Faroese, whereas the languages concerned in this study, Danish and Swedish, are both descendents of the western branch. The distinction is not as clear-cut as that between North and West Germanic, however. It is more of a continuum with two extremes. Icelandic and Faroese have through their conservatism separated from the others, quite possibly because of their location on islands, but the mainland Scandinavian languages are still very close to each other: “Rather than viewing Norwegian, Swedish and Danish as units, we should think of these names as loose designations for groups of dialects, arbitrarily distinguished on the basis of linguistic characteristics selected by modern language historians.” (Vikør 2001:34) In the Middle Ages, however, a new split occurred, this time between the north and the south (Vikør 2001). This essentially separates Danish from the two other (standard) languages (some dialects in the south of Norway and Sweden show characteristics of the southern group). The main changes separating Danish from the other languages are phonological. First of all, vowels in unstressed inflectional endings were merged into a schwa, just like in the West Germanic languages. Thus Swedish timmar ‘hours’ corresponds to Danish timer (the <e> in this case is pronounced as a schwa), and Swedish stjärnor ‘stars’ corresponds to Danish stjerner. Secondly, unvoiced plosives following long vowels were weakened, leading to correspondences like Swedish gripa ‘to seize’ and bita ‘to bite’ with Danish gribe and bide. Finally, Danish developed the phenomenon of stød, a kind of creaky voice present in some words. There are many minimal pairs differing only on this point, but it is not present in the spelling. These sound changes can of course be expected to cause problems with mutual intelligibility on the spoken level, and to the extent to which they are represented in the spelling, on the orthographic level, too. 26 4.2.2 West Germanic: German and Dutch West Germanic split into several different varieties as well, but as they were spoken in one area with many possibilities for contact between groups of people, these language varieties kept influencing each other continuously (Harbert 2007). This has resulted in a dialect continuum covering a large area (stretching from the Alps in Austria and Switzerland to the North Sea coast), and classifications into groups can be hard. Newer contact-induced changes have blurred the earlier distinctions caused by dialect splits. In classifications of language varieties in these areas, the terms ‘High’ and ‘Low’ occur frequently. These refer to geographical locations: ‘High’ varieties originated in the relatively mountainous south of the area, whereas the ‘Low’ varieties originate from the flat, lower lying north (Harbert 2007). In the middle ages, one of the low varieties (Middle Low German, Harbert (2007)) became the lingua franca of the Hanseatic League, heavily influencing the mainland Scandinavian languages. Nowadays, the status of the descendents of this variety has been reduced to being considered dialects of the standard language of the country in which they are spoken (either Dutch or German), despite their separate origin (Harbert 2007, see also Figure 2). Currently, two national standard languages are dominant in this area: Dutch and German.5 They can be considered part of one dialect continuum, together with all the other dialects that are still in use. Standard German, the official language of Germany today, is based mostly on the higher and middle varieties. In many parts of the country, however, dialects are still in common use, and their speakers can be considered bilingual, even if their native language is generally considered a mere dialect. Standard Dutch, on the other hand, developed from Low Franconian varieties, which were spoken along the western coast of Belgium and the Netherlands. 5 Both of these language have more than one local standard. Belgian Dutch and Netherlandic Dutch, for example, are considered different standards of the same language. In this study, Netherlandic Dutch and German as spoken in Germany are used. 27 Although the standard languages of Germany and the Netherlands thus developed from very different varieties, several varieties are still present in both countries, being considered dialects or regional languages. One of the most salient differences between German and Dutch is the High German Consonant Shift (Figure 3). It occurred around 500 AD (Van Gelderen 2006) and involved the transformation of voiceless plosives [p, t, k] into, depending on position, an affricate or a fricative (see Table 2). The consonant shift is absent in the lower varieties, including Dutch, and complete in the southernmost (i.e. ‘highest’) varieties of German. Several varieties in between have partially completed the shift (see Figure 3). One of these is standard German, which includes all changes except for the shift of [k] to [kχ] (hence the unexpected unaffricated [k] in Kopf ‘head’ and backen ‘to bake’, see the third column of Table 2). This consonant shift may cause confusion when a speaker of either language encounters the other for the first time, as it affects many words and the correspondence is not immediately clear. It is a very regular correspondence, however, and the words involved have not changed beyond recognition. 28 Figure 3: The Rhenish Fan, showing the partial completion of the High German Consonant Shift in the southwest of Germany (Van Gelderen 2006: 39). Table 2: Some cognates between Dutch (left) and German (right) that demonstrate the effects of the High German Consonant Shift. p > pf/f t > z/s (<z> is pronounced [ts]) k > ch (<ch> is pronounced [χ]) peper – Pfeffer 'pepper' tien – zehn 'ten' maken – machen 'to make' dapper – tapfer 'brave' tuin 'garden' – Zaun 'fence' boek – Buch 'book' kop – Kopf 'head' zitten – sitzen 'to sit' zoeken – suchen 'to search, seek' schaap – Schaf 'sheep' laten – lassen 'to let, leave' kop – Kopf 'head' heet – heiß 'hot' bakken – backen 'to bake' 29 4.2.3 West Germanic: English English originates from one particular sub-group of West Germanic: North Sea Germanic (Harbert 2007, see Figure 2). These varieties were spoken along the North Sea shore. This group still has descendants on the main land in the north of the Netherlands and Germany, but as contact with the other varieties spoken there has influenced them so strongly, they are generally considered mere dialects of the standard language of the country in which they are spoken. The exception to this is constituted by the Frisian languages, but even these have been heavily influenced by Dutch and German. Some groups of speakers of a North Sea Germanic language, however, crossed the North Sea and landed in England around 450 AD (Van Gelderen 2006). Over the following centuries they expanded and their languages gradually replaced the Celtic languages spoken on the British Isles before that time. Some of these languages are still very much alive (such as Irish, Welsh, Scottish Gaelic), but English is the dominant language almost everywhere in the area. As it was relatively cut off from the other West Germanic languages, it has had its own independent developments. First of all, English has been influenced more by Celtic than by the other West Germanic languages, being so close to the area where Celtic languages were spoken. This influence shows mainly in loan words and names, although it is argued that the syntax was influenced as well (Van Gelderen 2006). With the spread of Christianity came some Latin words, as in all of the other Germanic languages, but it was nothing compared to the later influence of Latin during the Renaissance. In the 8th century, speakers of Old Norse (‘the Vikings’) came from Norway and Denmark to the north of Britain and settled there. Their language has had a considerable influence on English (Van Gelderen 2006). For one, English borrowed a lot of words. This often resulted in Scandinavian words replacing their own cognates in the English language: Old Norse egg, for example, replaced its Middle English cognate ey. In some cases, both words coexist with slightly different meanings, such as shirt (West Germanic) and skirt (North Germanic). In addition to words, however, Scandinavian has influenced English grammar as well. In this time period, a 30 simplification of word endings spread from the north to the rest of the island. This is probably caused or enhanced by contact with Scandinavian (Van Gelderen, 2006). Even today, English morphology is less extensive than it is in the other West Germanic languages. In 1066, the Normans, speaking a variety of French, defeated the English king. The English nobility was replaced by Normans and French became the dominant language, although English remained the language of the masses. Because this situation lasted for a few hundred years, French had an extensive influence on English, mainly in the vocabulary: possibly up to 10 000 words (Van Gelderen 2006). Unlike what was the case with the Scandinavian influence, the native English words were not replaced by Germanic words, but by Romance words, setting English apart from the other Germanic languages. Some of the many words borrowed in this period are royal, tax, judge, grammar, art, poet, dinner, confess, mercy, age, damage. In addition to whole words, affixes were borrowed as well. Most of these stick to words of Romance origin (disinterest, solemnity) but there are some hybrids of Germanic words with Romance affixes (disbelief, oddity) or Romance words with Germanic affixes (useless, apprenticeship). In the Renaissance, English further borrowed many words directly from Latin, as did the other Germanic languages. The same is true for the new words needed for technological advancements in the 19th and 20th centuries (Van Gelderen 2006). As these words were borrowed relatively recently, they are still quite similar in all these languages, especially in their written forms. One of the biggest developments setting English apart from the other Germanic languages, however, did not come from the outside, but from within the English language. It is a sound change known as the Great Vowel Shift. This was a chain shift in which the long vowels were raised and the ones that could not be raised any further, /i/ and /u/, were diphthongized. Some examples of this are lane /leɪn/ (Dutch laan /la:n/), wine /waɪn/ (Swedish vin /vi:n/), mouse /maʊs/ (Danish mus /mu:s/), sea /si:/(German See /se:/ ‘lake’). 31 4.3 Germanic orthography As this study concerns only the written version of these languages, it is important to know the background of their orthographies. All Germanic languages are written using the Roman alphabet. This alphabet was originally developed for Latin, the language of writing in the (early) Middle Ages (Molewijk 1992, Scheuringer and Stang 2004). When in the later Middle Ages it became more customary to write in the common languages of the people instead of, or in addition to, Latin, there was no universal spelling standard the writers could adhere to. They had to invent their own way to write these languages, using the alphabet they already knew. The Latin alphabet, however, is not perfectly suited for Germanic languages. These languages contain sounds that are not present in Latin. For example, there were no letters for the sounds /j/ and /w/ (the current letters developed from Latin I and V (= /u/) respectively (Scheuringer and Stang 2004), hence still the English name ‘double u’ for the letter ‘w’). For other sounds, digraphs were established (such as <ng> for /ŋ/) or letters from other alphabets were introduced (such as <þ> (thorn) from the Runic alphabet). Also, there was no universal way to distinguish between short and long vowels, a very important distinction for these languages (Scheuringer and Stang 2004). Every writer had to come up with his own solution to the problems this caused. This, in addition to the fact that every writer based his spelling on his own dialect as there were no standard languages yet, resulted in a wide range of variation. Some of the spellings included in the Oxford English Dictionary (OED) for book, for example, are: boocke, bouke, boock, beuk, buik, bewk, bouck, bouk, bowyk, buike, buk, buyk, bvik, bwck, bwik, bwike, bwk, booke, buick, book, buik, buke, beuk, beuck. When the ability to write and read became more wide-spread, and the invention of the printing press in 1476 made it easier to produce and copy books for a larger audience (Van Gelderen 2006), standards started to be developed. This was true for the languages as a whole, but especially for their writing systems. Standardization initially happened mostly unofficially (Van Gelderen 2006): spellings emerged by convention. This means that the spellings that originated in the most prestigious regions of this time had the most influence on the standardized versions. These regions produced the most books and other writings, thus their spelling 32 conventions became the most widespread. This is similar to how the dialect of the most prestigious region ends up being the basis of the standard language. Only after an initial standard had been established, people started to consciously influence it. When and how this happened and what the current attitude to the spelling of a language is, differs for each of the languages in this study. Therefore, I will discuss their recent history and current situation one by one below. 4.3.1 English The spelling of English is notoriously irregular (Van Gelderen 2006). Although the development of its writing system started out similarly to those of the other Germanic languages, several circumstances have contributed to its being irregular nowadays. Its standardization started quite early – in the early 15th century (Van Gelderen 2006). Although many attempts have been made, no real spelling reform happened after the establishment of this standard. The spelling therefore essentially reflects the pronunciation of the language in the 15th and 16th century. The Great Vowel Shift (described in section 4.2.3) which changed almost all long vowels in the language, happened after this time. Because of this, the pronunciation of many letters in English no longer matches the way these particular letters are pronounced in the other languages (see Table 3). Table 3: Some words showing the difference in pronunciation between English and other Germanic languages of some vowel graphemes. English Dutch Swedish state /stejt/ staat /sta:t/ stat /stɑ:t/ cook /kʊk/ kook /ko:k/ week /wi:k/ week /we:k/ wine /wain/ vin /vi:n/ Other contributing factors are etymological respelling and borrowing words from other languages without changing the spelling (Van Gelderen 2006). This happens in all languages in this study to some extent, but it is more widespread in English. 33 Etymological respelling happens when the spelling of a word is changed, not according to its pronunciation, but according to its (supposed) origin. The word debt, for instance, was borrowed from French without the b (as French had already lost it at that point). Learned writers however, recognizing its connection with the Latin words it derived from, added the b in the written form of the word, to show this connection more clearly. Doing this, however, moves the spelling away from the pronunciation. In addition, this did not happen consistently: for some words, the respelling became standard, and for some it didn’t. The word receipt, for example, has a silent p, but conceit does not. In some cases, these respellings were based on a mistaken etymology. The s in island, for example, originates from its supposed connection to the French loan word isle, when in fact the first part of the word is a Germanic root that never contained an s (Old English íg, íeg (OED)). Loan words for which the pronunciation has been adapted to English, but the original spelling has been retained (Van Gelderen 2006), cause further irregularities. This spelling then does not match the pronunciation of the word in English. Examples of this are suite, glacier, phoenix. For words like this, spelling and pronunciation needs to be learned separately. The other Germanic languages have the same problem, but to a lesser extent: the spelling of words are adapted to the language’s own spelling system more easily. Dutch, for example has words like foto ‘photo’ and orthografie ‘orthography’ and kwarts ‘quartz’; and Swedish has byrå ‘bureau’ and buljong ‘bouillon’. Another issue, which is specific to English, is related to the Norman Conquest. For almost 500 years, the administrative language in England was French, and many of the people who knew how to write, had first learned this skill in French. Naturally, they applied conventions of French spelling to English (Van Gelderen 2006). This results in the many cases where <qu> is used for the sound sequence /kw/, even in Germanic words (such as queen) and the spelling of <ou> for (at that time) /u/ (such as mouse). 34 4.3.2 Dutch The first bible translation into Dutch was published in 1637 (the Statenvertaling, State’s translation, because it was funded by the state). As this translation was meant to be used throughout the Dutch language area, an effort was made to use a somewhat ‘neutral’ Dutch, with elements from different dialects (Molewijk 1992). Because of the wide influence of this bible translation, this has become the basis of modern Dutch. The spelling, as does the spoken language, consists mainly of characteristics of the (south)western varieties of Dutch, because this was the economic centre. The last big spelling reform happened in 1946-1947, uniting Netherlandic Dutch and Belgian Dutch spelling (Molewijk 1992). Spelling changes made after this are minor, concerning mostly the spelling of foreign loan words and the spelling of compound words. Proposals for more phonetic spellings have been made, but receive so much opposition from the public, that they were never carried through. The spelling, then, is not completely regular and phonetic: especially words of foreign origin are exceptions. In loans from French (crèche ‘day care’, garage ‘garage’, comité ‘committee’), German (überhaupt ‘at all, anyway’, sowieso ‘anyway’, föhn ‘hair dryer’) and more recently English (computer, race, cake, poster) the original spelling is retained, even when this does not match the pronunciation in Dutch. Dutch uses diacritics when they are present in loan words (unlike English, where the diacritic is usually dropped), but has not make any additions to the basic 26-letter alphabet for the spelling of native words. 4.3.3 German The German spelling standard was established relatively late: at the end of the 19th century, during German unification under Prussia (Scheuringer & Stang 2004). Before that, there were some regional standards, but nothing covering the whole area of present-day Germany. Scheuringer and Stang mention this as a reason for the relatively good 1:1-correspondence in German between graphemes and pronunciation. Like with Dutch, some minor changes have been made in the 20th 35 century, but like in Dutch, proposals for bigger changes meet with heavy resistance (Scheuringer & Stang 2004). At the end of the 20th century, proposals were made that in effect would make the spelling more regular, such as spelling all <ai> as <ei> (they are pronounced the same). This would affect very many words, and the general public was so much opposed to it, it was never carried through. A proposal in the 90s to write all common nouns with a lower case (instead of the current practice to capitalize them), was resisted as well. Eventually, only minor changes came in effect, having to do with punctuation, word separation, and spelling of foreign loan words. German spelling, as mentioned, is characterized by the practice to capitalize all nouns. In addition, it has four extra letters compared to the English 26-letter alphabet: ä, ö, ü and ß (pronounced /s/). 4.3.4 Danish After World War II, an idealistic movement of the necessity for Scandinavian unity grew strong in the Scandinavian countries (Vikør 2001). In Denmark, being so close to Germany in a time so shortly after the war in which Germany was ‘the enemy’, this movement was strongest and showed itself in a move away from German. A spelling change was adopted in 1948 (Vikør 2001). Among other things, this involved the decapitalization of nouns (which up to that point had been written with a capital, as is still the case in German) and spelling <aa> as <å>, conforming with Swedish and Norwegian. This reform initially met with opposition, but after some time was nevertheless accepted everywhere. Since then, however, no serious reforms have been made or even attempted, except for some notes on how to handle foreign loan words. The movement towards a Scandinavian unity sparked a spelling reform, but that same movement makes further reforms unfavourable (Vikør 2001). The reason for this is the sound changes Danish has undergone, which have separated it from the other Scandinavian languages (see section 4.2.1). The spelling, like English spelling, essentially reflects an older version of the language, rather than being an accurate representation of the current pronunciation. This older version, however, is closer to 36 the other Scandinavian languages than the spoken version is. Moreover, a more phonological orthography would, like in English, involve so many changes it is hardly feasible: “[A] completely phonological orthography would have to be so totally different from the present one that it would be unreadable for the entire Danish population – to learn it would be almost like learning a new language. […] By such drastic reform, the Danes would exclude themselves from their own literary heritage as well as from inter-Nordic written communication.” (Vikør 2001:190) Apart from the <å>, which it shares with Swedish (and Norwegian), Danish has two more letters not present in the other languages included in this study: <æ> and <ø>. 4.3.5 Swedish Swedish, as Danish, strives towards Scandinavian unity. The last big spelling reform for Swedish took place over a hundred years ago (Vikør 2001), in 1906. The changes in this reform resulted mostly in making the spelling more phonetic, that is, more representative of how the words are actually pronounced. haf /ha:v/ ‘ocean’, for example, became hav, and rödt /røt/ ‘red (adv.)’ became rött. Some of these changes made it more similar to the other Scandinavian languages, and some made it more distinct. Later attempts to make the spelling even more phonetic have not been adopted into the spelling standard. Swedish is a bit more prone than the other languages, however, to adapt foreign loan words to its own spelling system: English hike became hajk, French directeur became direktör (Vikør 2001). Another phenomenon that makes Swedish spelling relatively phonetic, is the fact that the spread of the standard language in schools happened mostly in written form (Vikør 2001). Neither the teachers nor the students were personally familiar with the spoken version of the standard, and in an attempt to speak correctly, they used spelling pronunciations that were different from how the words had been pronounced up to that point. These spelling pronunciations eventually became the 37 standard language. Thus, for example, drottning ‘queen’ went from /drɔniŋ/ to /drɔtniŋ/ and till ‘to’ went from /te/ to /til/. The Swedish alphabet has the additional letters å, ä and ö, where <ä> corresponds to Danish <æ> and <ö> corresponds to Danish <ø>. 4.3.6 Summary All of these languages, then, developed a standard spelling, solving the issues that arose from using the Latin alphabet. Some issues are solved in different ways in the different languages, however. A long vowel, for example, is in English often signalled by a silent ‘e’ following the consonant: cape, make, cake, duke, grape, rope, etc. In Dutch, however, the vowel is doubled: kaap ‘cape’, maak ‘make’, leek ‘layman’, meen ‘mean’, vuur ‘fire’, rood ‘red’. In German, a common strategy is to add an <h> after the vowel: Zahn ‘tooth’, zehn ‘ten’, Lohn ‘wage’ (in Dutch: loon), mehr ‘more’ (in Dutch: meer), lehren ‘teach’. In some cases, these differences might obstruct intelligibility between languages, as they make words look more different than they are. In other cases, however, the orthography can help understanding. Some sound changes, which have made the spoken versions of the languages more different from each other, are not reflected in the spelling – making the orthographic versions of the languages more similar to the others than the spoken versions. This is especially true for English and Danish, where the written language reflects an older version of the spoken language. 38 4.3.7 The North Wind and the Sun Below, the short fable of The North Wind and the Sun is printed in all five languages, in order to give an impression of the orthographies of these languages. English The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak. They agreed that the one who first succeeded in making the traveler take his cloak off should be considered stronger than the other. Then the North Wind blew as hard as he could, but the more he blew the more closely did the traveler fold his cloak around him; and at last the North Wind gave up the attempt. Then the Sun shone out warmly, and immediately the traveler took off his cloak. And so the North Wind was obliged to confess that the Sun was the stronger of the two. (Ladefoged 1999) Dutch De noordenwind en de zon hadden een discussie over de vraag wie van hun tweeën de sterkste was, toen er juist iemand voorbij kwam die een dikke, warme jas aanhad. Ze spraken af dat wie de voorbijganger ertoe zou krijgen zijn jas uit te trekken de sterkste zou zijn. De noordenwind begon uit alle macht te blazen, maar hoe harder hij blies, des te dichter de voorbijganger zijn jas om zich heen trok. Tenslotte gaf de noordenwind het maar op. Vervolgens begon de zon krachtig te stralen, en onmiddellijk daarop trok de voorbijganger zijn jas uit. De noordenwind kon toen slechts beamen dat de zon de sterkste was. (Gussenhoven 1999) German Einst stritten sich Nordwind und Sonne, wer von ihnen beiden wohl der Stärkere wäre, als ein Wanderer, der in einen warmen Mantel gehüllt war, des Weges daherkam. Sie wurden einig, daß derjenige für den Stärkeren gelten sollte, der den 39 Wanderer zwingen würde, seinen Mantel abzunehmen. Der Nordwind blies mit aller Macht, aber je mehr er blies, desto fester hüllte sich der Wanderer in seinen Mantel ein. Endlich gab der Nordwind den Kampf auf. Nun erwärmte die Sonne die Luft mit ihren freundlichen Strahlen, und schon nach wenigen Augenblicken zog der Wanderer seinen Mantel aus. Da mußte der Nordwind zugeben, daß die Sonne von ihnen beiden der Stärkere war. (Kohler 1999) Danish Nordenvinden og solen kom engang i strid om, hvem af dem der var den stærkeste. Da så de en vandringsmand, der kom gående, svøbt i en varm kappe. Og de enedes om, at den der først kunne få kappen af ham skulle anses for den stærkeste. Først tog nordenvinden fat, og han blæste og blæste, men jo mere han blæste, des tættere holdt manden kappen sammen om sig. Til sidst måtte nordenvinden give fortabt. Så tog solen fat. Og han skinnede og skinnede, og til sidst fik manden det for varmt og måtte tage kappen af. Da måtte nordenvinden indrømme, at solen var den stærkeste af de to. (Grønnum 1998) Swedish Nordanvinden och solen tvistade en gång om vem av dom som var starkast. Just då kom en vandrare vägen fram, insvept i en varm kappa. Dom kom då överens om, att den som först kunde få vandraren att ta av sig kappan, han skulle anses vara starkare än den andra. Då blåste nordanvinden så hårt han nånsin kunde, men ju hårdare han blåste desto tätare svepte vandraren kappan om sig, och till sist gav nordanvinden upp försöket. Då lät solen sina strålar skina helt varmt och genast tog vandraren av sig kappan, och så var nordanvinden tvungen att erkänna att solen var den starkaste av dom två. (Engstrand 1999) 40 5. Data In order to calculate the linguistic distances, a corpus of parallel word lists in the five languages included in the study is needed. In the Micrela project, a word list of a hundred nouns is used to collect data on the mutual intelligibility of the languages in the project. These nouns are taken from a list of all the words contained in the British National Corpus6 (BNC) ordered by frequency. Roughly, they are simply the 100 most frequent nouns in the corpus. These words were translated to the other languages in the project, creating parallel word lists for all languages (see e.g. Heeringa et al. (2013) for more details on the creation of these lists). The lists are being used to calculate lexical distances and Levenshtein distances in publications of the project (Heeringa et al. 2013). In many other publications involving lexical and Levenshtein distance, these distances have been calculated with relatively short word lists as well. Gooskens, Heeringa and Beijering (2008), for example, used the words from the text The North Wind and the Sun which they used in their experiment - about 100 words, depending on the language variety. Gooskens (2007a) used the words from the text she used in her experiment, as well: a news item consisting of about 250 - 290 words, depending on the language. These lists are too short to reliably calculate entropy measures, however (Moberg et al. 2007; see also Chapter 6). Therefore, I created new word lists consisting of 1500 words and used them not only to calculate the entropy, but the lexical and Levenshtein distances as well. The word list size should not make a significant difference for these distance calculations, but in order to be certain of this, I will correlate these results with the lexical and Levenshtein distances calculated by Heeringa et al. (2013), based on the much smaller set of 100 words. This new word list was, again, based on the British National Corpus. In this case not only the nouns, but words from all parts of speech were included. This will make for a better representation of the languages, as nouns might behave differently from other word classes when it comes to linguistic similarity. For example, loan words are very often nouns. The words were translated to Dutch, German, Danish and Swedish with help from internet sources, dictionaries and native speakers of Dutch 6 http://www.natcorp.ox.ac.uk/ 41 and German. During this process, some words were removed from the list because they proved to be too hard to be translated reliably. These cases consisted usually of words from the original English list which simply do not exist, or at least do not exist in the same form, in one or more of the other languages. A word like ‘whatever’, for example, does not have a clear translation in any of the other languages, and even if it has, it can only be translated by a multi-word expression. In Dutch, for example, the ‘translation’ consists of three words which are intervened by other words in the sentence (see the example sentence below). Another example of a problematic word is the verb ‘to face’, which does not have a clear translation covering its meaning in the other four languages. It can be translated by many different verbs and expressions, depending on subtle differences in the context. (1) EN Paint your house in whatever colour you like DU Verf je huis in paint your house in wat voor kleur je what for colour you maar wilt just want Another translation issue is caused by certain function words which might not even exist in the other languages. The translation of English modals, for example (such as ‘could’, ‘might’, ‘should’), depends highly on the context. Translating them as a separate word, as is necessary for this list, is difficult. Therefore, these were removed as well. In total, 51 words were removed from the list. In order to replace them, new words were added at the end of the list (simply the next words in the BNC frequency list). Because a margin was taken to anticipate words being possibly excluded at a later step in the process, the final list ended up containing 1510 words. In many cases, the English original word is ambiguous, and its meanings are covered by several different words in one or more of the other languages. In this case, one of these meanings was chosen and used consistently for the other languages. The noun practice, for example, can mean (amongst others) the following things: 42 1. The carrying out or exercise of a profession, esp. that of medicine or law 2. The actual application or use of an idea, belief, or method, as opposed to the theory or principles of it 3. The habitual doing or carrying on of something 4. Repeated exercise in or performance of an activity so as to acquire, improve, or maintain proficiency in it (Oxford English Dictionary online edition,7 entry for ‘practice’, meanings 1-4) In the other languages, different words are used to express these meanings. In Dutch, for example, meanings 1 and 2 are covered by praktijk, meaning 3 translates as gewoonte and the fourth meaning is expressed by oefening. In this case, the fourth meaning was chosen for all languages. The choosing of one meaning was not done systematically – in many cases it was simply the meaning that first emerged in the translator’s head or the first translation given by the dictionary used. Care was taken, however, to always choose one of the most common meanings of the word, and not one of the more obscure ones. Once one of the meanings of the English word was decided upon, it was translated by the most common word in the target language that accurately represents this meaning. If there were two or more alternatives that are both common (i.e. not considered jargon), and one of these was a cognate to the word in English or in one or more of the other languages, that word was chosen. Because the basis of the word list was taken from an English-language corpus, the inevitable result is that the final word list is somewhat centred on English. If it had been based on a word list taken from a Swedish corpus, for example, it would have contained different words. In all languages other than English, some words are included more than once because they correspond to multiple lemmas in English. The Dutch word leren, for example, means both ‘to learn’ and ‘to teach’ and is therefore included in the list twice. Other frequent words, on the other hand, might not be included at all, because their English equivalent has more than one possible translation and one of the others has been chosen for the word list – as is the case 7 http://www.oed.com 43 with ‘practice’ above. This should not be considered a problem, however. The goal of this part of the research is to create a list of words with corresponding meanings in the five languages that are included. Which words these are exactly is not of importance, as long as they are randomly chosen and are good representations of the languages. I believe that in this case, these conditions have been met. A list of the words that were excluded and the full word list in the five languages and can be found in appendices A and B respectively. 6. Methods As elaborated on in Chapter 2, there are several ways to go about measuring the linguistic similarity between two language varieties. In this study, three methods were used: lexical distance, Levenshtein distance, and conditional entropy measuring. As this thesis focuses on written language, these were applied only on the orthographic level, on the data described in Chapter 5. In the first part of this chapter, I will describe these methods in detail. In the second part of this chapter, the methods used in measuring intelligibility will be described. The experiments mentioned there were carried out as a part of the MICReLa project described in section 2.4. 6.1 Methods for measuring linguistic distance 6.1.1 Lexical distance A computationally simple way to measure linguistic distance is by measuring the lexical distance. This has been used many times in the past. An example of this is the well-known Swadesh list (Swadesh 1971): A list of 100 words, representing very basic concepts, constructed for the purpose of comparing languages to establish their 44 relationships to each other. Lexical distance, as can be expected from its name, consists of measuring distance on a lexical level. When two language varieties are related to each other, they usually share many cognates, but there will also be a part of the lexicon that consists of noncognates. This happens, for example, when one language has borrowed a word from a third language, whereas the other language has maintained the inherited word. English has many examples of this phenomenon, where mainly Latin and French words have replaced the Germanic words. Compare for example to contribute with its translations in the four other languages in this study: bijdragen, beitragen, bidrage, bidra. It can also be the result of semantic shift, however: the cognate word is in fact still present in both languages, but no longer has the same meaning. This results in false friends, such as English queen with Swedish kvinna ‘woman’, or English town with Dutch tuin ‘garden’ and German Zaun ‘fence’. The idea behind measuring lexical distance is this: The more cognates two language varieties share, the closer they are to each other.8 Lexical distance is then simply the percentage of non-cognates between a given language pair. This is to be measured for the 1500-word samples from each of the languages in this study. In order to measure this, it has to be determined whether two corresponding words are cognates. The traditional definition of cognate words stresses the shared origin of the words in an older form of the languages, as in this definition from the Oxford English Dictionary: “Coming naturally from the same root, or representing the same original word, with differences due to subsequent separate phonetic development”. For this research, however, a broader definition was used. In the situation in which a speaker of one language is trying to understand the words of another language, he or she does not see the etymological history of a word. The only thing that matters to the reader, is the fact that there is some kind of similarity to the corresponding word in their own language. Therefore, any two words of which the stems are related were considered cognates. This of course includes cognates in the 8 This is valid only when the cognates still have the same meaning - as many language learners know, false friends tend to impair intelligibility more than help it. 45 traditional sense, but it also includes loan words sharing a common source, such as German Party and its English equivalent which it is derived from, and words such as information, which occurs in all five languages and has a common source outside the Germanic family. Words which share a base form but have different affixes were considered cognates as well, such as Dutch betalen (‘pay’, be- + talen) with German zahlen (‘pay’, lacking the be- prefix). When a word consists of multiple lexical items, however, and one of them is not related, the complete words were not considered cognates. Take, for example, the compounds buitenlands (Dutch) and udenlandsk (Danish, ‘foreign’, literally roughly ‘out-landish’). The second parts of these words, lands and landsk, are cognates, but the first parts derive from different root words. The word pair as a whole is therefore not considered a cognate pair. When there was doubt about whether or not two words shared the same origin, etymological dictionaries were used. In addition, because the data consisted of parallel word lists, only word pairs with the same meaning in both languages were considered – false friends are no part of this study. 6.1.2 Orthographic Levenshtein distance Lexical distance calculates the percentage of cognate words in a language pair, but it says nothing about how similar these cognates are to each other. When two language varieties have started growing apart a long time ago, sound changes may have changed both words in a pair beyond recognition, even if they stemmed from the same root. Levenshtein distance (Heeringa 2004) is a computational way of measuring the distance between two cognates on a phonetic or orthographic level.9 In this study, the distance was calculated based on the orthography only, focusing on written intelligibility. 9 Although it is technically possible to calculate the distance between two non-cognates, it does not make a lot of sense. Two things need to have some common ground before we can sensibly consider how much they differ. What is for example the distance between the colour red and a tree? In the same way, it makes no sense to consider the distance between two non-cognate words, even though the algorithm can be applied to any word pair. When using a computer, one should never forget to use common sense, because that is something a computer does not have. 46 The Levenshtein algorithm (Heeringa 2004) calculates the distance between two strings (orthographic or phonetic transcriptions of words) by counting how many characters or phonetic segments minimally need to be changed in order to get from one string to the other. These changes can be insertion (adding a character), deletion (removing a character) or substitution (changing one character into another). The number of changes is then divided by the length of the total alignment, in order to normalize the calculation over words of different lengths. Without this normalization, longer words would contribute to the total average more than shorter words would, as longer words consist of more items. An example of what such a calculation could look like can be found in (2) for the English word long and its Dutch counterpart lang. There is one substitution (a for o) in a total length of four characters, resulting in a Levenshtein distance of .25 for this word pair. (2) EN l o n g DU l a n g 0 1 0 0 1/4 = .25 As can be seen here, the two words are aligned to each other so that each character in one word corresponds to one character in the other word. For this combination, the alignment is straightforward, but this is not the case for all words. For example, in many cases the two corresponding words are of different length, and therefore some characters have to be aligned to an empty position. Putting these empty positions at the end of the shortest word, thus aligning from left to right, does not always give the desired results. Take for example the word pair of English word and Dutch woord in (3): (3) EN w o r d DU w o o r d 0 0 1 1 1 3/5 = .60 For a naïve reader, it is obvious that the r in the English word should correspond to the r in the Dutch word, and idem ditto for the d’s. In the alignment shown here, however, this is not the case, resulting in three changes being needed (two substitutions and an insertion) to change word into woord. With a more 47 fortunate and sensible alignment, this can be reduced to only one change (see (4a) below). The algorithm takes this into account: it aligns the words in such a way that letters representing similar sounds are mapped onto each other (e.g. consonants to consonants and vowels to vowels) and the least possible changes are needed to get from one word to the other. Some examples of this process are shown in (4). (4) (a) (b) EN w o DU w o 0 w DU SW (c) (d) r d o r d 0 1 0 0 o o r d r d 0 o 1 0 1 0 EN w o r d GE w o r t 0 0 0 1 o r d w o r t 1 0 0 1 SW GE 1/5 = .20 2/5 = .40 1/4 = .25 2/4 = .50 The Levenshtein distances are calculated for each cognate word pair in a language combination and then averaged to get a distance for the language pair as a whole. With five languages, this results in ten distances, as the Levenshtein distance is a symmetrical distance measure.10 6.1.3 Measuring conditional entropy Moberg et al. (2007) explored the possibility of using conditional entropy as a way to measure language similarity. Entropy is a way to measure the regularity or predictability present in the correspondences between two language varieties. As such, it is not a measure of distance per se – it does not measure how similar the two 10 The Levenshtein distance can be adapted to be asymmetrical, by for example giving different weights to different replacements instead of the binary 1 and 0, or by taking other translations of a word into account than the pair included in the word list, as in Heeringa et al. (2013). In its basic form as used here, however, it is symmetrical. 48 parts of a correspondence are, but simply how predictable the correspondence is in a certain language pair. Like Levenshtein distance, it uses the phonetic or orthographic level of a language to calculate this (in the case of this study, only orthographic). The major advantage of measuring entropy between two languages is the fact that, unlike lexical and Levenshtein distance, it is inherently an asymmetrical measure.11 (5) 𝐻(𝑋|𝑌) = − ∑ 𝑝(𝑥, 𝑦) log 2 𝑝(𝑥|𝑦) 𝑥 ∈𝑋,𝑦∈𝑌 (Moberg et al. (2007), p. 4) The formula used to calculate entropy is shown above. H(X|Y) is the entropy of X given Y, that is, the amount of uncertainty regarding the value of X when the value of Y is known. In the case of languages, Y is the stimulus language (the value of which is known, it is the text the participant is reading) and X is the reader’s native language (the value of which is unknown: the reader is trying to guess which values in his native language correspond to what he is reading in language Y). p(x,y) is the chance that a certain combination of x and y occurs and p(x|y) is the chance of the occurrence of x in the case of y. The units of x and y can be anything, but in the case of this study, they represent letters, or combinations of letters.12 The use of this formula will be illustrated by the data set in Table 4, consisting of three word pairs. English is added merely for reference; this illustration focuses on Dutch and German only, and specifically, on the initial consonants of each word. Table 4: A (tiny) corpus of three words in three languages. German Dutch English tot dood dead dünn dun thin zu toe to 11 The Levenshtein distance can be asymmetrical, see footnote 10. It is not necessarily asymmetrical, however, unlike conditional entropy. The implementation of the Levenshtein distance used in this study is symmetrical. 12 In phonetic entropies, as Moberg et al. (2007) uses, they represent phonemes. 49 Consider first a native speaker of Dutch reading these German words. He encounters three different initial consonants: t, d, and z (see Table 5). Each t he encounters corresponds to a d in his own language, each d corresponds to a d and each z corresponds to a t. All correspondences are thus absolute and completely predictable. To put it in the terms of the formula above: p(d|t) is 1 (each t corresponds to a d), the log2 of it is therefore 0 and the whole product is 0. The total entropy for a Dutch reader of German is thus 0 (that is, H(Dutch|German) = 0). The other way around, however, paints a different picture. A German reader of Dutch sees only two different letters: d and t (see Table 6). The d occurs twice: once it corresponds with a t in the reader’s native language, and once with a d. The t corresponds in each case to a z. p(t|d) is then .5 (when a d occurs, there is a 50% chance that it corresponds to a t), the log2 of which is -1. p(t,d) is .33 (the correspondence is present in one out of three word pairs, i.e. 33% of the data), the entropy for this correspondence then being -(.33 * -1) = .33. The entropy of the correspondence of d and d is calculated in the same way, resulting in a total entropy for a German reader of Dutch of 0.67 (H(German|Dutch) = 0.67). Table 5: The entropy of German for a native speaker of Dutch. German Dutch Correspondence t d 1:1 d d 1:1 z t 1:1 Entropy German → Dutch: 0 Table 6: The entropy of Dutch for a native speaker of German. Dutch d t German t Correspondence 1:2 d z 1:1 Entropy Dutch → German: 0.67 50 We can see here, then, that H(Dutch|German) is not the same as H(German|Dutch). Therefore, we have the asymmetry that we were looking for. Note that for the calculation of entropy, it does not matter whether the two sounds or graphemes of the correspondence are the same or in any way similar. The only thing that matters is the regularity of the correspondence. The entropy was calculated in this way for each language pair in both directions, for five languages resulting in 20 measures. According to Moberg et al. (2007), at least 800 words are needed to reach stable entropy measures, but calculations based on less words already show the relative differences among language pairs accurately. This is illustrated in Figure 4: the entropy (vertical axis) stabilizes when calculated for around 800 words (horizontal axis), but even before that, the distance between both entropies is constant. Although the word lists used in this study consisted of over 1500 words, this number includes the non-cognates. Grapheme correspondences, however, were calculated only for cognate words, as was the case with the Levenshtein distance. The number of cognate words in each language combination is shown in Table 7. Some of these numbers are below 800, especially the combinations involving English are problematic. Figure 4 shows that even with these numbers, asymmetry can be reliably found. Some caution should be taken, however, when interpreting the results for the language pairs involving English. 51 Figure 4: Entropy calculations for Danish and Swedish (vertical axis) using various word list sizes (horizontal axis). Moberg et al. 2007:7. Table 7: The number of cognates for each language combination in the word list of 1510 words. Danish and Dutch Number of cognates (abs.) 806 Danish and English 632 Danish and German 824 Danish and Swedish 1175 Dutch and English 687 Dutch and German 961 Dutch and Swedish 784 English and German 601 English and Swedish 632 German and Swedish 829 Language pair 52 Entropy can be calculated for correspondences of one character each, as in the example above, but it is also possible to use larger units of language. All languages in this study have certain combinations of characters in their orthography that occur often and possibly correspond to a specific combination in the other languages (bigrams or trigrams). Examples of this for English are sh, ng or th. In this study, the entropy for each language pair was calculated three times: taking one letter, two letters and three letters as a unit. When the units consist of two (bigrams) or three letters (trigrams), the algorithm can take correspondences of unequal lengths (such as Dutch oe with German u, or English th with Dutch and German d in Table 4) into account by making for example a bigram consisting of one letter and ‘nothing’. In addition to analyzing each of these three measures individually, the sum of the three results for each language pair was included as well, combining the results of the unigram, bigram and trigram entropy. All four of these entropy results are presented in Chapter 7. 6.2 Measuring intelligibility In order to measure intelligibility, preliminary results from a written cloze test and a written word translation task carried out by Femke Swarte in the Micrela project (see section 2.4) were used. The tests are available on-line at www.micrela.nl. They included all language combinations in both directions. As there were five languages (English, Dutch, German, Danish, Swedish), there were twenty combinations. 6.2.1 Participants The experiments were carried out on-line and presented as a game. The participants were therefore not paid for their participation. Only people who spoke one of the five languages in the project natively (Danish, Dutch, English, German or Swedish) could participate. Each participant was randomly assigned one of the tasks (picture task (not included in this thesis), cloze test or word translation task) in either the written or the spoken form (only the written tasks are included in this thesis) in one of the 53 four languages that was not their native language. From the 18,108 people who participated in the experiments, the criteria described below were used to arrive at the final participant group of 2,976 people. All participants had only one native language and no languages other than this language were spoken in their homes when growing up. They grew up in one of the countries included in the project (see Chapter 4), corresponding to their native language. They had not previously learned the language they were tested in, with the exception of German and English, which is widely taught in schools throughout the Germanic language area. For German, participants who had learned German for more than seven years (meaning they must have spent time learning German outside of school) were excluded, and for English none were excluded. For more information on the selection of the participants and the procedure of the experiments, see Swarte (in preparation). In this thesis, the results from two of the six tasks are included: the written cloze test and the written word translation task. The final number of participants of the written cloze test is 528 and the number of participants of the written word translation task is 495. A breakdown of the number of participants per language combination can be found in Table 8 for the cloze test and Table 9 for the word translation task. 54 Table 8: Number of participants for each language combination of the written cloze test. Stimulus language Reader's native language Danish Dutch English German Swedish Total Danish 0 39 25 27 15 106 Dutch 22 0 27 26 15 90 English 30 36 0 41 15 122 German 34 32 18 0 15 99 Swedish 30 36 25 20 0 111 Total 116 143 95 114 60 528 Table 9: Number of participants for each language combination of the written word translation task. Stimulus language Reader's native language Danish Dutch English German Swedish Total Danish 0 36 16 21 15 88 Dutch 25 0 27 19 15 86 English 31 38 0 35 15 119 German 26 29 25 0 16 96 Swedish 15 31 30 30 0 106 Total 97 134 98 105 61 495 55 6.2.1 Cloze test In the written cloze test, the participants read a text which contained twelve gaps. They had a list of the twelve words belonging in these gaps in the target language with a translation into their own language: four adjectives, four nouns and four verbs. Their task was to fill the gaps with the words given. In order to do this, they need to be able to understand the text up to a certain point in order to know which words to fill in. The texts were taken from the Cambridge English Preliminary English Test (PET) and translated into the other languages by three translators (one translating, two checking). They are about 200 words and 16-17 sentences long. The topics are everyday things: catching a cold, riding a bike, driving in winter, child athletes. 6.2.2 Word translation task For the word translation task, participants were presented with single isolated nouns and were required to translate them. They were encouraged to provide an answer even if they had no idea what the word could possibly mean. The words used in this task were the words from the word list used in the Micrela project (Heeringa et al. 2013, see also section 2.4). This list consists of the 100 most frequent nouns in the British National Corpus (BNC) and their translations into the four other languages, again by three translators, one translating and two checking. Each participant was randomly assigned 50 words from this list. With this word translation task, the participants cannot use context to derive the meaning of a word: they only have that single word. Therefore, the influence of linguistic factors should be more clearly present. 56 7. Results In this chapter, the results and statistical analyses of the data are presented. In the first section, the results of the three linguistic measures (lexical distance, Levenshtein distance and entropy) are described. In the second section, their contribution to intelligibility, measured by two different tests (a cloze test and a word translation task), will be investigated. All correlation coefficients are Pearson’s R and all significances are calculated using the Mantel test (Mantel 1967). 7.1 Linguistic measures 7.1.1 Lexical distance Table 10 shows the results of the lexical distance measurements. The numbers represent the percentage of non-cognates in each language pair. As mentioned before, lexical distance in this study is a symmetrical measure, therefore the distance between a language pair is the same in both directions. There are a few things to note about this table. Firstly, the distance between Swedish and Danish is by far the smallest, at 22%. The distance between Dutch and German is quite small as well, with 36%. English is the most distant from all the other languages, with percentages of over 50. This is not surprising, as English has borrowed many words from Romance languages over the course of its history (Van Gelderen 2007, see Chapter 4 of this thesis). In Figure 5, a graphical representation of the distances is shown. The darker a line is, the closer the two languages it connects are together. The separation of English from the other languages is clearly visible, as well as the two clusters formed by Danish/Swedish and Dutch/German. These groupings correspond to what would be expected from the history of these languages as described in Chapter 4. 57 Table 10: Lexical distances between the languages pairs. Each number corresponds to the percentage of non-cognates between the two languages; thus the higher the number, the greater the distance. As this distance is symmetrical, half of the table is greyed out. Stimulus language Reader language Danish Dutch English German Swedish Danish X X X X X Dutch 47 X X X X English 58 55 X X X German 45 36 60 X X Swedish 22 48 58 45 X Figure 5: Graphical representation of the lexical distances in Table 10. The darker the line between two languages, the lower the lexical distance is between them. 58 7.1.2 Orthographic Levenshtein distance The orthographic Levenshtein distances were calculated for every cognate word pair in each language combination. Table 7 shows the number of cognates in each language pair, to show how many words each distance calculation was based on. Note that this is essentially the reversed lexical distance (as lexical distance consists of the percentage of non-cognates). The Levenshtein distances are shown in Table 11 and a graphical representation can be found in Figure 6. Like the lexical distance, the Levenshtein distance is symmetrical and every language combination is represented only once. In Figure 6, the same clusters show up as we saw with the lexical distance: Swedish and Danish are closest together, followed by a cluster of Dutch and German. English is again the most distant from the others, although its separation is less pronounced than with the lexical distance: there are similar distances between Dutch/German on the one hand and Swedish/Danish on the other. 59 Table 11: Orthographic Levenshtein distances between the language pairs. The higher the number, the greater the distance. For an explanation of how these distances were calculated, see Chapter 6. As these distances are symmetrical, half of the table is greyed out. Stimulus language Reader language Danish Dutch English German Swedish Danish X X X X X Dutch 35 X X X X English 34 35 X X X German 34 31 36 X X Swedish 23 36 35 36 X Figure 6: Graphical representation of the orthographic Levenshtein distances in Table 11. The darker the line between two languages, the lower the lexical distance is between them. 60 7.1.3 Entropy measures Table 12 shows the entropies based on unigrams. Unlike the lexical and Levenshtein distances, entropy is inherently asymmetrical. Therefore this table shows 20 different values. A higher entropy value means that there is more irregularity in the orthographic correspondences in that language pair. Figure 7 is a graphical representation of the unigram entropies. For each language pair, the entropy in one direction is plotted on the horizontal axis and the entropy in the other direction is plotted on the vertical axis. The line marks where the entropies in both directions are the same, and there is no asymmetry. The further a language pair is located from the line, the higher the asymmetry in that pair. Table 13 and Figure 8 show the bigram entropies and Table 14 and Figure 9 show the trigram entropies. The entropy data clearly show the close relationship between Swedish and Danish. In every case, the entropies between Swedish and Danish in both directions are the two lowest entropies, often with a clear gap between these two and the next lowest value. Only in the case of trigrams, the entropy for English given Swedish is very close (0.34 where H(Swedish|Danish) is 0.32), but this pair is very asymmetrical: the entropy in the other direction is much higher (H(Swedish|English) = 0.42). In all cases, the entropy of Swedish given Danish (that is, with Danish as stimulus language) is higher than the entropy of Danish given Swedish (with Swedish as stimulus language). The difference is exceptional relative to the asymmetries among the other language pairs, however. For the bigrams, the asymmetry is only 0.01 - the lowest of all pairs. For unigrams, the largest asymmetries are in DanishGerman (0.10) and Danish-Dutch (0.09). For bigrams, the largest asymmetries are English-Swedish (0.19) and English-Danish (0.17). For trigrams, the largest asymmetries are English-Swedish (0.08) and English-German (0.06). 61 Figure 7: Unigram entropies for each language pair in both directions plotted against each other. The line is where Entropy (A|B) = Entropy (B|A), i.e. where a pair is symmetrical. Table 12: Entropy for each language pair based on unigrams. These measures are not symmetrical. A higher number (marked by lighter colours) means a greater distance. Stimulus language Reader language Danish Dutch English German Swedish Danish X 1.09 1.06 1.22 0.89 Dutch 1.00 X 0.99 0.99 0.99 English 1.02 1.06 X 1.14 1.05 German 1.12 1.03 1.16 X 1.13 Swedish 0.86 0.99 1.12 1.15 X 62 Figure 8: Bigram entropies for each language pair in both directions plotted against each other. The line is where Entropy (A|B) = Entropy (B|A), i.e. where a pair is symmetrical. Table 13: Entropy for each language pair based on bigrams. These measures are not symmetrical. A higher number (marked by lighter colours) means a greater distance. Stimulus language Reader language Danish Dutch English German Swedish Danish X 0.74 0.74 0.83 0.64 Dutch 0.81 X 0.82 0.82 0.79 English 0.91 0.89 X 0.92 0.94 German 0.82 0.80 0.80 X 0.86 Swedish 0.63 0.72 0.75 0.85 X 63 Figure 9: Trigram entropies for each language pair in both directions plotted against each other. The line is where Entropy (A|B) = Entropy (B|A), i.e. where a pair is symmetrical. Table 14: Entropy for each language pair based on trigrams. These measures are not symmetrical. A higher number (marked by lighter colours) means a greater distance. Stimulus language Reader language Danish Dutch English German Swedish Danish X 0.37 0.35 0.37 0.32 Dutch 0.41 X 0.40 0.41 0.40 English 0.40 0.42 X 0.43 0.42 German 0.39 0.39 0.37 X 0.43 Swedish 0.28 0.36 0.34 0.38 X 64 Table 15 shows the correlations of the results for unigrams, bigrams and trigrams. All three measures correlate significantly with the two others. The correlation between the bigrams and trigrams is very high (r = .87), suggesting a high overlap between these two measures. The correlations of unigrams with each of the others are lower (r = .54 with bigrams and r = .41 with trigrams). It can be concluded that the unigrams are the most distinct of the three measures. Table 15: Correlations of the different entropies. r p n Entropy 1-gram – entropy 2-gram .54 < .01 20 Entropy 1-gram – entropy 3-gram .41 < .05 20 Entropy 2-gram – entropy 3-gram .87 < .001 20 Table 16 shows the sum of the unigram, bigram and trigram entropies for each language pair. As expected from the distributions of the separate entropy measures, the entropies for Danish and Swedish in both directions are clearly the lowest. The entropies with English as the stimulus language are consistently high. Figure 10 is a representation of the summed entropies in the way lexical and Levenshtein distance were shown in the previous sections (see Figure 5 and Figure 6). As asymmetry cannot be shown in this figure, it is created using the averages of the two directions for each language combination. For example, the distance between English and Dutch was calculated as the average of H(English|Dutch), 2.36 and H(Dutch|English), 2.20: 2.28.13 Like with the previous two linguistic measures, this graph again shows a close relationship between Danish and Swedish: the orthographic correspondences between these languages are relatively regular. The other patterns from the previous two measures are not present, however. The closest combination after Danish and Swedish is Dutch and Swedish, a pairing that did not show up in the lexical distance and Levenshtein distance results. In addition, German appears to be the most separate from the other languages, where with the lexical and Levenshtein distances, this was English. 13 Remember also that what entropy measures (and thus what this graph shows) is not actual distance, but rather predictability or regularity. 65 Table 16: Sum of the entropies calculated using unigrams (Table 12), bigrams (Table 13) and trigrams (Table 14). These measures are not symmetrical. A higher number (marked by lighter colours) means a greater distance. Stimulus language Reader language Danish Dutch English German Swedish Danish X 2.21 2.15 2.42 1.85 Dutch 2.21 X 2.20 2.22 2.18 English 2.33 2.36 X 2.49 2.41 German 2.33 2.22 2.34 X 2.42 Swedish 1.78 2.07 2.21 2.37 X Figure 10: Graphical representation of the sum of the entropy unigrams, bigrams and trigrams within each language pair. The numbers in Table 16 for both directions of each pair were averaged. A darker line means lower entropy among that pair. 66 Table 17 shows the asymmetry within each language pair for the summed entropies. The higher the number, the higher the asymmetry. A negative number means that the entropy for that direction of the language combination is lower than the entropy in the other direction. There is a little asymmetry between Danish and Swedish (0.07, where the entropy for a Danish person reading Swedish is lower than for a Swedish person reading Danish), but it is one of the lowest asymmetries in the table. (Note, however, that there are two pairs with no asymmetry at all (DutchGerman and Dutch-Danish)). The highest asymmetries are the language combinations involving English, where it is always the case that the entropies for pairs with English as the stimulus language are higher than those with English as the reader’s native language. This would suggest that written English is intrinsically harder to understand for speakers of the other four languages than vice versa. Table 17: The asymmetry in each language pair, based on the sum of the entropies as presented in Table 16. The number represents the difference between the entropies of both directions. E.g., H(English|Danish) = 2.33 and H(Danish|English) = 2.15. The difference is 0.18 or -0.18. A negative number in the table means that the entropy in that direction is lower than the entropy in the other direction. Stimulus language Reader language Danish Dutch English German Swedish Danish X 0 -0.18 0.09 0.07 Dutch 0 X -0.16 0 0.11 English 0.18 0.16 X 0.15 0.20 German -0.09 0 -0.15 X 0.05 Swedish -0.07 -0.11 -0.20 -0.05 X 67 7.1.4 Correlations of the different linguistic measures For the linguistic measures presented above, correlations were calculated. The results of this can be seen in Table 18. The lexical distances (section 7.1.1) and the Levenshtein distances (section 7.1.2) correlate with each other very highly, Pearson’s R = .92 (p < .01). This can also be seen in Figure 11: the higher the lexical distance, the higher the Levenshtein distance. Note that, although they are both measures of distance, they need not necessarily correlate, as they measure two different things: lexical distance determines the proportion of cognates in a language pair, whereas Levenshtein distance measures how similar these cognates are. It is possible for two languages to have many, but very different cognates; or few cognates, but when two words are in fact cognate, they're very similar. That is not the case for these data, however: language pairs that share few cognates (i.e. have a high lexical distance), differ greatly in these cognates (i.e. have a high Levenshtein distance). Table 18: Correlations among the three linguistic measures: lexical distance with orthographic Levenshtein distance, lexical distance with the entropy measures and Levenshtein distance with the entropy measures. For the lexical-Levenshtein comparison, half the matrices are used, as these are symmetrical. For comparisons with entropy, the full matrices are used (each language pair in both directions), where for the lexical and Levenshtein distances, the values for each language pair in the two directions is the same. r p n Lexical – Levenshtein .85 < .01 10 Lexical – entropy 1-gram .59 < .01 20 Lexical – entropy 2-gram .65 < .01 20 Lexical – entropy 3-gram .54 < .05 20 Levenshtein – entropy 1-gram .71 < .001 20 Levenshtein – entropy 2-gram .67 < .01 20 Levenshtein – entropy 3-gram .69 < .01 20 68 The correlations of both lexical distance and Levenshtein distance with the different entropy measures are lower (ranging from r = .54 to r = .71), though all of them are still significant. Entropy, again, measures something completely different from the other two: It does not measure how similar or different two languages are, but how regular these similarities and differences are. A language pair can have a low lexical and Levenshtein distance yet a high entropy, or vice versa. These data show that both distance measures do correlate significantly with entropy, but they do not completely overlap with each other. Entropy is a distinct measure and can very well add to explaining intelligibility. Figure 11: Scatter plot of the correlation between lexical distance and orthographic Levenshtein distance. 69 For the Micrela project (see section 2.4), Heeringa et al. (2013) calculated the lexical distance and Levenshtein distance similarly to this study, but using a list of 100 nouns, instead of the 1500 words of all word classes used here. Table 19 shows the correlations of these distances with the same distances calculated using the data for this study, and the correlations of Heeringa et al.’s distances with the entropy measures from this study. The correlations of the lexical and Levenshtein distances are again very high (r = .90) and clearly significant, suggesting that calculating these distances based on 100 or 1500 words does not make a big difference. The correlations of Heeringa et al.’s distances with the entropy from this thesis are much lower, though all but one (lexical distance with entropy trigrams) are still significant. This again shows that although entropy correlates with the other linguistic measures, it is still very distinct, and can contribute to explaining intelligibility in addition to lexical and Levenshtein distance. Table 19: Correlations between the three linguistic measures calculated in this study, based on a list of about 1500 words, and the lexical and Levenshtein distances calculated by Heeringa et al. (2013), based on a list of 100 words. r p n Lexical (1500 vs. 100 words) .90 < .001 20 Levenshtein (1500 vs. 100 words) .90 < .001 20 Lexical (100) – entropy 1-gram (1500) .42 < .05 20 Lexical (100) – entropy 2-gram (1500) .42 < .05 20 Lexical (100) – entropy 3-gram (1500) .28 .12 20 Levenshtein (100) – entropy 1-gram (1500) .66 < .01 20 Levenshtein (100) – entropy 2-gram (1500) .59 < .01 20 Levenshtein (100) – entropy 3-gram (1500) .68 < .01 20 70 7.2 Using linguistic measures to predict intelligibility Two tests were used to measure intelligibility: a cloze test and a word translation task, carried out by Femke Swarte as part of the Micrela project. These experiments were described in section 6.2. Figure 12: Correlation of the lexical distance with the written cloze test with all languages included. The cases with English as a stimulus language (marked in red) are outliers. For the following analyses, all cases including English were removed from the data. The reason for this is the fact that when English was the stimulus language, participants performed at ceiling level (see Figure 12). The cause for this is not likely to lie in linguistic factors, but in the position that English currently has of a lingua franca in Europe. The participants have had so much exposure to and experience with 71 English, that they can understand it nearly perfectly. This will cancel out any possible effect of linguistic factors. There is no correlation between lexical distance and written intelligibility in the data shown in Figure 12. Because the Mantel test, which is used to calculate the significances, compares matrices rather than individual numbers, we cannot remove only these four data points: English needs to be excluded completely. Figure 13 shows the data from Figure 12 excluding English. The correlation is highly significant (Pearson’s R = -.87, p < .001). When excluding English from the analysis, four languages are left (Danish, Dutch, German and Swedish). This means there are 12 data points to be correlated (each language combined with the other three languages, in both directions). Figure 13: Correlation of the lexical distance with the written cloze test excluding the cases with English as the stimulus language or as the reader’s language. 72 7.2.1 Cloze test Table 20 shows the correlations of the three linguistic measures with the written cloze test. Having excluded the English cases, there are 12 cases left. The correlations calculated based on these 12 cases are very high, especially considering lexical (-.87) and orthographic distance (-.85). The correlations with the different entropy measures are lower (-.61, -.59, and -.56), but still considerable and significant. A multiple regression analysis including the three entropy measures yields R = .67 (R2 = .45), but the model is not significant (F(3,8) = 2.2, p = .17). The contribution of unigrams is the largest in this model, followed by trigrams and bigrams (see Table 21), but none of the predictors are significant. All correlations are negative, which means that the greater the distance between the languages is according to the linguistic measure in question, the lower the intelligibility score for that language pair. The scatter plots in Figure 14, Figure 15 and Figure 16 show a graphical representation of these correlations. Table 20: Correlations of the three linguistic measurements with the results of the written cloze test. The n derives from the full matrix (each language combination in both directions) for the languages Danish, Dutch, German and Swedish. r p n Lexical distance -.87 < .001 12 Orthographic Levenshtein distance -.85 < .001 12 Orthographic entropy 1-gram -.61 < .05 12 Orthographic entropy 2-gram -.59 < .05 12 Orthographic entropy 3-gram -.56 < .05 12 73 Table 21: Properties of the three entropy measures as predictors in a multiple regression analysis with the written cloze test as the dependent variable. Model statistics: R2 = .45, F(3,8) = 2.21, p = .165. β t(8) p Unigrams -.586 -1.22 .259 Bigrams .339 0.41 .695 Trigrams -.527 -0.86 .417 Figure 14: Correlation of the lexical distance with the written cloze test excluding English. 74 Figure 15: Scatter plot of the Levenshtein distance with the results of the written cloze test. 75 Figure 16: Scatter plot of the entropy measures with the results of the written cloze test. Multiple linear regression analysis was used to find the best possible combination of predictors. The model that best predicted the results of the cloze test was a model with only one predictor: lexical distance (β = -.87, p < .001; R2 = .76, p < .001). This model is only slightly better than the model including only Levenshtein distance as a predictor (β = -.85, p < .001; R2 = .72, p < .001). In any model containing more than one predictor, all predictors were not significant. The entropy measures do not significantly add anything in combination with the lexical or Levenshtein distances. 76 7.2.2 Word translation task Table 22 shows the correlations between the results of the word translation task and the linguistic measures described in section 6.1. The pattern is similar to the results of the correlations with the cloze test, though more extreme: lexical distance has the highest correlation (-.85, p < .01) followed by the Levenshtein distance (-.82, p < .01) and the correlations with entropy are the lowest and least significant (-.62, -.56, -.56, p < .05 in all cases). A multiple regression analysis including the three entropy measures yields R = .71 (R2 = .50), but the model is not significant (F(3,8) = 2.7, p = .12). The contribution of trigrams is the largest in this model, followed by unigrams and bigrams (see Table 23), but none of the predictors are significant. All correlations are negative, which means that the greater the distance between the languages is according to the linguistic measure in question, the lower the intelligibility score for that language pair. All these results show the same pattern as the results of the correlations with the written cloze test (see section 7.2.1). The scatter plots showing the correlations graphically can be found in Figure 17, Figure 18 and Figure 19. It is clear from these graphs, too, that the correlation with lexical distance is the strongest (Figure 17) and that with entropy the weakest (Figure 19). A multiple regression analysis showed that the model with only lexical distance as predictor variable was the best model to predict the results of the word translation task (β = -.91, p < .001, R2 = .83). As was the case with the results of the cloze test, the entropy measures do not significantly add predictive power in combination with the lexical or Levenshtein distances. 77 Table 22: Correlations of the three linguistic measures with the written word translation task. The n derives from the full matrix (each language combination in both directions) for the languages Danish, Dutch, German and Swedish. r p n Lexical distance -.85 < .01 12 Orthographic Levenshtein distance -.82 < .01 12 Orthographic entropy 1-gram -.62 < .05 12 Orthographic entropy 2-gram -.56 < .05 12 Orthographic entropy 3-gram -.56 < .05 12 Table 23: Properties of the three entropy measures as predictors in a multiple regression analysis with the written word translation task as the dependent variable. Model statistics: R2 = .50, F(3,8) = 2.71, p = .115. β t(8) p Unigrams -.760 -1.66 .137 Bigrams .723 .91 .389 Trigrams -.774 -1.32 .223 78 Figure 17: Scatter plot of the lexical distance with the results of the written intelligibility task. 79 Figure 18: Scatter plot of the Levenshtein distance with the results of the written intelligibility task. 80 Figure 19: Scatter plot of the orthographic entropy measures with the results of the written intelligibility task. 81 8. Discussion 8.1 Linguistic measures The results of the lexical distance and Levenshtein distance are largely as was expected. They both show that Danish and Swedish are closest together, followed by German and Dutch. The results for the lexical distance also clearly single out English as the most distant from all the others, as is expected from the history of this language (see Chapter 4). The results of the entropy calculations, too, clearly put Swedish and Danish closer together than all other language combinations. The other relations expected based on the background of the languages, however, do not show. A reason for this could be that the entropy calculated here is based on the orthographic forms of the words. As described in Chapter 4, the standard orthographies for these language were established only in the last few centuries, when the languages were already clearly separated from each other. These standards developed largely independently from those of the other languages. Differences and similarities in orthography, then, need not reflect differences and similarities in the languages themselves. For example, German and Dutch share many cognates that both have the sound /u/ (Buch - boek ‘book’, suchen - zoeken ‘to search’, tun - doen ‘to do’). They each have a different way to spell it, however: in German, this sound is represented by <u>, whereas in Dutch it is the digraph <oe>. This suggests a difference that is not in fact there. In the orthographies of Swedish and Danish, a conscious effort was made to make them similar to each other, in order to preserve Scandinavian unity (Vikør 2001). It is not surprising, then, that the entropy between them is the lowest, in both directions. The highest average entropies exist between German and other languages. This suggests that the correspondences between written German and the other languages are the most irregular.14 14 This is not due to the German practice to capitalize nouns, as capitalization was ignored for the entropy calculations. 82 A clear asymmetry between Danish and Swedish, which is often reported in intelligibility research (Delsing and Lundin Åkesson 2005; Gooskens et al. 2010; Schüppert 2011; Gooskens and Van Bezooijen 2013), was not found by these entropy measurements. This is against Moberg et al.’s (2007) findings. It is possible that this is due to the fact that the entropy calculations in the present study were only on the orthographic level. The asymmetry in intelligibility has been established more strongly in spoken language than in written language. As mentioned in Chapter 4, the orthography of Danish is very different from its present day pronunciation, which is not the case for Swedish. It is possible that the entropy between the spoken versions of these languages, which will be calculated in future research, will show the asymmetry that is expected. The greatest amount of asymmetry was found in all combinations involving English: the asymmetry in the sum of the entropies for combinations with English ranged from .15 to .20, whereas the highest asymmetry in the other combinations was .11. In all of these cases, the entropies for pairs with English as the stimulus language were higher than those with English as the reader’s native language. This means that someone reading English encounters more unpredictability than an English-speaking person reading one of the other languages. This could be due to the relative irregularity of English spelling, which was described in Chapter 4. It should also be repeated here that the percentage of cognates between English and the other languages was relatively low (see Table 7), and for all of them lower than the minimum of 800 words that Moberg et al. (2007) established for reliable entropy measurements. The calculations should be redone with an extended version of the word list, which does contain enough cognates for the combinations with English as well. 83 8.2 Research questions The first research question of this thesis is: Are orthographic entropy measures a useful predictor of written intelligibility in addition to Levenshtein distance? Significant correlations were found for all entropy measures with the results of the two intelligibility experiments. The correlations with the cloze test ranged from r = -.56 to r = -.61 and those with the word translation task ranged from r = -.56 to r = -.62. This shows that the amount of entropy (irregularity) between languages can predict the level of intelligibility to some extent: when the entropy from one language to another is high, the intelligibility between the two language will be low. A regression analysis combining all three entropy measures as predictors did yield this relation but it was not significant for both intelligibility tests. Unigrams (in the cloze test) and trigrams (in the word translation task) contributed the most to the prediction. This might be due to the fact that unigrams and trigrams have the lowest correlation with each other. Bigrams overlap with both of the others. Although all correlations with the entropy were significant at the .05 level, the correlations with lexical distance and Levenshtein distance were much higher (between -.82 and -.87). In a linear regression analysis, the best fitting model included lexical distance as the only predictor for both the cloze test and the word translation task. This suggests that, for these intelligibility experiments, the entropy measures do not add explanatory power to the lexical and Levenshtein measures of linguistic distance. There are several possible causes for this. The main contribution of entropy to predicting intelligibility, in theory, is its inherent ability to register asymmetry between two language pairs. Especially for Danish and Swedish, asymmetry has been repeatedly shown, and Gooskens, Van Bezooijen and Van Heuven (accepted) showed this asymmetry for Dutch and German too. The only asymmetry found in these data, however, was that between English and the other languages. As mentioned above, a probable cause for this is the fact that these calculations were based on the written versions of the languages, whereas the asymmetry usually surfaces only in the spoken intelligibility (see e.g. Schüppert 2011 for Danish and Swedish). 84 The asymmetry that did surface in the entropy measurements, namely that involving English, could not be correlated to the intelligibility data. The reason for this is that the participants performed at ceiling level for English. Apparently, northern and western Europeans are exposed to English so much, in school and in their daily life, that they can understand the language well enough to perform these tasks nearly perfectly. This makes it impossible to show whether there is any asymmetry present in the level of intelligibility that is due to linguistic factors, such as irregular orthographic correspondences. There is another issue in using entropy to predict the intelligibility scores in this study. Entropy, as opposed to lexical distance and Levenshtein distance, does not measure actual similarity, but it measures the regularity of the correspondences between two languages - regardless of how similar these correspondences are. In order for this regularity to be helpful for intelligibility, the reader or listener needs to have some experience with the stimulus language. The experiments which this study draws from, however, were very short: the texts in the cloze test consist of about 200 words, and in the word translation task, each participant had to translate 50 words. If the participant does not have any previous experience with the stimulus language, these experiments does not give him enough input to benefit from a low entropy. The second and third research questions are related to each other. The answer to the second research question, Can lexical distance accurately predict written intelligibility?, is clearly ‘yes’. The lexical distance had a very high correlation with both intelligibility experiments (r = -.87 and r = -.85). A high lexical distance between two languages means low intelligibility. In the linear regression analyses, the best model for both intelligibility tasks included lexical distance as the only predictor, with R2 = .76 for the cloze test and R2 = .83 for the word translation task. The third research question, Can orthographic Levenshtein distance accurately predict written intelligibility?, can be answered affirmatively as well. The correlations of the intelligibility data with the Levenshtein distance were very high: r = -.85 and r = -.82. However, contrary to the expectations and to results from previous research (Gooskens 2007a, 2007b; Beijering, Gooskens and Heeringa 2008; Kürschner, 85 Gooskens and Van Bezooijen 2008), they were slightly lower than the correlations with lexical distance. There are a few differences between these studies and the present one. Firstly, the lexical distance in this study was calculated based on a list of 1500 words instead of 100 - 300, as in the previous research, but the high correlation between the lexical distance from this study and that from Heeringa et al. (2013) reported in section 7.1.4 suggests that this should not make a big difference. Moreover, the Levenshtein distance was calculated using the same extensive word list. A more likely explanation lies in the fact that the studies cited above involved only the Scandinavian languages Danish, Swedish and Norwegian, or dialects of these languages. The lexical variation among these language varieties is not very high (Gooskens 2007a), as also exemplified by the very low lexical distance between Danish and Swedish in this study (see Table 10 or Table 7). In such a group of languages, lexical distance might not be a very good predictor, as the lexical distances between all varieties is simply very low. The group of languages in this study, however, includes West Germanic languages as well as the North Germanic languages from Scandinavia. More lexical variation is present in this group. This is likely to make lexical distance a better predictor of intelligibility. 8.3 Future research First of all, this research should be expanded with spoken intelligibility instead of written intelligibility and linguistic measures calculated with phonological transcriptions of the word lists. The results for spoken intelligibility have proven to be different from those for written intelligibility (see e.g. Schüppert 2011): most notably, spoken intelligibility between Danish and Swedish is asymmetrical, whereas written intelligibility is not. The linguistic measures calculated with phonological transcriptions are likely to be different from those based on orthography as well, as different languages have different ways of representing the pronunciation in their orthography. In addition, the orthography is not always a faithful representation of the pronunciation of a language, especially in the cases of English and Danish (see Chapter 4). The relations between these languages based on the pronunciation could 86 be very different from those based on the orthography, as were shown in this study. This is currently already being worked in the Micrela project, using the same word list used for this thesis. Secondly, the word list needs to be expanded in order to obtain sufficient cognates for the combinations involving English, as well. In the current study, these are between 600 and 700 words, whereas a minimum of 800 is required for reliable entropy calculations (Moberg et al. 2007). In addition to lexical and Levenshtein distance and conditional entropy, there are more ways to measure linguistic relations between languages. Spruit (2006) and Nerbonne and Wiersma (2006), for example, developed methods to computationally measure syntactic and morphological distances between languages. Research investigating how well linguistic factors on these levels (i.e. other than the word level) can predict intelligibility between languages should be carried out in the future. A disadvantage of using entropy to predict intelligibility, is the fact that readers who have never encountered the language they see before, are not helped by a low entropy. The method of conditional entropy measures whether the correspondences between two languages are regular. It does not measure, however, how transparent these correspondences are to the naive reader. If the letter <k> in language A, for example, corresponds in 100% of the cases to a letter <k> in language B, this is calculated as zero entropy. If that letter <k>, however, corresponds in 100% of the cases to the letter <s> in language B, this, too, is calculated as zero entropy. There is, after all, no irregularity. To a reader with native language B who has never before encountered language A, however, the first of these two cases will pose no problems for intelligibility, whereas the second case most likely will. How should he guess that he needs to replace every <k> with an <s>? After familiarizing himself with language A, however, he might be able to derive the rule eventually. In this case, the lower the entropy is (i.e. the lower the irregularity), the easier it will be for the reader to derive the correspondences between a foreign language and his native language. In future research, the influence of the amount of entropy on the speed with which a participant learns to understand a new language should be studied. 87 9. Conclusion The main research question of this thesis is: Are orthographic entropy measures a useful predictor of written intelligibility in addition to Levenshtein distance? The entropy measures correlated significantly with the results of both intelligibility experiments, but the correlations of both the lexical distance and Levenshtein distance with the intelligibility data was much higher. In a regression analysis, both Levenshtein distance and entropy were excluded, leaving lexical distance as the strongest predictor. This is caused by the lack of asymmetry in the entropy calculations, which is the strength of the conditional entropy method. The asymmetry was present only in the language combinations involving English. In the intelligibility data, however, the items with English had to be excluded due to ceiling effects. The second and third research questions are connected: Can lexical distance accurately predict written intelligibility? and Can orthographic Levenshtein distance accurately predict written intelligibility? The answer to both of these questions is ‘yes’: the correlations of both Levenshtein distance and lexical distance with the intelligibility data were very high (-.82 at the lowest). However, contrary to previous research, the lexical distance was shown to be the better predictor of the two in a regression analysis. In previous studies (Gooskens 2007a, 2007b; Beijering, Gooskens and Heeringa 2008; Kürschner, Gooskens and Van Bezooijen 2008), the Levenshtein distance tended to be the best predictor. 88 10. References Beijering, Karin, Charlotte Gooskens and Wilbert Heeringa (2008). Predicting intelligibility and perceived linguistic distances by means of the Levenshtein algorithm. Linguistics in the Netherlands, 13-24. Bø, Inge (1978). Ungdom og naboland. Stavanger: Rogalandsforskning (rapport 4). Börestam, Ulla (1987). Dansk-svensk språkgemenskap på undantag. Uppsala: Uppsala Universitet. Chambers, Jack and Peter Trudgill (1980). Dialectology. Cambridge: Cambridge University Press. Cheng, Chin-Chuan (1997). Measuring relationship among dialects: DOC and related sources. Computational Linguistics & Chinese Language Processing, 2(1), 41-72. Delsing, Lars-Olof and Katarina Lundin Åkesson (2005). Håller språket ihop i Norden? En forskningsrapport om ungdomars förståelse av danska, svenska och norska. Copenhagen: Nordiska ministerådet. Engstrand, Olle (1999). Swedish. Handbook of the International Phonetic Association. Cambridge: Cambridge University Press, 140-142. Van Gelderen, Elly (2006). A History of the English Language. Amsterdam/Philadelphia: John Benjamins Publishing Company. Goebl, Hans (1982). Dialektometrie; Prinzipen und Methoden des Einsatzes der numerischen Taxonomie im Bereich der Dialektgeographie. PhilosophischHistorische Klasse Denkschriften, 157. Vienna: Verlag der Österreichischen Akademie der Wissenschaften. With assistance of W.-D. Rase and H. Pudlatz. Goebl, Hans (1993). Probleme und Methoden der Dialektometrie: Geolinguistik in globaler Perspektive. In: W. Viereck (ed.), Proceedings of the International Congress of Dialectologists, 1. Stuttgart: Franz Steiner Verlag, 37-81. Gooskens, Charlotte (2007a). The contribution of linguistic factors to the intelligibility of closely related languages. Journal of Multilingual and Multicultural Development, 28(6), 445-467. Gooskens, Charlotte (2007b). Contact, attitude and phonetic distance as predictors of inter-Scandinavian communication. In: J.-M. Eloy and T. ÓhLfearnáin (eds.), 89 Near languages – Collateral languages. Actes du colloque international réuni à Limerick, du 16 au 18 juin 2005, 99-109. Gooskens, Charlotte, Wilbert Heeringa and Karin Beijering (2008). Phonetic and lexical predictors of intelligibility. International Journal of Humanities and Arts Computing, 2(1-2), 63-81. Gooskens, Charlotte and Renée van Bezooijen (2013). Explaining Danish-Swedish asymmetric word intelligibility – An error analysis. In: C. Gooskens & R. van Bezooijen (eds.), Phonetics in Europe: Perception and Production. Frankfurt a.M.: Peter Lang, 59-82. Gooskens, Charlotte, Renée van Bezooijen and Vincent van Heuven (accepted). Mutual intelligibility of Dutch-German cognates by children: The devil is in the detail. Linguistics, 53(2). Gooskens, Charlotte, Vincent van Heuven, Renée van Bezooijen and Jos Pacilly (2010). Is spoken Danish less intelligible than Swedish? Speech Communication, 52, 1022-1037. Gordon, Raymond G., Jr., ed. (2005). Ethnologue: Languages of the World, 15th edn. Dallas: SIL International. Online version: http://www.ethnologue.com/. Grønnum, Nina (1998). Danish. Journal of the International Phonetic Association, 28, 99-105. Gussenhoven, Carlos (1999). Dutch. Handbook of the International Phonetic Association. Cambridge: Cambridge University Press, 74-77. Harbert, Wayne (2007). The Germanic languages. Cambridge: Cambridge University Press. Haugen, Einar (1953). Nordiske sprakproblemer- en Opinionsundersøkelse. Nordisk Tidtkrift, 29, 225-249 . Haugen, Einar (1966). Semicommunication: The language gap in Scandinavia. Sociological Inquiry, 36, 280-297. Heeringa, Wilbert (2002). Over de indeling van de Nederlandse streektalen. Een nieuwe methode getoetst. Driemaandelijkse bladen voor taal en volksleven in het oosten van Nederland, 54(1-4), 111-148. Heeringa, Wilbert (2004). Measuring dialect pronunciation differences using Levenshtein distance. PhD thesis. Groningen: Grodil, 46. 90 Heeringa, Wilbert, Jelena Golubovic, Charlotte Gooskens, Anja Schüppert, Femke Swarte and Stefanie Voigt (2013). Lexical and orthographic distances between Germanic, Romance and Slavic languages and their relationship to geographic distance. In: C. Gooskens & R. van Bezooijen (eds.), Phonetics in Europe: Perception and Production. Frankfurt a.M.: Peter Lang, 99-137. Hoppenbrouwers, Cor and Geer Hoppenbrouwers (1988). De featurefrequentiemethode en de classificatie van Nederlandse dialecten. TABU: Bulletin voor taalwetenschap, 18(2), 51-92. Hoppenbrouwers, Cor and Geer Hoppenbrouwers (2001). De indeling van de Nederlandse streektalen. Dialecten van 156 steden en dorpen geklasseerd volgens de FFM. Assen: Koninklijke Van Gorcum B.V. Kessler, Brett (1995). Computational dialectology in Irish Gaelic. In Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics. Dublin: EACL, 60-67. Kohler, Klaus (1999). German. Handbook of the International Phonetic Association. Cambridge: Cambridge University Press, 86-89. Kürschner, Sebastian, Charlotte Gooskens and Renée van Bezooijen (2008). Linguistic determinants of the intelligibility of Swedish words among Danes. International Journal of Humanities and Arts Computing, 2(1-2), 83-100. Ladefoged, Peter (1999). English, American. Handbook of the International Phonetic Association. Cambridge: Cambridge University Press, 41-44. Mantel, Nathan (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27(2), 209–220. Maurud, Øivind (1976). Nabospråkforståelse i Skandinavia: en undersøkelse om gjensidig forståelse av tale- og skriftspråk i Danmark, Norge og Sverige. Stockholm: Nordiska rådet. Moberg, Jens, Charlotte Gooskens, John Nerbonne and Nathan Vaillette (2007). Conditional entropy measures intelligibility among related languages. In: P. Dirix, I. Schuurman, V. Vandeghinste and F. Van Eynde (eds.), Computed Linguistics in the Netherlands 2006: Selected papers from the 17th CLIN Meeting. Utrecht: LOT, 51-66. 91 Molewijk, G.C. (1992). Spellingverandering van zin naar onzin (1200–heden). The Hague: Sdu Uitgeverij Koninginnegracht. Nerbonne, John and Wybo Wiersma (2006). A Measure of Aggregate Syntactic Distance. In: J. Nerbonne and E. Hinrichs (eds.), Proceedings of the Workshop on Linguistic Distances, 82-90. Scheuringer, Hermann and Christian Stang (2004). Die deutsche Rechtschreibung. Vienna: Edition Praesens. Schüppert, Anja (2011). Origin of asymmetry. Mutual intelligibility of spoken Danish and Swedish. PhD thesis. Groningen: Grodil, 94. Schüppert, Anja, Nanna Haug Hilton and Charlotte Gooskens (accepted). Swedish is beautiful, Danish is ugly? Investigating the link between language attitudes and intelligibility. Linguistics, 53(2). Séguy, Jean (1973). La dialectométrie dans l’Atlas linguistique de la Gascogne. In: Revue de linguistique Romane, 37, 1-24. Spruit, Marco R. (2006). Measuring Syntactic Variation in Dutch Dialects. Literary and Linguistic Computing, 21(4), 493-506. Swadesh, Morris (1971). The origin and diversification of language. Chicago: Aldine. Edited post mortem by Joel Sherzer. Swarte, Femke (in preparation). Mutual intelligibility in the Germanic Language Area. PhD thesis. Groningen: Grodil. Swarte, Femke, Anja Schüppert and Charlotte Gooskens (accepted). Does German help speakers of Dutch to understand written and spoken Danish words? - The role of second language knowledge in decoding an unkown but related language. In: G. De Angelis, U. Jessner and M. Kresic (eds.), Crosslinguistic Influence and Multilingualism. Tang, Chaoju and Vincent J. van Heuven (2009). Mutual intelligibility of Chinese dialects experimentally tested. Lingua, 119(5), 709-732. Tang, Chaoju and Vincent J. van Heuven (2007). Mutual intelligibility and similarity of Chinese dialects. In: B. Los and M. van Koppen (eds.), Linguistics in the Netherlands 2007. Amsterdam: John Benjamins, 223-234. Vikør, Lars S. (2001). The Nordic Languages: Their Status and Interrelations. Oslo: Novus forlag (Novus Press). 92 Voegelin, C.F. and Zellig S. Harris (1951). Methods for determining intelligibility among dialects of natural languages. Proceedings of the American Philosophical Society, 95(3), 322-329. Wolff, Hans (1959). Intelligibility and Inter-Ethnic Attitudes. Anthropological Linguistics, 1(3), Urbanization and standard language: A symposium presented at the 1958 meetings of the American Anthropological Association, 34-41. Etymological dictionaries Duden (2001). Herkunftswörterbuch: Etymologie der deutschen Sprache (3rd edition). Mannheim/Leipzig/Wien/Zürich: Dudenverlag. Katler, Jan (2000). Politikens Etymologisk Ordbog. Aalborg: Politikens Forlag. Norstedts etymologiska ordbok (2008). Nordstedts Akademiska Förlag. Oxford English Dictionary (OED), online edition. Online at: www.oed.com (retrieved April-September 2014). Philippa, Marlies, Frans Debrabandere and Arend Quak (2005). Etymologisch Woordenboek van het Nederlands, F-Ka. Amsterdam: Amsterdam University Press. Philippa, Marlies, Frans Debrabandere Arend Quak, Tanneke Schoonheim and Nicoline van der Sijs. Etymologisch Woordenboek van het Nederlands, web edition: www.etymologie.nl. Amsterdam: Amsterdam University Press (retrieved April-June 2014). 93 Appendix Appendix A: Excluded Words No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Part of Speech a a a a a a adv adv adv adv conj det det det interjection interjection interjection interjection modal modal modal modal modal modal modal modal modal modal modal n n n 94 Word chief due key labour major prime all either off to cos another either whatever no well yeah yes can could may might must ought shall should used will would claim good item 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 n n prep prep pron pron pron pron pron pron pron pron v v v v v v provision rate down into anything herself himself itself myself themselves whom yourself face manage market mind propos result 95 Appendix B: Word List No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Part of Speech det v prep conj det prep infinitive-marker v pron prep prep pron conj pron pron prep prep v prep prep adv det English the be of and a in to have it to for i that you he on with do at by not this Dutch de zijn van en een in te hebben het naar voor ik dat jij hij aan met doen te bij niet dit German der sein von und ein in zu haben das zu für ich dass du er an mit tun bei bei nicht dies 96 Danish den være af og en i at have det til til jeg at du han på med gøre ved fra ikke dette Swedish den vara av och en i att ha det till till jag att du han på med göra vid av inte detta 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 conj prep pron det det pron conj det conj pron det v conj det v det pron det v det v pron prep adv adv v v but from they his that she or which as we an say if their go what there all get her make who as out up see know maar van zij zijn dat zij of welk als wij een zeggen of hun gaan wat daar al krijgen haar maken wie als uit op zien weten aber von sie sein das sie oder welch als wir ein sagen wenn ihr gehen was da alle kriegen ihr machen wer wie aus oben sehen wissen 97 men fra de hans den hun eller hvilke som vi en sige hvis deres gå hvad der alle få hendes lave hvem som ud op se vide men från de hans den hon eller vilken som vi en säga om deras gå vad där alla få hennes göra vem såsom ut upp se veta 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 n v pron det adv pron n det adv v det v conj adv prep adv a det pron det a v adv det n adv adv time take them some so him year its then think my come than more about now last your me no other give just these people also well tijd nemen hen enkele zo hem jaar zijn dan denken mijn komen dan meer over nu laatst jouw mij geen ander geven slechts deze mensen ook goed zeit nimmen ihr einige so ihm jahr sein dann denken mein kommen dann mehr über jetzt letzt dein mir kein ander geben nur dies leute auch gut 98 tid tage dem nogen så ham år dens så tænke min komme end mere om nu sidst din mig ingen anden give bare disse mennesker også godt tid ta dem någon så honom år dess då tänka min komma än mera om nu sista din mig ingen annan ge bara dessa människor också bra 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 det adv a adv conj n v prep v pron det adv conj adv adv a v n det v n prep adv adv det det pron any only new very when way look like use her such how because when as good find man our want day between even there many those one enig slechts nieuw zeer als weg kijken zoals gebruiken haar zulk hoe omdat wanneer als goed vinden man ons willen dag tussen zelfs daar veel die men jeder nur neu sehr wenn weg schauen wie nutzen ihr solch wie weil wann als gut finden mann unser willen tag zwischen sogar da viel diejenigen man 99 nogen kun ny meget når vej se ligesom bruge hende sådan hvordan fordi hvornår som god finde mand vores ville dag mellem selv der mange dem man någon endast ny mycket när väg titta ligesom använda henne sådan hur därför att när som bra finna man vår vilja dag mellan även där många dem man 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 prep adv conj n v prep adv adv n adv prep adv v det adv v v det a n v n v n a adv n after down so thing tell through back still child here over too put own on work become more old government mean part leave life great where case na neer dus ding vertellen door terug steeds kind hier over ook zetten eigen aan werken worden meer oud regering menen deel verlaten leven groot waar geval nach hinunter also ding erzählen durch zurück noch kind hier über auch stellen eigen zu arbeiten werden mehr alt regierung meinen teil verlassen leben groß wo fall 100 efter ned så ting fortælle gennem tilbage stadig barn her over også sætte egen på arbejde blive mere gammel regering betyde del forlade liv stor hvor tilfælde efter ned så ting berätta genom tillbaka fortfarande barn här över också sätta egen på arbeta bli mera gammal regering betyda del lämna liv stor var fall 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 n adv v det pron n v v n det v adv v n n adv adv n n v n n prep n prep adv adv woman over seem same us work need feel system each may much ask group number however again world area show course company under problem against never most vrouw over lijken zelfde ons werk nodig hebben voelen systeem elk mogen veel vragen groep nummer daarentegen weer wereld gebied laten zien cursus bedrijf onder probleem tegen nooit meest frau über scheinen gleich uns arbeit brauchen fühlen system jeder dürfen viel fragen gruppe nummer jedoch wieder welt gebiet zeigen kurs firma unter problem gegen nie meist 101 kvinde over synes samme os arbejde have brug for føle system hver måtte meget spørge gruppe nummer dog igen verden område vise kursus entreprise under problem mod aldrig mest kvinna över tyckas samma oss arbete behöver känna system varje få mycken fråga grupp nummer dock igen värld område visa kurs företag under problem mot aldrig mest 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 n v v n n a adv pron n adv a n prep adv conj adv v n n a n adv v n v a n service try call hand party high about something school in small place before why while away keep point house different country really provide week hold large member dienst proberen bellen hand feest hoog ongeveer iets school in klein plaats voor waarom terwijl weg houden punt huis anders land echt voorzien week houden groot lid dienst versuchen anrufen hand party hoch ungefähr etwas schule in klein platz für warum während weg halten punkt haus anders land wirklich bieten woche halten groß mitglied 102 tjeneste forsøge ringe op hånd fest høj omkring noget skole i lille plads foran hvorfor mens væk holde punkt hus anderledes land virkelig forsyne uge holde stor medlem tjänst försöka ringa hand fest hög omkring något skola i små plats före varför medan borta hålla punkt hus annorlunda land verkligen förse vecka hålla stor medlem 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 adv det v prep v n prep a conj prep v det n v conj n adv n adv n v a v n n adv det always next follow without turn end within local where during bring most word begin although example next family rather fact like social write state percent quite both altijd volgend volgen zonder draaien einde in lokaal waar gedurende brengen meest woord beginnen hoewel voorbeeld volgend familie liever feit leuk vinden sociaal schrijven staat procent behoorlijk beide immer nächst folgen ohne drehen ende in lokal wo während bringen meist wort anfangen obwohl beispiel nächst familie eher tatsache mögen sozial schreiben staat prozent ziemlich beide 103 altid næste følge uden dreje ende inden lokal hvor under bringe mest ord begynde skønt eksempel næste familie snarere faktum synes om social skrive stat procent helt begge alltid nästa följa utan vända ände inom lokal där under bringa mest ord börja fastän exempel nästa familj snarare faktum tycka om social skriva stat procent ganska båda 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 v v a adv v v det n n n n a n n n n n v n n n v n n n adv n start run long right set help every home month side night important eye head information question business play power money change move interest order book often development starten rennen lang goed zetten helpen elk thuis maand zijde nacht belangrijk oog hoofd informatie vraag zaak spelen macht geld verandering verhuizen interesse bestelling boek vaak ontwikkeling starten laufen lang richtig setzen helfen jeder zuhause monat seite nacht wichtig auge haupt information frage unternehmen spielen macht geld veränderung umziehen interesse bestellung buch oft entwicklung 104 starte løbe lang rigtig sætte hjælpe hver hjem måned side nat vigtig øje hoved information spørgsmål forretning spille magt penge forandring flytte interesse bestilling bog ofte udvikling starta löpa lång rätt ställa hjälpa varje hem månad sida natt viktig öga huvud information fråga företag spela makt pengar förändring flytta intresse beställning bok ofta utveckling 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 a a v v n conj n n n n adv adv v n conj conj n v v n adv a pron n v n n young national pay hear room whether water form car other yet perhaps meet level until though policy include believe council already possible nothing line allow need effect jong nationaal betalen horen kamer of water vorm auto ander nog misschien ontmoeten niveau tot hoewel beleid bevatten geloven raad al mogelijk niets lijn toestaan behoefte effect jung national Zahlen hören zimmer ob wasser form auto ander noch vielleicht treffen niveau bis obwohl politik einbeziehen glauben rat schon möglich nichts linie erlauben bedarf effekt 105 ung national betale høre værelse hvorvidt vand form bil anden endnu måske møde niveau til selvom politik indeholde tro råd allerede mulig intet line tillade behov effekt ung nationell betala höra rum huruvida vatten form bil annan ännu kanske träffa nivå till fast politik inkluderar tro råd redan möjlig ingenting linje tillåta behov effekt 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 a n v v n n n v n conj n n n v n n adv a adv det v n adv a n a adv big use lead stand idea study lot live job since name result body happen friend right least right almost much carry authority long early view public together groot gebruik leiden staan idee studie lot leven baan sinds naam resultaat lichaam gebeuren vriend recht minst rechts bijna veel dragen autoriteit lang vroeg uitzicht publiek samen groß gebrauch leiten stehen idee studie los leben job seit name ergebnis körper passieren freund recht mindest rechts fast viel tragen autorität lang früh blick publik zusammen 106 stor brug føre stå idé studium lod leve job siden navn resultat krop ske ven rettighed mindst højre næsten megen bære autoritet langt tidlig udsigt offentlig sammen stor användning leda stå idé studie lott leva jobb sedan namn resultat kropp ske vän rättighet minst höger nästan mycken bära auktoritet långt tidig utsikt offentlig tillsammans 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 v n conj a conj n n v n v v a a adv n n n n n v n v n adv n n n talk report after only before bit face sit market appear continue able political later hour law door court office let war produce reason less minister subject person praten verslag na enig voor beetje gezicht zitten markt verschijnen doorgaan capabel politiek later uur wet deur hof kantoor laten oorlog produceren reden minder minister onderwerp persoon reden report nach einzig bevor bisschen gesicht sitzen markt scheinen fortsetzen fähig politisch später stunde gesetz tür gericht büro lassen krieg produzieren grund weniger minister thema person 107 snakke rapport efter eneste før stykke ansigt sidde marked synes fortsætte dygtig politisk senere time lov dør domstol kontor lade krig producere grund mindre minister emne person prata rapport efter enda före bitars ansikte sitta marknad synas fortsätta duktig politisk senare timme lag dörr domstol kontor låta krig producera skäl mindre minister ämne person 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 n a a v n v v a prep n v v v n n n v a n adv adv n n v v adv v term particular full involve sort require suggest far towards period consider read change society process mother offer late voice both once police kind lose add probably expect periode bijzonder vol betrekken soort vereisen suggereren ver richting periode beschouwen lezen veranderen maatschappij proces moeder aanbieden laat stem beide eens politie soort verliezen toevoegen waarschijnlijk verwachten laufzeit besondere voll beteiligen sorte erfordern vorschlagen weit zu periode betrachten lesen ändern gesellschaft prozess mutter anbieten spät stimme beide einmal polizei art verlieren hinzufügen warscheinlich erwarten 108 periode særlig fuld involvere sort kræve foreslå fjern mod periode betragte læse forandre samfund proces mor tilbyde sen stemme begge engang politi slags tabe tilføje sandsynligvis forvente term särskild full involvera sort kräva föreslå fjärran mot period överväga läsa ändra samhälle process mor erbjuda sen röst båda en gång polis slag förlora tillägga troligen förvänta 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 adv a adv n a n n adv v n a n det n n v n n n adv n v v n adv adv v ever available no price little action issue far remember position low cost little matter community remain figure type research actually education fall speak few today enough open ooit beschikbaar niet prijs klein actie kwestie ver herinneren positie laag kosten weinig zaak gemeenschap blijven figuur type onderzoek eigenlijk educatie vallen spreken paar vandaag genoeg openen je verfügbar nicht preis klein aktion frage weit erinnern position niedrig kosten wenig sache gemeinschaft bleiben figur typ forschung eigentlich bildung fallen sprechen paar heute genug öffnen 109 nogensinde tilgængelig nej pris lille handling emne fjernt huske position lav omkostninger lidt sag fællesskab forblive figur type forskning faktisk uddannelse falde tale få i dag nok åbne någonsin tillgänglig nej pris liten handling fråga långt minnas position låg kostnad litet sak gemenskap förbli figur typ forskning faktiskt utbildning falla tala få i dag nog öppna 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 a v n n n n n n v n n v n v a v v n v n n v n det v v v bad buy programme minute moment girl age centre stop control value send health decide main win understand decision develop class industry receive back several return build spend slecht kopen programma minuut moment meisje leeftijd centrum stoppen controle waarde zenden gezondheid besluiten hoofdwinnen begrijpen beslissing ontwikkelen klasse industrie ontvangen rug verschillende terugkeren bouwen besteden schlecht kaufen programm minut moment mädchen alter zentrum stoppen kontrolle wert schicken gesundheit entscheiden hauptgewinnen verstehen entscheidung entwickeln klasse industrie empfangen rücken verschiedene zurückkehren bauen ausgeben 110 dårlig købe program minut moment pige alder centrum stoppe kontrol værdi sende sundhed beslutte hovedvinde forstå beslutning udvikle klasse industri modtage ryg adskillige vende tilbage bygge bruge dålig köpa program minut moment flicka ålder centrum stoppa kontroll värde sända hälsa besluta huvudvinna förstå beslut utveckla klass industri mottaga rygg flera återvända bygga spendera 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 n n n prep v v a v prep v a n adv n n n adv n n n prep n v a v n a force condition paper off describe agree economic increase upon learn general century therefore father section patient around activity road table including church reach real lie mind likely kracht conditie papier vanaf beschrijven eens zijn economisch toenemen op leren algemeen eeuw daarom vader sectie patiënt rond activiteit weg tafel inclusief kerk bereiken echt liggen geest waarschijnlijk kraft kondition papier ab beschreiben zustimmen wirtschaftlich zunehmen auf lernen allgemein jahrhundert deswegen vater abschnitt patient rund aktivität weg tisch einschließlich kirche erreichen echt liegen geist warscheinlich 111 kraft kondition papir fra beskrive forliges økonomisk øge på lære generel århundrede derfor far sektion patient omkring aktivitet vej bord inklusive kirke nå ægte ligge sind sandsynlig kraft kondition papper av beskriva enas ekonomisk öka på lära allmän århundrade därför far sektion patient runt aktivitet väg bord inklusive kyrka nå äkta ligga sinne trolig 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 prep n n n adv n n n a n det prep n v v a a a adv n n n v v prep n n among team experience death soon act sense staff certain student half around language walk die special difficult international particularly department management morning draw hope across plan product onder team ervaring dood binnenkort daad zin personeel zeker student half rond taal lopen sterven speciaal moeilijk internationaal bijzonder afdeling beheer ochtend tekenen hopen over plan product unter team erfahrung tod bald handlung sinn personal sicher student halb rund sprache laufen sterben speziell schwierig international besonders abteilung management morgen zeichnen hoffen über plan produkt 112 blandt hold erfaring død snart handling mening personale vis studerende halv rundt omkring sprog gå dø speciel svær international især afdeling ledelse morgen tegne håbe over plan produkt bland team erfarenhet död snart handling mening personal viss student halv runt språk gå dö speciell svår internationell särskilt avdelning ledning morgon rita hoppas över plan produkt 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 n adv n n n v n n a n n n n n n adv n v n n n prep a v v a n city early committee ground letter create evidence foot clear boy game food role practice bank else support sell event building range behind sure report pass black stage stad vroeg comité grond letter creëren bewijs voet duidelijk jongen spel voedsel rol oefening bank anders ondersteuning verkopen gebeurtenis gebouw gebied achter zeker verslag leggen passeren zwart fase stadt früh ausschuss boden buchstabe schaffen beweis fuß klar junge spiel nahrung rolle übung bank sonst unterstützung verkaufen veranstaltung gebäude reichweite hinter sicher berichten passieren schwarz phase 113 by tidligt komite jord bogstav skabe bevis fod klar dreng spil mad rolle øvelse bank andet støtte sælge begivenhed bygning rækkevide bag sikker berette passere sort stadie stad tidigt utskott jord bokstav skapa bevis fot klar pojke spel mat roll övning bank annars stöd sälja händelse byggnad räckvidd bakom säker rapportera passera svart stadium 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 n adv adv v n n adv n v n n n n n v a n det adv n n n n a a a n meeting sometimes thus accept town art further club cause arm history parent land trade watch white situation whose ago teacher record manager relation common strong whole field vergadering soms dus accepteren stad kunst verder club veroorzaken arm geschiedenis ouder land handel kijken wit situatie wiens geleden leraar opname manager relatie veelvoorkomend sterk geheel veld treffen manchmal so akzeptieren stadt kunst weiter klub verursachen arm vergangenheit elternteil land handel ansehen weiß situation wessen her lehrer rekord manager beziehung häufig stark ganz feld 114 møde sommetider således acceptere by kunst videre klub forårsage arm historie forælder land handel iagttage hvid situation hvis siden lærer rekord leder forhold almindelig stærk hel felt möte ibland således acceptera stad konst vidare klubb orsaka arm historia förälder land handel titta vit situation vems sedan lärare rekord chef förhållande vanlig stark hel fält 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 a v adv v n n v v det v adv n n n n v v n v n n adv n v v v n free break yesterday support window account explain stay few wait usually difference material air wife cover apply project raise sale relationship indeed light claim form base care vrij breken gisteren ondersteunen raam rekening uitleggen blijven weinig wachten gewoonlijk verschil materiaal lucht vrouw bedekken toepassen project verhogen verkoop relatie inderdaad licht beweren vormen baseren verzorging frei brechen gestern unterstützen fenster konto erklären bleiben wenig warten gewöhnlich unterschied material luft frau abdecken anwenden projekt erhöhen verkauf beziehung tatsächlich licht behaupten bilden basieren pflege 115 fri brække i går støtte vindue konto forklare blive få vente normalt forskel materiale luft kone dække anvende projekt hæve salg forhold virkelig lys hævde danne basere pleje fri bryta i går stödja fönster konto förklara stanna få vänta vanligtvis skillnad material luft fru täcka tillämpa projekt höja försäljning förhållande verkligen ljus hävda bilda basera omsorg 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 pron pron adv n adv v v a n n n n n n n a n n n a n n n n a adv v someone everything certainly rule home cut grow similar story quality tax worker nature structure data necessary pound method unit central bed union movement board true simply contain iemand alles zeker regel thuis snijden groeien soortgelijk verhaal kwaliteit belasting werker natuur structuur data nodig pond methode eenheid centraal bed unie beweging bestuur waar simpel bevatten jemand alles sicherlich regel zuhause schneiden wachsen ähnlich geschichte qualität steuer arbeiter natur struktur daten notwendig pfund methode einheit zentral bett union bewegung vorstand wahr einfach enthalten 116 nogen alt sikkert regel hjem skære vokse lignende fortælling kvalitet skat arbejder natur struktur data nødvendig pund metode enhed central seng union bevægelse bestyrelse sand simpelthen indeholde någon allt säkert regel hem skära växa liknande historia kvalitet skatt arbetare natur struktur data nödvändig pund metod enhet central säng union rörelse styrelse sann enkelt innehålla 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 adv a a a n n v a v v v n a a n det n n n n v v n v a a n especially open short personal detail model bear single join reduce establish wall easy private computer former hospital chapter scheme theory choose wish property achieve financial poor officer speciaal open kort persoonlijk detail model dragen enkel samenvoegen reduceren oprichten muur makkelijk privé computer vorig ziekenhuis hoofdstuk schema theorie kiezen wensen eigendom bereiken financieel arm officier insbesondere offen kurz persönlich detail modell tragen einzig verbinden reduzieren gründen mauer leicht privat computer vorherig krankenhaus kapitel schema theorie wählen wünschen eigentum erreichen finanziell arm offizier 117 især åben kort personlig detalje model bære enkelt sammenføje reducere oprette væg let privat computer tidligere hospital kapitel ordning teori vælge ønske ejendom opnå finansiel fattig officer speciellt öppen kort personlig detalj modell bära enkel förbinda reducera upprätta vägg lätt privat dator tidigare sjukhus kapitel schema teori välja önska egendom uppnå finansiell fattig officer 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 prep n n v v v n n n v a prep n n n n v a n a n n n n a n n up charge director drive deal place approach chance application seek foreign along top amount son operation fail human opportunity simple leader look share production recent firm picture op kosten directeur rijden uitdelen plaatsen benadering kans toepassing zoeken buitenlands langs top hoeveelheid zoon operatie falen menselijk kans simpel leider blik deel productie recent firma afbeelding auf gebühr direktor fahren austeilen platzieren ansatz chance anwendung suchen ausländisch entlang top menge sohn operation scheitern menschlich chance einfach führer blick teil produktion kürzlich firma bild 118 op omkostning direktør køre dele ud placere tilgang chance anvendelse søge udenlandsk langs top mængde søn operation fejle menneskelig lejlighed simpel leder blik del produktion nylig firma billede upp kostnad direktör köra utdela placera närmande chans tillämpning söka utländsk längs topp mängd son operation misslyckas mänsklig tillfälle enkel ledare blick del produktion färsk firma bild 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 n n v prep v n a v n adv v v n n v a prep n v n v v v n adv n n source security serve according end contract wide occur agreement better kill act site labour plan various since test eat loss close represent love colour clearly shop benefit bron zekerheid dienen volgens beëindigen contract wijd voorkomen overeenkomst beter doden handelen plaats arbeid plannen verschillend sinds test eten verlies sluiten vertegenwoordigen houden van kleur duidelijk winkel voordeel 119 quelle sicherheit dienen nach beenden vertrag weit vorkommen vereinbarung besser töten handeln stelle arbeit planen verschiedene seit test essen verlust schließen darstellen lieben farbe deutlich laden vorteil kilde sikkerhed tjene ifølge slutte kontrakt vid forekomme aftale bedre dræbe handle sted arbejde planlægge forskellige siden test spise tab lukke repræsentere elske farve tydeligt butik fordel källa säkerhet tjäna enligt sluta kontrakt vid förekomma avtal bättre döda handla plats arbete planera diverse sedan test äta förlust stänga representera älska färg tydligt butik fördel 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 n n n n n n v n a n n v n pron pron n n v v n n a n n a n n animal heart election purpose standard secretary rise date hard music hair prepare factor other anyone pattern piece discuss prove front evening royal tree population fine plant pressure dier hart verkiezing doel standaard secretaris stijgen datum hard muziek haar voorbereiden factor ander iemand patroon stuk bespreken bewijzen voorkant avond koninklijk boom populatie fijn plant druk tier herz wahl zweck standard sekretär steigen datum hart musik haar vorbereiten faktor andere jemand muster stück besprechen beweisen vorderseite abend königlich baum population fein pflanze druck 120 dyr hjerte valg formål standard sekretær stige dato hård musik hår forberede faktor anden nogen mønster stykke drøfte bevise forside aften kongelig træ befolkning fin plante tryk djur hjärta val ändamål standard sekreterare stiga datum hård musik hår förbereda faktor annen någon mönster stycke dryfta bevisa framsida kväll kunglig träd befolkning fin planta tryck 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 n v n v n n prep n n v n v n adv v n n n prep n n n a a prep adv n response catch street pick performance knowledge despite design page enjoy individual suppose rest instead wear basis size environment per fire series success natural wrong near round thought reactie vangen straat plukken prestatie kennis ondanks ontwerp pagina genieten individu veronderstellen rust in plaats van dragen basis maat milieu per vuur serie succes natuurlijk verkeerd dichtbij rond gedachte reaktion fangen straße pflücken leistung kenntnis trotz entwurf seite genießen einzelne annehmen ruhe stattdessen tragen basis größe umwelt pro feuer serie erfolg natürlich falsch nah rund gedanke 121 reaktion fange gade plukke ydeevne viden trods design side nyde individ antage hvile i stedet have på basis størrelse miljø per brand serie succes naturlig forkert nær rundt tanke respons fånga gata plocka prestation kunskap trots design sida njuta individ anta vila i stället ha på basis storlek miljö per brand serie framgång naturlig fel nära runt tanke 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 n v a n v n v n v v n n n n n v n n n v n n n a v n n list argue final future introduce analysis enter space arrive ensure demand statement attention love principle pull set doctor choice refer feature couple step following thank machine income lijst ruzie maken definitief toekomst introduceren analyse binnengaan ruimte aankomen verzekeren eis uitspraak aandacht liefde principe trekken stel dokter keuze verwijzen kenmerk koppel stap volgend danken machine inkomen liste streiten endgültig zukunft einführen analyse hereinkommen raum ankommen gewährleisten bedarf aussage achtung liebe prinzip ziehen set doktor wahl verweisen merkmal paar schritt folgende danken maschine einkommen 122 liste drøfte endelig fremtid introducere analyse kommer i plads ankomme sikre krav udtalelse opmærksomhed kærlighed princip trække sæt læge valg henvise særpræg par skridt følgende takke maskine indkomst lista gräla slutlig framtid introducera analys komma in rymd anlända försäkra krav uttalande uppmärksamhet kärlek princip dra set läkare val hänvisa särdrag par steg följande tacka maskin inkomst 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 n v n n n n n pron a n n v n n a n a n a a a n adv n n n n training present association film region effort player everyone present award village control organisation news nice difficulty modern cell close current legal energy finally degree mile means growth training presenteren associatie film regio moeite speler iedereen tegenwoordig onderscheiding dorp beheren organisatie nieuws leuk moeilijkheid modern cel dicht huidig legaal energie uiteindelijk graad mijl middelen groei ausbildung präsentieren assoziation film region mühe spieler jeder vorhanden auszeichnung dorf kontrollieren organisation nachrichten nett schwierigkeit modern zelle nah gegenwärtig legal energie schließlich grad meile mittel wachstum 123 uddannelse præsentere forening film region indsats spiller alle nuværende pris landsby kontrollere organisation nyheder dejlig vanskelighed moderne celle tæt aktuel legal energi endelig grad mil midler vækst träning presentera förening film region ansträngning spelare alla nuvarande pris by kontrollera organisation nyheter trevlig svårighet modern cell nära aktuell laglig energi slutligen grad mil medel växt 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 n n prep n v adv a a n a v n v n n n n n n n n v n n v n v treatment sound above task affect please red happy behaviour concerned point function identify resource defence garden floor technology style feeling science relate doubt horse force answer compare behandeling geluid boven taak beïnvloeden alstublieft rood gelukkig gedrag bezorgd wijzen functie identificeren bron verdediging tuin vloer technologie stijl gevoel wetenschap betreffen twijfel paard dwingen antwoord vergelijken behandlung klang oben aufgabe beeinflussen bitte rot glücklich verhalten besorgt weisen funktion identifizieren ressource abwehr garten boden technologie stil gefühl wissenschaft betreffen zweifel pferd zwingen antwort vergleichen 124 behandling lyd over opgave påvirke vær så venlig rød lykkelig adfærd bekymret pege funktion identificere ressource forsvar have gulv teknologi stil følelse videnskab relatere tvivl hest tvinge svar sammenligne behandling ljud ovan uppgift påverka snälla röd lycklig beteende bekymrad peka funktion identifiera resurs försvar trädgård golv teknologi stil känsla vetenskap relatera tvivel häst tvinga svar jämföra 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 v a adv v n n n n a conj n v adv n v v n n n a n adv v a adv v n suffer individual forward announce user fund character risk normal nor dog obtain quickly army indicate forget station glass cup previous husband recently publish serious anyway visit capital lijden individueel vooruit aankondigen gebruiker fonds karakter risico normaal noch hond verkrijgen snel leger aangeven vergeten station glas kop vorig echtgenoot recent publiceren serieus hoe dan ook bezoeken kapitaal leiden individuell vorwärts ankündigen benutzer fonds charakter risiko normal noch hund erhalten schnell heer angeben vergessen bahnhof glas tasse früher ehemann kürzlich veröffentlichen ernst irgendwie besuchen kapital 125 lide individuel fremad annoncere bruger fond karakter risiko normal heller ikke hund opnå hurtigt hær angive glemme station glas kop tidligere mand nylig udgive seriøs i hvert falt besøge kapital lida individuell framåt tillkännage användare fond karaktär risk normal inte heller hund erhålla snabbt armé ange glömma station glas kopp tidigare man nyligen publicera seriös i alla fall besöka kapital 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 n n n v n n a n n n v n v v v n n n v prep v adv n v v v v note season argument listen show responsibility significant deal economy element finish duty fight train maintain attempt leg investment save throughout design suddenly brother improve avoid wonder tend aantekening seizoen argument luisteren show verantwoordelijkheid significant overeenkomst economie element beëindigen plicht vechten trainen onderhouden poging been investering redden gedurende ontwerpen plotseling broer verbeteren ontwijken afvragen neigen 126 anmerkung jahreszeit argument hören show verantwortlichkeit signifikant deal wirtschaft element beenden pflicht kämpfen trainieren pflegen versuch bein investition retten während entwerfen plötzlich bruder verbessern vermeiden sich fragen neigen note årstid argument lytte show ansvar signifikant deal økonomi element slutte pligt kæmpe træne vedligeholde forsøg ben investering redde hele vejen designe pludseligt bror forbedre undgå undre sig have tendens anteckning årstid argument lyssna show ansvar signifikant överenskommelse ekonomi element sluta plikt kämpa träna underhålla försök ben investering rädda över hela designa plötsligt bror förbättra undvika undra tendera 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 n n n n n a v n v adv n v v n adv v a n n v v n n a a n a title hotel aspect increase help industrial express summer determine generally daughter exist share baby nearly smile sorry sea skill treat remove concern university left dead discussion specific titel hotel aspect toename hulp industrieel uiten zomer vaststellen in het algemeen dochter bestaan delen baby bijna glimlachen armzalig zee vaardigheid behandelen verwijderen zorg universiteit links dood discussie specifiek titel hotel aspekt zunahme hilfe industriell äußern sommer bestimmen im allgemeinen tochter existieren teilen baby fast lächeln traurig meer fähigkeit behandeln entfernen sorge universität links tot diskussion spezifisch 127 titel hotel aspekt stigning hjælp industriel udtrykke sommer bestemme generelt datter findes dele baby næsten smile bedrøvelig hav færdighed behandle fjerne bekymring universitet venstre død diskussion specifik titel hotell aspekt ökning hjälp industriell uttrycka sommar fastställa generellt dotter finnas dela baby nästan le ledsen hav färdighet behandla förflytta oro universitet vänster död diskussion specifik 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 n n prep v n n a n n v n v n v n n adv conj a n a n v n n n v customer box outside state conference whole total profit division throw procedure fill king assume image oil obviously unless appropriate circumstance military proposal mention client sector direction admit klant doos buiten mededelen conferentie geheel totaal profijt verdeling gooien procedure vullen koning aannemen beeld olie duidelijk tenzij gepast omstandigheid militair voorstel noemen cliënt sector richting toegeven kunde karton außerhalb erklären konferenz ganze total gewinn teilung werfen verfahren füllen könig annehmen bild öl offenbar wenn nicht geeignet umstand militärisch vorschlag nennen klient sektor richtung zugeben 128 kunde kasse udenfor meddele konference hele total profit inddeling smide procedure fylde konge antage billede olie åbenbart medmindre passende omstændighed militær forslag nævne klient sektor retning indrømme kund låda utanför uppge konferens helhet total vinst uppdelning kasta procedur fylla kung anta bild olja uppenbarligen om inte lämplig omständighet militärisk förslag nämna klient sektor riktning erkänna 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 adv v a adv n n a a adv v a n n n adv adv n v prep n n v n n prep v a though replace basic hard instance sign original successful okay reflect aware measure attitude disease exactly above commission intend beyond seat president encourage addition goal round miss popular doch vervangen basis moeilijk voorbeeld teken origineel succesvol oké reflecteren bewust maat houding ziekte precies boven provisie bedoelen voorbij zitplaats president bemoedigen toevoeging doel rond missen populair doch ersetzen grundlegend schwer fall zeichen original erfolgreich ok reflektieren bewusst maß haltung krankheit genau oben kommission beabsichtigen über sitz präsident ermutigen zusatz ziel rund vermissen populär 129 dog erstatte grundlæggende hårdt eksempel tegn original succesfuld ok reflektere bevidst mål holdning sygdom præcis over kommission agte over sæde præsident tilskynde tilføjelse mål rund savne populær fast ersätta grundläggande hårt exempel tecken originell framgångsrik okej reflektera medveten mått hållning sjukdom precis ovan kommission ämna bortom sittplats president uppmuntra tillägg mål runt sakna populär 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 n n n v a det conj v v n adv n v n a n n a v n n n v v v n a affair technique respect drop professional less once fly reveal version maybe ability operate campaign heavy advice institution top discover surface library pupil record refuse prevent advantage dark affaire techniek respect laten vallen professioneel minder eens vliegen onthullen versie misschien vermogen bedienen campagne zwaar advies instituut top ontdekken oppervlak bibliotheek pupil opnemen weigeren voorkomen voordeel donker affäre technik respekt fallen lassen professionell weniger einmal fliegen enthüllen version vielleicht vermögen betreiben kampagne schwer rat institution ober entdecken oberfläche bibliothek schüler aufnehmen verweigern verhindern vorteil dunkel 130 affære teknik respekt lade falde professionel mindre engang flyve afsløre version måske evne betjene kampagne tung råd institution top opdage overflade bibliotek elev optage nægte forhindre fordel mørk affär teknik respekt tappa professionell mindre en gång flyga avslöja version kanske förmåga fungera kampanj tung råd institution övre upptäcka yta bibliotek elev registrera vägra förhindra fördel mörk 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 v n n n v n v n n v n n a n n v n a n n v adv n n n n n teach memory culture blood cost majority answer variety press depend bill competition ready general access hit stone useful extent employment regard apart present appeal text parliament cause leren geheugen cultuur bloed kosten meerderheid antwoorden variëteit pers afhangen rekening competitie klaar generaal toegang raken steen nuttig omvang dienst beschouwen uit elkaar cadeau appel tekst parlement oorzaak lehren speicher kultur blut kosten mehrheit antworten varietät presse abhängen rechnung wettbewerb fertig general zugreifen treffen stein nützlich umfang beschäftigung betrachten auseinander geschenk beschwerde text parlament ursache 131 lære hukommelse kultur blod koste flertal svare sort presse afhænge regning konkurrence klar general adgang træffe sten nyttig omfang beskæftigelse betragte hinanden gave appel tekst parlament årsag lära minne kultur blod kosta majoritet svara variant press bero räkning konkurrens klar general tillgång träffa sten nyttig omfattning sysselsättning betrakta isär gåva vädja text parlament orsak 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 n n n a n n a n adv adv v v n a n adv n n n n v n n n v n n terms bar attack effective mouth fish future visit little easily attempt enable trouble traditional payment best post county lady holiday realise importance chair facility complete article object voorwaarden bar aanval effectief mond vis toekomstig bezoek weinig gemakkelijk proberen in staat stellen problemen traditioneel betaling best post graafschap dame vakantie realiseren belang stoel faciliteit voltooien artikel object bedingungen bar angriff effektiv mund fisch zukünftig besuch wenig leicht versuchen ermöglichen schwierigkeiten traditionell zahlung best post grafschaft dame urlaub realisieren wichtigkeit stuhl einrichtung vervollständigen artikel objekt 132 vilkår bar angreb effektiv mund fisk fremtidig besøg lidt nemt forsøge muliggøre problemer traditionel betaling bedst post grevskab dame ferie realisere betydning stol facilitet fuldende artikel objekt villkor bar angrepp effektiv mun fisk framtida besök litet lätt försöka möjliggöra svårigheter traditionell betalning bäst post grevskap dam semester realisera betydelse stol facilitet fullborda artikel objekt 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 n n v a n a adv n n n a a n v n n a conj v n n n n n n n n context survey notice complete turn direct immediately collection reference card interesting considerable television extend communication agency physical except check sun species possibility official chairman speaker second career context enquête opmerken compleet draai direct onmiddellijk collectie referentie kaart interessant aanzienlijk televisie uitbreiden communicatie agentschap fysiek behalve controleren zon soort mogelijkheid ambtenaar voorzitter spreker seconde carrière kontext umfrage bemerken komplett wende direkt sofort sammlung referenz karte interessant erheblich fernseher erweitern kommunikation agentur physisch außer prüfen sonne art möglichkeit beamte vorsitzende sprecher sekunde karriere 133 kontekst undersøgelse bemærke komplet drej direkte straks samling reference kort interessant betydelig fjernsyn udvide kommunikation agentur fysisk undtagen kontrollere sol art mulighed officiel formand taler sekund karriere kontext undersökning märka komplett vändning direkt genast samling referens kort intressant betydlig television utvidga kommunikation agentur fysisk utom kontrollera sol art möjlighet tjänsteman ordförande talare sekund karriär 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 v n v a n n n n a a v n n n v n a n v adv n a n n n n n n laugh weight sound responsible base document solution return medical hot recognise talk budget river fit organization existing start push tomorrow requirement cold edge opposition opinion drug quarter option lachen gewicht klinken verantwoordelijk basis document oplossing terugkeer medisch heet herkennen lezing budget rivier passen organisatie bestaand start duwen morgen vereiste koud rand oppositie mening medicijn kwart optie lachen gewicht klingen verantwortlich basis dokument lösung rückkehr medizinisch heiß erkennen vortrag budget fluss passen organisation vorhanden start drücken morgen voraussetzung kalt rand opposition meinung droge viertel option 134 le vægt lyde ansvarlig basis dokument løsning tilbagevenden medicinsk hed genkende foredrag budget flod passe organisation eksisterende start skubbe i morgen krav kold kant opposition mening medicin kvart option skratta vikt ljuda ansvarig bas dokument lösning återkomst medicinsk het erkänna föreläsning budget flod passa organisation existerande start trycka i morgon krav kall kant opposition mening läkemedel kvart option 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 v prep n v n n n adv n adv n n v v n a n n pron n v n a n n n n a sign worth call define stock influence occasion eventually software highly exchange lack shake study concept blue star radio no-one arrangement examine bird green band sex finger past independent gebaren waard oproep definiëren voorraad invloed gelegenheid uiteindelijk software zeer uitwisseling tekort schudden studeren concept blauw ster radio niemand regeling onderzoeken vogel groen band seks vinger verleden onafhankelijk gebärden wert anruf definieren vorrat einfluss gelegenheit schliesslich software sehr austausch mangel schütteln studieren konzept blau stern radio niemand anordnung untersuchen vogel grün band sex finger vergangenheit unabhängig 135 gøre tegn værd opkald definere lager indflydelse lejlighed til sidst software meget udveksling mangel ryste studere koncept blå stjerne radio ingen arrangement undersøge fugl grøn band køn finger fortid uafhængig vinka värd samtal definiera lager inflytande tillfälle slutligen programvara mycket utbyte brist skaka studera koncept blå stjärna radio ingen arrangemang undersöka fågel grön band kön finger förflutna oberoende 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 n n n n n n v adv n v n a n adv n n v n n n n n adv n adv v n adv equipment north move message fear afternoon drink fully race gain strategy extra scene slightly kitchen speech arise network tea peace failure employee ahead scale hardly attend shoulder otherwise uitrusting noord verhuizing bericht angst middag drinken volledig ras verkrijgen strategie extra scène enigszins keuken toespraak opkomen netwerk thee vrede mislukking werknemer vooruit schaal nauwelijks bijwonen schouder anders ausrüstung Norden umzug nachricht angst nachmittag trinken völlig rasse gewinnen strategie extra szene ein wenig küche rede entstehen netzwerk tee frieden ausfall arbeitnehmer voraus umfang kaum besuchen schulter sonst 136 udstyr nord flytning besked angst eftermiddag drikke fuldt race vinde strategi ekstra scene lidt køkken tale opstå netværk te fred fiasko medarbejder forude skala næppe deltage skulder ellers utrustning nord flyttning meddelande rädsla eftermiddag dricka fullt ras vinna strategi extra scen lätt kök tal uppstå nätverk te fred misslyckande arbetstagare framåt skala knappt delta skuldra annars 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 n adv n n n v n a n n n a n n v a v n n n v n n v n n adv n railway directly supply expression owner associate corner past match sport status beautiful offer marriage hang civil perform sentence crime ball marry wind truth protect safety partner completely copy spoorwegen direct voorziening uitdrukking eigenaar associëren hoek vroeger wedstrijd sport status mooi aanbod huwelijk hangen burgerlijk uitvoeren zin misdaad bal trouwen wind waarheid beschermen veiligheid partner volledig kopie eisenbahn direkt belieferung ausdruck besitzer assoziieren ecke vergangen match sport status schön angebot ehe hängen bürgerlich ausführen satz verbrechen ball heiraten wind wahrheit schützen sicherheit partner völlig kopie 137 jernbane direkte forsyning udtryk ejer associere hjørne tidligere match sport status smuk tilbud ægteskab hænge civil udføre sætning forbrydelse bold gifte sig vind sandhed beskytte sikkerhed partner helt kopi järnväg direkt tillförsel uttryck ägare associera hörn förgången match sport status vacker anbud äktenskap hänga civil utföra mening brott boll gifta sig vind sanning skydda säkerhet partner helt kopia 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 n n n adv n n n v n n a a a n v n n n a pron n n a n n a n n balance sister reader below trial rock damage adopt newspaper meaning light essential obvious nation confirm south length branch deep none planning trust working pain studio positive spirit college balans zus lezer onder proces rots schade adopteren krant betekenis licht essentieel duidelijk natie bevestigen zuid lengte tak diep geen planning vertrouwen werkend pijn studio positief geest college balance schwester leser unten prozess fels schaden adoptieren zeitung bedeutung licht wesentlich offensichtlich nation bestätigen süden länge ast tief kein planung vertrauen arbeitend schmerz studio positiv geist college 138 balance søster læser nedenunder proces klippe skade adoptere avis betydning lys afgørende åbenbar nation bekræfte syd længde gren dyb ingen planlægning tillid arbejdende smerte studie positiv ånd kollegium balans syster läsare nedan process klippa skada adoptera tidning betydelse ljus väsentlig uppenbar nation bekräfta syd längd gren djup ingen planering tillit arbetande smärta studio positiv ande college 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 n n v n n v v adv n adv n n n v n a n a n v n a v v n n n adv accident hope mark works league clear imagine through cash normally play strength train travel target very pair male gas issue contribution complex supply beat artist agent presence along ongeluk hoop markeren werken competitie ontruimen voorstellen door kleingeld normaal gesproken toneelstuk kracht trein reizen doel zeer paar mannelijk gas uitgeven contributie complex voorzien slaan artiest agent aanwezigheid langs 139 unfall hoffnung markieren werke liga löschen vorstellen durch bargeld normalerweise theaterstück kraft zug reisen ziel sehr paar männlich gas ausgeben beitrag komplex beliefern schlagen artist agent anwesenheit entlang ulykke håb mærke værker liga rense forestille gennem kontanter normalt spil styrke tog rejse mål meget par mandlig gas udstede bidrag kompleks forsyne slå kunstner agent tilstedeværelse langs olycka hopp markera verk liga rensa föreställa igenom kontanter normalt pjäs styrka tåg resa mål mycket par manlig gas ge ut bidrag komplex förse slå artist agent närvaro längs 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 a v n n n v n a v v n adv adv n a adv n n v v n n n v n a n n environmental strike contact protection beginning demand media relevant employ shoot executive slowly relatively aid huge late speed review test order route consequence telephone release proportion primary consideration reform betreffende het milieu staken contact bescherming begin eisen media relevant in dienst hebben schieten bestuur langzaam relatief hulp enorm laat snelheid recensie testen bestellen route gevolg telefoon loslaten proportie primair overweging hervorming 140 umweltstreiken kontakt schutz anfang fordern medien relevant beschäftigen schiessen exekutive langsam relativ hilfe riesig spät geschwindigkeit rezension testen bestellen route folge telefon loslassen proportion primär überlegung reform miljømæssige strejke kontakt beskyttelse begyndelse kræve medier relevant ansætte skyde udøvende langsomt relativt bistand kæmpe sent hastighed bedømmelse teste bestille rute konsekvens telefon løslade andel primær overvejelse reform miljöbetingad strejka kontakt skydd början kräva media relevant anställa skjuta utövande långsamt relativt bistånd enorm sent hastighet recension testa beställa rutt konsekvens telefon släppa proportion primär övervägande reform 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 n a a det a a a v adv v n n adv n n a v n n a v n n a a n n adv driver annual nuclear latter practical commercial rich emerge apparently ring distance exercise close skin island separate aim danger credit usual link candidate track safe interested assessment path merely bestuurder jaarlijks nucleair laatste praktisch commercieel rijk tevoorschijn komen blijkbaar rinkelen afstand oefening dichtbij huid eiland verschillend richten gevaar krediet gewoonlijk koppelen kandidaat spoor veilig geïnteresseerd beoordeling pad slechts 141 fahrer jährlich nuklear letzter praktisch kommerziell reich entstehen offenbar klingeln abstand übung nah haut insel getrennt zielen gefahr kredit gewöhnlich verbinden kandidat spur sicher interessiert beurteilung pfad nur chauffør årlig nukleare sidstnævnte praktisk kommerciel rig opstå tilsyneladende ringe afstand øvelse tæt hud ø særskilt sigte fare kredit sædvanlig forbinde kandidat spor sikker interesserede vurdering sti blot förare årlig nukleär senare praktisk kommersiell rik uppstå tydligen ringa avstånd övning tätt hud ö särskild sikta fara kredit vanlig länka kandidat spår säker intresserad bedömning stig endast 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 prep n a n n v n v n n n n n n n v v v n a n n v n n n a v plus district regular reaction impact collect debate lay rise belief conclusion shape vote aim politics reply press approach file western earth public survive estate boat prison additional settle plus district regulier reactie impact verzamelen debat leggen stijging geloof conclusie vorm stem doel politiek beantwoorden drukken benaderen bestand westers aarde publiek overleven landgoed boot gevangenis extra vestigen plus bezirk regulär reaktion auswirkung sammlen debatte legen anstieg glaube schlussfolgerung form stimme ziel politik antworten drücken angehen datei westlich erde publik überleben landgut boot gefängnis zusätzlich siedeln 142 plus distrikt regelmæssig reaktion indvirkning samle debat lægge stigning tro konklusion form stemme mål politik besvare trykke nærme sig fil vestlig jord publikum overleve ejendom båd fængsel ekstra bosætte sig plus distrikt regelbunden reaktion inverkan samla debatt lägga stigande tro slutsats form röst mål politik svara trycka närma sig fil västlig jord publik överleva egendom båt fängelse ytterligare bosätta sig 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 adv n v v v conj adv pron n n n a n n n n adv a n v a adv v n v n n v largely wine observe limit deny for straight somebody writer weekend clothes active sight video reality hall nevertheless regional vehicle worry powerful possibly cross colleague charge lead farm respond grotendeels wijn observeren beperken ontkennen want recht iemand schrijver weekend kleren actief zicht video realiteit hal desondanks regionaal vervoermiddel zorgen maken krachtig eventueel kruisen collega opladen leiding boerderij reageren grossenteils wein beobachten begrenzen leugnen denn gerade jemand autor wochenende kleider aktiv sicht video realität halle trotzdem regional fahrzeug sich sorgen mächtig möglicherweise überqueren kollege aufladen führung bauernhof reagieren 143 i vid udstrækning vin observere begrænse benægte for lige nogen forfatter weekend tøj aktiv syn video realitet hal alligevel regional køretøj bekymre kraftfuld muligvis krydse kollega oplade ledelse gård reagere till stor del vin observera begränsa förneka för rakt någon författere helg kläder aktiv syn video realitet hall icke destro mindre regional fordon bekymra kraftfull möjligtvis korsa kollega ladda ledning gård reagera 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 n adv n n n v v v n n n n n n n n a n n v v a adv n n adv n prep employer carefully understanding connection comment grant concentrate ignore phone hole insurance content confidence sample transport objective alone flower injury lift stick front mainly battle generation currently winter inside werkgever voorzichtig begrip verbinding commentaar toewijzen concentreren negeren telefoon gat verzekering inhoud vertrouwen monster transport doelstelling alleen bloem blessure optillen steken voor hoofdzakelijk gevecht generatie huidig winter binnen arbeitgeber vorsichtig verständnis verbindung kommentar gewähren konzentrieren ignorieren telefon loch versicherung inhalt vertrauen probe transport zielsetzung allein blume verletzung heben stecken vordere hauptsächlich schlacht generation momentan winter innerhalb 144 arbejdsgiver omhyggeligt forståelse forbindelse kommentar skænke koncentrere ignorere telefon hul forsikring indhold tillid prøve transport hensigt alene blomst kvæstelse løfte stikke for hovedsagelig kamp generation øjeblikket vinter indenfor arbetsgivare försiktigt förståelse förbindelse kommentar tillmötesgå koncentrera ignorera telefon hål försäkring innehåll tillit prov transport mål allena blomma skada lyfta sticka främre huvudsakligen strid generation för närvarande vinter inuti 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 a adv v n v n n n n n det n n a n n a v n a n n n v v a a n impossible somewhere arrange will sleep progress volume ship legislation commitment enough conflict bag fresh entry smile fair promise introduction senior manner background key touch vary sexual ordinary cabinet onmogelijk ergens regelen testament slapen voortgang volume schip wetgeving verplichting genoeg conflict tas vers binnenkomst glimlach eerlijk beloven introductie senior manier achtergrond sleutel aanraken variëren seksueel gewoon kabinet unmöglich irgendwo anordnen testament schlafen fortschritt volumen schiff gesetzgebung verpflichtung genug konflikt tasche frisch eintritt lächeln fair versprechen einleitung älter weise hintergrund schlüssel berühren variieren sexuell gewöhnlich kabinett 145 umulig et eller andet sted arrangere testamente sove fremskridt volumen skib lovgivning forpligtelse nok konflikt taske frisk indgang smil retfærdig love introduktion senior måde baggrund nøgle berøre variere seksuel almindelig kabinet omöjlig någonstans ordna testamente sova framsteg volym skepp lagstiftning förpliktelse nog konflikt påse färsk inträde leende rättvis lova introduktion senior sätt bakgrund nyckel beröra variera sexuell vanlig kabinett 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 n adv n adv n adv n v n a n n n n n v n n adv n a v n n a n n n painting entirely engine previously administration tonight adult prefer author actual song investigation debt visitor forest repeat wood contrast extremely wage domestic commit threat bus warm sir regulation drink schilderij geheel motor vorig administratie vanavond volwassene de voorkeur geven auteur eigenlijk lied onderzoek schuld bezoeker bos herhalen hout contrast extreem loon huiselijk begaan bedreiging bus warm mijnheer regulatie drank 146 malerei vollständig motor vorher administration heute abend erwachsene bevorzugen autor tatsächlich lied untersuchung schuld besucher wald wiederholen holz kontrast äusserst lohn häuslich begehen bedrohung bus warm herr regulierung getränk maleri helt motor tidligere administration i aften voksen foretrække forfatter faktisk sang undersøgelse gæld besøgende skov gentage træ kontrast ekstremt løn huslig begå trussel bus varm herre regulering drik målning helt motor förut administration i kväll vuxen föredra författere faktisk sång undersökning skuld besökare skog upprepa trä kontrast extremt lön huslig begå hot buss varm min herre reglering dryck 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 n a a a n adv a n v v pron n prep n n n n v a n n adv n a n v n n relief internal strange excellent run fairly technical tradition measure insist his farmer until traffic dinner consumer meal warn living package half increasingly description soft stuff award existence improvement opluchting intern vreemd uitmuntend loop aardig technisch traditie meten staan op zijn boer tot verkeer diner consument maaltijd waarschuwen levend pakket helft in toenemende mate beschrijving zacht spullen toekennen bestaan verbetering 147 erleichterung intern seltsam ausgezeichnet lauf ziemlich technisch tradition messen bestehen sein bauer bis verkehr abendessen verbraucher mahlzeit warnen lebendig paket hälfte zunehmend beschreibung weich sachen vergeben existenz verbesserung lettelse intern mærkelig fremragende løbe temmelig teknisk tradition måle insistere hans landmand indtil trafik middag forbruger måltid advare levende pakke halvdel i stigende grad beskrivelse blød ting tilkende eksistens forbedring lättnad intern konstig utmärkt lopp ganska teknisk tradition mäta insistera hans bonde till trafik middag konsument måltid varna levande paket halva alltmer beskrivning mjuk saker tilldela existens förbättring 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 n n a v n n n adv n a n v n n v n v a n n a n n a a v a a coffee appearance standard attack sheet category distribution equally session cultural loan bind museum conversation threaten link launch proper victim audience famous master lip religious joint cry potential broad koffie voorkomen standaard aanvallen vel categorie verdeling gelijk sessie cultureel lening binden museum conversatie bedreigen link lanceren eigenlijk slachtoffer publiek beroemd meester lip religieus gezamenlijk huilen potentieel breed kaffee aussehen üblich angreifen blatt kategorie verteilung gleichermassen sitzung kulturell darlehen binden museum gespräch bedrohen link lancieren richtig opfer publikum berühmt meister lippe religiös gemeinsam weinen potenziell breit 148 kaffe udseende standard angribe ark kategori fordeling ligelig session kulturel lån binde museum samtale true link lancere korrekt offer publikum berømt mester læbe religiøs fælles græde potentiel bred kaffe utseende standard anfalla ark kategori fördelning lika session kulturell lån binda museum konversation hota länk lansera rätt offer publik berömd mästare läpp religiös gemensam gråta potential bred 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 n v n a n prep v n n v n v v n a a v v n n n a v adv n n v v exhibition experience judge formal housing past concern freedom gentleman attract explanation appoint note total lovely official date demonstrate construction middle yard unable acquire surely crisis west impose care tentoonstelling ervaren rechter formeel huisvesting voorbij betreffen vrijheid heer aantrekken uitleg benoemen opmerken totaal lieflijk officieel daten demonstreren constructie midden tuin niet in staat verkrijgen vast crisis westen opleggen zorgen ausstellung erfahren richter formal gehäuse vorüber betreffen freiheit herr anziehen erklärung ernennen beachten summe lieblich offiziell ausgehen mit demonstrieren konstruktion mitte hof unfähig erwerben sicherlich krise westen verhängen sich kümmern 149 udstilling opleve dommer formel boliger forbi vedrøre frihed herre tiltrække forklaring udnævne bemærke total dejlig officiel date demonstrere konstruktion midte gård ude af stand erhverve sikkert krise vest pålægge pleje utställning uppleva domare formell bostäder förbi angå frihet herre attrahera förklaring utnämna märka summa härlig officiell sällskapa demonstrera konstruktion mitt gård oförmögen förvärva säkert kris väst ålägga bry sig om 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 n n adv v a n n n adv n n n a adv v conj v v n v pron a v n n a prep v god favour before name equal capacity flat selection alone football victory factory rural twice sing whereas own head examination deliver nobody substantial invite intention egg reasonable onto retain god gunst voor noemen gelijk capaciteit flat selectie alleen voetbal overwinning fabriek landelijk twee keer zingen terwijl bezitten leiden examinatie bezorgen niemand substantieel uitnodigen intentie ei redelijk op behouden gott gunst zuvor nennen gleich kapazität wohnung auswahl allein fussball sieg fabrik ländlich zweimal singen während besitzen leiten prüfung liefern niemand wesentlich einladen intention ei angemessen auf behalten 150 gud gunst førend nævne lig kapacitet lejlighed udvælgelse alene fodbold sejr fabrik landlig to gange synge hvorimod eje lede gennemgang levere ingen væsentlig indbyde hensigt æg rimelig på beholde gud gunst före nämna lika kapacitet lägenhet urval endast fotboll seger fabrik lantlig två gånger sjunga då däremot äga leda granskning leverera ingen väsentlig inbjuda avsikt ägg rimlig på behålla 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 n n a a a v n n n n n v n aircraft decade cheap quiet bright contribute row search limit definition unemployment spread mark vliegtuig decennium goedkoop stil helder bijdragen rij zoektocht limiet definitie werkloosheid spreiden merkteken flugzeug jahrzehnt billig still hell beitragen reihe suche limit definition arbeitslosigkeit verbreiten marke 151 fly årti billig stille lys bidrage række eftersøgning grænse definition arbejdsløshed sprede mærke flygplan decennium billig stilla ljus bidra rad sökande gräns definition arbetslöshet sprida märke