Building a corpus for learning how to produce sequence Ciprian-Virgil Gerstenberger
Transcription
Building a corpus for learning how to produce sequence Ciprian-Virgil Gerstenberger
Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Building a corpus for learning how to produce atonal pronouns in the Romanian clitic sequence Ciprian-Virgil Gerstenberger Universitetet i Tromsø, Norge Learner Language, Learner Corpora Conference LLLC 2012 06.10.2012 Oulu, Finnland .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Outline Atonal pronouns: Why a special corpus? Language knowledge: How to build it? Language production: What are the benefits? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Outline Atonal pronouns: Why a special corpus? Language knowledge: How to build it? Language production: What are the benefits? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Outline Atonal pronouns: Why a special corpus? Language knowledge: How to build it? Language production: What are the benefits? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... General question How to deal with soft constraints in language production? free word order (e.g., in Finnish) − information structure, style? in-situ vs. extraposed relative clauses (e.g., in German) − clause weight, registrer? optional sandhi phenomena (e.g., in Romanian) − genre, register, dialect, sociolect, idiolect? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Specific question What triggers optional realizations of Romanian atonal pronouns? (1) a. Te rog sa˘ îl faci! [Please, do it!] ˘ faci! b. Te rog sa-l (2) a. Stiu ¸ ca˘ îi scrii emailuri. [I know that you write him/her emails.] ˘ scrii emailuri. b. Stiu ¸ ca-i (3) ˘ de treaba! ˘ a. Hai sa˘ ne apucam [Let’s start working!] ˘ de treaba! ˘ b. Hai sa˘ ne-apucam .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... (External) Sandhi Joining Epenthesis in English: a car vs. an old car Elision in French: la fille[the girl] vs. l’église[the church] Elision in Romanian: Tu îl vezi. vs. Tu-l vezi.[You see him/it.] ⇒ Sandhi can be marked graphically but it does’nt have to. ⇒ Elision in Romanian is always graphically marked ! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Sandhi in Romanian General Rule: avoid hiatus CV VC ⇒ C-VC ˘ Ma˘ apuc de treaba. [I start working.] ˘ M-apuc de treaba. ⇒ CV-C Tu îl vezi. [You see him/it.] Tu-l vezi. ⇒ CV-VC “ ˘ Te apuci de treaba. [You start working.] ˘ Te-apuci de treaba. .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Romanian atonal pronouns Accusative Number Person Type Gender Syllabic Non-syllabic onset Sg Pl coda 1. pers/refl m/f [m@] ma˘ [m] m- — 2. pers/refl m/f [te] te — 3. pers [te] te“ [l] l- m — f [o] o relf m/f [se] se pers/refl m/f [ne] ne 2. pers/refl m/f [v@] va˘ 3. pers m — f [le] le m/f [se] se 1. relf [o] o“ [s] s-, [se] se“ [ne] ne“ [v] v[i] i“ [le] le“ [s] s-, [se] se“ .. . .. . .. . [l] -l/îl — — — — [j ] -i/îi — — . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Romanian atonal pronouns Dative Number Person Type Syllabic Non-syllabic onset Sg Pl 1. pers/refl [mi] mi 2. pers/refl [tsi] ¸ti 3. pers [i] i relf [Si] s¸ i 1. pers/refl [ni] ni, [ne] ne 2. pers/refl [vi] vi, [v@] va˘ 3. pers [li] li, [le] le relf [Si] s¸ i coda [mj ] [mi] mi“ [tsi] ¸ti“ [i] i“ [Si] s¸ i“ [ne] ne“ [v] v- [j ] -i/îi [Sj ] -¸si/î¸si — — [le] le“ [Si] s¸ i“ .. -mi/îmi [tsj ] -¸ti/î¸ti — [Sj ] -¸si/î¸si . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Problems from a learner’s perspective Obligatory sandhi atonal pronouns − ˘ *M-am apucat de treaba. ˘ *Ma˘ am apucat de treaba. [I’ve started to work.] elsewhere − *într-un vis de vara˘ *între un vis de vara˘ [in a summer dream] .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Problems from a learner’s perspective Optional sandhi atonal pronouns − ˘ M-apuc de treaba. ˘ Ma˘ apuc de treaba. [I start to work.] elsewhere − O s-aduc cartea. O sa˘ aduc cartea. [I’ll bring the book.] .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Problems from a learner’s perspective Hyphennated non-reduced (=syllabic) forms as phonological hosts Ti–l ¸ cumperi. Sa˘ nu mi–¸ti pierzi timpul cu a¸sa ceva! [You buy it (for yourself).] [Don’t loose you time with such things.] in postverbal position ˘ Du–te acasa! [Go home!] as phonological hosts in postverbal position ˘ a–¸ ˘ ti–l ! Cumpar [Buy it!] .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Problems from a learner’s perspective Understanding: What kind of hyphen is it? hyphen as unreliable indicator for reduced forms ˘ Ti–ai ¸ cumparat cartea. [You’ve bought the book!] Ti–l ¸ cumperi. [You buy it.] Ti–o ¸ cumperi. [You buy it.] ˘ ti cumperi cartea! Sa–¸ [Buy the book!] ˘ Du–te acasa! [Go home!] ˘ Du–te–acasa! [Go home!] ˘ a–¸ ˘ ti–l! Cumpar [Buy it (for yourself)!] ˘ a--l! ˘ Cumpar [Buy it!] ˘ a--¸ ˘ ti cartea! Cumpar [Buy the book!] gray = syllabic atonal pronoun – non-syllabic – post-verbal black = reduced atonal pronoun -- non-syllabic .. . .. . .. . AND post-verbal . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Problems from a learner’s perspective Understanding: Which phonological form is it? grapheme-phoneme ambiguity ˘ a-¸ ˘ ti-l! [Buy Cumpar it!] / Ti-l ¸ cumperi.[You ˘ a-¸ ˘ ti cartea! [Buy Cumpar the book!] [tsi] buy it!] / θti cumperi cartea.[You ˘ Ti-ai ¸ cumparat cartea.[You ′ ve bought buy the book .] [tsj ] [tsi] “ the book .] .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Problems from a learner’s perspective Production: To hyphenate or not to hyphenate? ⇒ obligatory or optional hyphenation? ⇒ if optional, reduced or non-reduced form? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Problems from a learner’s perspective Production: To hyphenate or not to hyphenate? ⇒ obligatory or optional hyphenation? ⇒ if optional, reduced or non-reduced form? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... The choice issue Well-balanced mixture of jointed vs. non-jointed forms defining well-balanceness? domain of well-balanceness: clause, sentence, paragraph, text? counting only optional or both obligatory and optional instances? alignment, parallelity? Trebuie s-o faci s¸ i s-o dregi! Trebuie sa˘ o faci s¸ i sa˘ o dregi! [You have to do it and to mend it!] Trebuie sa˘ o faci s¸ i s-o dregi! Trebuie s-o faci s¸ i sa˘ o dregi! ⇒ Different rhythm! A matter of style? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... The choice issue Speech rate Alexandra Popescu (2003) Morphophonologische Phänomene des Rumänischen , PhD thesis, University of Düsseldorf, 2003 Optimality-Theoretic model: – reduced forms always win in faster speech rate – non-reduced forms always win in normal speech rate Popescu (2003), Ex. (21), p. 160 .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... The choice issue Speech rate (cont.) Alexandra Popescu (2003) Morphophonologische Phänomene des Rumänischen , PhD thesis, University of Düsseldorf, 2003 ⇒ speech rate is relative: no experimental setup ⇒ speech rate vs. number of syllable per time unit? ⇒ what about rhythm? ˘ Si ˘ ˘ “Emil Boc, du-te-acasa/ ¸ apuca-te de coasa!” “Emil Boc, go home/ And start scything!” .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... The choice issue Speech rate (cont.) Alexandra Popescu (2003) Morphophonologische Phänomene des Rumänischen , PhD thesis, University of Düsseldorf, 2003: (p. 179) ⇒ the OT model fails to account for all presented data “Es ist allerdings unklar, warum der Kandidat mit dem Vollvokal [1] neben dem Kandidaten c. mit dem Vollvokal [i] beim Normalsprechen gewinnen kann, obwohl er nach dem bisherigen Ranking schlechter ist als der Kandidat mit dem Vollvokal.” .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... The choice issue Mode, register, style ˘ pronumelui personal Maria Iliescu (1975) Pentru o sistematizare a predarii ˘ neaccentuat românesc (la studen¸tii straini), In Limba Român˘a 24, 1975 “În limba literara˘ îngrijita˘ se prefera˘ proume nelegate” “in well-groomed literary style, non-bound pronouns are preferred” “În stilul beletristic formele enlitice apar mai des” “in beletristic style, enclitic forms occur more often” ⇒ fuzzy formulations: "are prefered", "occur more often" ⇒ how to define well-groomedness? ⇒ how many styles to define? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Usage-based approach Corpus-driven solution Observation Realization of some optional reduced atonal pronouns occur far more often than their non-reduced counterparts. Jürgen Bredemeier (1976) Strukturbeschränkungen im Rumänischen. Studien zur Syntax der prä- und postverbalen Pronomina, TBL Verlag Gunter Narr, 1976 Why? How often? ⇒ Look into relevant data! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Web as Corpus? ˘ "Du-te-acasa!" ⇒ No fine-tuning possible! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Web as Corpus offering a wide range of usage-based instances of everything improvements (e.g., sematic web) are not (yet) useful for the current research issue even simple but relevant distinctions are not possible without a massive data cleanup (diacritica, hypens, misspellings, sloppy formulations, etc.) ⇒ Far too expensive at the moment! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Use existing Corpora Odense Grammatically Annotated Corpus of Romanian Business Revista pe care a¸ti realizat-o mi-a atras aten¸tia annotation and preprocessing changed the original string lacking atonal pronouns and auxiliaries, dangling hyphens ⇒ Not of much use! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... What to do? ⇒ Build a special corpus! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... General ideas account for specific phenomena (encountered instanced plus all optional variants) provide additional necessary linguistic annotation (part-of-speech) add accessible, relevant infos (spoken, written, genre, etc.) enable unification of specific annotated data with other layers (syntax, semantics, information structure) keep the original string on place use as much as possible copyright-free data .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Experimental data set Europarl Corpus Romanian part of the Europart Corpus parallel corpus extracted from the proceedings of the European Parliament original purpose: Statistical Machine Translation (SMT) freely available compared to Google data, much cleaner yet, still a huge amount of cleanup work .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Data evaluation size after the first cleaned up and broken into sentences using the default tools 224417 inc_europarl_ro.sent.txt size after cleanup foreign sentences and diacritica correction 223622 europarl_ro.sent.xml pseudo-senteces, formulaic senteces (parliament meetings) .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Usable data for the research question search for lines with at least a hyphen 56155 unique instances 53897 ⇒ Filter irrelevant hypen occurences! ⇒ Search for the non-reduced pronominal forms! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Language knowledge The small universe of atonal pronouns in Romanian local phenomenon relatively small number of forms modelling any possible combination (even non-grammatical ones – aka mal rules in error modelling) ⇒ exhausitve modelling .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Language knowledge Example: 1pers, Sg, Acc .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Annotation run Current state pattern + context-testing functions → current annotation state ⇒ add all other optional forms licensed by the given context .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Annotation run Intended state ⇒ Part-of-speech information needed! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Part-of-Speech annotation Current state whole corpus pos-tagged using http://www.racai.ro/webservices/TextProcessing.aspx .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Towards the final format Steps to do transform the MULTEX pos annotation into an xml format unify the annotation of optional sandhi with the pos annotation ⇒ ... and then? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... ... starts the real linguistic fun! Using the whole potential of the linguistic annotation Is there a significant difference between the occurences of ˘ ˘ ti? sa˘ îmi vs. sa-mi and, e.g., ca˘ î¸ti vs. ca-¸ taking more context into account (item before subjunction + item after the atonal pronoun) and count the syllable of the extended context? What about the rhythm changes in the context (cf. the huge amount and variation of reduced forms in the Romanian poetry)? include stylometric measurements ⇒ What triggers the choice of a specific surface form? .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Extending the linguistic playground Annotating more (copyright-free) data Romanian part of the JRC-Acquis Multilingual Parallel Corpus DEX – Dic¸tionarul Explicativ al Limbii Române Romanian Wikipedia – articles (elaborated, well-formulated text) – comments (informal, more personal) ⇒ Copyright-free data is shareable data! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Natural Language Generation vs. Language Learning sharing the need to produce well-formed, situationally adequate natural language utterances Why not sharing the knowledge as well? Why not the resources, too? ⇒ Sharing data is not like sharing a slice of bread, rather like Jesus’ bread and fish miracle! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Machine vs. human Transferability of constraint formulation from NLG to LL Is the constraint formatization from NLG transferable to the LL domain? Yes! Linearization and surface realization have to be applied on perceivable entities. ⇒ no room to generate partially empty strings ⇒ no room to linearize traces or empty categories .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Example of constraint formulation in NLG Obligatory sandhy in the sequence of atonal pronouns Rule: The rightmost item in the atonal pronoun sequence can not be an open syllable with nucleus [i]. Assuming the base form [ni] ni: Is it the rightmost atonal pron in the sequence? 1. yes ⇒ change from [ni] ni to [ne] ne Is there on the left an item to obligatorily attach to? 1.1 yes (e.g., [ne] ne [a] a dat) ⇒ attach [nea] ne-a dat 1.2 no (e.g., [ne]“ ne [dai] dai) “ ⇒ done [ne dai] ne dai “ 2. no (e.g., [ni] ni [le] le [dai] dai) “ ⇒ done [ni le dai] ni le dai “ .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Optional sandhi phenomena Exploting the specific language model analyse the context consult the specific language model give hints to students wrt. most appropriate form to choose .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Further possible applications Exploiting specific language resources design and implementation of different types of language learning exercises for training atonal pronouns specific feedback to production error types because of mal-rule like coding of non-licensed forms enriching existing analysis tools (parsers) with specific information .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Human vs. machine NLG too much of a technique, too little of a science Using NLG techniques for LL: rara avis Karin Harbusch et Al (2009) Computing Accurate Grammatical Feedback in a Virtual Writing Conference for German-Speaking Elementary-School Children: An Approach Based on Natural Language Generation, CALICO Journal, 26(3), 2009 Using LL research insights for NLG NLG too much of a Fiat!-domain: from the very beginning NLG paying very little attention to surface phenomena such as language variation or even orthography ⇒ modelling human language production: a real plus for NLG .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Atonal pronouns: Why a special corpus? ...................... Language knowledge: How to build it? ............ Language production: What are the benefits? ....... Conclusions motivating the need for special corpora for learning how to make decisions in case of optional surface realization reporting on the cumbersome process of building resources for special phenomena stressing the need of resource and insights sharing between fields with similar goals underlining the benefits of sharing resource between NLG and LL wrt. realization of atonal pronouns in Romanian ⇒ Share resources! .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. .