Stylebook for Tübingen Treebank - Seminar für Sprachwissenschaft
Transcription
Stylebook for Tübingen Treebank - Seminar für Sprachwissenschaft
Stylebook for the Tübingen Treebank of Written German (TüBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra Kübler, Heike Zinsmeister, Kathrin Beck Universität Tübingen Seminar für Sprachwissenschaft Wilhelmstr. 19 D-72074 Tübingen {telljohann,eh,kuebler,zinsmeis,kbeck}@sfs.uni-tuebingen.de January 2012 Abstract This stylebook is an updated version of Telljohann et al. (2009). It describes the design principles and the annotation scheme for the German treebank TüBa-D/Z developed by the Division of Computational Linguistics (Lehrstuhl Prof. Hinrichs) at the Department of Linguistics (Seminar für Sprachwissenschaft – SfS) of the Eberhard Karls Universität Tübingen, Germany. The guidelines focus on the syntactic annotation of written language data taken from the German newspaper ’die tageszeitung’ (taz). The unannotated taz newspaper material was taken from the Science CD (Wissenschafts-CD) of ’die tageszeitung’ (taz) that can be licensed from contrapress media GmbH (http://shop.taz.de/index.php?cat=c18_taz-Archiv.html). At present, the treebank comprises 65,524 sentences. The newspaper material is taken from the taz editions from 1989 41 articles 1992 July 10, 11, 13, 14 1995 October 14, 16, 17 1997 237 articles 1999 April 30, May 3 – 7 The average sentence length is 17.8 words and the total number of tokens currently amounts to 1,164,766. The TüBa-D/Z treebank is still under development. Thus, the number of annotated sentences will increase over time. Periodic data updates and accompanying updates of this stylebook will be made available at: http://www.sfs.uni-tuebingen.de/en/tuebadz.shtml Please consult this website in order to ensure that you are using the most recent and most complete version of the treebank. The annotation scheme for the TüBa-D/Z treebank is derived from the verbmobil treebank for spoken German, developed earlier (1997–2000) by the Division of Computational Linguistics of the SfS (Hinrichs et al. 2000). The TüBa-D/Z annotation scheme has been extended along various dimensions to accommodate the characteristics of written texts. In order to ensure the reusability of the data, a surface-oriented annotation scheme has been adopted that is inspired by the notion of topological fields and is enriched by a level of predicate-argument structure. The linguistic inventory used in the treebank annotation is based on a minimal set of assumptions that are uncontroversial among major syntactic theories. In this sense it is an attempt at theoryneutrality. i Acknowledgements Funding for the TüBa-D/Z has come from a variety of sources: • the Competence Center for Text- and Information Technology (Kompetenzzentrum für Text- und Informationstechnologie – KIT)) grant by the Ministry of Science, Research and the Arts Baden-Württemberg (funding since 2000); • the collaborative research center (Sonderforschungsbereich) grant SFB 441 – Linguistic Data Structures, project A1 – Representation and Automatic Acquisition of Linguistic Data funded by the German Research Council (Deutsche Forschungsgemeinschaft – DFG); • the collaborative research center (Sonderforschungsbereich) grant SFB 833 – The construction of meaning - the dynamics and adaptivity of linguistic structures, project A3 – Disambiguating Discourse Connectives using Corpus-induced Semantic Relations funded by the German Research Council (Deutsche Forschungsgemeinschaft – DFG); • the ESFRI research infrastructure project grants D-SPIN and CLARIND funded by the Federal Ministry of Education and Research (BMBF) (funding since 2008). A project of this scale would not be possible without the generous support from many contributors: Our special thanks go to ’die tageszeitung’ (taz) who kindly granted permission to process the newspaper data and to release the treebank. We would like to acknowledge Rosmary Stegmann for her many contributions to the treebank of spoken German in verbmobil. Her research laid the foundations for the annotation scheme of that treebank, which has been summarized in the ’Stylebook for the German Treebank in verbmobil’ (Stegmann et al. 2000). We would like to thank Manfred Sailer and Frank Richter for their helpful comments and support in form of encouragement and critical discussions from which we could strongly benefit for the challenging task of developing a dataoriented syntactic annotation scheme for spoken as well as for written German. Furthermore, we are indebted to Tylman Ule for his assistance with part-ofspeech tagging of the data and with data conversion. We would also like to acknowledge the support of Martina Liepert and Jorn Veenstra, who initiated and developed the integration of named entities into the annotation scheme. Moreover, we would like to thank Julia Trushkina (Trushkina 2004) and Yannick Versley (Versley et al. 2010) who provided the tools for morphological preprocessing. ii Furthermore, Yannick Versley (Versley et al. 2010) supported the project by developing a tool for lemma disambiguation and for the automatic integration of semantic classes of named entities. The quality of the treebank has been considerably improved by feature oriented consistency checks developed by Ventsislav Zhechev. Further consistency tests were contributed by Tylman Ule and Frank H. Müller in the course of their research work in the SFB 441. They deserve special mention for their support. We like to thank Vera Möller and Karin Naumann (2007) for annotating anaphora and coreference relations and also for doing an excellent job in documenting the concepts. Yannick Versley and Holger Wunsch supported the project in various aspects. In the course of their Ph.D. projects in the SFB 441 they enhanced the conceptual aspects of the anaphora resolution as annotated in the treebank. They also wrote mapping and conversion tools for integrating the anaphora annotion in the Export-XML format. For their diligence and dedication to the arduous task of linguistic annotation and of post-editing we thank our research assistants Janne Berlacher, Anne Brock, Armin Buch, Nadine Cetin, Marisa Delz, Silke Dutz, Katrin Eichler, Emilia Ellsiepen, Steffen Froemel, Holger Gauza, Simone Hartung, Daniel Hüttl, Heike Johannsen, Miriam Käshammer, Laura Kassner, Sarah Klug, Janina Kopp, Anuschka Kranz, Christian Kreß, Rebecca Kreß, Michael Kossack, Anne Lohse, Wolfgang Maier, Nicole Maruschka, Kai Metzger, Vera Möller, Simone Müller, Maja Pietsch, Brigitta Rist, Andreas Rudin, Maria Schmidt, Marie Schreier, Insa Starr, Melanie Störzer, Isabel Trott, and Dominikus Wetzel. They also improved the linguistic quality of the annotation by dedicated discussions on problematic and interesting examples. iii The development of the TüBa-D/Z treebank was notably facilitated by a number of former verbmobil partners whose contributions went well beyond the call of duty. Hans Uszkoreit and his colleagues at the Saarland University kindly provided us with the graphical annotation tool Annotate (Plaehn 1998) which was developed as part of the research project (Teilprojekt C3; Principal investigators: Uszkoreit/Smolka) Nebenläufige grammatische Verarbeitung (NEGRA) in the collaborative research center (Sonderforschungsbereich) 378. The Annotate tool provides human annotators with a graphical, user-friendly interface for annotating and editing trees and also offers database support for maintaining large treebanks. We would like to express our special gratitude to Thorsten Brants, who has kindly and generously provided us with software support and user assistance for the Annotate tool from the very beginning of the Tübingen treebank project. iv Contents Abstract i Acknowledgements iv List of Tables viii 1 Introduction 1 2 Major Challenges and Design Decisions 3 3 The Theoretical Basis of the Annotation Scheme 3.1 Topological Fields . . . . . . . . . . . . . . . . . . 3.1.1 The Concept of Topological Fields . . . . . 3.2 Constituent Analysis and Topological Fields . . . . 3.3 General Annotation Principles . . . . . . . . . . . 3.3.1 Flat Clustering Principle . . . . . . . . . . 3.3.2 Longest Match Principle . . . . . . . . . . . 3.3.3 High Attachment Principle . . . . . . . . . 3.4 The Structure of an Annotated Tree . . . . . . . . 3.4.1 The Levels of Annotation . . . . . . . . . . 3.4.2 The Inventory of Labels . . . . . . . . . . . 3.4.3 What Is a Syntactic Unit? . . . . . . . . . 3.4.4 Printing and Spelling Errors . . . . . . . . . 3.4.5 Isolated Phrases . . . . . . . . . . . . . . . 3.4.6 Long-Distance Dependencies . . . . . . . . 3.4.7 Empty Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 The Annotation of the Internal Structure of Phrases 4.1 Premodification and Postmodification in Phrases . . . 4.2 Noun Phrases . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Noun Phrases without Modifiers . . . . . . . . 4.2.2 Prenominal Modification . . . . . . . . . . . . 4.2.3 Postnominal Modification . . . . . . . . . . . . 4.2.4 Appositional Constructions . . . . . . . . . . . 4.2.5 Foreign Language Material . . . . . . . . . . . 4.2.6 Named Entity Annotation . . . . . . . . . . . . 4.2.7 Ordinal Numbers . . . . . . . . . . . . . . . . . v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 6 9 10 10 10 10 11 11 11 14 20 21 23 24 . . . . . . . . . 25 25 25 25 26 31 34 38 41 48 4.3 4.4 4.5 4.6 4.7 4.2.8 Cardinal Numbers . . . . . . . . . . . . . . . . . . . . 4.2.9 Letters and Non-Words . . . . . . . . . . . . . . . . . 4.2.10 Expletive and Other Uses of es . . . . . . . . . . . . . Determiner Phrases . . . . . . . . . . . . . . . . . . . . . . . Prepositional Phrases . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Prepositions . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Circumpositions and Postpositions . . . . . . . . . . . Adjectival Phrases . . . . . . . . . . . . . . . . . . . . . . . . Adverbial Phrases . . . . . . . . . . . . . . . . . . . . . . . . Verb Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Head of a Sentence and Verb Complex . . . . . . . . . 4.7.2 Verb Complexes in Verb-second and Verb-final Clauses 4.7.3 Ersatzinfinitiv Constructions . . . . . . . . . . . . . . 4.7.4 Infinitives with zu . . . . . . . . . . . . . . . . . . . . 4.7.5 Coherency and Incoherency of Verbal Constructions . 4.7.6 AcI Constructions . . . . . . . . . . . . . . . . . . . . 4.7.7 Imperatives . . . . . . . . . . . . . . . . . . . . . . . . 4.7.8 Particle Verbs . . . . . . . . . . . . . . . . . . . . . . 4.7.9 Verbs with Predicate . . . . . . . . . . . . . . . . . . . 4.7.10 Modal Verbs . . . . . . . . . . . . . . . . . . . . . . . 5 Attachment Principles for Phrases 5.1 Attachment to Fields . . . . . . . . . . . . . . 5.2 Attachment of Ambiguous Complements . . . . 5.3 Modifier Attachment . . . . . . . . . . . . . . . 5.3.1 Modifier Attachment in the Initial Field 5.3.2 Attachment across Punctuation Marks . 5.3.3 Ambiguous Modifiers in Isolated Phrases . . . . . . . . . . . 6 The Annotation of Sentences 6.1 Sentence Initial Fields . . . . . . . . . . . . . . . . 6.1.1 The C-Field in Verb-Final Clauses . . . . . 6.1.2 The C-Field in Verb-Second Clauses . . . . 6.1.3 The KOORD-Field in all Clause Types . . 6.1.4 The PARORD-Field in Verb-Second Clauses 6.1.5 Resumptive Constructions: The LV-Field . 6.2 Questions . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 W-Questions . . . . . . . . . . . . . . . . . 6.2.2 Yes - No Questions . . . . . . . . . . . . . . 6.3 Relative Clauses . . . . . . . . . . . . . . . . . . . 6.3.1 Event-modifying Relative Clauses . . . . . . 6.3.2 Independent Relative Clauses . . . . . . . . 6.4 Coordination . . . . . . . . . . . . . . . . . . . . . 6.4.1 Coordination of Phrases . . . . . . . . . . . 6.4.2 Asymmetric Coordination . . . . . . . . . . 6.4.3 Coordinations with Complex Conjunctions vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 50 51 54 55 55 58 58 63 65 65 65 67 69 71 72 73 74 75 78 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 80 80 81 83 83 84 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 86 86 88 88 89 90 91 91 91 92 94 94 95 96 97 98 . . . . . . . . . . . 6.5 6.4.4 Coordinations with Truncated Words . . . . . . . . . 6.4.5 Attachment Principles of Coordination within Phrases 6.4.6 Coordination of Topological Fields . . . . . . . . . . . 6.4.7 Attachment of Ambiguous Modifiers in Coordination . 6.4.8 Coordination of Sentences . . . . . . . . . . . . . . . . 6.4.9 Paratactic Constructions with denn and weil . . . . . 6.4.10 Conjunctions Occurring with Isolated Phrases . . . . . 6.4.11 Split Coordinations . . . . . . . . . . . . . . . . . . . Elliptical Constructions . . . . . . . . . . . . . . . . . . . . . 7 The Annotation of Specific Syntactic Phenomena 7.1 Superlative and Comparative Forms . . . . . . . . 7.1.1 Superlative Forms . . . . . . . . . . . . . . 7.1.2 The Comparative Particles wie and als . . . 7.2 Verbal and Adjectival Use of Participles . . . . . . 7.3 Topicalization . . . . . . . . . . . . . . . . . . . . 7.4 Headlines . . . . . . . . . . . . . . . . . . . . . . . 7.5 Discourse Markers . . . . . . . . . . . . . . . . . . 7.6 Parentheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Criteria for the Distinction of Grammatical Functions 8.1 Subcategorization of Verbs . . . . . . . . . . . . . . . . 8.2 Subcategorization of PREDs . . . . . . . . . . . . . . . . 8.3 Distinction of FOPP, OPP, and V-MOD . . . . . . . . . 8.4 Distinction of MOD, MOD-MOD, and V-MOD . . . . . 8.5 Distinction of ON, PRED, ON-MOD, and PRED-MOD . 9 The 9.1 9.2 9.3 9.4 9.5 TüBa-D/Z Data Formats The NEGRA Export Format The Penn Treebank Format The Export-XML Format . The TIGER-XML Format . The CoNLL Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 101 102 103 105 107 107 108 109 . . . . . . . . 112 112 112 112 115 116 117 119 121 . . . . . 123 123 123 124 125 125 . . . . . 128 128 132 135 137 139 References 140 Index 143 vii List of Tables 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 Three clause types according to Höhle (1986) . . . . . . Topological fields . . . . . . . . . . . . . . . . . . . . . Levels of annotation . . . . . . . . . . . . . . . . . . . The STTS tag set . . . . . . . . . . . . . . . . . . . . . Morphological feature combinations for lexical elements Values of morphological features . . . . . . . . . . . . . Node labels . . . . . . . . . . . . . . . . . . . . . . . . Edge labels . . . . . . . . . . . . . . . . . . . . . . . . Syntactic-Semantic Node Labels for Named Entities . . . . . . . . . . . 7 8 11 13 15 16 17 18 19 4.1 4.2 Semantic Classes and Subclasses for Named Entities . . . . . . . . . . . . Types of es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 53 viii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 Introduction The purpose of this report is to describe the design principles and annotation scheme for the TüBa-D/Z treebank of German. It is intended as a guide for the treebank annotators in Tübingen and for theoretical and computational linguists who want to use annotated treebank data for their own research. In addition, we hope that this report may be of some use for researchers who want to construct their own treebank for German or for some other language. We would like to emphasize that the annotation scheme is language-specific, and we advise against adopting this scheme without modification for some other language. However, we do believe that the type of design decisions that are reported here for German will arise for other languages as well. And it is in this sense that the current report could provide an useful point of reference. The TüBa-D/Z treebank was developed by the Division of Computational Linguistics (Lehrstuhl Prof. Hinrichs) at the Department of Linguistics (Seminar für Sprachwissenschaft – SfS) of the Eberhard Karls Universität Tübingen, Germany. The guidelines focus on the syntactic annotation of written language data taken from the German newspaper ’die tageszeitung’ (taz). The unannotated taz newspaper material was taken from the Science CD (Wissenschafts-CD) of ’die tageszeitung’ (taz) that can be licensed from contrapress media GmbH (http://shop.taz.de/index.php?cat=c18_taz-Archiv.html). At present, the treebank comprises 65,524 sentences. The newspaper material is taken from the taz editions from 1989 41 articles 1992 July 10, 11, 13, 14 1995 October 14, 16, 17 1997 237 articles 1999 April 30, May 3 – 7. The average sentence length is 17.8 words and the total number of tokens currently amounts to 1,164,766. The TüBa-D/Z treebank is still under development. Thus, the number of annotated sentences will increase over time. Periodic data updates and accompanying updates of this stylebook will be made available at: http://www.sfs.uni-tuebingen.de/en/tuebadz.shtml Please consult this website in order to ensure that you are using the most recent and most complete version of the treebank. The annotation scheme for the TüBa-D/Z treebank is derived from the verbmobil treebank for spoken German, developed earlier (1997–2000) by the Division of Compu1 tational Linguistics of the SfS (Hinrichs et al. 2000). The annotation scheme for the verbmobil treebank has been summarized in the ’Stylebook for the German Treebank in verbmobil’ (Stegmann et al. 2000). The TüBa-D/Z annotation scheme has been extended along various dimensions to accommodate the characteristics of written texts. In order to ensure the reusability of the data, the linguistic inventory used in the treebank annotation is based on a minimal set of assumptions that are uncontroversial among major syntactic theories. In this sense it is an attempt at theory-neutrality. The TüBa-D/Z treebank is released in five different data formats : the Negra Export format, the Export-XML format, the TIGER-XML format, the Penn treebank format, and the CoNLL format. More information about each data format is given in chapter 9. To the best of our knowledge, the verbmobil treebank for spoken German is still the only treebank based on non-genre-specific German speech data. It is released as TüBa-D/S treebank (http://www.sfs.uni-tuebingen.de/en/tuebads.shtml). For written texts, TüBa-D/Z is not the only treebank available for German. Two other (semi-)manually annotated treebanks are currently available, each with their own annotation scheme: the Negra treebank (http://www.coli.uni-saarland.de/projects/ sfb378/negra-corpus/) and the TIGER treebank (http://www.ims.uni-stuttgart. de/projekte/TIGER/). The Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z; http://www. sfs.uni-tuebingen.de/en/tuepp.shtml) is a project closely related to the TüBa-D/Z treebank. It consists of 200 million word tokens of the Science CD (Wissenschafts-CD) of ’die tageszeitung’ (taz), including the sentences which are annotated in the TüBa-D/Z treebank. The texts were automatically annotated with clause structure, topological fields, and chunks, in addition to more low level annotation including parts of speech and morphological ambiguity classes. The first release of TüBa-D/Z (12/2003) functioned as training corpus. 2 Chapter 2 Major Challenges and Design Decisions Most syntactic theories consider individual sentences as the primary domain of linguistic theorizing and of syntactic annotation. For written language, the segmentation into sentences is largely unproblematic and coincides with the domain of syntactic analysis. However, newspaper texts exhibit a number of phenomena that do not lend themselves easily to a purely sentence-based annotation. These phenomena include: headlines, titles, parentheses, discourse markers, and sentence conjunction by a colon. These cases are described in more detail in sections 3.4.3 to 3.4.5 of this stylebook. The second main question, which needed to be addressed at the outset of the project was the inventory of syntactic categories and grammatical functions to be used for syntactic annotation and specification of predicate-argument structure. Here our choices were guided by two main considerations: 1. Linguistic adequacy and theory-neutrality: For the purposes of reusability of the treebank data, the annotation scheme should not reflect a commitment to a particular syntactic theory. Rather, the inventory of categories should be a reflection of common assumptions that syntacticians share across different frameworks concerning questions of constituenthood, phrase attachment, and grammatical functions. On this note, the annotations should be theory-neutral and minimal. This desideratum is of utmost importance so as to ensure the reusability of the annotated data. At the same time, the annotation scheme should reflect as much as possible those empirical generalizations that syntacticians, especially from a descriptive perspective, have identified as characteristic of the language in question. 2. Balancing the needs of potential users: Since the construction of a treebank is a labor-intensive and costly enterprise, ideally the TüBa-D/Z treebank should appeal to as many potential users as possible. Moreover, the treebank should be of interest to researchers of a wide range of different fields. Considering the renewed interest in the use of corpora for both theoretical and computational linguistics, choicepoints in the annotation scheme should be resolved in such a way that the needs of potential users are balanced as much as possible. 3 To support the use of the TüBa-D/Z treebank in computational linguistics, the annotation scheme should be sensitive to processing considerations, as long as linguistic adequacy of the choice of annotations is not compromised. Ceteris paribus, processing considerations favor annotation schemes that pay close attention to properties of syntactic surface structure, particularly to word order regularities and distributional properties of words and phrases. At the same time, the use of empty categories and data structures with crossing dependencies among phrases are to be avoided if the annotations are to be used for parsers that rely on the context-freeness of the underlying grammar. In order to satisfy the above aims, the annotation scheme is surface-oriented and context-free. The theoretical assumptions underlying the levels of annotation and the choice of labels themselves are as much as possible based on a rich tradition of theoretical and empirical research on German syntax. For the treatment of word regularities of German, which is a language with relatively free word order, an inventory of topological fields is incorporated into the annotation scheme. Topological fields in the sense of Herling (1821), Erdmann (1886), Drach (1937), and Höhle (1986) are widely used in descriptive studies of German syntax. Such fields constitute an intermediate layer of analysis above the level of individual phrases and below the clause level. The concept of topological fields favors tree-based annotations, i.e. bracketings that do not rely on crossing or discontinuous dependencies. Instead, such nonlinear dependencies are to be expressed at the level of predicate-argument structure which constitutes a second level of annotation with its own descriptive inventory of grammatical functions. The framework of topological fields is widely used in empirical and theoretical accounts of German syntax. Thus, it is in the linguistics literature. This greatly facilitates thorough training of human annotators, since they can rely on the pre-existing body of literature. One purpose of this stylebook is to add to these reference materials. Currently, a total of 25 syntactic node labels for the encoding of constituent structures are being used. These include labels for topological fields as well as labels for phrases and their constituent parts. In order to capture grammatical functions of individual phrases and syntactic dependencies between phrases, constituent structure trees are enriched by a set of edge labels between constituent structure nodes. The current inventory of edge labels comprises 42 distinct categories. In addition to these primary edge labels, four secondary edge labels are used. These labels indicate phrase-internal government of elements in the verb complex, express phrase-internal modification of noun phrases, resolve long-distance dependencies among modifiers, or relate the phrasal complements of so-called third-construction control verbs. For certain computational applications, robust identification of named entities, e.g. person names, names of companies and institutions, names of geographical locations, is a major concern. Therefore, such named entities are identified by a special node label, and their internal structure is sometimes identified by an additional secondary edge label that is used exclusively for named entities. At the word level, part-of-speech labels are assigned according to the Stuttgart-Tübingen tag set, which is widely accepted for part-of-speech tagging for German and which provides an inventory of 54 distinct part-of-speech labels. In addition, information on inflectional morphology is given. 4 Detailed information about the complete inventory of node labels, edge labels, partof-speech labels and inflectional feature clusters is given in section 3.4.2 of this stylebook. The remainder of this stylebook is organized as follows: chapter 3 offers an overview of the theoretical foundations of the annotation scheme, focusing on the concept of topological fields (3.1) and its relation to constituent structure (3.2), on general annotation principles (3.3), as well as an overview of the annotation levels and of the inventory of the annotation labels for each level (3.4). Chapter 4 concerns the annotation of the internal structure of phrases, broken down into major word classes and their phrasal projections. Chapter 5 addresses the principles for relating individual phrases to each other, particularly for modifier and complement attachment. Chapter 6 discusses the annotation of entire sentences, focusing on the relationship between sentence types and topological fields, coordination (including phrasal conjunction) and elliptical constructions. Chapter 7 is devoted to the annotation of miscellaneous syntactic constructions such as comparatives, verbal and adjectival participles, topicalization, newspaper headlines, discourse markers, and parentheses, which each pose special challenges for the annotation tasks. Chapter 8 describes the criteria used for distinguishing different grammatical functions. Chapter 9 describes the five different data formats in which the TüBa-D/Z treebank is distributed. The stylebook concludes with a bibliography and a subject index. We do not consider the annotation level of anaphora and coreference relations in this stylebook. Please consult (Naumann and Möller 2007) for a detailed description of these phenomena. 5 Chapter 3 The Theoretical Basis of the Annotation Scheme 3.1 Topological Fields The annotation scheme for the TüBa-D/Z treebank has been developed with special regard to the characteristics of the German language: the interaction of configurational and non-configurational syntactic properties, which arise from the partially free word order. On the one hand, there exist three different clause types with respect to the fixed position of the finite verb (verb-second (V-2), verb-initial (V-1), and verb-final (V-end)). On the other hand, there is a high degree of variability of complements and adjuncts. In order to treat the relatively high degree of word order freedom in German, the treebank adopts the notion of topological fields as the primary clustering principle of a sentence. The basic characteristics of the model of topological sequences within a German sentence were originally formulated by Herling (1821) and Erdmann (1886). Herling (1821) developed an adequate topological theory for complex sentences in which clauses form a topological unit carrying a syntactic function and he mentioned the special position of the finite verb in verb-second und verb-final clauses. Erdmann (1886) established the basics of a theory of topological fields and pointed out that the first position in a clause is not necessarily the subject position. The so called Herling/Erdmann scheme already covers a set of word order regularities which apply for all three clause types of German. Later Drach (1937) introduced the notion of field. Finally, Höhle (1986) developed topological schemes for the three clause types. 3.1.1 The Concept of Topological Fields In a German clause, the finite verb can appear in three different positions: verb-second, verb-initial, and verb-final. Only in verb-final clauses the verb complex consisting of the finite verb and non-finite verbal elements forms a unit. The discontinuous positioning of the verbal elements in verb-first and verb-second clauses is the traditional reason for structuring German clauses into fields. The positions of the verbal elements form the Satzklammer (sentence bracket) which divides the sentence into a Vorfeld (initial field), a Mittelfeld (middle field), and a Nachfeld (final field). The Vorfeld and the Mittelfeld 6 are divided by the linke Satzklammer (left sentence bracket), which is the finite verb, the rechte Satzklammer (right sentence bracket) is the verb complex between the Mittelfeld and the Nachfeld. Thus, the theory of topological fields states the fundamental regularities of German word order. It is an important basis for the topological analysis of any German sentence, since subclauses and embedded clauses are treated within the bounds of fields. Identical word order regularities within a specific field can be realized in all three clause types. But the fields themselves differ in their possible elements and grammatical rules. Therefore, the theory is a descriptive rather than explanatory theory for a specific language. Höhle (1986) denotes the three clause types as E-Sätze (verb-final clauses), F1-Sätze (verb-initial clauses), and F2-Sätze (verb-second clauses). The topological schemes of these types are listed in Table 3.1. Table 3.1: Three clause types according to Höhle (1986) E-Sätze (KOORD) - (C) - X - VK - Y F1-Sätze (KOORD - (KL) - FINIT - X - VK - Y F2-Sätze (KOORD or PARORD) - (KL) - K - FINIT - X - VK - Y Abbreviations and explanations used in Table 3.1: VK: verb complex FINIT: element denoting categories of finiteness KOORD: coordinating particles (e.g. und, oder) PARORD: non-coordinating particles (e.g. denn, weil) X, Y: sequence of any number of constituents C: complementizer K: one constituent KL: nominativus pendens, resumptive construction (Linksversetzung) These schemes topologically analyse not only atomic sentences but also complex sentence constructions which contain embedded clauses. Such embedded clauses can occur in a Linksversetzung (resumptive construction), Vorfeld, Mittelfeld, or Nachfeld. Herling’s theory of the coordination and embedding of sentences covers these phenomena in detail (Herling 1821). According to Höhle (1986), we assume the existence of the following topological fields (cf. Table 3.2): The following description of the topological fields does not claim completeness regarding all descriptive details but rather mentions their main characteristics.1 1 In the following, the abbreviations for the fields listed in Table 3.2 are used. 7 Table 3.2: Topological fields Field VF LK MF VC NF LV C KOORD PARORD Description Vorfeld (initial field) Linke (Satz-)Klammer (left sentence bracket) Mittelfeld (middle field) Verbkomplex (verb complex) Nachfeld (final field) Linksversetzungsfeld (field for resumptive constructions) C-Feld (field for complementizers, left from MF) Koordinationsfeld (field for coordinating particles) left-most element, optionally in all clause types, (e.g. und, oder) Koordinationsfeld (field for non-coordinating particles) left-most element, optionally only in verb-second (e.g. denn, weil) VF: The Vorfeld consists of only one constituent. Usually it is the subject2 . But because of the high degree of non-configurationality in German, the subject can also occur in the Mittelfeld, thus allowing almost every other constituent to occupy the Vorfeld. LK: The Linke Klammer is the position of the finite verb in verb-second and verb-first clauses or a conjunction in verb-final clauses. It consists of exactly one element. MF: Apart from those units which are optionally located in other fields, any nonverbal constituent may occur in the Mittelfeld. It consists of a sequence of any number of constituents. The linear order of the constituents depends on the specific word order principles for German and their interaction. VC: The Verbkomplex is a sequence of verb forms. In verb-second and verb-first clauses it consists of one or more non-finite elements or - depending on the verb - of a separable prefix. In verb-final clauses it also contains the finite verb. The rule for the linear order in general is: right determines left. If there is a finite verb in the verb complex, it is usually the right-most element (exception: Ersatzinfinitiv constructions (daß er sich ein neues Konzept wird überlegen müssen) (cf. 4.7.3). NF: For some clause types (e.g. so daß-Sätze), the Nachfeld is the obligatory position. Embedded complement clauses, relative clauses, and single constituents can optionally occur in the Nachfeld. In contrast to the Vorfeld it may be occupied by any number of constituents. LV: The Linksversetzungsfeld is a field for the left-dislocated phrase of resumptive constructions. A Linksversetzung is a pendent constituent. It can be regarded as a 2 In the fifth release, 52.5% of all Vorfeld fields host the subject. 8 syntactic anticipation of a part of a sentence (cf. 6.1.5). There are many restrictions which apply for this position. C: The C-Feld only occurs in verb-final clauses (exception: the conjunction als in subordinated sentences of comparison als wäre es nie geschehen.). It is obligatorily occupied in finite verb-final clauses if there is no conjunction in the Linke Klammer. In non-finite verb-final clauses the C-position may be empty. This field can be occupied by conjunctions of sentential objects (e.g. daß, ob) or sentence initial conjunctions like um, obwohl, wenn and also by complex interrogative or relative phrases, e.g. ..., ’um wieviel Geld’ geht es dabei? / ..., ’an der’ Max Daniel Professor für Klavier ist. (cf. 6.1.1). KOORD: The KOORD-field is the field for coordinating particles. In contrast to the PARORD-field, it can optionally occur as the left-most element of all clause types (cf. 6.1.3). PARORD: The PARORD-field is the field for non-coordinating particles which optionally occur as the left-most element of a verb-second clause (cf. 6.1.4). Concerning the distribution of constituents to topological fields see also the chapter Deskriptive Generalisierungen in Grewendorf (1991). The combination of these fields in order to constitute verb-first, verb-second, or verbfinal clauses is described in Höhle (1986). The topological model, which is the basis of most traditional German grammars, only provides descriptive parameters concerning the sentence structure without making any statement about the regularities within the fields and the hierarchical constituent structure of the sentence. For more complicated phenomena, it offers only a catalogue of detailed descriptions. 3.2 Constituent Analysis and Topological Fields The main weakness of the concept of topological fields is the above-mentioned fact that the hierarchical constituent structure of a sentence cannot be described. The aim is to find a form of representation which combines the topological model with a constituent analysis in order to describe the hierarchy of the linguistic units within the fields. In our annotation scheme, the integration of a constituent analysis was achieved by a second level of annotation strictly within the bounds of topological fields: a predicate-argument structure with its own descriptive inventory of syntactic categories and grammatical functions. The constituent structure is represented by phrase structure trees (phrase markers) whose node and edge labels carry this information. In order to analyse syntactic constructions, it is necessary to define the number and types of constituents within the fields. 9 1. Number of constituents within the fields: In general, C, LK, KOORD, PARORD, and VF contain only one constituent. More than one constituent is allowed within MF and NF. 2. Types of constituents within the fields: Phrasal constituents occur in VF, MF, NF and C (interrogative or relative phrases). Embedded clauses either belong to NF, VF, LV, or in some cases to MF. Usually, outside the spoken language context, verb-final clauses do not occur isolated. They need to be attached if possible. 3.3 General Annotation Principles Our annotation scheme tries to find a trade-off between pragmatic requirements on the one hand and linguistic reality on the other hand. The following three common annotation principles are adopted to group the constituents within a syntactic tree: the flat clustering principle, the longest match principle, and the high attachment principle. 3.3.1 Flat Clustering Principle The flat clustering principle keeps the number of hierarchy levels in a syntactic structure as small as possible. As a consequence, any degree of branching is allowed. Constituents which cannot be assigned a grammatical function within a syntactic construction are structured as much as possible, but are not typically connected to surrounding constituents as a whole. 3.3.2 Longest Match Principle The longest match principle demands that as many daughter nodes as possible are combined into a single mother node, provided that the resulting construction is syntactically as well as semantically well-formed. 3.3.3 High Attachment Principle The high attachment principle prescribes that syntactically and semantically ambiguous modifiers are attached to the highest possible level in a tree structure. Premodifiers and postmodifiers are treated in a different way. First, both kinds of modifiers are projected to their phrase level. Since the modification scope of premodifiers is unambiguous, they are directly attached to the head of the phrase which they are modifying. By contrast, postmodifiers are always attached on a higher level to preserve ambiguity. This decision was taken to avoid the problematic distinction whether a postmodifier is a free adjunct or a complement of the modified phrase. 10 3.4 3.4.1 The Structure of an Annotated Tree The Levels of Annotation A syntactic tree consists of nodes and edges. Nodes represent constituents on different levels of annotation. Edges always link daughter nodes to a mother node. The root node of a tree is assumed as the sentence node of a construction. One level below the sentence node, the nodes of the topological fields are located. This is the reason why topological fields can be regarded as the top-level ordering principle for sentences in the treebank. The sequence of the fields in the three clause types never violates the topological schemes given by Höhle (1986). Within each sentence structure, in general at least two topological fields are occupied (exception: infinitive constructions, (cf. 4.7.4). Others may be left empty (elliptical constructions, cf. 6.5). Table 3.3 lists the four levels of annotation which we distinguish within the structure of an annotated syntactic tree3 : Table 3.3: Levels of annotation Level clause level field level phrase level lexical level Inventory root node labels for different types of clauses node labels for topological fields (including labels for conjuncts of fields) node labels for syntactic categories (including syntactic-semantic node labels for named entities) and edge labels for grammatical functions lexical entries tagged with the part-of-speech (POS-)tags taken from the STTS tag set (Schiller et al. 1995) and with morphological features (Trushkina 2004, Versley et al. 2010) and lemmas (Versley et al. 2010) Node labels denote the syntactic category of a phrase or sentence, a topological field, or a grammatical property. Edge labels denote the grammatical function of lexical entries, phrases, topological fields, and clauses. 3.4.2 The Inventory of Labels The part-of-speech tags used for the annotation are taken from the Stuttgart-Tübingen tag set (STTS) (Schiller et al. 1995).4 The STTS is a guideline for the annotation of German text corpora on the lexical level. Every single part-of-speech of a text is assigned one specific tag. The tag set consists of the tags listed in Table 3.4 (cf. (Schiller et al. 1995)). The tagging of the data was performed by the tnt tagger (Brants 1998) and manually corrected with the Annotate tool (Plaehn 1998). 3 We do not consider the suprasentential annotation level of anaphora and coreference relations in this stylebook. Please consult (Naumann and Möller 2007) for a detailed description of these phenomena. 4 PAV was changed into a new tag called PROP (pronominal form of a prepositional phrase) in order to justify PX as the syntactic category of its mother. 11 The morphological tags give information about inflectional morphology and include features such as case, number, person, etc. A specific combination of feature-value pairs is defined for each relevant part-of-speech category, see Table 3.5 for the list of part-of-speech categories that are annotated with morphological features and the corresponding feature combinations. The values are represented in a cluster by single character abbreviations, see Table 3.6 for the set of features and their values. Features can uniquely be identified by their position in the cluster. Node labels indicate the syntactic category of a phrase or sentence, but they are also used to label topological fields and sequences of topological fields within coordinations or to indicate specific grammatical properties of constituents. Table 3.7 lists all node labels which are used in the treebank. (An additional node is introduced for named entities, see Table 3.9) Edge labels indicate the grammatical function of lexical entries, phrases, topological fields, and clauses. Since case information is given and a distinction of different modifiers is made by these labels, the syntactic tree structures also contain semantic roles. The specific set of edge labels for the German treebank is listed in Table 3.8, including secondary edge labels. The latter ones are used to resolve ambiguities on a different level of description. Two specific edge labels denote whether a constituent has the function of a head (HD), e.g. a phrase (NX, PX, ADJX, ADVX, VXFIN, VXINF), or a non-head (-), e.g. a determiner or a modifier attached to a phrase. On any annotation level, there is at most one head. Within phrases, these two labels indicate the internal dependency structure of the phrase. The head of a sentence structure (e.g. SIMPX) is always the finite verb. In coordinations, each conjunct depends on the head of the whole construction and is denoted with a specific edge label (KONJ) in order to distinguish them from conjunctions and modifying elements within a coordination (see 6.4.1 and 6.4.3). Edge labels below all root node labels carry only non-head labels (cf. (Kübler and Telljohann 2002)). In an enhanced version of the TüBa-D/Z treebank, each named entity is assigned one of the following semantic classes: person (PER), organisation (ORG), location (LOC), geopolitical entity (GPE), or other (OTH). The semantic class OTH comprises all remaining named entities not fitting into PER, ORG, LOC, or GPE (cf. 4.2.6). In order to annotate these semantic classes, syntactic-semantic node labels of the pattern syntactic category = semantic class are defined as the mother node of named entities (see Table 3.9). These syntactic-semantic nodes indicate that the structure below represents a (complex) named entity of a certain syntactic category belonging to one of the five semantic classes (e.g. Ute Wedemeier (NX=PER), The Jim Wane Swingtett (NX=ORG), Sögestraße (NX=LOC), Auf die stürmische Art (PX=OTH) (cf. 4.2.6). The former node label ’EN-ADD’ and the secondary edge label ’EN’ are deleted. The internal syntactic structure of named entities is governed by the general annotation rules. All parts below a syntactic-semantic node that do not belong to the named entity itself are marked as ’-NE’, e.g. [[die (-NE)] AWO] (NX=ORG), [[Der (-NE)] zweite Weltkrieg] (NX=OTH). 12 Table 3.4: The STTS tag set POS = ADJA ADJD ADV APPR APPRART APPO APZR ART CARD FM description attributive adjective adverbial or predicative adjective adverb preposition; left circumposition preposition + article postposition right circumposition definite or indefinite article cardinal number foreign language material ITJ KOUI interjection subordinating conjunction with zu + infinitive subordinating conjunction with clause coordinative conjunction particle of comparison, no clause noun proper noun substituting demonstrative pronoun attributive demonstrative pronoun substituting indefinite pronoun attributive indefinite pronoun without determiner attributive indefinite pronoun with determiner irreflexive personal pronoun substituting possessive pronoun attributive possessive pronoun substituting relative pronoun attributive relative pronoun reflexive personal pronoun substituting interrogative pronoun attributive interrogative pronoun adverbial interrogative or relative pronoun pronominal adverb KOUS KON KOKOM NN NE PDS PDAT PIS PIAT PIDAT PPER PPOSS PPOSAT PRELS PRELAT PRF PWS PWAT PWAV PROP 13 examples [das] große [Haus] [er fährt] schnell, [er ist] schnell schon, bald, doch in [der Stadt], ohne [mich] im [Haus], zur [Sache] [ihm] zufolge, [der Sache] wegen [von jetzt] an der, die, das, ein, eine zwei [Männer], [im Jahre] 1994 [Er hat das mit “] A big fish [” übersetzt] mhm, ach, tja um [zu leben], anstatt [zu fragen] weil, daß, damit, wenn, ob und, oder, aber als, wie Tisch, Herr, [das] Reisen Hans, Hamburg, HSV dieser, jener jener [Mensch] keiner, viele, man, niemand kein [Mensch], irgendein [Glas] [ein] wenig [Wasser], [die] beiden [Brüder] ich, er, ihm, mich, dir meins, deiner mein [Buch], deine [Mutter] [der Hund,] der [der Mann ,] dessen [Hund] sich, einander, dich, mir wer, was welche [Farbe], wessen [Hut] warum, wo, wann, worüber, wobei dafür, dabei, deswegen, trotzdem POS = description examples PTKZU PTKNEG PTKVZ PTKANT PTKA TRUNC VVFIN VVIMP VVINF VVIZU VVPP VAFIN VAIMP VAINF VAPP VMFIN VMINF VMPP XY zu + infinitive negation particle separated verb particle answer particle particle with adjective or adverb truncated word - first part finite main verb imperative, main verb infinitive, main infinitive + zu, main past participle, main finite verb, aux imperative, aux infinitive, aux past participle, aux finite verb, modal infinitive, modal past participle, modal non-word containing special characters comma sentence-final punctuation other sentence internal punctuation zu [gehen] nicht [er kommt] an, [er fährt] rad ja, nein, danke, bitte am [schönsten], zu [schnell] An– [und Abreise] [du] gehst, [wir] kommen [an] komm [!] gehen, ankommen anzukommen, loszulassen gegangen, angekommen [du] bist, [wir] werden sei [ruhig !] werden, sein gewesen dürfen wollen [er hat] gekonnt D2XW3, letters $, $. $( , . ? ! ;: -[]() The following POS categories do not contain any morphological information and are assigned the morphological label ”- -”: ADJD, ADV, APZR, CARD, FM, ITJ, KOUI, KOUS, KON, KOKOM, PWAV, PROP, PTKZU, PTKNEG, PTKVZ, PTKANT, PTKA, TRUNC, VVIZU, VVPP, VAPP, VMPP, XY, $, , $. , $( . 3.4.3 What Is a Syntactic Unit? The newspaper articles of the taz have been defined as the primary segmentation domain of the data. They are preprocessed into syntactic units delimited by punctuation marks (. ? ! ; - ... /) for which specific rules demand or forbid segmentation. Each syntactic unit is assigned a specific code which identifies its origin in the newspaper data, e.g. T990507.123 (T (taz) 99 (year) 05 (month) 07 (day) 123 (article)). A syntactic unit usually consists of one complete sentence structure with a root node (SIMPX, R-SIMPX, P-SIMPX). But it may also consist of one or more sentences and/or phrases, e.g. headlines, titles, sentences with parentheses, sentences with discourse markers, or sentence conjunction by a colon. An annotated tree is a complete syntactically and semantically well-formed construction according to the longest match principle. The model of topological fields does not prescribe that all fields have to be occupied. The fact that fields can be left empty, also helps us to cope with elliptical constructions (cf. 6.5). 14 Table 3.5: Morphological feature combinations for lexical elements POS ADJA feature combination case number gender APPR case APPRART APPO ART NN case number gender case case number gender case number gender NE PDS PDAT PIS case case case case PIAT case number gender PIDAT case number gender PPER PWS case number gender person case number gender case number gender case number gender case number gender case number gender person case number gender PWAT case number gender PPOSS PPOSAT PRELS PRELAT PRF number number number number gender gender gender gender comments underspecified for gender if plural noun is underspecified, e.g. die/np* nordhessischen/np* Grünen/np* invariant local description e.g. Berliner/*** cardinal numbers as abbreviation: full morphology e.g. im 4./dsn Jahrhundert/dsn without case if a preposition takes another PP as complement, e.g. bis/ zu/d einer/dsf Woche/dsf and in the construction was für ein(er/e/...) underspecified for gender, e.g. Abgeordnete (in plural), Leute underspecified: man/ns* nichts/*** (cf. nix, sowas) PIS or PIAT: allerhand/*** (cf. allerlei, allzuviel, dergleichen, derlei, etwas, genausoviel, genug, genügend, keinerlei, mehr, reichlich, soviel, viel, wenig, weniger, zuviel, zuwenig) PIDAT or PIS: sowas/*** (cf. paar, bißchen) plural is underspecified for gender, e.g. lauter/***, see also ’PIS or PIAT’ below solch/*** (cf. manch, welch, all), see also ’PIS or PIDAT’ below plural is underspecified for gender sich: underspecified for gender underspecified for gender: plural forms and wer, wem, wen wessen/*** 15 POS VAFIN VAIMP VMFIN VVFIN VVIMP feature combination person number mood tense number person number mood tense person number mood tense number comments German has only second person imperative forms Table 3.6: Values of morphological features Feature case gender number mood person tense Values n (nominative), g (genitive), d (dative), a (accusative), * (underspecified) m (masculine), f (feminine), n (neuter), * (underspecified) s (singular), p (plural), * (underspecified) i (indicative), k (subjunctive; German ’Konjunktiv’) 1 (first), 2 (second), 3 (third), * (underspecified) s (present), t (past) Punctuation is not annotated, i.e., all punctuation marks are not attached to the tree structure. Exceptions are punctuation marks which carry a semantic meaning within a sentence, e.g. - (bis, und) in expressions like 15.30 - 17.30 Uhr. They are tagged according to the part of speech that they represent in the text (cf. 4.4.1). Constituents are not attached to a tree if they are not assigned a grammatical function within the specific syntactic construction. The following tree diagram shows two annotated trees in one syntactic unit:5 5 These tree diagrams and all following tree diagrams in this report were generated with the aid of the Negra Annotate tool. 16 Table 3.7: Node labels Node Labels ADJX ADVX DP FX NX PX VXFIN VXINF LV C FKOORD KOORD LK MF MFE NF PARORD VC VCE VF FKONJ DM P-SIMPX R-SIMPX SIMPX Description Phrase Node Labels adjectival phrase adverbial phrase determiner phrase (e.g. gar keine) foreign language phrase noun phrase prepositional phrase finite verb phrase non-finite verb phrase Topological Field Node Labels resumptive construction (Linksversetzung) complementizer field (C-Feld) coordination consisting of conjuncts of fields field for coordinating particles left sentence bracket (Linke (Satz-)Klammer) middle field (Mittelfeld) middle field between VCE and VC final field (Nachfeld) field for non-coordinating particles verb complex (Verbkomplex) verb complex with the split finite verb of Ersatzinfinitiv constructions initial field (Vorfeld) conjunct consisting of more than one field Root Node Labels discourse marker paratactic construction of simplex clauses relative clause simplex clause 17 Table 3.8: Edge labels Edge Labels Description Edge Labels denoting Heads and Conjuncts HD head non-head KONJ conjunct Complement Edge Labels ON nominative object (i.e. subject; also clausal subjects) OD dative object OA accusative object OG genitive object OS sentential object OPP prepositional object OADVP adverbial object OADJP adjectival object PRED predicate OV verbal object FOPP facultative (i.e. optional) prepositional object, passivized subject (von-phrase) VPT separable verb prefix APP apposition Modifier Edge Labels MOD ambiguous modifier ON-MOD, OA-MOD, OD-MOD, modifiers modifying complements or modifiers, OG-MOD, OS-MOD, OPP-MOD, e.g. V-MOD = modifier of the verb FOPP-MOD, PRED-MOD, OADJP-MO, OADVP-MO, V-MOD, MOD-MOD Edge Labels in Split Coordinations ONK, OAK, ODK, OGK, second conjunct (K) in OPPK, FOPPK, PREDK, split coordinations OSK, OADVPK, OA-MODK, e.g. ONK = second conjunct MODK, V-MODK of a nominative object Edge Label denoting Structural Expletive ES Vorfeld-es Secondary Edge Labels dependency relation between: refvc two verbal objects in VC refmod two ambiguous modifiers refint a phrase internal part and its modifier refcontr control verb and its complement across clause boundaries 18 Table 3.9: Syntactic-Semantic Node Labels for Named Entities Labels Description Syntactic-Semantic Node Labels ADJX=ORG adjectival phrase, named entity of the semantic class “organisation” ADJX=OTH adjectival phrase, named entity of the semantic class “other” ADVX=ORG adverbial phrase, named entity of the semantic class “organisation” ADVX=OTH adverbial phrase, named entity of the semantic class “other” DM=OTH discourse marker, named entity of the semantic class “other” FX=LOC foreign language phrase, named entity of the semantic class “location” FX=ORG foreign language phrase, named entity of the semantic class “organisation” FX=OTH foreign language phrase, named entity of the semantic class “other” FX=PER foreign language phrase, named entity of the semantic class “person” NX=GPE noun phrase, named entity of the semantic class “geopolitical entity” NX=LOC noun phrase, named entity of the semantic class “location” NX=ORG noun phrase, named entity of the semantic class “organisation” NX=OTH noun phrase, named entity of the semantic class “other” NX=PER noun phrase, named entity of the semantic class “person” PX=GPE prepositional phrase, named entity of the semantic class “geopolitical entity” PX=LOC prepositional phrase, named entity of the semantic class “location” PX=ORG prepositional phrase, named entity of the semantic class “organisation” PX=OTH prepositional phrase, named entity of the semantic class “other” PX=PER prepositional phrase, named entity of the semantic class “person” SIMPX=ORG simplex clause, named entity of the semantic class “organisation” SIMPX=OTH simplex clause, named entity of the semantic class “other” VXINF=ORG non-finite verb phrase, named entity of the semantic class “organisation” VXINF=OTH non-finite verb phrase, named entity of the semantic class “other” Edge Label -NE non-head, the part below is not part of the named entity 19 SIMPX 511 − − − − VF 510 V−MOD PX 506 − LK 507 HD HD NX 500 − An MF 508 VXFIN 501 HD HD MOD NX 502 HD ADVX 503 HD VXINF 504 HD verwundet NX 505 − HD der Oder er dann , ein Wadendurchschuß APPR ART NE VAFIN PPER ADV VVPP $, ART NN $. d dsf dsf 3sit nsm3 −− −− −− nsm nsm −− 0 1 2 wurde VC 509 OV ON 3 4 5 6 7 8 9 . 10 The leaves of the trees consist of pairs of non-terminal symbols and part-of-speech tags. Non-terminal symbols are represented by spherical nodes, whereas edge labels are depicted by rectangular nodes. The tree diagram consists of two trees, a SIMPX and an isolated phrase. In accordance with the four annotation levels shown in Table 3.3, the sentence is annotated top-down by the root node (SIMPX), the field nodes (VF, LK, MF, and VC), the phrase nodes (PX, VXFIN, NX, ADVX, and VXINF), and finally the tagged lexical entries. The edge labels between the field level and the phrase level indicate that the syntactic structure contains one unambiguous modifier (V-MOD), a subject (ON), one ambiguous modifier (MOD), a verbal object (OV), and the finite verb, which itself is the head (HD) of the entire syntactic construction. The noun phrase (ein Wadendurchschuß) is not attached to the sentence structure because otherwise the wellformedness of the construction would be violated. Thus, it has to be annotated as an isolated phrase lacking a verbal constituent. 3.4.4 Printing and Spelling Errors In contrast to spoken language data like in the Verbmobil (cf. (Stegmann et al. 2000)) which exhibit fragmentary utterances, false starts, repetitions, interruptions, and hesitation noises as its characteristic properties, data taken from newspaper corpora does not include unintentionally formed syntactic constructions. Deviations from syntactic wellformedness are either intended by the author or are caused by printing errors. While incorrect writing of words is neglected in the syntactic analysis (the respective lexical entry is marked with the correct writing of the word in a comment line below), lexical elements which do not belong to the syntactic construction (intentional or unintentional) are structured as much as possible, but are not attached to the surrounding constituents: 20 − SIMPX 511 − − − MF 510 ON VF 506 MOD LK 507 HD ADVX 500 HD VXFIN 501 HD Jetz wollen MOD OA NX 508 − − NX 502 HD ADVX 503 HD VC 509 OV HD ADJX 504 HD VXINF 505 HD Sie wieder ein solches ADV VMFIN PPER ADV ART PIDAT NN VVINF $. −− 3pis np*3 −− asn asn asn −− −− − − 0 1 2 3 4 5 System 6 aufbauen . 7 8 Jetzt SIMPX 517 SIMPX 518 − − − − − VF 515 V−MOD NF 516 FOPP PX 508 − NX 500 HD Am LK 509 HD VF 510 ON LK 511 HD MF 512 V−MOD VXFIN 501 HD NX 502 HD VXFIN 503 HD PX 504 HD HD 0 Abend 1 erklärten 2 , sie 3 4 seien 5 dabei VC 513 6 PX 514 OV HD VXINF 505 HD VXINF 506 HD geschlagen 7 worden − HD NX 507 − 8 − von 9 10 HD der Polizei 11 APPRART NN VVFIN $, PPER VAFIN PROP VVPP VAPP $( APPR ART NN dsm dsm 3pit −− np*3 3pks −− −− −− −− d dsf dsf 3.4.5 Isolated Phrases There are textual fragments in newspaper data which cannot be analysed as a SIMPX or as a constituent of a SIMPX because they are lacking a verbal constituent or they are not assigned a specific grammatical function within a well-formed sentence. These fragments are annotated as isolated phrases. The isolated elements are structured as much as possible (mostly up to the level of phrasal categories), but they are not typically connected to surrounding constituents as a whole, so that a conflict with the topological field analysis is avoided. Their root node carries a phrasal category of their lexical head (NX, PX, ADVX, etc.): ADVX 503 − − PX 500 HD Warum ADVX 501 HD 0 auch 1 HD ADVX 502 HD nicht 2 ? 3 PWAV ADV PTKNEG $. −− −− −− −− 21 12 13 PX 503 − HD PX 502 − HD ADVX 500 HD NX 501 HD Hoffentlich 0 ohne 1 Nebenwirkungen 2 . 3 ADV APPR NN $. −− a apf −− In accordance with the longest match principle, as many parts of the fragment as possible are projected to the phrase level and are included into a tree structure. It has to be decided which part of the whole construction is the head and which parts depend on this head. Phrases within a syntactic unit are not attached on a higher level if they do not show dependency relation. This is often the case with syntactic elements which are separated by a colon or a dash (cf. 5.3.2): SIMPX NX 508 − 509 − VF − HD − LK 505 NX 506 ON HD NX VXFIN 500 507 − VC 501 HD ASB 504 HD ein 1 : 2 HD ADJX 503 VPT lädt 0 NX 502 HD − HD Tag 3 der 4 offenen 5 Tür 6 7 NN VVFIN PTKVZ $. NN ART ADJA NN nsm 3sis −− −− nsm gsf gsf gsf NX 508 KONJ NX NX 500 − 501 − Arlington NX HD Road 0 502 , 3 R NX 504 HD 1999 2 NX 503 HD USA 1 NX − : 505 − Mark , 8 D 10 − : Jeff 11 507 − − Bridges 12 , 13 − Tim 14 Robbins 4 5 6 NE NE NE CARD $, NN $. NE NE $, NN $. NE NE $, NE NE nsn nsf npm −− −− nsf −− nsm nsm −− npm −− nsm nsm −− nsm nsm 22 9 NX 506 HD Pellington 7 KONJ NX 15 16 SIMPX 512 − − − VF 510 V−MOD MF 511 ON ADVX 507 NX 500 HD Berlin NX 501 HD HD − ADVX 502 HD ADVX 503 HD PX 509 − VXFIN 504 HD NX 505 HD ( taz ) − So NE $( $( ADV ADV VAFIN PIS APPRART NN $. nsn −− nsf −− −− −− −− 3sis ns* dsm dsm −− 3.4.6 2 3 4 5 6 man NX 506 HD $( 1 wird HD NE 0 also PRED LK 508 HD 7 zum 8 9 Problemfall 10 . 11 Long-Distance Dependencies Our annotation scheme facilitates a surface-oriented representation of long-distance dependencies without crossing branches and traces. If a modifying constituent is not adjacent to the modified constituent, their dependency relation, which can even go beyond the border of topological fields, is encoded by special naming conventions for edge labels. We use edge labels such as OA-MOD (referring to OA) or PRED-MOD (referring to PRED) etc. expressing the non-ambiguity of the modifier. Beyond this, we make use of secondary edge labels for ambiguity resolution. These labels just serve as additional information to the grammatical functions encoded in the edge labels. These secondary edge labels indicate underspecified long distance dependencies in the following cases: 1. If the above mentioned edge labels need further disambiguation, e.g. if there are two OAs or V-MODs below one SIMPX node (refmod). 2. If the dependency relation exists between two nodes of which at least one is phrase internal and therefore carries only head or non-head information (refint). 3. If there is a dependency relation outside of SIMPX in control verb constructions (refcontr). SIMPX 512 − − − VF 507 ON LK 508 HD V−MOD MF 509 MOD NX 500 HD VXFIN 501 HD ADVX 502 HD ADJX 503 HD Die ADJX refmod 504 HD − je . VVINF KOKOM ADV $. np* 3pis −− −− −− −− −− −− −− 5 denn HD ADJD 4 schlummern ADVX 506 ADJD 3 seliger VXINF 505 HD ADV 2 künftig V−MOD 506 VAFIN 1 dort − NF 511 MOD−MOD PDS 0 werden − VC 510 OV 23 6 7 8 − SIMPX 515 − − − MF 513 OA NF 514 MOD NX 511 SIMPX 512 HD VF 506 ON LK 507 HD NX 500 HD VXFIN 501 HD Dieser hat − − PX 508 refint − HD 512 NX 502 HD NX 503 − NX 504 HD VXINF 505 HD die Bereitschaft , Therapieangebote VAFIN NN APPR ART NN $, NN VVIZU $. nsm 3sis apf a asf asf −− apn −− −− 1 − auf VC 510 HD PDS 0 Auswirkungen HD − MF 509 OA 2 3 4 SIMPX 512 − − 5 6 7 anzunehmen 8 . 9 − NF 511 OS SIMPX 510 − VF 505 OA LK 506 HD refcontr NX 500 − MF 507 ON VXFIN 501 HD HD NX 503 − − das zu schicken VVFIN PIS ART NN PTKZU VVINF $. *** asn 3sks ns* dp* dp* −− −− −− 3.4.7 3 4 Angehörigen HD PDS 2 den VXINF 504 HD All 1 man VC 509 HD 500 PIDAT 0 versuche NX 502 HD − MF 508 OD 5 6 7 . 8 Empty Categories In general, an empty category analysis, e.g. for phrases without heads, is being avoided in the TüBa-D/Z treebank. Empty Edge Labels Specifiers, prepositions,6 complementizers, discourse markers, KOORD and PARORD constituents, conjunctions, and unambiguous modifiers (that are attached to phrases immediately rather than to topological fields ) are not labelled with grammatical functions. Furthermore, the edges below the SIMPX node are empty. They are not labelled in order to speed up annotation where the information is unnecessary or self-evident. Furthermore, empty edge labels are used in elliptical phrases, e.g. noun phrases only consisting of an article and an attributive adjective (cf. 6.5). 6 In order to facilitate the identification of dependencies between verbs and their nominal complements and adjuncts and in keeping with basic assumptions in Dependency Grammar, the annotated head of a prepositional phrase is the NX (or complement) rather than the preposition itself. Therefore, prepositions carry no edge label. 24 Chapter 4 The Annotation of the Internal Structure of Phrases 4.1 Premodification and Postmodification in Phrases The annotation of phrases is also carried out following the flat clustering principle in order to keep the number of hierarchy levels in a syntactic structure as small as possible. As will be shown in the following sections, phrases may include adjectival or nominal premodifiers and/or postmodifiers of any syntactic category. Both kinds of modifiers are in principle projected to their phrase levels. Since the modification scope of premodifiers is unambiguous, they are directly attached to the head of the phrase which they modify. By contrast, postmodifiers are always attached on a higher level to preserve ambiguity. This decision, referred to in 3.3 as the high attachment principle, was made to avoid the problematic distinction whether a postmodifier is a free adjunct or a complement of the modified phrase. The attachment strategy for premodifiers and postmodifiers is applied for all categories of phrases. 4.2 Noun Phrases A simple noun phrase (NX) consists of a head noun (noun, proper noun, or a pronoun), (optionally) a determiner and (optionally) an adjectival or a nominal premodifier of any complexity preceding the head noun. A complex noun phrase is a simple noun phrase with a postmodifier of any syntactic category and complexity. 4.2.1 Noun Phrases without Modifiers Simple noun phrases without modifiers are single nouns, proper nouns, pronouns or proper nouns consisting of more than one NE. All of them are directly projected to their phrase level. While single nouns, proper nouns and pronouns carry the edge label HD, the NEtagged tokens of a complex proper noun are attached on the same level without head information: 25 NX 500 HD Spendengeld 0 NN asf NX 500 HD Hamburg 0 NE dsn NX 500 − − Ute Wedemeier 0 1 NE NE nsf nsf If proper nouns include other parts of speech than NEs, these parts are tagged according to their distribution. Therefore, proper nouns with a preposition include a prepositional phrase. NX 502 − − PX 501 − HD NX 500 HD Ole 0 von 1 Beust NE APPR NE nsm d dsm 4.2.2 2 Prenominal Modification In a simple noun phrase, both the determiner and the head noun are directly attached on the same level to NX so that the label of the head noun carries the edge label HD and the edge label of the determiner is empty. 26 NX 500 − HD die Auseinandersetzung 0 1 ART NN nsf nsf NX 500 − HD jede Spur 0 1 PIDAT NN nsf nsf Since prenominal modifiers are directly attached to the head noun on the same level, their edge labels are empty (whereas the edge labels of modifiers that are attached to topological fields are non-empty (cf. 8.4)). Prenominal modifiers are either attributive adjectives or preceding genitive phrases: NX 501 − − HD ADJX 500 HD ein externer 0 Wirtschaftsprüfer 1 2 ART ADJA NN nsm nsm nsm NX 501 − − HD ADJX 500 HD − die zu verhandelnden ART PTKZU ADJA NN npf −− npf npf 0 1 2 Taten 3 27 NX 501 − HD NX 500 HD Bremens Gesundheitssenatorin 0 1 NE NN gsn nsf If there is a PIDAT preceding the article it is directly attached to the noun phrase. NX 501 − − − HD ADJX 500 HD all die historischen PIDAT ART ADJA NN *** apm apm apm 0 1 2 Fehler 3 If a PIDAT is following the article in adjective position it is projected to its phrase level (ADJX) with possible premodifiers and then directly attached like an attributive adjective to the noun phrase. NX 501 − − HD ADJX 500 HD Die meisten 0 Benutzer 1 2 ART PIDAT NN npm npm npm 28 NX 504 − − HD ADJX 503 − − HD PX 502 − HD NX 500 ADVX 501 HD die in 0 HD Deutschland 1 ohnehin 2 wenigen 3 Gen−Food−Produzenten 4 5 ART APPR NE ADV PIDAT NN npm d dsn −− npm npm If there is more than one prenominal modifier, the one on the left hand side of the noun is modifying the following noun, the one on the left hand side of the modifier is modifying both, the modifier and the noun, and so on. All of these modifiers are attached to the head noun on the same level which yields a rather flat noun phrase structure. This strategy is justified by the fact that these modifiers have a scope of modification beyond the adjectival phrase, e.g. as in coordinated noun phrases like insgesamt 12.000 Studienplätze und 15.000 Lehrstellen, the adverb insgesamt modifies 12.000 Studienplätze as well as 15.000 Lehrstellen. NX 502 − − ADJX 500 HD ADJX 501 HD HD lieber knieartiger 0 Leser 1 2 ADJA ADJA NN nsm nsm nsm In case of complex head nouns, e.g. complex (proper) nouns consisting of two nominal parts or coordinated head nouns (cf. 6.4.5), first the complex noun respectively the coordination (cf. 6.4) is annotated with its own internal dependency structure. Afterwards, the determiner and possible premodifying adjectival phrases are attached on a higher level. 29 NX 502 − HD NX 501 − HD NX 500 − Die " 0 − debis 1 Systemhaus 2 GmbH 3 " 4 5 ART $( NE NE NN $( nsf −− nsf nsn nsf −− NX 502 − HD NX 501 − HD NX 500 HD der Heinrich 0 Böll−Stiftung 1 2 ART NE NN gsf gsm gsf NX 504 − HD NX 503 HD − PX 502 − HD NX 500 HD NX 501 HD " Solidarität $( NN APPR −− nsf d 0 1 mit 2 Miloevic " − Parolen NE $( $( NN dsm −− −− dpf 3 4 5 Milosevic 30 6 NX 503 − HD NX 502 KONJ ihren − KONJ NX 500 NX 501 HD HD Sänger 0 und 1 Gründer 2 3 PPOSAT NN KON NN asm asm −− asm 4.2.3 Postnominal Modification Whereas prenominal modifiers are always directly attached to the head noun on the same level, postnominal modifiers are attached to the head noun on a higher level. Postnominal modifiers are also always first projected to the phrase level before they are attached to the head noun on a higher level. Phrase internal postmodifiers can be of any phrasal category. The following tree structures show a prepositional phrase (PX) and a genitive phrase (NX) as postmodifiers. See section 6.3, page 92 for the analysis of relative clauses. NX 503 HD − PX 502 − HD NX 500 NX 501 HD HD Glück im 0 Netz 1 2 NN APPRART NN nsn dsn dsn NX 503 HD − NX 502 − − NX 500 − ADJX 501 HD Die HD HD Mitteilung 0 des 1 Bremer 2 Senats 3 4 ART NN ART ADJA NN asf asf gsm *** gsm In case a noun has more than one postmodifier, these modifiers usually show a hierarchical structure, for example, the first modifier modifies the head noun, the second 31 modifier modifies the complete preceding noun phrase structure, and so on. NX 506 HD − NX 505 HD − NX 503 − PX 504 − HD − ADJX 500 HD die guten 0 Beziehungen NX 501 NX 502 HD HD Bonns 1 HD 2 zu 3 Moskau 4 5 ART ADJA NN NE APPR NE apf apf apf gsn d dsn Attributes of degree and quantity nouns are also defined as postnominal modifiers: NX 502 HD − NX 500 NX 501 − HD eine HD Kiste 0 Sprengstoff 1 2 ART NN NN asf asf asm Cardinal numbers either appear as quantity nouns or premodifying adjectival attributes, e.g. the cardinal number 1,000,000 can also be expressed by the quantity noun eine Million. Therefore, we have to distinguish the following two ways of annotation: SIMPX 510 − − VF 509 ON NX 508 HD − PX 507 − HD NX 506 HD − NX 504 − NX 500 − Der ADJX 501 HD HD 0 Etat LK 505 HD HD 1 von 2 3,5 3 NX 502 HD Millionen 4 Mark 5 VXFIN 503 HD steht 6 . 7 ART NN APPR CARD NN NN VVFIN $. nsm nsm d −− dpf dpf 3sis −− 32 SIMPX 517 − − − VF 516 OS SIMPX 515 − − − VF 513 ON MF 514 MOD NX 508 HD − NX 500 − NX 501 HD HD NX 510 − VXFIN 502 HD NX 506 HD NX 507 HD Das 5 Mark " , empört NN NN VVFIN ADV CARD NN $( $, VVFIN PPER PRF $. −− nsn nsn nsn 3sit −− −− apf −− −− 3sis nsf3 as*3 −− 3 zuletzt VXFIN 505 HD OA ART 2 kostete ADJX 504 HD MF 512 ON " 1 Weißbrot ADVX 503 HD LK 511 HD HD $( 0 Kilo OA LK 509 HD 4 5 6 7 8 9 10 sie 11 sich 12 . 13 For nominal postmodifiers apart from genitive phrases the same attachment rule is applied. This kind of postmodifiers which may also appear in brackets, e.g. Heinz Schleußer (SPD), is semantically closely related to the preceding head noun phrase. die Arbeiterwohlfahrt Bremen, for instance, means die Arbeiterwohlfahrt which is located in Bremen, but does not mean die Arbeiterwohlfahrt which is called Bremen. Hence, these postmodifiers have to be distinguished from appositions (cf. 4.2.4) and complex named entities (cf. 4.2.6). NX 502 HD − NX NX 500 501 − − Heinz HD Schleußer 0 ( 1 SPD 2 ) 3 4 NE NE $( NE $( nsm nsm −− nsf −− NX 502 HD − NX NX 500 − 501 HD die HD Arbeiterwohlfahrt 0 Bremen 1 2 ART NN NE nsf nsf nsn 33 NX 502 HD − NX 500 HD NX 501 HD Zentralkrankenhaus 0 Ost NN NN dsn dsn 1 NX 502 HD − NX 500 HD Kapitel NX 501 HD 0 VII 1 NN CARD dsn −− NX 502 HD − NX 500 − des NX 501 HD HD 0 ICE 1 884 2 ART NN CARD gsm gsm −− 4.2.4 Appositional Constructions An apposition is a specific kind of attribute to a noun, which normally agrees in case with this noun and does not change its overall meaning. There is no consensus among grammarians of what is exactly meant by the notion apposition (cf. (Eisenberg 1999 2001)). Eisenberg (1999 2001), for instance, claims that, e.g. Ute Wedemeier die Landesvorsitzende and die Landesvorsitzende Ute Wedemeier are both appositions but it is not clear which part is the apposition and which part is the head noun. The Duden Grammar (1995) distinguishes between loosely constructed appositions (lockere Apposition) (e.g. Ute Wedemeier, die Landesvorsitzende,), which follow the head noun separated by a comma, and tightly constructed appositions (enge Apposition) (e.g. (die) Landesvorsitzende Ute Wedemeier), which precede the head noun (cf. (Drosdowski 1995)). According to Helbig/Buscha (1998) there is case agreement between loosely constructed appositions and head nouns which are separated by a punctuation mark. By contrast, Engel (1996) thinks that only loosely constructed appositions can be regarded as appositions. He treats tightly constructed appositions as nomen varians or nomen invarians. Because of these different definitions of the notion of apposition, we do not decide on what is the head noun and what is the apposition. We assume referential identity between the two parts. Loosely constructed appositions as well as tightly constructed appositions are treated as appositional constructions, i.e., the head noun and its apposition 34 form a complex structure which does not give any information about head assignment. Therefore, both parts are first projected to their phrase level and then coordinated on a higher level, each of them labelled as apposition (APP), i.e. as a part of an appositional structure. What is important is the referential identity in meaning. Thus, Nummer 1 is an appositional construction, whereas Seite 1 is a noun phrase with the postmodifier 1. Forms of address for persons and titles, e.g. Herr, Frau, Doktor (Dr.), Professor (Prof.), Bundeskanzler, are also treated as appositional constructions. Here are some examples: NX 505 APP APP NX 503 NX 504 HD − NX 500 − − ADVX 501 HD ADJX 502 HD Donnerstag HD HD morgen 0 , 1 den 2 13. 3 Mai 4 5 NN ADV $, ART ADJA NN asm −− −− asm asm asm NX 502 APP APP NX 500 NX 501 HD HD Herr Taake 0 1 NN NE nsm nsm NX 502 APP APP NX NX 500 501 HD − Landesvorsitzende − Ute 0 Wedemeier 1 2 NN NE NE nsf nsf nsf 35 NX 505 APP APP NX 504 HD − NX 503 − NX HD ADJX 500 − − Volker NX 501 502 HD Tegeler 0 , 1 − stellvertretender 2 Geschäftsführer 3 HD des 4 Landesverbandes 5 6 NE NE $, ADJA NN ART NN nsm nsm −− nsm nsm gsm gsm NX 502 APP APP NX 500 − NX 501 HD die HD Stadt 0 Frankfurt 1 2 ART NN NE nsf nsf nsn NX 504 APP APP NX 503 APP APP NX 500 NX 501 NX 502 HD HD HD Vorwurf Nummer 0 1 1 2 NN NN CARD nsm nsf −− NX 502 APP APP NX 500 NX 501 HD HD Telefon 472711 0 1 NN CARD dsn −− In case of a form of address combined with one or more titles preceding a name, we annotate an embedded appositional construction: 36 NX 505 APP APP NX NX 503 − 504 − HD ADJX APP APP NX NX 500 501 HD Die 502 HD Dortmunder Psychologin 0 − Prof. 1 2 − Alexa 3 Franke 4 5 ART ADJA NN NN NE NE nsf *** nsf nsf nsf nsf The same way, we treat proper nouns which are identical to a preceding proper noun, for example, an actor’s name and role: NX 502 APP APP NX NX 500 − 501 − Andrea − Spatzek / 0 1 − Gabi 2 Zenker 3 4 NE NE $( NE NE dsf dsf −− dsf dsf Premodification of the whole appositional construction is attached to an additional NX level. NX 504 − HD NX 503 ADVX 500 APP APP NX NX 501 HD 502 HD Auch − Bundesumweltminister 0 − Jürgen 1 Trittin 2 3 ADV NN NE NE −− nsm nsm nsm There are some examples in which the appositional construction does not agree in case. These are postnominal titles of books, films, etc. and translations interspersed in the sentence. In the latter type, we extend the appositional construction also to nonnominal phrases. 37 PX 503 − HD NX 502 APP APP NX NX 500 501 − in HD dem 0 − Film 1 " 2 HD Das 3 Verhör 4 " 5 6 APPR ART NN $( ART NN $( d dsm dsm −− nsn nsn −− SIMPX 510 − − − MF 509 OA OPP PX 508 APP NCX 505 − − C 500 − um APP PX 506 HD − VC 507 HD HD ADJX 501 HD NCX 502 HD − die wildernden ( " outlaw " ) zu stellen ADJA NN APPR NN $( $( FM $( $( PTKZU VVINF −− apm apm apm g gsn −− −− −− −− −− −− −− 4.2.5 2 3 4 Gesetzes HD ART 1 außer VXINF 504 KOUI 0 Hunde FX 503 HD 5 6 7 8 9 10 11 12 Foreign Language Material Words or parts of a text written in a foreign language except foreign language proper nouns are tagged as foreign language material (FM), e.g. hello (FM), no (FM) longer (FM) amused (FM). All parts of foreign language proper nouns are tagged as NE (e.g. Mary (NE) , New (NE) York (NE), University (NE) of (NE) Illinois (NE)). Single foreign words are projected to a syntactic level assigned the node label FX, which is an universal label for any syntactic category (phrasal and sentential) in the respective foreign language. More complex parts of a text tagged as FM are attached on the same level without any internal syntactic structure and head assignment. Their mother node is also assigned the label FM, e.g. no longer amused. For foreign language constructions containing a foreign language proper noun, the annotation strategy is the following: in a first step, all NEs are projected to the phrase level (NX), in a second step, these phrase node labels together with all FMs are projected to the next higher level with the node label FX. Again, there is no head assignment directly below the FX node, e.g. Mister Gere himself. 38 DM 500 − hello 0 FM −− NX 501 − − FX 500 − − das no 0 HD longer 1 − 2 amused 3 Kollegium ART FM FM FM NN nsn −− −− −− nsn 4 NX 500 − − University − of 0 Illinois 1 2 NE NE NE dsf dsf dsn NX 502 − HD FX 501 − − − NX 500 HD wie Mr. 0 Gere 1 himself 2 3 KOKOM FM NE FM −− −− nsm −− Often, foreign language material is a part of a German syntactic construction and plays the role of a grammatical function. Therefore, the FX node is attached as a constituent to the tree structure. If it is directly attached to a field or a sentence bracket, the edge label above the FX node denotes its grammatical function within the clause, e.g. Kafka goes Kleinkunst (head of the clause). 39 SIMPX 506 − − − VF 503 LK 504 ON HD NX 500 FX 501 NX 502 HD HD HD Kafka MF 505 V−MOD goes Kleinkunst 0 1 2 NE FM NN nsm −− asf If a single FM is head of a phrase which can be identified as a German phrase, e.g. by an article and/or an adjective (noun phrase), it is projected to the specific phrasal category, e.g. NX instead of FX in a construction like in der Creme de la Kunst, die nordamerikanischen Brothers. PX 503 − HD NX 502 − HD FX 501 − − − − NX 500 HD in der 0 Creme 1 de 2 la 3 Kunst 4 5 APPR ART FM FM FM NN d dsf −− −− −− dsf NX 501 − − HD ADJX 500 HD ihrer nordamerikanischen 0 Brothers 1 2 PPOSAT ADJA FM gp* gp* −− If FX is modified by a postmodifier the mother node of the complex phrase is also FX, which again may be preceded by another phrase, e.g. Unter der Überschrift ’user als looser’.. 40 PX 505 − HD NX 504 APP APP FX 503 HD NX 500 − Unter − FX 501 HD HD NX 502 − der Überschrift ’ user APPR ART NN $( FM KOKOM FM $( d dsf dsf −− −− −− −− −− 0 4.2.6 1 2 3 4 als HD 5 looser 6 ’ 7 Named Entity Annotation Proper nouns denote individual living beings, objects, titles, etc. which exist only once as entities with their own specific properties. The distinction between proper nouns and nouns is not always clear-cut. On the one hand, proper nouns can also become nouns, e.g. Opel as the company is a proper noun POS-tagged as Opel (NE), on the other hand, Opel as the car is a noun POS-tagged as Opel (NN). In addition to the categories of proper nouns listed in the STTS guidelines (e.g. first and last names of persons, names of companies, and geographical names), we also define names of streets and places (e.g. Feldstraße), individual names of institutions (e.g. MaxPlanck-Institut, Pergamonmuseum), events (e.g. Märzrevolution), and titles of books, movies, etc. as specific categories of proper nouns . In contrast to the categories defined in the STTS, our additional categories are not POS-tagged as ‘NE’, e.g. Alexanderplatz (NN), heute (ADV). Complex proper nouns forming a syntagma as well as titles, names of historical events, institutions, and so on, are POS-tagged according to their distribution (e.g. der (ART) Potsdamer (ADJA) Platz (NN), Auf (APPR) die (ART) stürmische (ADJA) Art (NN), Schlaflos (ADJD) in (APPR) Seattle (NE)). On the syntactic level, we define all kinds of proper nouns as named entities. In our enhanced version of the TüBa-DZ treebank named entity information is defined by semantic classes: each named entity is assigned one of the following five semantic classes (cf. 3.4.2): 1. person (PER) 2. organisation (ORG) 3. location (LOC) 4. geopolitical entity (GPE) 5. other (OTH) 41 Table 4.1 lists the commonly occurring semantic subclasses for named entities in the TüBa-D/Z with examples: Table 4.1: Semantic Classes and Subclasses for Named Entities Semantic Common Semantic Subclasses Classes PER persons surnames names of animals (personified) ORG organisations companies institutes museums newspapers, journals clubs theaters, cinemas universities TV and radio stations restaurants, hotels forces fashion labels sporting events bands LOC districts sights, churches planets geographical areas streets, places mountains, lakes continents GPE countries, states (incl. historical) cities (incl. historical) OTH operating systems titles of books, movies, etc. mottoes, slogans wars Examples Hans Winkler (Familie) Feuerstein (Schweinchen) Babe Nato, EU Microsoft, Bertelsmann, Institut für chinesische Medizin Pergamonmuseum Süddeutsche Zeitung, Der Spiegel VfB Stuttgart Metropol-Theater, CinemaxX Freie Universität Arte, Radio Bremen Sassella, Adlon Blauhelme Chanel Olympische Spiele, Wimbledon Beatles, Die Fantastischen Vier Schöneberg Brandenburger Tor, Johanniskirche Mars Königsheide Sögestraße, Alexanderplatz Alpen, Viktoriasee Europa, Asien Frankreich, Hessen, Assyrien Berlin, Babylon DOS Faust, Schlaflos in Seattle Zwischen Himmel und Erde Zweiter Weltkrieg In order to annotate the semantic classes, syntactic-semantic node labels of the pattern syntactic category = semantic class are defined for the mother node of named entities (see Table 3.9). The syntactic-semantic nodes indicate that the structure below represents a (complex) named entity of a certain syntactic category belonging to one of the five semantic classes (cf. 3.4.2). The former node label ’EN-ADD’ and the secondary edge label ’EN’ are deleted. 42 Our annotation strategy for named entities is shown in the following tree examples: Named entities may consist of one or more lexical elements tagged as NE. In case of a single NE, this NE is projected to its phrase level, carrying the respective syntacticsemantic node label (Gütersloh (NX=GPE)). Named entities consisting of two or more NEs are attached on the same level. None of them carries a head label in order to indicate that there is no obvious dependency relation between them (St. Pauli (NX=LOC)). There can occur postmodifiers within a named entity which are a named entity themselves like St. Pauli (NX=LOC) in [[FC (NX)] [St. Pauli (NX=LOC)]] (NX=ORG). Parts which do not belong to a named entity are marked with the edge label -NE as in [[den (-NE) FC (NX)] [Güthersloh (NX=GPE)]] (NX=ORG). Named entities which are not tagged as NE, e.g. Millerntor (NX=LOC) are also assigned a syntactic-semantic node label. SIMPX 517 − − − MF 515 V−MOD V−MOD VF PX 514 512 ON − NX=ORG − NX NX=LOC 500 VXFIN − St. NX 502 HD Pauli 1 503 −NE 1:0 3 gegen 4 − NX NX=GPE − HD NX=LOC 505 HD den 5 513 HD 504 HD siegt 2 PX 509 HD 501 − 0 NX=ORG 508 HD FC HD LK 511 HD V−MOD FC 6 506 HD HD Gütersloh 7 am 8 Millerntor 9 . 10 11 NN NE NE VVFIN CARD APPR ART NN NE APPRART NN $. nsm nsm nsm 3sis −− a asm asm asn dsn dsn −− As mentioned above, all elements of German named entities which consist of a complex syntactic structure, e.g. a phrase or a sentence, are always tagged according to their distribution and annotated with their internal syntactic structure as noun phrases, prepositional phrases, adjectival phrases, clauses, etc., e.g. the movie title (Schlaflos (ADJD) in (APPR) Seattle (NE)) (ADJX=OTH) in the following tree example. If two named entity nodes are coordinated like Tom Hanks (NX=PER) and Meg Ryan (NX=PER) their mother node is NX which represents the nominal status of the named entity. 43 SIMPX 515 − − − VF 514 V−MOD PX 513 − HD ADJX=OTH MF 511 512 HD − ON PX LK 507 508 − ADJX " KONJ PX=GPE VXFIN NX=PER NX=PER 503 504 501 502 HD Schlaflos in 3 510 HD HD Seit NX 509 HD 500 " PRED NX HD Seattle 4 " 5 − gelten 6 KONJ − Tom 7 − − Hanks 8 und 9 HD − NX NX 505 − Meg 10 − 506 HD Ryan 11 als 12 − Dream−Team 13 HD des 14 Biedersinns 15 . 0 1 2 $( APPR $( ADJD APPR NE $( VVFIN NE NE KON NE NE KOKOM NN ART NN 16 $. −− d −− −− d dsn −− 3pis nsm nsm −− nsf nsf −− nsn gsm gsm −− 17 If the original form of a named entity (e.g. Zweiter Weltkrieg) is inflected and/or premodified by an article and/or attributive adjective like in the following example tree (dem Zweiten Weltkrieg (NX=OTH)), the mother node of the named entity carries the semantic class information and all parts that do not belong to the named entity are assigned the edge label -NE. SIMPX 513 − − − − MF 512 MOD V−MOD PX 511 − DM VF 506 LK 507 − VXFIN 500 HD NX VXFIN HD Stimmt , 0 1 NX=OTH 508 ON 501 HD HD −NE OV VXINF 504 HD 505 HD erst 3 HD ADJX 503 wurde 2 510 − ADVX 502 HD sie VC 509 nach 4 dem 5 HD Zweiten 6 Weltkrieg 7 gebaut 8 . 9 10 VVFIN $, PPER VAFIN ADV APPR ART ADJA NN VVPP $. 3sis −− nsf3 3sit −− d dsm dsm dsm −− −− If a named entity of a syntactic category other than NX has a premodifier, or a postmodifier, (both can be a named entity itself) the mother node of the whole constituent is always NX which represents the nominal status of the named entity: 44 SIMPX 512 − − − VF MF 514 511 ON V−MOD NX PX 513 − 510 − HD − PX=OTH − ADJX 500 501 − − Oliver Bukowskis 0 derbes " 2 Bis 3 508 HD HD − NX=GPE VXFIN ADJX 503 HD 1 NX 507 502 HD HD LK 506 NX=PER OA " 5 NX 504 HD Denver 4 HD 505 HD feierte 6 im 7 HD Altonaer 8 Theater 9 Premiere 10 11 NE NE ADJA $( APPR NE $( VVFIN APPRART ADJA NN NN gsm gsm nsn −− a asn −− 3sit dsn *** dsn asf NX 508 HD − SIMPX=OT 507 − − LK MF 504 506 HD ON PRED VXFIN NX NX=PER 500 HD " PX 505 501 HD Sind − HD NX=PER 502 503 HD Sie − Luigi ? 3 " von − Stephan Brüggenthies 0 1 2 4 5 6 $( VAFIN PPER NE $. $( APPR NE 7 NE 8 −− 3pis np*3 nsm −− −− d dsm dsm If a named entity is a syntagma with its own internal syntactic structure, i.e. it does not agree with the inflection of another constituent of the sentence, all premodifiers are attached on a higher level: 45 SIMPX 511 − − − MF 510 ON VF NX 507 509 OPP − PX HD LK 504 NX=OTH 505 − HD HD NX VXFIN 500 NX=PER 501 HD Im 506 − Radio 1 503 HD läuft 0 ADJX 502 HD HD Vivaldis 2 HD " 3 Vier Jahreszeiten 4 5 " 6 . 7 8 APPRART NN VVFIN NE $( CARD NN $( $. dsn dsn 3sis gsm −− −− npf −− −− SIMPX 512 − − − MF 511 ON NX 510 HD − PX 509 − HD NX 508 − VF 505 507 HD NX − VXFIN 500 NX 501 HD Den PX=OTH 506 OA − HD LK Auftakt 0 − macht 1 NX=PER 502 HD NX 503 HD eine 2 HD − Aufführung 3 von 4 Ernst 5 504 − − Jandls 6 Aus 7 HD der 8 Fremde 9 . 10 11 ART NN VVFIN ART NN APPR NE NE APPR ART NN $. asm asm 3sis nsf nsf d gsm gsm d dsf dsf −− If a named entity has a postmodifier that is part of the named entity itself, all premodifiers are also attached on a higher level: 46 NX 507 HD − NX 506 − − HD NX NX=OTH 504 − 505 − HD ADJX ADJX 500 501 HD Das 167seitige Aktionsprogramm 1 der 2 NX NX 503 HD Bremer 3 − 502 HD 0 HD HD Agenda 4 21 5 6 ART ADJA NN ART ADJA NN CARD nsn nsn nsn gsf *** gsf −− Foreign Language Named Entities The syntactic annotation of foreign language named entities differs from the annotation of German named entities in the following aspects. According to the STTS guidelines, foreign language proper nouns are tagged as NE, while all other lexical elements of a foreign language are tagged as foreign language material (FM). A foreign language named entity which consists of a proper noun, e.g. the title of a movie is assigned a syntacticsemantic node label of the category NX (Forrest Gump (NX=OTH)). NX=OTH 500 − − Forrest Gump 0 NE NE nsm nsm 1 If a foreign language named entity consists of only FM tagged tokens, these tokens are directly attached on the same level without internal syntactic structure. The mother node of the phrase is marked as a syntactic-semantic node label of the category FX, e.g. Knockin’ on Heaven’s Door (FX=OTH). FX=OTH 500 − Knockin’ − 0 on − 1 Heaven’s − 2 Door FM FM FM FM −− −− −− −− 3 If a foreign language named entity consists of NE as well as FM tagged tokens, e.g. Shakespeare (NE) in (FM) Love (FM), NE is projected to NX=PER. The NX=PER node and all FM tagged tokens are attached directly on the same level. 47 FX=OTH 501 − − − NX=PER 500 HD Shakespeare in Love NE FM FM nsm −− −− 4.2.7 0 1 2 Ordinal Numbers According to their distribution, ordinal numbers occur either as a premodifying attributive adjective (e.g. die dritte (ADJA) Partie) or as a head noun (e.g. er ist der sechste (NN)). In the first case, the premodifier is projected to an adjectival phrase, in the latter case it is projected to a noun phrase. NX 501 − − HD ADJX 500 HD die dritte 0 Partie 1 2 ART ADJA NN nsf nsf nsf SIMPX 510 − − KOORD 500 − VF 507 LK 508 ON HD MOD MOD MOD VXFIN 502 ADVX 503 ADVX 504 ADVX 505 HD HD HD HD ja auch NX 501 − HD Aber − das 0 ist 1 2 MF 509 3 PRED NX 506 − schon 4 HD die 5 Vierte 6 . 7 8 KON PDS VAFIN ADV ADV ADV ART NN $. −− nsn 3sis −− −− −− nsf nsf −− 4.2.8 Cardinal Numbers According to their syntactic function (nominal or adjectival), cardinal numbers (CARD), are either projected to NX or ADJX. If their numerals are written separately or in groups, e.g. numbers of bank accounts, they are attached on the same level like proper names without internal head assignment. 48 NX 502 APP APP NX 500 NX 501 HD HD Jahr 2000 0 1 NN CARD dsn −− PX 502 − HD NX 501 − − HD ADJX 500 HD in allen 0 23 1 Bezirken 2 3 APPR PIDAT CARD NN d dpm −− dpm NX 502 APP APP NX 500 NX 501 HD − BLZ − 500 0 − 901 00 1 2 NN CARD CARD CARD 3 nsf −− −− −− A premodifying cardinal number is nominal if it does not express a quantity like in the example above, but a characteristic of the following noun, e.g. the number of a zip code: NX 501 − HD NX 500 HD 13187 Berlin 0 1 CARD NE −− nsn Complex time expressions or results of competitions are also treated as cardinal numbers: 49 NX 501 − HD ADJX 500 HD 20.15 Uhr 0 1 CARD NN −− nsf PX 501 − HD NX 500 HD mit 3:0 0 1 APPR CARD d −− 4.2.9 Letters and Non-Words Letters and non-words are tagged as XY. They are projected to their phrase level and assigned the syntactic category to which they belong in the construction. Signs which represent a lexical element, e.g. the sign for paragraph, are tagged with the respective part-of-speech tag: NX 501 − HD NX 500 HD D−76351 Linkenheim 0 1 XY NE −− dsn NX 506 HD − NX 505 HD − NX 504 HD − NX 500 NX 501 NX 502 NX 503 HD HD HD − § 220 a des HD Strafgesetzbuches 0 1 2 3 NN CARD XY ART NN 4 nsm −− −− gsn gsn 50 4.2.10 Expletive and Other Uses of es The pronominal form es functions as expletive element in German. Three different expletive usages are traditionally distinguished: formal subject or object, correlate of an extraposed clausal argument, and Vorfeld-es (cf. (Eisenberg 1999 2001), (Pütz 1986)). For sake of completeness, the following list begins with an example of es as a referential personal pronoun. Personal Pronoun The pronoun functions as an argument of the verb and refers to some person, object, or event that is salient in the context. It can be tested, whether es is used as a pronoun by replacing it by another noun or pronoun (such as das or er/ihn). In the example tree es refers to the neuter noun Gästehaus in the preceding sentence: Die italienische Regierung hat die Familie im staatlichen Gästehaus Casino dell’Algardi untergebracht. SIMPX 509 − − − − MF 508 FOPP VF 504 LK 505 ON HD NX 500 HD PX 506 − HD VXFIN 501 VXINF 503 HD wird 0 OV NX 502 HD Es VC 507 von HD Scharfschützen bewacht 3 . 1 2 4 PPER VAFIN APPR NN VVPP $. 5 nsn3 3sis d dpm −− −− Formal Subject or Object The formal subject obligatorily occurs with weather verbs, e.g. Es regnet and impersonal or agentless constructions such as Es gibt so eine Buchung or Es geht um populäre Unterhaltung. Some verbs optionally permit an expletive subject but also occur with referential subjects such as Max/Es klopft an der Tür. A formal object is found in constructions like jmd. legt es an auf etw. or jmd. verdirbt es mit jmdm. In all examples mentioned, es functions as a grammatical argument without semantic contribution, i.e. it does not refer to a person, object, or event. In TüBa-D/Z formal subjects and objects are treated like referential pronouns and are labelled alike, e.g. with edge labels ON or OA. Formal arguments are obligatory and may occur in the Mittelfeld. In case of doubt, it is a good test to paraphrase the sentence such that another element occupies the Vorfeld, e.g. Natürlich gibt es so eine Buchung versus *Natürlich gibt so eine Buchung. 51 SIMPX 507 − − − MF 506 OA VF 503 LK 504 ON HD − VXFIN 501 ADVX 502 HD HD NX 500 HD " Es 0 gibt 1 NX 505 − so HD eine Buchung 4 . 5 " 2 3 6 7 $( PPER VVFIN ADV ART NN $. $( −− nsn3 3sis −− asf asf −− −− Correlate of an Extraposed Clausal Argument If a clausal argument is extraposed in the Nachfeld, it is optionally doubled by an expletive es in the Vorfeld or Mittelfeld. The expletive is labelled ON-MOD or OS-MOD depending on the function of the clausal argument. − − SIMPX 521 − − − NF 520 ON SIMPX 519 − − NF 518 OS SIMPX 517 − VF 510 ON−MOD KOORD 500 − Aber NX 501 HD − LK 511 HD MF 512 PRED VC 513 HD VF 514 V−MOD LK 515 HD VXFIN 502 HD ADJX 503 HD VXINF 504 PX 505 HD VXFIN 506 HD HD − ON NX 507 − ADVX 508 HD HD NX 509 − es ist übertrieben zu sagen , damit die FU VAFIN ADJD PTKZU VVINF $, PROP VVFIN ART NE ADV ART NN $. −− nsn3 3sis −− −− −− −− −− 3skt nsf nsf −− asf asf −− 2 3 4 5 6 7 8 9 10 11 eine HD PPER 1 erst OA KON 0 bekäme − MF 516 MOD 12 Identität 13 Vorfeld-es The last type is a purely structural dummy element. It occurs in Vorfeld position only and is not correlated with any argument of the clause. It does not agree with the verb which becomes evident if there is a plural subject in the Mittelfeld, which is illustrated in the example tree below. It is ungrammatical in the Mittelfeld, e.g. *. . . dass es ihn die Völker zahlen. Vorfeld-es is labelled ES to indicate its purely structural function. In the first release of TüBa-D/Z, 12/2003, Vorfeld-es was integrated by means of ON-MOD. 52 . 14 SIMPX 516 − − − − NF 515 ON−MOD R−SIMPX 514 − VF 508 ES LK 509 MF 510 HD NX 500 HD VXFIN 501 HD es ON ON MOD NX 502 NX 503 NX 504 ADJX 505 zahlen − ihn 1 HD die 2 − Völker 3 , 4 HD deren − MF 512 OA HD 0 − C 511 VC 513 OV HD VXINF 506 VXFIN 507 HD HD HD Menschenrechte angeblich 7 verteidigt 8 werden 5 6 PPER VVFIN PPER ART NN $, PRELAT NN ADJD VVPP 9 VAFIN 10 **** 3pis asm3 npn npn −− gp* npn −− −− 3pis 11 Table 4.2 summarizes tests and labels for the different uses of es. Table 4.2: Types of es type test substitutable by other pronouns optional referential pronoun yes formal argument no correlate no Vorfeld-es no no no yes no correlates with clausal argument ungrammatical in Mittelfeld edge label no no yes no no no no yes ON, OA, etc. ON, OA ON-MOD, OS-MOD ES Es sei denn The lexicalized phrase es sei denn, meaning außer, is analysed as a copula construction. 53 12 SIMPX 523 KONJ KONJ SIMPX 522 − − − − NF 521 PRED SIMPX 519 − − VF 510 − LK 511 ES NX 500 HD " SIMPX 520 Es HD V−MOD VXFIN 501 ADVX 502 HD HD geschieht 0 1 − MF 512 hier ON ON NX 503 NX 504 HD HD nichts 3 , 4 es 5 LK 514 − VF 516 HD MOD ON VXFIN 505 ADVX 506 NX 507 HD HD sei 6 MF 515 HD denn 7 , 8 − LK 517 MF 518 HD OA VXFIN 508 HD ich NX 509 HD tu es 11 . 12 13 " 9 10 $( PPER VVFIN ADV PIS $, PPER VAFIN ADV $, PPER VVFIN PPER $. $( −− **** 3sis −− *** −− nsn3 3sks −− −− ns*1 1sis asn3 −− −− 4.3 2 VF 513 Determiner Phrases Certain pronouns serving as determiners in noun phrases may be premodified, for instance, by degree adverbs such as in so viele Ältere, gar kein Schutz, etc. In the case of so viele Ältere, the premodifying adverb so is attached to the indefinite pronoun viele. Together, they form a determiner phrase (DP), which is attached to the head noun Ältere on the same level: NX 502 − HD DP 501 − HD ADVX 500 HD so viele Ältere 0 1 ADV PIDAT NN 2 −− ap* ap* NX 502 − HD DP 501 − HD ADVX 500 HD gar kein 0 Schutz 1 2 ADV PIAT NN −− nsm nsm 54 14 4.4 Prepositional Phrases 4.4.1 Prepositions Considering prepositional phrases, it turns out to be appropriate not to annotate the preposition as the head of the phrase. It is rather reasonable to annotate the complement within the prepositional phrase as the head. This decision facilitates the identification of dependencies between verbs and their nominal complements and adjuncts. Moreover, it is in accordance with basic assumptions in Dependency Grammar . PX 501 − HD NX 500 HD in Südpolen APPR NN d dsn 0 1 If the preposition is realized as a non-alphabetic sign, e.g. - (bis, gegen), this sign is tagged as APPR and annotated like a preposition: NX 504 HD − PX 503 − HD NX 502 − NX ADJX 500 − 501 − HSV HD HD BU 0 − 1 Bramfelder 2 SV 3 4 NE NE APPR ADJA NN nsm nsn a *** asm Since pronominal adverbs (PROP) are pronominal forms of a prepositional phrase, they are directly projected to PX: 55 SIMPX 510 − − − − VF 506 ON LK 507 HD V−MOD MF 508 OA NX 500 HD VXFIN 501 HD ADVX 502 HD NX 503 HD Freuden−thal 0 wollte 1 gestern 2 nichts VC 509 OV FOPP 3 PX 504 HD VXINF 505 HD dazu sagen 4 NE VMFIN ADV PIS PROP VVINF nsf 3sit −− *** −− −− 5 Freudenthal In German, there are so-called Verschmelzungsformen, i.e. merged forms of a preposition and a determiner, e.g. in dem Januar amalgamates to im Januar. The merged form is assigned the part-of-speech tag APPRART (including richer morphological annotation). In terms of syntax, it is annotated like a preposition: PX 501 − HD NX 500 HD Im Januar 0 1 APPRART NN dsm dsm Prepositional phrases expressing intervals, e.g. with von/bis, von/bis zu or zwischen, are annotated in the same way as coordinate structures (cf. 6.4.1), i.e. without head assignment on the level of coordination, since the two phrases are assumed to be conjuncts. If two prepositions follow each other (e.g. bis zum), the result is an embedded structure of a prepositional phrase taking another preposition. The first preposition does hereby not receive a morphological case feature. PX 506 KONJ KONJ PX 504 − PX 505 HD − HD NX 502 NX 503 − − ADJX 500 ADJX 501 HD vom HD 23. 0 HD bis 1 25. 2 Juli 3 4 APPRART ADJA APPR ADJA NN dsm dsm a asm asm 56 PX 505 − HD NX 504 KONJ − KONJ NX 502 NX 503 − − ADJX 500 ADJX 501 HD zwischen HD 15.000 0 und 1 22.000 2 3 APPR CARD KON CARD d −− −− −− PX 504 − HD PX 503 − HD NX 502 APP bis zum 0 APP NX 500 NX 501 HD HD Jahr 1 2000 2 3 APPR APPRART NN CARD −− dsn dsn −− As opposed to the case with two prepositions, intervals like dritter bis fünfter November are annotated as a coordinate attributive adjective phrase within a simple noun phrase (cf. 6.4.1). Premodification of non-isolated prepositional phrases follows the general principle of low attachment. PX 504 − − HD NX 503 HD ADVX 500 HD Irgendwo − NX 501 − NX 502 HD HD in den ADV APPR ART NN NE −− d dpm dpm gsn 0 1 2 Wäldern 3 Schaumburgs 4 There is one exception to the low attachment principle: isolated phrases in which a preceding adverb does not semantically modify the prepositional phrase. In this case the adverbial phrase is high attached to an additional level of PX. 57 PX 503 − HD PX 502 − HD ADVX 500 HD Nun NX 501 HD zum 0 Wetter 1 ADV APPRART NN −− dsn dsn 4.4.2 2 Circumpositions and Postpositions Circumpositions are treated as ternary branching prepositional phrases. The circumposition on the left hand side is tagged as APPR and the circumposition on the right hand side as APZR: PX 501 − HD − NX 500 HD von sich 0 aus 1 2 APPR PRF APZR d ds*3 −− Postpositions are tagged as APPO. The complement of the postposition occurs on the left side and constitutes the head of the prepositional phrase: PX 501 HD − NX 500 − HD Dem Vernehmen 0 nach 1 2 ART NN APPO dsn dsn d 4.5 Adjectival Phrases We distinguish between attributive adjectives on the one hand and adverbial or predicative adjectives respectively on the other hand. Attributive adjectives are tagged as 58 ADJA (die traditionellen Elemente) or CARD (20.15 Uhr), whereas adverbial or predicative adjectives are tagged as ADJD (das Gewicht ist gut; den betriebswirtschaftlich günstigeren Standort) or PWAV (wie wirke ich). The annotation of superlative and comparative forms is explained in section 7.1 on page 112. In general, German adjectives are inflected when they are an attribute of a noun. They are not inflected either when they function as a predicative adjective or a premodifier of an adjective or an adverb or when they belong to a small class of noninflected adjectives, e.g. some ancient form such as gut Wetter or lieb Mütterlein or some adjectives denoting a colour (mit einer rosa Karte). All adjectives have to be projected to their phrase level before they are attached to another phrase or to a field. NX 501 − − HD ADJX 500 HD Die traditionellen Elemente 0 1 2 ART ADJA NN npn npn npn PX 502 − HD NX 501 − − HD ADJX 500 HD mit einer 0 rosa 1 Karte 2 3 APPR ART ADJA NN d dsf dsf dsf SIMPX 506 − − VF 503 LK 504 ON HD NX 500 − HD Gewicht 0 ADJX 502 HD ist 1 MF 505 PRED VXFIN 501 HD Das − gut 2 3 ART NN VAFIN ADJD nsn nsn 3sis −− 59 NX 502 − − HD ADJX 501 − HD ADJX 500 HD den betriebswirtschaftlich 0 1 günstigeren 2 Standort ART ADJD ADJA NN asm −− asm asm SIMPX 509 − − − 3 − VF 508 ON NX 504 − − HD ADJX 500 HD Der männliche 0 1 Trinker 2 LK 505 HD MF 506 V−MOD VC 507 OV VXFIN 501 HD ADJX 502 HD VXINF 503 HD sei 3 gut 4 erforscht 5 , 6 ART ADJA NN VAFIN ADJD VVPP $, nsm nsm nsm 3sks −− −− −− A nominalized adjective like Fassbares might be premodified by an adverbial adjective (ADJD) instead of an attributive adjective (ADJA). The former ones do never inflect. NX 501 − HD ADJX 500 HD physisch Faßbares 0 1 ADJD NN −− asn Whenever an adjective is modified by another modifier, the same annotation strategy as for noun phrases is applied, i.e., the modifier is directly attached to the adjectival phrase. The adjectival phrase as a whole is the premodifier of the noun phrase. For instance: 60 NX 502 − − HD ADJX 501 − HD ADVX 500 HD eine sehr 0 gute 1 Quelle 2 3 ART ADV ADJA NN nsf −− nsf nsf SIMPX 508 − − − − MF 507 PRED KOORD 500 VF 504 LK 505 ON HD − VXFIN 502 ADVX 503 HD HD NX 501 − − aber HD der 0 Text 1 ADJX 506 ist 2 HD sehr 3 abstrakt 4 5 KON ART NN VAFIN ADV ADJD −− nsm nsm 3sis −− −− The same holds if an adjective selects an argument. Für die Weltgesellschaft is the facultative argument of wesentlich. It is directly attached to the adjectival phrase. NX 503 − − HD ADJX 502 − HD PX 501 − HD NX 500 − Die für 0 HD die 1 Weltgesellschaft 2 wesentliche 3 Unterscheidung 4 5 ART APPR ART NN ADJA NN nsf a asf asf nsf nsf Premodifying adjectives may occur in a linear order and/or as a coordination (cf. 6.4.1) of attributive adjectives: 61 NX 503 − HD ADJX 502 KONJ − KONJ ADJX 500 ADJX 501 HD HD 28. und 0 29. 1 Mai 2 3 ADJA KON ADJA NN nsm −− nsm nsm NX 504 − − − HD ADJX 503 KONJ KONJ ADJX 500 ADJX 501 HD Die ADJX 502 HD großen 0 , 1 HD bekannten serbischen 2 3 Oppositionsparteien 4 5 ART ADJA $, ADJA ADJA NN npf npf −− npf npf npf NX 504 − − − HD ADJX 503 KONJ ADJX 500 ADJX 502 HD eigene 0 KONJ ADJX 501 HD ihre − HD demokratische 1 und 2 freiheitliche 3 Tradition 4 5 PPOSAT ADJA ADJA KON ADJA NN asf asf asf −− asf asf If the premodifying adjective is deverbal, the adjectival phrase can be of any complexity. In this case, the adjectival phrase has its own internal dependency structure. All elements which depend on the adjective are annotated as its premodifiers. Deverbal adjectives are either attributive or adverbial and predicative respectively, and occur as the present participle or past participle form of a verb. 62 NX 502 − − HD ADJX 501 − HD ADJX 500 HD das aktuell 0 diskutierte Thema 1 2 3 ART ADJD ADJA NN asn −− asn asn NX 504 − − HD ADJX 503 − − HD PX 502 − HD ADVX 500 HD Die 0 teilweise NX 501 − 1 HD in die Erde 2 3 4 gebaute 5 Sporthalle ART ADV APPR ART NN ADJA NN nsf −− a asf asf nsf nsf 6 In the following example, postmodification of an adjectival phrase is shown: ADJX 502 HD − ADJX 500 ADJX 501 HD − besser HD als 0 gut 1 2 ADJD KOKOM ADJD −− −− −− 4.6 Adverbial Phrases Besides adverbials also negation particles (PTKNEG) project to an adverbial phrase. They either occur as premodifiers1 or postmodifiers or they are directly attached to a field. 1 bis zu, über are considered to be ADV rather than APPR because of their semantic meaning. 63 SIMPX 515 − − − − − MF 514 OD VF LK 509 KOORD ADJX APP ADVX VXFIN NX NX 502 HD 503 HD heute 0 will − Ina 1 2 − NX 504 − − Terre 3 ( 4 VC 512 APP − V−MOD 511 HD 501 MOD NX 510 V−MOD 500 Doch ON − Hannelore 5 ADVX 505 HD Droege 6 ) 7 8 VXINF 507 508 HD nicht 9 OV ADVX 506 HD es 513 HD HD so 10 recht 11 munden 12 13 KON ADV VMFIN NE NE $( NE NE $( PPER PTKNEG ADV ADJD VVINF −− −− 3sis dsf dsf −− dsf dsf −− nsn3 −− −− −− −− NX 503 − − ADVX 500 ADVX 501 HD HD bis − HD ADJX 502 HD zu 300.000 Leute 0 1 2 ADV ADV CARD NN 3 −− −− −− np* NX 502 − − ADVX 500 HD ADJX 501 HD HD über 350.000 0 Auskünfte 1 2 ADV CARD NN −− −− apf ADVX 502 HD − ADVX 500 ADVX 501 HD HD heute abend 0 1 ADV ADV −− −− 64 SIMPX 510 − − − − MF 509 V−MOD VF 505 LK 506 ON HD HD − OV VXFIN 501 ADVX 502 ADVX 503 VXINF 504 HD HD HD HD NX 500 − HD Der Fahrer 0 ADVX 507 konnte 1 nicht 2 VC 508 mehr 3 bremsen 4 . 5 6 ART NN VMFIN PTKNEG ADV VVINF $. nsm nsm 3sit −− −− −− −− 4.7 Verb Phrases Whereas finite verb phrases are labelled VXFIN, non-finite verb phrases are labelled VXINF. Since infinitives and past participles share certain properties (e.g. exchangeability in Man hat nur noch das eigene Herz schlagen hören/gehört.), they are assumed to carry the same phrase label (VXINF). The finite verb in LK as well as the non-finite verbs in VC are always projected to their phrase level. All verb phrases of the verb complex are attached on the same level to form the verb complex. In order to follow the flat clustering principle, no internal hierarchy of the verb complex is annotated. 4.7.1 Head of a Sentence and Verb Complex The finite verb which can either appear in LK (verb-first clauses and verb-second clauses) or in VC (verb-final clauses), is always the head of the entire sentence. Non-finite verbal elements belong to VC. If the finite verb is located in LK and if there is more than one non-finite element in VC, the non-finite element which is selected by the finite verb is denoted as the head of VC. All other elements of VC are verbal objects. The head of VC selects the verbal object OV. This verbal object may select another verbal object OV, and so on. In order to denote the dependency relations between verbal objects within the verb complex, we attach a secondary edge label refvc between their phrase nodes. 4.7.2 Verb Complexes in Verb-second and Verb-final Clauses The following example shows a verb-second clause with the head of the sentence in LK and a verb complex consisting of a single non-finite element. 65 SIMPX 513 − − − − MF 512 OA FOPP VF PX 510 511 ON − NX LK 506 − NX 507 − HD NX 501 überehrgeizige Bürgermeister 1 HD die 3 − Bergidylle 4 in 5 ein VXINF 504 HD 6 OV NX 503 − will 2 − NX 502 HD 0 509 HD VXFIN 500 HD VC 508 HD ADJX Der HD − Mekka 7 des 8 505 HD HD Massentourismus 9 verwandeln 10 11 ART ADJA NN VMFIN ART NN APPR ART NE ART NN VVINF nsm nsm nsm 3sis asf asf a asn asn gsm gsm −− If the verb complex comprises more than one immediate daughter, the one that is selected by the finite verb is the head of VC. SIMPX 509 − − − VF 505 LK 506 ON HD NX 500 NX 502 HD Es VC 508 PRED VXFIN 501 HD − MF 507 − müsse 0 HD ein 1 Buchungsfehler 2 OV HD VXINF 503 VXINF 504 HD HD gewesen 3 sein 4 . 5 6 PPER VMFIN ART NN VAPP VAINF $. nsn3 3sks nsm nsm −− −− −− The following trees demonstrate verb complexes with two or more verbal objects. The secondary edge label refvc is pointing from the selecting OV to the depending OV. SIMPX 508 − − − MF 506 MOD VC 507 ON OV refvc C 500 ADVX 501 − HD Wenn HD da 0 NX 502 was 1 OV 503 HD VXINF 503 VXINF 504 VXFIN 505 HD HD HD gebucht 2 worden 3 ist 4 5 KOUS ADV PIS VVPP VAPP VAFIN −− −− *** −− −− 3sis 66 SIMPX 513 − − − MF 512 MOD ON V−MOD PX 511 − HD NX 509 HD VC 510 − OV refvc C 500 ADVX 501 − NX 502 HD daß − auch 0 HD sein 1 Vater 2 auf 3 NX 503 NX 504 HD HD Anregung 4 Andreottis 5 OV 505 refvc OV 506 HD VXINF 505 VXINF 506 VXINF 507 VXFIN 508 HD HD HD HD umgebracht 6 worden 7 sein 8 könnte 9 10 KOUS ADV PPOSAT NN APPR NN NE VVPP VAPP VAINF VMFIN −− −− nsm nsm a asf gsm −− −− −− 3skt If there is no finite verb at all, the rightmost element of the verb complex (if there is more than one element) is annotated as the head of the sentence. This often occurs in headlines (cf. 5.2 and 7.4). SIMPX 504 − − MF 502 VC 503 OA HD NX 500 VXINF 501 HD HD Prachtwicken gucken 0 . 1 2 NN VVINF $. apf −− −− 4.7.3 Ersatzinfinitiv Constructions In order to indicate Ersatzinfinitiv constructions, two specific field node labels are introduced. VCE is the node label for the part of the verb complex consisting of the finite verb which subcategorizes for the Ersatzinfinitiv. MFE is the node label for the second part of MF between VCE and the second part of the verb complex VC (e.g. [C die] [MF uns] [VCE hätten] [MFE mißtrauisch] [VC machen müssen]). 67 SIMPX 511 − − − − MF 510 ON OPP NX 507 KONJ VCE 508 − KONJ C 500 NX 501 NX 502 PX 503 − HD HD HD daß Fischer 0 und 1 ich 2 dazu 3 VC 509 HD OV HD VXFIN 504 VXINF 505 VXINF 506 HD HD HD haben 4 beitragen können 5 6 7 KOUS NE KON PPER PROP VAFIN VVINF VMINF −− nsm −− ns*1 −− 1pis −− −− R−SIMPX 511 − − − − − C 506 ON MF 507 OA VCE 508 HD MFE 509 PRED OV HD NX 500 HD NX 501 HD VXFIN 502 HD ADJX 503 HD VXINF 504 HD VXINF 505 HD die uns PRELS PPER VAFIN ADJD VVINF VMINF np* ap*1 3pkt −− −− −− 0 hätten VC 510 1 mißtrauisch 2 3 machen 4 müssen 5 In the example below, the finite verb precedes the non-finite verbs although müssen is no Ersatzinfinitiv. Since its position corresponds to the position of the finite verb in real Ersatzinfinitiv constructions and here also a second middle field is possible, we follow the same annotation strategy. SIMPX 514 − − − − MF 513 ON OD MOD MOD OA NX 512 − − HD ADJX 509 − C 500 − NX 501 − daß NX 502 HD die HD Nato sich 2 VCE 510 HD OV HD ADVX 503 ADVX 504 ADVX 505 VXFIN 506 VXINF 507 VXINF 508 HD HD HD HD HD HD doch 3 noch 4 ein 5 HD VC 511 ganz 6 neues 8 wird 9 überlegen 10 müssen 1 KOUS ART NE PRF ADV ADV ART ADV ADJA NN VAFIN VVINF VMINF −− nsf nsf ds*3 −− −− asn −− asn asn 3sis −− −− 68 7 Konzept 0 11 12 4.7.4 Infinitives with zu Regarding infinitives with zu, zu determines the non-finiteness of the verb on its right hand side. This is the reason why zu is considered the head of the VXINF whereas the infinitive is assumed to be the complement. Like other infinitives, they occur in the verb complex: SIMPX 512 − − − − VF 511 ON NX 510 HD − NX 506 − HD NX LK 501 KONJ NX − NX 500 513 HD der 0 Friedens− und 2 509 HD OD PRED OV NX VXFIN NX ADJX VXINF 502 HD 1 VC 508 KONJ 514 − Erkenntnisse MF 507 503 HD Konfliktforschung 3 HD scheinen 4 505 HD ihm 5 504 HD fremd 6 − zu 7 sein 8 . 9 10 NN ART TRUNC KON NN VVFIN PPER ADJD PTKZU VAINF $. npf gsf −− −− gsf 3pis dsm3 −− −− −− −− SIMPX 510 − − − − VF 509 OPP PX 505 − LK 506 HD NX 500 HD Über Details 0 MF 507 VC 508 HD MOD OV HD VXFIN 501 ADVX 502 VXINF 503 VXINF 504 HD HD werde 1 HD noch 2 − zu 3 HD verhandeln 4 sein 5 . 6 7 APPR NN VAFIN ADV PTKZU VVINF VAINF $. a apn 3sks −− −− −− −− −− The infinitive with zu can also be realized as an infix of the verb. In this case, the verb is tagged as VVIZU. Moreover, it is projected to VXINF with the grammatical function HD: 69 SIMPX 509 − − − MF 508 MOD MOD OA PX 507 − HD NX 505 VC 506 − C 500 HD HD ADJX 501 − ADVX 502 HD um neben 0 NX 503 HD neuen Baugesetzen 1 − auch 2 3 VXINF 504 HD mehr 4 HD Mitspracherechte einzufordern 5 6 . 7 8 KOUI APPR ADJA NN ADV PIAT NN VVIZU $. −− d dpn dpn −− *** apn −− −− Beside the examples above, the infinitive with zu occurs in optional (in most cases with um zu) and obligatory infinitive clauses. SIMPX 515 − − − − NF 514 OS MF 512 ON OS−MOD SIMPX 513 OD OPP − PX 508 − HD C 500 NX 501 NX 502 NX 503 NX 504 − HD HD HD HD Wenn Angehörige 0 es 1 sich 2 zur 3 MF 510 HD OA VXFIN 505 Lebensaufgabe HD VXINF 507 − machen 5 VC 511 NX 506 HD 4 − VC 509 , 6 HD den 7 HD Kranken 8 − zu 9 kontrollieren 10 11 KOUS NN PPER PRF APPRART NN VVFIN $, ART NN PTKZU VVINF −− np* asn3 dp*3 dsf dsf 3pis −− asm asm −− −− SIMPX 513 − − − − NF 512 OS SIMPX 511 − MF 507 VC 508 OD HD C 500 NX 501 − HD um Freunden 0 C 503 − zu 1 − MF 509 VXINF 502 HD − − sagen 2 , 3 − daß 5 ON OA NX 504 NX 505 HD ihr 4 VC 510 VXFIN 506 HD Zug 6 HD HD Verspätung 7 hat 8 9 KOUI NN PTKZU VVINF $, KOUS PPOSAT NN NN VAFIN −− dpm −− −− −− −− nsm nsm asf 3sis 70 Infinitive clauses can consist of only one verb complex: SIMPX 518 − − FKOORD 517 KONJ − KONJ FKONJ 516 − − − FKONJ 514 NF 515 − − OS MF 512 SIMPX 513 OA VF 507 LK 508 ON HD NX 500 HD − PX 509 − VXFIN 501 HD NX 502 HD Er V−MOD − wendet 0 den 1 NX 503 HD − Blick 2 von 3 VC 511 HD VXFIN 504 HD der 4 LK 510 HD Wand 5 VC 505 HD und 6 VPT fängt 7 an 8 VXINF 506 HD − zu 9 erzählen 10 . 11 12 PPER VVFIN ART NN APPR ART NN KON VVFIN PTKVZ PTKZU VVINF $. nsm3 3sis asm asm d dsf dsf −− 3sis −− −− −− −− 4.7.5 Coherency and Incoherency of Verbal Constructions The notion of coherency attributed to Bech (1955 57) covers the relation of dependency between adjacent verbal elements, i.e. the relation of subcategorization between a verb and a non-finite verbal complement. Kiss (1995) calls this relation infinitive Komplementation (non-finite complementation). Bech (1955 57) distinguishes between three different modi of obligatory and optional coherency: 1. verbs constructing coherently and incoherently, e.g. versprechen, versuchen coherent, extraposition possible: a. [wie er mit kritischen politischen Gegenpositionen umzugehen versteht] incoherent, extraposition: b. [wie er versteht,][mit kritischen politischen Gegenpositionen umzugehen] 2. verbs constructing only coherently, e.g. wollen, möchten coherent, no extrapostion possible: a. [wie er mit kritischen politischen Gegenpositionen umgehen will] b.*[wie er will mit kritischen politischen Gegenpositionen umgehen] 3. verbs constructing only incoherently, e.g. überreden, überzeugen incoherent, extraposition obligatory: a. [wie er sie überredet,][mit kritischen politischen Gegenpositionen umzugehen] b.*[wie er sie [mit kritischen politischen Gegenpositionen umzugehen] überredet] 71 Coherent and incoherent constructions of verbs are annotated differently. In case of coherency, the verbal complement is part of the verb complex. In the clause wie er mit kritischen politischen Gegenpositionen umzugehen versteht, for instance, the infinitive with zu is the verbal object of the finite verb. While in case of incoherency, the verbal complement is annotated as a sentential complement, i.e., mit kritischen politischen Gegenpositionen umzugehen in the clause wie er sie überredet, mit kritischen politischen Gegenpositionen umzugehen is a sentential object in NF. We define that a construction is incoherent, if extraposition in NF is possible. That is, whenever it is possible to shift the infinitival complement together with a constituent of MF, which it subcategorizes for, into NF, these elements are annotated as sentential objects. Therefore, the coherent example above (wie er mit kritischen politischen Gegenpositionen umzugehen versteht) is annotated with a sentential object in MF since extraposition is possible (cf. the incoherent example 1.b.). SIMPX 513 − − − MF 512 ON OS SIMPX 511 − − MF 510 OPP PX 509 − HD NX 506 − C 500 NX 501 − HD wie ADJX 502 mit 1 HD kritischen 2 HD ADJX 503 HD er 0 − politischen 3 Gegenpositionen 4 VC 507 VC 508 HD HD VXINF 504 VXFIN 505 HD HD umzugehen 5 versteht 6 7 KOUS PPER APPR ADJA ADJA NN VVIZU VVFIN −− nsm3 d dpf dpf dpf −− 3sis If a complement of the verb within the sentential object is located out of the sentence boundaries, e.g. in the C-field, the secondary edge label refcontr gives additional information about the dependency relation (cf. 3.4.6). 4.7.6 AcI Constructions AcI (accusativus cum infinitivo) verbs are a small group of verba sentiendi (e.g. sehen, hören, fühlen, spüren) which subcategorize for an accusative and an infinitive. The verbs lassen, machen, heißen have a modal verb like reading in which they also select an accusative and an infinitive. The infinitive itself subcategorizes for complements with respect to its valency but its subject is realized by an accusative which is the direct object of the AcI verb. Since AcI constructions are coherent infinitive constructions in which extraposition is not possible (cf. (Eisenberg 1999 2001), p.355), the AcI is not annotated as a sentential object (* wenn man nur noch hört das eigene Herz schlagen). The infinitive as the verbal object of the AcI verb is located in the verb complex and the accusative is realized as OA in MF. 72 SIMPX 510 − − − MF 509 ON MOD MOD OA NX 507 − C 500 NX 501 − HD Wenn ADVX 502 ADVX 503 HD HD man 0 nur 1 VC 508 − HD ADJX 504 HD noch 2 das 3 eigene 4 Herz 5 OV HD VXINF 505 VXFIN 506 HD HD schlagen 6 hört 7 8 KOUS PIS ADV ADV ART ADJA NN VVINF VVFIN −− ns* −− −− asn asn asn −− 3sis As a consequence of this analysis, we annotate two accusative objects (OA) if the AcI construction comprises a transitive infinitive verb such as beenden in the following example. Uns functions as its subject and die Diskussion as its direct object. Both are in accusative case and both are labelled OA. SIMPX 509 − − LK 506 HD VXFIN 500 HD Lassen − MF 507 OA ON OA NX 501 HD NX 502 HD NX 503 − VC 508 OV MOD HD ADVX 504 HD VXINF 505 HD jetzt beenden Sie uns die Diskussion VVFIN PPER PPER ART NN ADV VVINF 3pis np*3 ap*1 asf asf −− −− 0 4.7.7 1 2 3 4 5 6 Imperatives Imperative verbs have only one singular and one plural form and are not inflected concerning the grammatical category person. Their form corresponds to second person singular and plural verbs which are tagged as VVIMP or VAIMP. Warte mal! instead of Wartest du mal? (warte/VVIMP:s) (wartest/VVFIN:2sis) It is important to keep apart imperative sentences from imperative verbs. An imperative sentence does not need to comprise an imperative verb form as is shown in the following examples Warten Sie mal bitte! Bitte warten! (warten/VVFIN:3pis) (warten/VVINF:–) 73 SIMPX 506 − − MF 505 OA LK 503 HD NX 504 HD VXFIN 500 HD − NX 501 HD ( vgl. $( VVIMP NN CARD $( −− s asf −− −− 0 Seite NX 502 HD 1 32 2 ) 3 4 Normally imperative verbs are lacking the subject, but the addressed person can also be mentioned to stress the utterance: SIMPX 504 − DM 502 − LK 503 HD NX 500 HD VXFIN 501 HD Maikäfer flieg 0 ... 1 2 NN VVIMP $( nsm s −− 4.7.8 Particle Verbs Separable verb particles are tagged as PTKVZ and annotated with the edge label VPT: SIMPX 511 − − MF 510 ON OD LK 507 HD ADVX 500 − NX 501 HD − Auch − Vertreter 1 − − VXFIN 503 HD der 2 NX 508 HD NX 502 HD die 0 − VF 509 NX 506 − − ADJX 504 HD AfB 3 HD VC 505 HD stimmten 4 den 5 VPT 86 6 Millionen 7 zu 8 . 9 10 ADV ART NN ART NE VVFIN ART CARD NN PTKVZ $. −− npm npm gsf gsf 3pit dpf −− dpf −− −− 74 In verb-final clauses, the particle verb occurs unseparated within the verb complex: SIMPX 510 − − − − VF 506 LK 507 ON HD MOD OD MOD OV VXFIN 501 ADVX 502 NX 503 ADVX 504 VXINF 505 HD HD HD HD NX 500 HD Rußland MF 508 wollte 0 VC 509 − bislang 1 HD einer 2 UN−Resolution 3 nur 4 zustimmen 5 6 NE VMFIN ADV ART NN ADV VVINF nsn 3sit −− dsf dsf −− −− 4.7.9 Verbs with Predicate Typically, the complement type PRED (predicate) occurs with verbs like sein, haben, scheinen, aussehen, sich anhören, klingen, etc. PRED is annotated, if the following conditions apply: • if it is not possible to determine the case of the constituent in question properly (e.g. gut in Das ist gut.) • if the constituent in question actually predicates the subject, i.e. the subject is characterized as having the property expressed by PRED (e.g. in Die Ursache war unklar. Die Ursache is characterized by the property of being unclear) • many PRED verbs are raising-verbs (subject without theta-role) • if als-phrases are selected by the verb they are labelled as PRED (e.g. Unter dem Motto Kino-Extrem agiert der Regisseur als Filmjockey.) SIMPX 510 − − − VF 509 V−MOD PX 508 − HD NX LK 505 Unter APP HD ON PRED NX NX VXFIN NX NX 501 HD dem 0 507 APP 500 − MF 506 HD Motto 1 502 " 2 HD Kino−Extrem 3 503 " 4 − agiert 5 der 6 504 HD − Regisseur 7 HD als 8 Filmjockey 9 10 APPR ART NN $( NN $( VVFIN ART NN KOKOM NN d dsn dsn −− dsn −− 3sis nsm nsm −− nsm 75 Some examples for verbs that take predicates: recht sein, recht haben, leid tun, frei sein, fertig sein, sich gut/schlecht treffen, gut/schlecht finden, etc. PRED verbs have to be distinguished carefully from verbs occurring with ordinary modifiers (V-MOD) such as gut passen. With respect to topological fields, note that PRED usually marks the border between MF and NF, i.e., whatever constituent occurs on the right hand side of PRED belongs to NF. In general, this constituent is an adjunct which PRED does not subcategorize for: SIMPX 508 − − − VF 504 LK 505 ON HD NX 500 HD VXFIN 501 NF 507 V−MOD NX 502 ADVX 503 HD ist 0 MF 506 PRED HD Das − HD Politik 1 hier 2 3 PDS VAFIN NN ADV nsn 3sis nsf −− SIMPX 509 − − − − NF 508 V−MOD VF 504 LK 505 ON HD NX 500 HD VXFIN 501 HD es PX 507 − HD ADJX 502 NX 503 HD ist 0 MF 506 PRED − kalt 1 an 2 HD diesem 3 Tag 4 5 PPER VAFIN ADJD APPR PDAT NN nsn3 3sis −− d dsm dsm But there are exceptions in which PRED does not necessarily constitute the border between MF and NF: • Another constituent may occur between PRED and VC, for instance, if an ambiguous modifier follows PRED. 76 R−SIMPX 511 − − − MF 510 V−MOD C 505 PRED PX 506 ON NX 507 − HD NX 500 − an 0 HD diesem 1 HD − Abend der 3 HD NX 503 HD 2 VC 509 HD ADJX 502 − das PX 508 − NX 501 HD MOD HD schönste Platz 4 5 im 6 HD All 7 VXFIN 504 war 8 . 9 10 PRELS APPR PDAT NN ART ADJA NN APPRART NN VAFIN $. nsn d dsm dsm nsm nsm nsm dsn dsn 3sit −− • PRED subcategorizes for the constituent that follows it. Complements of PREDs are always attached to a field since they are assigned a grammatical function within the sentence structure (cf. 8.1): SIMPX 510 − − − VF 508 MF 509 ON PRED NX 505 LK 506 HD − NX 500 HD meiner 0 − VXFIN 502 − Einer PX 507 HD NX 501 HD FOPP ADJX 503 HD Freunde 1 NX 504 HD wurde 2 HD HD süchtig 3 nach 4 Nachrichten 5 . 6 7 PIS PPOSAT NN VAFIN ADJD APPR NN $. nsm gpm gpm 3sit −− d dpf −− SIMPX 507 − − − VF 504 LK 505 ON HD NX 500 HD PRED VXFIN 501 HD ich FOPP ADJX 502 HD bin 0 MF 506 froh 1 PX 503 HD darum 2 3 PPER VAFIN ADJD PROP ns*1 1sis −− −− • Because of the word order rule that pronouns in MF have to precede other constituents, PRED might not be the last element in MF if it is a pronoun: 77 SIMPX 510 − − VF − LK 506 − MF 507 VC 508 509 ON HD PRED MOD OV HD NX VXFIN NX ADVX VXINF VXINF 500 501 HD 502 HD Bravo 503 HD kann 0 HD es 1 504 505 HD nicht HD gewesen 2 3 sein 4 . 5 6 NN VMFIN PPER PTKNEG VAPP VAINF $. nsf 3sis nsn3 −− −− −− −− SIMPX 507 − − − VF 504 LK 505 ON HD NX 500 MF 506 PRED VXFIN 501 HD NX 502 HD er ADVX 503 HD war 0 MOD HD es 1 nicht 2 3 PPER VAFIN PPER PTKNEG nsm3 3sit nsn3 −− 4.7.10 Modal Verbs Modal verbs are always tagged as VMFIN or VMINF regardless of their use as an auxiliary or a main verb. If a modal verb functions as an auxiliary verb, it is projected like any other auxiliary verb. If a modal verb is the main verb of a sentences, verbal modifiers refer to the modal verb in the same way as they refer to other main verbs: SIMPX 508 − − − VF 504 LK 505 MF 506 VC 507 ON HD OA OV NX 500 HD " − NX 502 HD Die 0 VXFIN 501 − wollten 1 HD die 2 VXINF 503 HD BLG 3 schonen 4 . 5 " 6 7 $( PDS VMFIN ART NE VVINF $. $( −− np* 3pit asf asf −− −− −− 78 SIMPX 512 − − LK 509 − MF 510 HD VXFIN 500 HD Hätte VC 511 ON OD OA MOD NX 501 NX 502 NX 503 ADVX 504 HD HD HD sie 0 sich 1 HD das 2 OA−MOD V−MOD OV HD NX 505 ADVX 506 VXINF 507 VXINF 508 HD HD HD HD nicht 3 alles 4 vorher 5 überlegen 6 können 7 ? 8 9 VAFIN PPER PRF PDS PTKNEG PIS ADV VVINF VMINF $. 3skt nsf3 ds*3 asn −− asn −− −− −− −− SIMPX 508 − − − MF 507 ON OPP C 504 V−MOD PX 505 − PX 500 HD Warum HD NX 501 HD 0 Daewoo VC 506 HD NX 502 HD 1 nach 2 Bremen VXFIN 503 HD 3 mußte PWAV NE APPR NE VMFIN −− ns* d dsn 3sit 4 79 Chapter 5 Attachment Principles for Phrases 5.1 Attachment to Fields Phrases are attached to the topological field in which they occur. Their edge labels denote their grammatical function within the sentence structure. In LK and VC there can only occur verb forms, separable verbal prefixes, or infinitive particles. LK and VC mark the beginning and the end of MF (cf. 3.2). 5.2 Attachment of Ambiguous Complements The partially free word order and the morphological properties of German can cause ambiguity concerning the grammatical function of a constituent. In the following example, the syntactic structure does not give any information about case assignment. Both noun phrases can be identified as ON or OA: SIMPX 509 − − − VF 508 OA NX 507 HD − PX 504 − HD NX 500 − NX 501 HD Ein − Bad 0 in 1 MF 506 HD ON VXFIN 502 HD der 2 LK 505 HD Menge 3 NX 503 − verhindert 4 HD das 5 Sicherheitsgitter 6 . 7 8 ART NN APPR ART NN VVFIN ART NN $. asn asn d dsf dsf 3sis nsn nsn −− Headlines like the following are lacking the finite verb. Therefore, in the first example it cannot be decided if it is an active or a passive construction, i.e., if the noun phrase is ON or OA. The second example is an active construction, but again the noun phrase can be both, ON or OA: 80 SIMPX 504 − − MF 502 VC 503 ON HD NX 500 VXINF 501 HD HD Kriegsverbrecher verurteilt 0 1 NN VVPP npm −− SIMPX 504 − − MF 502 VC 503 OA HD NX 500 VXINF 501 HD HD Prachtwicken gucken 0 . 1 2 NN VVINF $. apf −− −− Since we do not assign specific edge labels for ambiguous complements, we formulate the following preference principle for case assignment: Preference principle for case assignment: If case assignment is ambiguous, we decide on the more plausible grammatical function and on the more plausible sequence of grammatical functions respectively. The main criteria for the decision are the unmarked word order and the semantic content. Therefore, in the first example above, OA appears in VF whereas ON has its position in MF. For elliptical headlines, we assume a passive construction if the verb in VC is a past participle and an active construction if the verb in VC is an infinitive (cf. 4.7.2 and 7.4). 5.3 Modifier Attachment Modifiers either modify one specific constituent or more than one constituent. The scope of modification can even range over the whole sentence structure. Therefore, they are either unambiguous or ambiguous. An unambiguous constituent that modifies just one other constituent within a tree structure is either adjacent or discontinuous. In the first case, it is immediately attached to the constituent which it modifies, concerning the attachment rules for phrases. In the second case, the dependency, which can even go beyond the border of topological fields, is indicated by X-MOD edge labels, which express the non-ambiguity of the modifier (e.g. OA-MOD is the modifier of OA). Thus, edge labels like OA-MOD, V-MOD, OPP-MOD, MOD-MOD, etc. express that the respective constituent modifies only one other constituent in the sentence (OA, V, OPP, a modifier, etc.) which is not adjacent: 81 SIMPX 511 − − − − VF 510 OA−MOD PX 506 LK 507 − HD HD NX 500 − Für VXFIN 501 HD diese 0 MF 508 HD Behauptung 1 ON MOD NX 502 ADVX 503 HD hat 2 OA OV NX 504 HD Beckmeyer 3 VC 509 − bisher keinen 4 5 VXINF 505 HD HD Nachweis 6 geliefert . 7 8 9 APPR PDAT NN VAFIN NE ADV PIAT NN VVPP $. a asf asf 3sis nsm −− asm asm −− −− If a modifying constituent is ambiguous (i.e. it modifies more than one constituent, the entire sentence, or a constituent that occurred in previous sentences), it is attached to its topological field and given the ambiguous edge label MOD to preserve ambiguity. In the following example an der Uni either modifies the accusative object den Entwicklungsprozeß or the verb fortsetzen: SIMPX 513 − − − VF 511 V−MOD ON ADJX 507 KONJ KONJ ADJX 500 MOD HD energisch 1 NX 503 HD will 2 VC 510 − VXFIN 502 HD und PX 509 HD ADJX 501 HD 0 OA LK 508 − Phantasievoll − MF 512 NX 504 − er 3 HD NX 505 HD den 4 OV − Entwicklungsprozeß 5 an 6 der 7 VXINF 506 HD HD Uni 8 fortsetzen 9 10 ADJD KON ADJD VMFIN PPER ART NN APPR ART NN VVINF −− −− −− 3sis nsm3 asm asm d dsf dsf −− We formulate the following definitions for MOD and X-MOD: Definition of MOD: A constituent is called MOD, if it cannot be assigned a more specific label, either because it is ambiguous or because there is no more specific label (e.g. for sentence modifiers or for constituents that refer to some sentence external expression). Sometimes it is difficult to determine whether a modifier is definite or not. In cases of doubt, modifiers are marked as ambiguous (MOD) rather than as definite modifiers. Definition of X-MOD: X is a variable that can be replaced by labels for syntactic categories like OA, OPP, MOD, V. X-MOD marks long-distance modification which is unambiguous, e.g. relative clauses (Aber es gäbe (intelligente Lösungen OA), (die kein Geld kosten OA-MOD)). Typical MODs and V-MODs: Generally, modifying subclauses (e.g. Katastrophenstimmung herrscht erst, [wenn 82 nichts mehr zu verheimlichen ist] (MOD).) are MOD because they modify the complete main clause. Modifying particles and adverbs like da, dann, auch, eigentlich, ja, vielleicht, auch, natürlich usually show attachment ambiguity and therefore are annotated as MOD. Only if they unambiguously express the modification of the verb (e.g. Das Buch liegt da. or Er geht auch.) they carry the edge label V-MOD. Pronominal adverbs (PROP) like dabei, dafür, trotzdem, deswegen, hierauf, etc. are either ambiguous (e.g. Dabei (MOD) erscheinen Sie in anderen Verlagen.) or unambiguous [e.g. Er achtet dabei (V-MOD) auf alles.). Non-pronominal adverbs such as vorher, später, etc. in most cases give temporal or local information. Thus, they are rather V-MOD than MOD. 5.3.1 Modifier Attachment in the Initial Field Since only one constituent is allowed in the initial field, all elements preceding and following the head are attached as premodifiers (low attachment) or postmodifiers (high attachment) according to the attachment rules explained in 4.1. SIMPX 513 − − − VF 511 MOD MF 512 ON PX 509 − PX 510 refint − HD − NX 506 HD ADVX 500 HD Auch PRED − NX 501 HD HD 509 LK 507 HD ADVX 502 HD VXFIN 503 HD NX 504 − HD ADJX 505 HD HD für Rumänien ist der Papst−Besuch ADV APPR NE ADV VAFIN ART NN APPR ADJA NN $. −− a asn −− 3sis nsm nsm d dsf dsf −− 0 1 5.3.2 2 selbst NX 508 − 3 4 5 von 6 großer 7 8 Bedeutung 9 . 10 Attachment across Punctuation Marks The punctuation marks : and - and ... separate a syntactic construction within a unit unless there is no syntactic dependency relation between the two parts (cf. 3.4.5) like in the following: SIMPX NX 508 − − VF 509 − HD − LK 505 ON HD NX VXFIN 500 HD NX 506 VC 501 HD ASB 502 503 : 2 504 HD Tag 3 HD ADJX HD ein 1 − NX VPT lädt 0 507 − der 4 offenen 5 Tür 6 7 NN VVFIN PTKVZ $. NN ART ADJA NN nsm 3sis −− −− nsm gsf gsf gsf 83 NX 502 − NX 500 − HD Sein HD ADJX 501 HD Zuhause 0 : 1 stilvolles 2 Entertainment . 3 4 5 PPOSAT NN $. ADJA NN $. nsn nsn −− nsn nsn −− Attachment is necessary if the part following the punctuation mark has a grammatical function within the sentence structure: SIMPX 512 − − − NF 511 OS SIMPX 510 − − − VF 505 LK 506 VF 507 LK 508 ON HD ON HD NX 500 VXFIN 501 HD NX 502 HD Er meinte : 1 − PRED VXFIN 503 HD 0 MF 509 NX 504 HD das − ist 4 HD meine 5 Geschichte 6 ’ 7 . " 2 3 8 9 PPER VVFIN $. $( PDS VAFIN PPOSAT NN $( $. $( 10 nsm3 3sit −− −− nsn 3sis nsf nsf −− −− −− SIMPX 516 − − − − NF 515 V−MOD PX 514 − KONJ − − KONJ PX 512 − KOORD 500 VF 508 LK 509 ON HD NX 501 − Doch Zweifel − NX 503 HD blieben 1 HD NX 511 HD HD 0 − NX 510 VXFIN 502 HD PX 513 HD − 2 sowohl 3 bei 4 Joergensen 5 HD ADVX 504 ADVX 505 HD HD selbst 6 als 7 − auch 8 − NX 506 in 9 HD der 10 NX 507 − Redaktion 11 HD der 12 taz 13 . 14 15 KON NN VVFIN $( KON APPR NE ADV KON ADV APPR ART NN ART NE $. −− npm 3pit −− −− d dsm −− −− −− d dsf dsf gsf gsf −− 5.3.3 Ambiguous Modifiers in Isolated Phrases Since isolated phrases (cf. 3.4.5) do not consist of topological fields, ambiguous modifiers (MOD) have to be attached to the phrase itself. The isolated phrase is projected one level higher and the modifier is attached on this higher level. Thus, the information about ambiguity can be preserved even without topological fields or explicit MOD labelling, just by the existence of yet another projection level of the phrase. The overall attachment strategy has been chosen in order to keep syntactic structure 84 flat and to be able to preserve attachment ambiguity where necessary. In the following examples, so may refer to something that is implicit or has been mentioned before: NX 502 − HD ADVX 500 NX 501 HD HD so Winkler 0 1 ADV NE −− nsf If there is more than one ambiguous modifier in an isolated phrase, all of them are attached on the next higher level. The mother node of this isolated phrase is marked with the node label of the modified phrase. NX 503 − − ADVX 500 ADVX 501 HD HD Zunächst HD NX 502 HD natürlich 0 Durcheinander 1 . 2 3 ADV ADV NN $. −− −− nsn −− NX 504 − − ADVX 500 ADVX 501 HD HD vielleicht HD − mal 0 ADVX 503 HD ein 1 − NX 502 HD Mini−Hit 2 da 3 4 ADV ADV ART NN ADV −− −− nsm nsm −− 85 Chapter 6 The Annotation of Sentences The approach of topological fields supports the flat clustering principle inasmuch MF and NF allow for more than one constituent being attached to the same field node. The field nodes form a level of annotation between the phrase level and the sentence level. The last step to complete a sentence structure is to attach the field nodes to the highest annotation level of the whole structure: the root node. In the following sections, the annotation of sentence structures will be demonstrated. 6.1 Sentence Initial Fields 6.1.1 The C-Field in Verb-Final Clauses The C-field (complementizer field) is the field for subordinating conjunctions KOUS (e.g. daß, wenn, da, weil, ob), KOUI (e.g. um (+zu)), relative pronouns (PRELS), interrogative (PWAV) pronouns and (complex) interrogative or relative phrases. Thus, it only occurs in verb-final clauses, except for comparison clauses with the conjunction als. In case of a conjunction, we directly project to the C-field: SIMPX 508 − − − MF 506 MOD VC 507 ON OV refvc C 500 ADVX 501 − HD Wenn da 0 NX 502 HD was 1 OV 503 HD VXINF 503 VXINF 504 VXFIN 505 HD HD HD gebucht 2 worden 3 ist 4 5 KOUS ADV PIS VVPP VAPP VAFIN −− −− *** −− −− 3sis There are conjunctions in German which consist of two elements (e.g. so daß and als ob). Both of them are also directly attached to the C-field, while none of them carries a head label. 86 SIMPX 510 − − − MF 509 ON V−MOD PRED−MOD PRED PX 508 − HD NX 506 VC 507 − C 500 − NX 501 − so ADVX 502 − daß HD der 0 HD Maschinenpark HD HD ADJX 503 ADJX 504 HD heute 3 für 4 VXFIN 505 HD lukrative 5 Sonderanfertigungen 6 HD unbrauchbar 7 ist 8 . 1 2 KOUS KOUS ART NN ADV APPR ADJA NN ADJD VAFIN 9 $. 10 −− −− nsm nsm −− a apf apf −− 3sis −− Since C generally does not contain more than one constituent, the adverb auch in the following example is not supposed to occur in the C-field together with the conjunction wenn. The wenn-clause is annotated as the modifier of the adverbial phrase auch, i.e., the adverbial phrase subcategorizes for the verb-final clause. ADVX 516 HD − SIMPX 515 − − − MF 514 ON MOD OA NX 512 − HD − NX 509 KONJ ADVX 500 HD auch , 0 − − NX 502 NX 503 NX 504 − HD HD HD wenn man 2 wie 3 ARD 4 und 5 HD DP 510 KONJ C 501 1 V−MOD NX 513 ADJX 505 HD ZDF 6 VC 511 HD relativ 7 viele 8 Teams 9 OV HD ADVX 506 VXINF 507 VXFIN 508 HD HD HD überall 10 postieren 11 kann 12 13 ADV $, KOUS PIS KOKOM NE KON NE ADJD PIDAT NN ADV VVINF VMFIN −− −− −− ns* −− nsf −− nsn −− apn apn −− −− 3sis If the constituent in the C-field is a pronoun or a complex phrase, it is first projected to the phrase level and then projected to the C-field. The edge label below the C-Field denotes the grammatical function of this constituent. SIMPX 508 − − C 505 − MF 506 ON MOD NX 500 ADVX 501 HD HD Wieviel V−MOD PRED ADJX 502 HD monatlich 1 HD ADJX 503 HD da 0 VC 507 HD fällig 2 VXFIN 504 wird 3 4 PWS ADV ADJD ADJD VAFIN nsn −− −− −− 3sis 87 R−SIMPX 510 − − C 508 MF 509 FOPP ON PX 505 MOD MOD NX 506 − HD − deren HD HD ADJX 501 HD 0 VC 507 − NX 500 − zu − HD Ablaß 1 die 2 tonale 3 Ebene 4 ADVX 502 ADVX 503 VXFIN 504 HD HD HD natürlich 5 nicht 6 ausreicht 7 8 APPR PRELAT NN ART ADJA NN ADV PTKNEG VVFIN d gp* dsm nsf nsf nsf −− −− 3sis 6.1.2 The C-Field in Verb-Second Clauses Only comparison clauses with als allow for a C-field and a left sentence bracket in the same clause: SIMPX 506 − − − LK 504 HD C 500 − ON VXFIN 501 HD als PRED NX 502 HD sei 0 MF 505 das 1 NX 503 − HD deren 2 Pflicht 3 4 KOUS VAFIN PDS PDAT NN −− 3sks nsn gsf nsf 6.1.3 The KOORD-Field in all Clause Types The KOORD-field is optionally the left-most field of all clause types (V-1, V-2, V-end). Therefore, it can only occur at the beginning of a syntactic unit (cf. 3.4.3). For verb-second clauses, it can be regarded as an alternative field to the PARORDfield. The KOORD-field contains coordinative particles like und, oder, aber, etc. (cf. Höhle (1986)). Here are two examples of different clause types: 88 − − SIMPX 513 − − − MF 512 V−MOD MOD OPP PX 511 − KOORD 500 − Und VF 507 ON LK 508 HD NX 501 HD VXFIN 502 HD ADVX 503 HD war früher Koring HD NX 509 − ADVX 504 HD VC 510 OV HD ADJX 505 HD in schiefes KON NE VAFIN ADV ADV APPR ADJA NN VVPP −− nsm 3sit −− −− a asn asn −− 0 1 2 einmal VXINF 506 HD 3 4 5 6 Licht 7 geraten 8 SIMPX 507 − − − LK 505 MF 506 HD KOORD 500 VXFIN 501 − HD Oder MOD PRED NX 502 ADVX 503 ADJX 504 HD ist 0 ON HD Bremerhaven 1 HD nicht 2 günstiger 3 ? 4 5 KON VAFIN NE PTKNEG ADJD $. −− 3sis nsn −− −− −− 6.1.4 The PARORD-Field in Verb-Second Clauses PARORD is an alternative field to KOORD for verb-second clauses only. PARORD expressions are denn, weil1 : Typical SIMPX 509 − − − − − VF 508 ON NX 505 − LK 506 HD HD PARORD 500 ADVX 501 VXFIN 502 − HD HD Denn auch 0 die 1 MF 507 OPP gehen 2 PX 503 VC 504 HD VPT davon 3 aus 4 5 KON ADV PDS VVFIN PROP PTKVZ −− −− np* 3pis −− −− 1 weil can occur in verb-second and in verb-final clauses. In the first case, it is in the PARORD-field, in the latter case, it belongs to the C-field. 89 6.1.5 Resumptive Constructions: The LV-Field Resumptive constructions are analysed as suggested by Höhle (1986) and Kathol (1995), by using the field LV (Linksversetzung) which is located to the left of VF. In general, the LV-field is not restricted to one constituent. The typical feature of a resumptive construction is that there is a (pronominal) constituent somewhere in the sentence, on the right hand side of the LV-field, which refers back to the expression within the LV-field. Therefore, we use the X-MOD label to indicate this kind of long-distance dependency. SIMPX 520 − − − − − LV 519 ON−MOD PX 518 KONJ KONJ PX 516 PX 517 − HD − HD NX 510 NX 511 − HD − ADJX 500 ADJX 501 HD Vom Einzelgänger zum 1 2 LK 513 HD MOD MOD MOD MOD OV HD VXFIN 503 ADVX 504 ADVX 505 ADVX 506 ADVX 507 VXINF 508 VXINF 509 HD HD HD HD HD HD HD ja auch HD stilbildenden 3 VF 512 ON NX 502 HD introvertierten 0 HD Popstar 4 , 5 das 6 muß 7 MF 514 8 9 VC 515 erst 10 einmal 11 verkraftet 12 werden 13 14 APPRART ADJA NN APPRART ADJA NN $, PDS VMFIN ADV ADV ADV ADV VVPP VAINF dsm dsm dsm dsm dsm dsm −− nsn 3sis −− −− −− −− −− −− SIMPX 515 − − − − − LV 514 FOPP−MOD SIMPX 513 − − − MF 508 ON KOORD 500 − Doch C 501 NX 502 − HD wie 0 es 1 VC 509 VF 510 OV HD VXINF 503 VXFIN 504 HD HD weitergehen 2 FOPP , 4 HD ON NX 507 HD darüber 5 MF 512 VXFIN 506 HD soll 3 PX 505 LK 511 − herrscht 6 HD kein 7 Konsens 8 . 9 10 KON KOUS PPER VVINF VMFIN $, PROP VVFIN PIAT NN $. −− −− nsn3 −− 3sis −− −− 3sis nsm nsm −− Grammatical functions within a LV-construction are assigned according to the following principle: • The LV-constituent is licensed by some (pronominal) constituent within the core sentence. The core sentence exceeds from VF to NF. Therefore, the licensing constituent is considered to be modified by the constituent within the LV-field. For instance, ON-MOD is licensed by ON like in the first example above, which is also in strong accordance with the assumption that the original position of the subject in verb-second clauses is VF. In constructions with wenn ... dann ..., the wenn-clause, which is semantically a precondition to the dann-clause, is in the LV-field in correlation with dann. Therefore, dann (MOD) refers back to the wenn-clause (MOD-MOD): 90 15 SIMPX 519 − − − − LV 518 MOD−MOD SIMPX 516 − MF 517 − − MF 511 MOD C 500 was 1 VF 513 refmod OV refvc 503 MOD 516 HD VXINF 504 VXFIN 505 ADVX 506 VXFIN 507 HD HD HD HD gebucht worden ist 3 4 , 5 dann 6 PRED NX 508 HD ist 7 PX 515 − HD 2 MOD LK 514 HD VXINF 503 HD da 0 OV NX 502 HD Wenn VC 512 ON ADVX 501 − ON ADVX 509 NX 510 HD das 8 HD HD nicht 9 in 10 Ordnung 11 12 KOUS ADV PIS VVPP VAPP VAFIN $, ADV VAFIN PDS PTKNEG APPR NN −− −− *** −− −− 3sis −− −− 3sis nsn −− d dsf If dann is not present in the matrix clause , the wenn-clause occurs in VF. In this case, the wenn-clause is labelled as MOD because there is no explicit correlating constituent. It rather refers to the whole matrix clause, e.g. (Wenn da was gebucht worden ist (MOD), ist das nicht in Ordnung.) 6.2 Questions 6.2.1 W-Questions In general, w-questions are verb-second clauses with interrogative pronouns in VF. The problem here is to decide on the syntactic category of the interrogative phrase. We follow the strategy to assign PX to all PWAVs, which compositionally comprise a preposition such as wobei, wofür, wogegen, woher, womit, woran, worauf, wovon, wozu and also to causal PWAVs such as warum, wieso, weshalb. The (non-compositional) PWAVs wann, wo are analysed as ADVX. SIMPX 510 − − − VF 507 V−MOD LK 508 HD PX 500 HD VXFIN 501 HD Warum 0 machen − MF 509 1 ON OA MOD MOD NX 502 HD NX 503 ADVX 504 HD ADVX 505 HD nicht einfach wir 2 − den HD 3 Computer 4 5 VC 506 VPT 6 aus 7 ? 8 PWAV VVFIN PPER ART NN PTKNEG ADV PTKVZ $. −− 1pis np*1 asm asm −− −− −− −− 6.2.2 Yes - No Questions Yes - no questions may occur in various forms, but the most typical form is the verb-first clause: 91 SIMPX 505 − − LK MF 503 504 HD ON OA VXFIN NX NX 500 501 HD 502 − Veruntreute HD die 0 HD AWO Spendengeld 1 2 ? 3 4 VVFIN ART NN NN $. 3sit nsf nsf asn −− Otherwise, a question mark at the end of a verb-second or verb-final clause indicates that it is actually meant as a question: SIMPX 508 − − − MF 507 MOD VF 504 ON PRED LK 505 ADJX 506 HD NX 500 HD Das − ADVX 502 ADVX 503 HD HD HD ist 0 HD VXFIN 501 doch 1 ganz 2 klar 3 ? 4 5 PDS VAFIN ADV ADV ADJD $. nsn 3sis −− −− −− −− SIMPX 507 − − − MF 506 ON V−MOD PX 504 − VC 505 HD HD C 500 NX 501 NX 502 − HD HD Ob Ampler 0 auf 1 HD Sieg 2 VXFIN 503 fahre 3 ? 4 5 KOUS NE APPR NN VVFIN $. −− nsm a asm 3sks −− 6.3 Relative Clauses Considering relative clauses (R-SIMPX), the relative pronoun occurs in the C-field. It is first projected to the phrase level before it is attached to the C node. The relative clause itself is located in NF like in the following example if no other constituent follows. Its 92 edge label shows to which constituent of the matrix clause it is related. OA-MOD, for example, suggests that the relative clause refers to OA: SIMPX 516 − − − − − NF 515 OA−MOD MF 513 KOORD 500 VF 507 LK 508 ON HD NX 501 − Aber C 510 HD − ON NX 504 HD gäbe 1 − NX 509 ADJX 503 HD es 0 OA − VXFIN 502 HD R−SIMPX 514 intelligente Lösungen 3 , 4 OA HD − die 5 VC 512 NX 505 HD 2 − MF 511 VXFIN 506 HD kein 6 HD Geld kosten 7 8 . 9 10 KON PPER VVFIN ADJA NN $, PRELS PIAT NN VVFIN $. −− nsn3 3skt apf apf −− np* asn asn 3pis −− If the head noun phrase of the relative clause is the noun phrase of a prepositional phrase or a postmodifier within a complex phrase, the relative clause is labelled as MOD. Additionally, there is a secondary edge label named refint (cf. 3.4.6) from the head noun NX to the relative clause: SIMPX 517 − − − − NF 516 MOD R−SIMPX 515 − MF 512 OPP VF 506 LK 507 ON HD NX 500 − Ein HD 515 Bettenrost 0 − mutiert 1 − zu 2 − NX 503 HD einem 3 NX 510 HD NX 502 HD ON PX 509 refint − HD Gefängnisgitter 4 , 5 hinter 6 − MF 514 V−MOD PX 508 VXFIN 501 HD − C 513 HD ADJX 504 VXFIN 505 HD dem 7 VC 511 HD HD freier 8 Himmel 9 lockt 10 . 11 12 ART NN VVFIN APPR ART NN $, APPR PRELS ADJA NN VVFIN $. nsm nsm 3sis d dsn dsn −− d dsn nsm nsm 3sis −− The position of the relative clause in NF is justified by the fact that it does not necessarily occur as an immediate constituent located to the right of the noun phrase to which it refers. For example, a verb complex can occur between the noun phrase and the relative clause (Der Bettenrost ist zu einem Gefängnisgitter mutiert, hinter dem freier Himmel lockt.). In sentences like this, the complexity of the noun phrase (NP + relative clause) is important. This so called heavyness follows Behaghel’s first physical law (Behaghel 1932): complex noun phrases tend to find a position at the end of the sentence even if they deviate from their basic order. If the relative clause does not follow the noun phrase immediately, its unmarked position is in NF. Unless there is strong evidence for a position in MF, the relative clause is located in NF. 93 If the relative clause and its head noun phrase are adjacent constituents in VF or MF, the relative clause modifies the noun phrase directly as a postmodifier. R−SIMPX 515 − − − MF 514 OA NX 513 HD − R−SIMPX 512 C − − − C MF VC 507 508 ON NX NX 500 − die die 0 HD HD ADVX NX NX VXFIN VXFIN 503 HD AWO , 1 2 511 PRED 502 HD VC 510 ON 501 HD 509 V−MOD wo 3 4 504 505 HD HD er Kreisvorsitzender 506 HD HD ist 5 6 , 7 prüfte 8 9 PRELS ART NN $, PWAV PPER NN VAFIN $, VVFIN nsf asf asf −− −− nsm3 nsm 3sis −− 3sit 6.3.1 Event-modifying Relative Clauses Relative clauses that modify an event which is not expressed by a nominal expression are annotated as SIMPX. SIMPX 518 − − − − NF 517 MOD SIMPX 516 − − MF 514 OA FOPP LK 508 HD V−MOD PX 509 − VXFIN 500 HD HD NX 501 − NX 502 HD HD NX 504 HD VC 513 HD HD NX 505 − , was NN VVPP $, PWS APPR PIDAT NN PIAT NN VVFIN $. −− 3pis asm asm asn asn −− −− nsn d dsf dsf *** asm 3sis −− 6.3.2 7 8 9 10 Szene 11 mehr VXFIN 507 HD HD APPRART 6 jeder − NN 5 mit NX 506 HD ART 4 übertragen − haben 3 Niederdeutsche VXINF 503 HD OA PX 512 VAFIN 2 ins C 511 ON ... 1 Text VC 510 OV −− 0 den − MF 515 12 Sinn 13 macht 14 Independent Relative Clauses Independent relative clauses (also ’nominal relative clauses’, in German ’Freie Relativsätze’) do not modify a head word but substitute an argument or adjunct in the clause. Consequently, they are labelled SIMPX on sentential level (instead of R-SIMPX) and they function as (sentential) subject (ON) or sentential object (OS). The latter is not uncontroversial since they are distributed like non-sentential, nominal arguments with respect to subcategorization restrictions. The relative pronoun used in independent relative clauses normally belongs to the w-class of relative pronouns such as wer or was and is tagged with the STTS tag PWS. 94 . 15 SIMPX 516 − − − VF 515 ON SIMPX 513 − − C 508 ON MOD NX 500 HD ADVX 501 HD Wer MF 509 V−MOD OV ADJX 502 HD VXINF 503 HD VC 510 OV 503 VXINF 504 HD HD VXFIN 506 HD NX 507 − will , kommt VAINF VMFIN $, VVFIN APPR PPOSAT NN $. ns* −− −− −− −− 3sis −− 3sis a ap* ap* −− SIMPX 517 − − 3 4 5 6 − 7 auf 8 seine HD VVPP 2 werden VXFIN 505 HD PX 512 − ADJD 1 unterhalten LK 511 HD HD ADV − gut refvc PWS 0 einfach MF 514 OPP − 9 Kosten 10 . 11 − NF 516 OS SIMPX 515 − − VF 508 MOD LK 509 HD ADVX 500 HD VXFIN 501 HD Manchmal NX 502 HD C 512 OA MF 513 ON VC 514 HD ADJX 503 HD VXINF 504 HD NX 505 HD NX 506 HD VXFIN 507 HD klar sagen , was will . VMFIN PIS ADJD VVINF $, PWS PIS VMFIN $. −− 3sis ns* −− −− −− asn ns* 3sis −− 1 man VC 511 OV ADV 0 muß ON − MF 510 V−MOD 2 3 4 5 man 6 7 8 9 Independent relative clauses introduced by wie are currently annotated in a different manner. Wie is analysed as subordinating conjunction (KOUS). This type of structure is to be revised in a subsequent release. 6.4 Coordination Coordination is a syntactic phenomenon that occurs on the following annotation levels: phrase level, field level, and sentence level. Within coordinations, the conjuncts are first projected to their phrase, field, or clause level. In a second step, they are attached to their mother node which is n-ary branching (conjunctions between the conjuncts). This scheme is the same for all syntactic categories. The edge labels between the mother node and the conjuncts of the coordination are labelled as KONJ. This edge label supports the distinction between conjuncts, modifiers, and conjunctions within complex conjunctions (cf. 6.4.3), as well as the distinction between coordinations and elliptical constructions (cf. 6.5). In contrast to coordinating conjunctions in the KOORD-field, coordinating conjunctions in coordinations (und, oder, etc.) are directly attached to the mother node of the conjuncts. The class of coordinating conjunctions consists of single, e.g. und, oder, aber, als, as well as of complex conjunctions, e.g. entweder oder, weder noch, sowohl als. Generally, coordinating conjunctions may coordinate constituents of any category. Moreover, 95 they can form asymmetric coordinations in which the conjuncts belong to different syntactic categories (cf. 6.4.2).2 In order to distinguish conjunctions from conjuncts within a coordination, their edge labels are empty. In the following, coordination on all annotation levels as well as specific cases of coordination, e.g. split coordinations, will be demonstrated. 6.4.1 Coordination of Phrases Noun Phrases NX 506 KONJ − KONJ NX 504 NX 505 HD − NX 500 HD NX 501 HD − Ende NX 502 HD der 0 − NX 503 HD Kämpfe 1 und 2 − Verurteilung HD der 3 4 Selbstmandatierung 5 6 NN ART NN KON NN ART NN asn gpm gpm −− asf gsf gsf Prepositional Phrases PX 504 KONJ − KONJ PX 502 PX 503 − HD − HD NX 500 NX 501 HD am − Arbeitsplatz 0 oder 1 in 2 HD der 3 Familie 4 5 APPRART NN KON APPR ART NN dsm dsm −− d dsf dsf Adjectival Phrases NX 503 − HD ADJX 502 KONJ − KONJ ADJX 500 ADJX 501 HD HD Heimliche und 0 illegale 1 Pioniertat 2 3 ADJA KON ADJA NN nsf −− nsf nsf 2 If bis is used as a conjunction like in 10.000 bis (KON) 20.000 koreanischen Daewoo PKW it is tagged as KON. But remember that von ... bis ... phrases are treated differently (cf. 4.4.1). 96 SIMPX 510 − − − MF 509 PRED VF 506 LK 507 ON HD NX 500 ADJX 508 VXFIN 501 HD HD Das KONJ KONJ KONJ KONJ ADJX 502 ADJX 503 ADJX 504 ADJX 505 HD klingt 0 HD anmaßend 1 , 2 HD pathosschwer 3 , 4 HD elektrisch 5 , 6 laut . 7 8 9 PDS VVFIN ADJD $, ADJD $, ADJD $, ADJD $. nsn 3sis −− −− −− −− −− −− −− −− Adverbial Phrases ADVX 502 KONJ − KONJ ADVX 500 ADVX 501 HD HD solo oder 0 zusammen 1 2 ADV KON ADV −− −− −− 6.4.2 Asymmetric Coordination Since constituents of different syntactic categories can be coordinated, it has to be decided on a label for the mother node of the coordination. In this case, the default strategy has been adopted to choose the syntactic category of the left-most conjunct as the category of the entire coordination: ADVX 504 KONJ KONJ ADVX 500 HD heute , 0 KONJ − KONJ NX 501 NX 502 NX 503 HD HD HD So. 1 , 2 Mo. 3 , 4 5 u. Di. 6 7 ADV $, NN $, NN $, KON NN −− −− nsm −− nsm −− −− nsm 97 SIMPX 519 − − − MF 518 PRED ADJX 517 KONJ KONJ KONJ PX 515 NX 516 KONJ VF 509 LK 510 ON HD NX 500 − KONJ VXFIN 501 HD Die ADJX 511 Farbpalette 0 − KONJ und 3 geschmackvoll 4 , 5 von − HD PX 513 HD HD zart 2 − ADJX 503 HD ist 1 PX 512 ADJX 502 HD KONJ − NX 514 HD KONJ NX 504 NX 505 HD HD Bordeaux bis 8 ADVX 506 HD Flieder 9 , 10 auch 11 − KONJ NX 507 NX 508 HD HD Orange 12 oder 13 Rosa 14 . 6 7 ART NN VAFIN ADJD KON ADJD $, APPR NN APPR NN $, ADV NN KON NN 15 $. nsf nsf 3sis −− −− −− −− d dsn a asn −− −− nsn −− nsn −− SIMPX 512 − − − − MF 511 PRED NX 510 − VF 505 LK 506 ON HD NX 500 HD KONJ − VXFIN 501 PX 508 HD VC 509 − HD weder 1 − ausgesprochene 2 OV NX 503 HD bin 0 KONJ ADJX 502 HD Ich − NX 507 Meat−Loaf−FanIn 3 noch 4 in 5 dem 6 VXINF 504 HD HD Konzert 7 gewesen 8 . 9 10 PPER VAFIN KON ADJA NN KON APPR ART NN VAPP $. ns*1 1sis −− nsf nsf −− d dsn dsn −− −− 6.4.3 Coordinations with Complex Conjunctions The conjuncts and conjunctions of a coordination with complex conjunctions are also attached on the same level following the above mentioned rules for coordination. Both parts of complex conjunctions like entweder oder and sowohl als are tagged as KON. The latter one usually occurs together with the adverb auch, which is tagged as ADV, projected to the phrase level, and then attached to the mother node of the coordination. The same applies for nicht in coordinations with sondern. Sondern is tagged as KON, whereas nicht is always tagged as PTKNEG: 98 16 SIMPX 516 − − − FKOORD 515 KONJ − KONJ MF 514 V−MOD PRED PX 513 − VF 509 LK 510 MOD HD ADVX 500 VXFIN 501 HD HD Immerhin MF 511 wird NX 512 ON MOD MOD MOD PRED NX 502 ADVX 503 ADVX 504 ADVX 505 ADJX 506 HD HD HD HD 0 HD es 1 nicht 2 noch 3 − ADJX 507 HD obendrein 5 ADJX 508 HD kalt 4 HD , sondern bei 8 HD 20 9 Grad 10 erträglich . 6 7 ADV VAFIN PPER PTKNEG ADV ADV ADJD $, KON APPR CARD NN 11 ADJD 12 $. 13 −− 3sis nsn3 −− −− −− −− −− −− d −− dpn −− −− SIMPX 517 − − − MF 516 V−MOD VF 514 OA ADJX 515 ON − KONJ − − KONJ NX 512 PX 513 HD − − PX 508 − HD NX 500 − Papst−Besuch in 1 HD VXFIN 502 HD 0 NX 510 HD NX 501 HD Der HD LK 509 ADJX 503 HD Bukarest 2 ADVX 504 HD spielt 3 sowohl 4 NX 505 HD außenpolitisch als 5 6 auch für 8 − − ADVX 506 HD 7 NX 511 − HD Rumänien 9 HD ADJX 507 HD selbst 10 eine 11 große 12 Rolle . 13 14 15 ART NN APPR NE VVFIN KON ADJD KON ADV APPR NE ADV ART ADJA NN $. nsm nsm d dsn 3sis −− −− −− −− a asn −− asf asf asf −− SIMPX 525 − KONJ − KONJ SIMPX 524 − − − − NF 523 OS MF 520 ON MOD LK 511 HD − NX 501 HD HD haben 0 − ADVX 512 VXFIN 500 Entweder SIMPX 521 die HD OV ADVX 502 VXINF 503 HD HD überhaupt 2 nicht 3 C 514 begriffen 4 MF 515 OPP , 5 ON PX 504 NX 505 HD HD worum 6 − − VC 516 VF 517 HD ON VXFIN 506 HD geht 8 , 9 oder 10 VXFIN 508 HD es 11 − LK 518 HD NX 507 HD es 7 SIMPX 522 − MF 519 OD PRED NX 509 ADJX 510 HD ist HD ihnen egal 14 . 12 13 KON VAFIN PDS ADV PTKNEG VVPP $, PWAV PPER VVFIN $, KON PPER VAFIN PPER ADJD $. −− 3pis np* −− −− −− −− −− nsn3 3sis −− −− nsn3 3sis dp*3 −− −− 6.4.4 1 VC 513 − 15 Coordinations with Truncated Words Truncated words are projected to the phrase level. Their edge labels are empty. The phrases of both conjuncts are coordinated. The truncated words do not receive morphological annotation. 99 16 NX 500 KONJ − KONJ NX NX 501 502 − HD Bau− und 0 Verkehrsplanungen 1 2 TRUNC KON NN −− −− dpf PX 503 − HD NX 502 KONJ − KONJ NX 500 NX 501 − bei − einer 0 − SPD− 1 oder 2 HD einer CDU−Veranstaltung 3 4 5 APPR ART TRUNC KON ART NN d dsf −− −− dsf dsf NX 502 KONJ − KONJ NX 501 − NX HD ADVX 503 500 − HD Noch−Frauen− und 0 bald 1 Fußballsender 2 3 TRUNC KON ADV NN −− −− −− nsm In the case of complex conjunctions, the conjuncts are annotated in the same way. ADJX 503 − KONJ − − ADJX ADVX 500 ADJX 501 − sowohl KONJ 502 HD kultur− 0 als 1 HD auch 2 stadtentwicklungspolitisch¦ 3 4 KON TRUNC KON ADV ADJD −− −− −− −− −− NX 503 − KONJ ADVX 500 − KONJ NX 501 HD − nicht die 0 NX 502 − − Sozial− 1 , 2 sondern 3 HD die 4 Bildungsbehörde 5 6 PTKNEG ART TRUNC $, KON ART NN −− nsf −− −− −− nsf nsf 100 Word initial TRUNCs are different from truncated words which include the second part of a word. The latter ones are treated like complete lexical heads, because they comprise the head morpheme of the complex word. NX 506 HD − PX 505 − HD NX NX 503 KONJ 504 − KONJ − NX ADJX NX 500 501 HD 502 HD Originaltitel und 0 HD HD −fassung " 2 8 MM 5 " 6 . 3 4 7 8 KON NN APPR $( CARD NN $( $. nsm −− nsf d −− −− dsm −− −− 6.4.5 1 von NN Attachment Principles of Coordination within Phrases If two or more nominal conjuncts occur together with a common determiner and/or adjectival phrase, first the conjuncts are projected to their phrase level and then the determiner or the adjectival phrase is attached to the coordination on a higher level according to the high attachment principle. Thus, the modification scope comprises the entire coordination. The coordinated part is assigned the head function. NX 503 − HD NX 502 KONJ den − KONJ NX 500 NX 501 HD HD Angestellten und 0 1 Beamten 2 3 ART NN KON NN dp* dp* −− dpm NX 504 − − HD NX 503 KONJ ADJX NX 501 HD 502 HD türkischen 0 KONJ NX 500 Die − HD Instrumente 1 und 2 Harmonien 3 4 ART ADJA NN KON NN np* np* npn −− npf 101 6.4.6 Coordination of Topological Fields The conjuncts of a coordination of topological fields are either single fields (cf. 6.4.4) or a combination of fields. Possible combinations are, for instance, (MF + VC), (LK + MF), (LK + MF + VC). The node label for these conjuncts is FKONJ (conjunct consisting of fields) and the mother node of a coordination of conjuncts of fields is FKOORD. In a coordination of conjuncts of fields, the following annotation steps are involved: 1. The constituents are attached to the fields in which they occur in (MF, VC, NF, etc.). 2. Each conjunct (concatenation of fields or single field) is labelled as FKONJ. 3. The conjuncts are attached to the general coordination field FKOORD. SIMPX 526 − − FKOORD 525 KONJ − KONJ FKONJ 523 FKONJ 524 − − − − − MF 521 NF 522 OPP OA−MOD PX 518 MF 519 − VF 510 HD LK 511 ON NX 512 HD NX 500 − HD glauben 0 HD ADJX 502 HD Wir LK 513 − VXFIN 501 HD V−MOD an 1 die 2 totale 3 Gegenwart 4 und 5 − ADVX 514 HD KONJ ADVX 504 ADVX 505 HD HD HD tun − hier 7 KONJ und 8 − C 515 VXFIN 503 6 R−SIMPX 520 OA jetzt 9 − MF 516 OA ON NX 506 NX 507 NX 508 HD HD HD alles 10 , 11 was 12 VC 517 HD VXFIN 509 HD wir können . 13 14 PPER VVFIN APPR ART ADJA NN KON VVFIN ADV KON ADV PIS $, PRELS PPER VMFIN $. np*1 1pis a asf asf asf −− 1pis −− −− −− asn −− asn np*1 1pis −− Often, the subject of the sentence occurs only in the left field conjunct: SIMPX 514 − − FKOORD 513 KONJ − KONJ FKONJ 511 − VF 506 " MF 508 ON MOD HD VXFIN 501 HD HD Immer − LK 507 ADVX 500 0 FKONJ 512 OD VXFIN 503 HD und 3 HD stiehlt 4 OA NX 504 HD einer 2 − MF 510 HD NX 502 kommt 1 − LK 509 mir 5 NX 505 − HD meine 6 Krise 7 " 8 9 . 10 $( ADV VVFIN PIS KON VVFIN PPER PPOSAT NN $( $. −− −− 3sis nsm −− 3sis ds*1 asf asf −− −− A coordination of fields may also be an embedded structure. In this case, FKOORD functions also as conjunct label: 102 15 16 SIMPX 521 − − FKOORD 520 KONJ KONJ FKONJ 519 − − FKOORD 518 KONJ FKONJ 515 − − VF 508 LK 509 ON HD NX 500 − Die Älteren 1 − VXFIN 503 HD teurer , 3 HD OA ADVX 505 HD familiäre 5 VC 514 MOD HD haben 4 − MF 513 ADJX 504 HD 2 − NX 512 HD ADJX 502 sind 0 LK 511 PRED HD KONJ FKONJ 517 OA MF 510 VXFIN 501 HD − MF 516 Verpflichtungen 6 und 7 NX 506 − oft 8 OV ein 9 VXINF 507 HD HD Haus 10 abzuzahlen 11 . 12 13 ART NN VAFIN ADJD $, VAFIN ADJA NN KON ADV ART NN VVIZU $. np* np* 3pis −− −− 3pis apf apf −− −− asn asn −− −− 6.4.7 Attachment of Ambiguous Modifiers in Coordination Within phrases, the modification scope of a premodifier can be ambiguous. Therefore, high attachment is applied to preserve ambiguity. In the following example, the adverb modifies the coordination of adjectives rather than only the first adjective: ADJX 504 − HD ADJX 503 KONJ ADVX 500 HD − KONJ ADJX 501 ADJX 502 HD Viel HD größer 0 und 1 brutaler 2 3 ADV ADJD KON ADJD −− −− −− −− Modifying constituents are attached to a conjunct rather than to a field if their modification scope is limited to the conjunct. 103 SIMPX 512 − − − MF 511 OPP PX 510 KONJ VF 506 LK 507 ON HD − VXFIN 501 ADVX 502 HD HD NX 500 HD Wir KONJ PX 508 glauben 0 − HD − NX 503 an 2 HD die 3 − HD ADVX 504 − nicht 1 PX 509 − NX 505 HD Vergangenheit 4 und 5 − nicht 6 an 7 HD die 8 Zukunft 9 . 10 11 PPER VVFIN PTKNEG APPR ART NN KON PTKNEG APPR ART NN $. np*1 1pis −− a asf asf −− −− a asf asf −− Also in coordinations with complex conjunctions, attachment on the phrase level is applied if possible. SIMPX 513 − − − − VF 512 ON NX VC 510 511 APP APP OV NX LK 506 HD NX NX 500 OA VXFIN NX 502 − Sprecher 0 , 1 − Axel 2 Wallrabenstein , 4 KONJ KONJ VXINF 504 HD die 6 − VXINF − wollte 5 509 − 503 HD 3 VXINF 508 HD 501 HD Radunskis MF 507 − 505 HD Entscheidung 7 weder 8 HD bestätigen 9 noch 10 dementieren 11 . 12 13 NE NN $, NE NE $, VMFIN ART NN KON VVINF KON VVINF $. gsm nsm −− nsm nsm −− 3sit asf asf −− −− −− −− −− SIMPX 514 − − − − MF 513 OA NX 512 − VF 508 ON Die KONJ − − KONJ LK 509 HD NX 500 − − VXFIN 501 HD HD ADVX 503 HD nicht etwa NX 504 − , sondern ADV PPOSAT NN $, KON ADV ART ADJA NN VVPP $. nsf nsf 3sit −− −− asf asf −− −− −− asn asn asn −− −− 4 5 6 7 8 gleich 9 das 10 ganze VXINF 507 HD PTKNEG 3 Lektüre ADJX 506 HD VAFIN 2 unsere ADVX 505 HD HD VC 511 OV HD NE 1 hatte ADVX 502 HD − ART 0 Lufthansa NX 510 − 11 Flugzeug 12 rationiert If there is more than one constituent within a conjunct, each with its own grammatical function, these constituents are first attached to the respective field node. Then, the fields are coordinated: 104 13 . 14 SIMPX 516 − − − FKOORD 515 KONJ − KONJ MF 514 V−MOD PRED PX 513 − VF 509 LK 510 MOD HD ADVX 500 VXFIN 501 HD HD Immerhin NX 512 ON MOD MOD MOD PRED NX 502 ADVX 503 ADVX 504 ADVX 505 ADJX 506 HD HD HD HD wird 0 HD MF 511 es nicht 2 noch 3 HD obendrein 5 ADJX 508 HD kalt 4 HD ADJX 507 , sondern bei 8 HD 20 9 Grad 10 erträglich 11 . 6 7 ADV VAFIN PPER PTKNEG ADV ADV ADJD $, KON APPR CARD NN ADJD $. −− 3sis nsn3 −− −− −− −− −− −− d −− dpn −− −− 6.4.8 1 − 12 13 Coordination of Sentences In accordance with the longest match principle, complete sentences are coordinated as paratactic constructions when they belong to the same syntactic unit (cf. 3.4.3), i.e., they are coordinated by a conjunction, a comma, or a dash: SIMPX 522 KONJ − KONJ SIMPX 521 − − − MF 520 OPP PX 519 − HD VF NX 516 LK 509 − − − − − NX VF LK MF VC 510 APP APP NX Storchbein 1 − setzt 2 auf 3 504 HD die 4 HD ADJX 503 HD 512 − NX 502 − Henning 0 − VXFIN 501 − 511 HD NX 500 Nashorn−Bürgermeister 518 HD NX HD SIMPX 517 ON Sammlung der 6 Kräfte 8 und 9 515 MOD OV ADJX VXFIN ADVX VXINF 506 HD positiven 7 514 HD 505 HD 5 513 V−MOD HD prompt 10 wird 11 507 HD da 12 508 HD gepöbelt 13 . 14 15 NN NE NE VVFIN APPR ART NN ART ADJA NN KON ADJD VAFIN ADV VVPP $. nsm nsm nsm 3sis a asf asf gpf gpf gpf −− −− 3sis −− −− −− 105 SIMPX 520 KONJ KONJ SIMPX 518 − SIMPX 519 − − VF 514 MOD MF 515 ON ADVX 508 − ADVX 500 HD ADVX 501 HD aber PX 510 NX 503 HD blieb alles HD − NX 504 HD LK 512 HD HD ADVX 505 HD NX 513 − − VXFIN 506 HD ADJX 507 HD − nur der Innensenator ist ein neuer ADV VVFIN PIS APPRART NN $( ADV ART NN VAFIN ART ADJA $. −− −− 3sit nsn dsn dsn −− −− nsm nsm 3sis nsm nsm −− 1 2 3 alten − ADV 0 beim − MF 517 PRED NX 511 − VXFIN 502 HD − VF 516 ON OPP LK 509 HD HD So − 4 5 6 7 8 9 10 11 . 12 13 A coordination may also consist of two sentences with the subject of the whole construction only occurring in the left conjunct of the coordination. SIMPX 520 KONJ − KONJ SIMPX 519 − − − VF 517 SIMPX 518 OS − − SIMPX 515 − MF 516 − − MF 509 OA C 500 NX 501 − HD Ohne LK 511 MF 512 LK 513 ON HD OV HD HD VXINF 502 VXINF 503 VXFIN 504 HD sie 0 OA VC 510 HD gesehen 1 − zu 2 NX 505 HD haben 3 ? 4 , 5 − kontert der 6 7 PX 514 − VXFIN 506 HD HD Popstar 8 und 9 OPP − wirft ADVX 508 HD einen 10 HD NX 507 11 HD Blick 12 nach 13 rechts 14 . 15 16 KOUS PPER VVPP PTKZU VAINF $. $, VVFIN ART NN KON VVFIN ART NN APPR ADV $. −− ap*3 −− −− −− −− −− 3sis nsm nsm −− 3sis asm asm d −− −− Subclauses (either in VF or in NF) with or even without a conjunction can also be coordinated. SIMPX 525 − − − MF 524 MOD MOD ON SIMPX 523 KONJ − KONJ SIMPX 521 − VF 514 − LK 515 PRED SIMPX 522 − MF 516 HD ON OV refvc ADJX 500 HD Unklar VXFIN 501 ADVX 502 ADVX 503 HD HD HD ist 0 aber 1 C 504 − noch 2 , 3 NX 505 − wie HD diese − VC 517 Leistung 6 OV 506 HD VXINF 506 VXINF 507 VXFIN 508 HD HD HD beurteilt 7 werden 8 − C 518 ON kann − 10 und 11 OPP NX 509 PX 510 HD HD wer 12 PRED ADJX 511 HD dafür 13 VC 520 zuständig 14 OV HD VXINF 512 VXFIN 513 HD HD sein 15 soll 16 . 4 5 ADJD VAFIN ADV ADV $, KOUS PDAT NN VVPP VAINF VMFIN $( KON PWS PROP ADJD VAINF VMFIN $. −− 3sis −− −− −− −− nsf nsf −− −− 3sis −− −− ns* −− −− −− 3sis −− 106 9 − MF 519 17 18 6.4.9 Paratactic Constructions with denn and weil Paratactic constructions consisting of verb-second clauses conjoined by the conjunctions denn and weil, which also occur in the PARORD-field in the beginning of a sentence, are treated as equal conjuncts (verb-second instead of verb-final in weil-clause). In order to distinguish coordination of sentences with conjunct of the PARORD field from the above mentioned coordinations of sentences, these paratactic constructions are labelled as P-SIMPX instead of SIMPX. P−SIMPX 519 KONJ − KONJ SIMPX SIMPX 517 − 518 − − − − − VF MF 515 516 ON OA NX LK 508 KONJ NX 500 HD LK 511 NX 512 HD OV ON HD NX VXFIN VXINF NX VXFIN 502 HD und VF 510 KONJ 501 0 VC 509 − Kilos − 503 HD Fitneß 1 504 HD sollen 3 , 4 denn 5 514 HD OV NX VXINF 506 HD Wesemann 6 − 505 HD stimmen 2 VC 513 − will 7 die 8 − Tour 9 de 10 507 − HD France 11 gewinnen 12 . 13 14 NN KON NN VMFIN VVINF $, KON NE VMFIN ART NE NE NE VVINF $. npn −− nsf 3pis −− −− −− nsm 3sis asf asf asf asf −− −− 6.4.10 Conjunctions Occurring with Isolated Phrases If a conjunct occurs isolated with a conjunction, high attachment is applied like in complete coordinations. But for isolated conjuncts, the conjunct is annotated as the head of the construction (HD instead of KONJ). ADVX 501 − HD ADVX 500 HD und jetzt 0 1 KON ADV −− −− ADVX 502 − Oder − HD ADVX 500 ADVX 501 HD HD eben 0 nicht 1 . 2 3 KON ADV PTKNEG $. −− −− −− −− 107 If there are modifiers which do not modify the conjunct itself because they are ambiguous or might modify something else rather than the conjunct, they are attached on the same (high) level as the conjunction: NX 505 − HD − − − PX 504 − NX 500 HD Und das 0 ADVX 501 ADVX 502 HD HD auch 1 HD NX 503 HD noch 2 ohne 3 Mehrvergütung 4 . 5 6 KON PDS ADV ADV APPR NN $. −− nsn −− −− a asf −− NX 506 − − − HD NX 505 HD − PX 504 − PX 500 ADVX 501 HD und NX 502 HD damit 0 − auch 1 HD die 2 NX 503 HD HD Nervosität 3 im 4 Nato−Hauptquartier 5 . 6 7 KON PROP ADV ART NN APPRART NN $. −− −− −− nsf nsf dsn dsn −− 6.4.11 Split Coordinations Closely related to isolated conjuncts are split coordinations. Generally, the left conjunct of a split coordination is located in MF, in rare cases in VF, and the right conjunct occurs in NF. In order to express the relation between them, the left conjunct carries the label of its grammatical function (ON, OA, OD, etc.) whereas the right conjunct carries a label that denotes that it is the conjunct of this grammatical function (e.g. ONK, OAK, ODK, etc.). In asymmetric coordination, the syntactic category of the second split conjunct determines the syntactic category one level higher up: 108 SIMPX 513 − − − − MF 511 NF 512 OA VF 507 LK 508 ON HD NX 500 − Jedes PX 509 HD Ja−Wort 0 OAK NX 510 − VXFIN 501 HD OPP zieht 1 HD KONJ KONJ KONJ NX 502 NX 503 NX 504 NX 505 NX 506 HD HD HD HD HD Applaus nach 2 3 sich , 4 5 Unterschriften 6 , 7 Küsse 8 , 9 Händeschütteln . 10 11 12 PIDAT NN VVFIN NN APPR PRF $, NN $, NN $, NN $. nsn nsn 3sis asm d ds*3 −− apf −− apm −− asn −− SIMPX 514 − − − − − MF NF 512 513 ON VF LK 507 − ADJX VXFIN ADVX 501 hat 0 nicht 1 NX 503 − HD die 4 KONJ NX 505 − Jöns 3 − VXINF 504 − Karin 2 511 OV NX HD NX 510 KONJ 502 HD Selbstverständlich VC 509 HD HD ONK NX 508 MOD 500 OA Flächen 5 506 HD − gebucht , 6 7 sondern 8 HD die 9 SPD 10 . 11 12 ADJD VAFIN PTKNEG NE NE ART NN VVPP $, KON ART NE $. −− 3sis −− nsf nsf apf apf −− −− −− nsf nsf −− SIMPX 511 − − − − LK 509 NF 510 HD VF 505 ONK VXFIN 506 KONJ NX 500 VXFIN 501 VXFIN 502 HD HD HD Lausbuben − MF 507 ON sind 0 KONJ und 1 PRED KONJ NX 503 ADJX 504 HD bleiben 2 ADJX 508 − HD sie 3 und 4 unwiderstehlich 5 . 6 7 NN VAFIN KON VVFIN PPER KON ADJD $. npm 3pis −− 3pis np*3 −− −− −− 6.5 Elliptical Constructions In elliptical constructions, syntactically necessary linguistic elements are missing which can be reconstructed from the context or the speech situation. Elliptical constructions appear on the phrase level as well as on the sentence level. The model of topological fields does not make any assumptions about dependency relations, but it allows that topological fields may be left empty. For the description of elliptical sentence constructions, the scheme of topological fields is an appropriate model because neither crossing branches nor traces have to be used to annotate the surface structure of a sentence. 109 In elliptical phrases, the head word is missing. They are annotated like phrases without a head. Therefore, the edge labels of an elliptical phrase are empty: SIMPX 510 − − − VF 508 MF 509 ON OD NX 504 − LK 505 − HD − irischen Hinweisschilder 2 den 3 HD ADVX 503 HD seien 1 − ADJX 502 HD 0 ADJX 507 − VXFIN 501 HD die NX 506 HD ADJX 500 PRED HD walisischen 4 ziemlich ähnlich 5 6 7 ART ADJA NN VAFIN ART ADJA ADV ADJD npn npn npn 3pks dpn dpn −− −− KONJ − PX 503 KONJ PX 502 − HD PX NX 500 501 − HD in und 0 um 1 Berlin 2 3 APPR KON APPR NE d −− a asn PX 505 − HD NX 504 HD − NX 503 − − ADJX 500 HD vom 15. ADJX 501 HD NX 502 HD 4. 99 APPRART ADJA ADJA CARD dsm dsm dsm −− 0 1 2 3 SIMPX 521 − − − FKOORD 520 KONJ KONJ MF 517 V−MOD VF 509 LK 510 ON HD NX 500 − Die FDP 0 in 2 , 6 in 7 − NX 507 und 10 in 11 HD Brandenburg 12 HD ADJX 508 HD 3.300 9 NX 516 HD HD Sachsen 8 − ADJX 506 HD OA PX 515 − NX 505 4.000 5 MF 519 V−MOD NX 514 HD HD knapp 4 − ADJX 504 HD Thüringen 3 KONJ OA PX 513 − ADVX 503 HD hat 1 − NX 502 HD V−MOD NX 512 HD VXFIN 501 HD OA PX 511 − − MF 518 2.000 13 Mitglieder 14 15 ART NE VAFIN APPR NE ADV CARD $, APPR NE CARD KON APPR NE CARD NN nsf nsf 3sis d dsn −− −− −− d dsn −− −− d dsn −− apn 110 SIMPX 517 − − FKOORD 516 − KONJ KONJ FKONJ 514 FKONJ 515 − ON C 500 − Ob NX 501 − − MF 510 MOD HD − VC 511 OA ADVX 502 HD − nun weniger OV NX 503 VXINF 504 HD HD V−MOD VXINF 505 HD ADJX 506 HD ADV PIAT NN VVINF VVINF KON ADJD NN VVINF VMFIN −− nsm nsm −− *** apf −− −− −− −− apf −− 3sis 5 6 7 8 Parkgebühren HD VXFIN 509 HD NN 4 flächendeckend OV VXINF 508 HD Senat 3 oder NX 507 HD der 2 lassen OA ART 1 reinigen OV refvc 504 VC 513 KOUS 0 Straßen − MF 512 9 10 kassieren 11 will 12 In elliptical sentence constructions, specific topological fields are not occupied. All constituents are attached to the appropriate field. In the first example, LK in the second conjunct is missing. In the second example, the subject is in NF and the main clause is lacking a verbal constituent: SIMPX 512 − − LK 506 ON VXFIN 501 HD HD Fall 0 − VF 508 MF 509 PRED ON PRED ADJX 502 NX 503 ADJX 504 HD ist 1 − MF 507 HD NX 500 Der KONJ SIMPX 511 − VF 505 − KONJ SIMPX 510 − brisant 2 , 3 HD die HD Mischung 4 5 explosiv 6 . 7 8 ART NN VAFIN ADJD $, ART NN ADJD $. nsm nsm 3sis −− −− nsf nsf −− −− SIMPX 513 − − NF 512 ON SIMPX 511 − − − MF 510 OA ON MF 507 PRED APP ADJX 500 HD Fein , 0 MOD NX 508 C 501 NX 502 − HD daß sich APP NX 503 − HD NX 504 HD die 3 VC 509 HD Achse 4 ADJX 505 HD Hamburg−Köln 5 VXFIN 506 HD langsam 6 festigt 2 ADJD $, KOUS PRF ART NN NE ADJD VVFIN $. −− −− −− as*3 nsf nsf nsn −− 3sis −− 111 7 . 1 8 9 Chapter 7 The Annotation of Specific Syntactic Phenomena 7.1 Superlative and Comparative Forms 7.1.1 Superlative Forms The particle am, which occurs as a particle with an adjective or an adverb in superlative constructions, is tagged as PTKA. Both, the particle and the adjective/adverb are attached on the same level forming an adverbial/adjectival phrase: SIMPX 512 − − − VF MF 510 511 OA ON NX FOPP PX 507 − HD NX 501 HD HD vorigen 0 Sonntag 1 ADJX 502 − hätte 2 509 − VXFIN 500 VC 508 HD ADJX Den MOD LK 506 − − − Frank 3 503 − Michael 4 − Nehr 5 OV NX VXINF 504 HD am 6 HD − liebsten 7 aus 8 dem 9 505 HD HD Kalender 10 gestrichen 11 . 12 13 ART ADJA NN VAFIN NE NE NE PTKA ADJD APPR ART NN VVPP $. asm asm asm 3skt nsm nsm nsm −− −− d dsm dsm −− −− 7.1.2 The Comparative Particles wie and als Comparative particles in German are als and wie, in rare cases also denn (e.g. Die werden dort seliger schlummern denn je.). These particles are tagged as KOKOM and occur with all types of syntactic phrases (NX, ADVX, PX, etc.). They are directly attached to an adjacent comparative phrase. In case of a comparative phrase with a postmodifier, they are directly attached to the highest node of the complex phrase. A comparative phrase can occur as an adjacent postmodifier of the head phrase: 112 SIMPX 516 − − − VF 515 MOD SIMPX 513 MF 514 − − ON PRED MF 511 ADJX 512 V−MOD HD ADJX 506 − VC 507 HD ADJX 500 HD HD VXINF 501 VXFIN 502 HD HD HD Rein musikalisch LK 508 gesehen 0 1 NX 510 − NX 503 − ist 2 − ADJX 509 HD das 3 HD − − − ADJX 504 ADJX 505 HD Album HD wesentlich 4 schlanker 5 6 als 7 das 8 erste 9 . 10 11 ADJD ADJD VVPP VAFIN ART NN ADJD ADJD KOKOM ART ADJA $. −− −− −− 3sis nsn nsn −− −− −− nsn nsn −− SIMPX 510 − − − MF 509 ON OA NX 508 HD − NX 505 − C 500 NX 501 − HD daß HD − als HD HD ADJX 503 HD 1 VC 507 − ADJX 502 sie 0 NX 506 − VXFIN 504 HD ehrenamtliche 2 Vorsitzende ein 3 4 HD dienstliches 5 Handy 6 hat 7 8 KOUS PPER KOKOM ADJA NN ART ADJA NN VAFIN −− nsf3 −− nsf nsf asn asn asn 3sis If there is a long-distance dependency between the comparative phrase and the head phrase, the dependency relation is denoted with the respective X-MOD label. R−SIMPX 511 − − − − MF 510 OA C 505 ON − NX 500 HD V−MOD NX 506 PX 507 HD − HD ADVX 501 mehr 1 nach 2 NX 504 HD Bremerhaven 3 OA−MOD VXFIN 503 HD fünfmal 0 NF 509 HD NX 502 HD der VC 508 − liefert 4 HD als 5 Daewoo 6 7 PRELS ADV PIS APPR NE VVFIN KOKOM NE nsm −− *** d dsn 3sis −− ns* In case of a long-distance dependency between the comparative phrase and the main verb (cf. 4.7.9), the comparative phrase is either a complement (e.g. PRED) or an ambiguous or unambiguous modifier of the main verb (MOD or V-MOD). 113 SIMPX 510 − − − VF 509 V−MOD PX 508 − HD NX LK 505 APP HD ON PRED NX NX VXFIN NX NX 501 − HD dem 502 HD Motto 0 507 APP 500 Unter MF 506 " 1 2 503 HD Kino−Extrem 3 " 4 504 − agiert 5 HD der 6 − Regisseur 7 HD als 8 Filmjockey 9 10 APPR ART NN $( NN $( VVFIN ART NN KOKOM NN d dsn dsn −− dsn −− 3sis nsm nsm −− nsm SIMPX 509 − − − VF 508 MOD PX 507 − − HD NX 504 − LK 505 − HD HD ADJX 500 VXFIN 501 HD Wie in 0 den 1 MF 506 HD meisten 2 Musicals 3 PRED NX 502 ADJX 503 − ist 4 ON HD die 5 HD Handlung 6 simpel 7 8 KOKOM APPR ART PIDAT NN VAFIN ART NN ADJD −− d dpn dpn dpn 3sis nsf nsf −− SIMPX 516 − − − − MF 515 ON MOD VF 512 V−MOD SIMPX 513 − PX 507 − LK 508 HD HD NX 500 HD NX 509 − − VXFIN 501 HD − VC 510 HD HD ADJX 502 HD C 503 − VXINF 504 HD KONJ VXINF 505 HD VXINF 506 HD Arsten die neue , wie , nachgearbeitet NE VAFIN ART ADJA NN $, KOUS VVPP $, VVPP KON VVPP $. d dsn 3sis nsf nsf nsf −− −− −− −− −− −− −− −− 2 3 4 5 6 7 geplant VXINF 511 − In 1 Strecke KONJ APPR 0 wird VC 514 OV 8 9 10 und 11 begrünt 12 . 13 The high attachment principle applies when the comparative particle has scope over a coordination of phrases (cf. 6.4.5). In this case, the two conjuncts are coordinated first. Then the particle is attached on a higher level. 114 NX 503 − HD NX 502 KONJ − KONJ NX NX 500 − wie Pete 0 501 − − Sampras 1 oder 2 − Yewgeni 3 Kafelnikov 4 5 KOKOM NE NE KON NE NE −− dsm dsm −− dsm dsm 7.2 Verbal and Adjectival Use of Participles In German, verbal participles which are passive verb forms (Der Mensch wird angesehen) can be used as adjectives: it can either function as an attribute adjective (der angesehene Mensch) or - depending on the context - also as a predicative adjective (der Mensch ist angesehen.). In contrast to the auxiliary werden in verbal passives, the auxiliary sein is used in constructions with adjectival passives. Concerning the problematic distinction between verbal and adjectival passives, we adapted the criteria in the Stuttgart-Tübingen tag set (STTS) (Schiller et al. 1995).1 1. Can the sentence be transformed into active form keeping the same semantics? If yes → VVPP 2. Is there a von-PP or an equivalent PP that gives evidence for verb semantics? If yes → VVPP 3. Is it possible to substitute the word in questions by a semantically similar adjective? If yes → ADJD The following two tree structures show the annotation of the verbal and adjectival passives of the verbal participle angesehen. In the first example, the verbal participle is analysed as a VVPP in VC. In the second example, the verbal participle has an adjectival reading and is annotated as an ADJD in MF. 1 Concerning the differences between verbal and adjectival passives in English cf. Bresnan (1995). 115 SIMPX 509 − − − MF 508 MOD ON − C 500 − daß ADVX 501 HD VC 507 HD ADJX 502 HD ADJX 503 − ein anderes ADJA NN KOKOM ADJD VVPP VAFIN −− −− nsn nsn nsn −− −− −− 3sit − − 3 4 5 gültig VXFIN 505 HD ART 2 als HD VXINF 504 HD ADV 1 Argument HD OV KOUS 0 auch PRED NX 506 − 6 angesehen wurde 7 8 SIMPX 517 − − NF 516 ON SIMPX 515 − − − MF 514 ON PRED ADJX 513 − VF 507 ON−MOD NX 500 HD LK 508 HD MF 509 PRED VXFIN 501 HD ADJX 502 HD NX 510 − C 503 − HD − ADJX 504 HD VC 512 HD HD ADVX 505 HD Es ist schade , daß so wenig VAFIN ADJD $, KOUS ADJA NN ADV ADV ADJD VAFIN nsn3 3sis −− −− −− npf npf −− −− −− 3pis 7.3 1 2 3 4 5 Leistungen VXFIN 506 HD PPER 0 akademische HD ADVX 511 6 7 8 angesehen 9 sind 10 Topicalization Topicalization is almost exclusively found in verb-second clauses. Consequently, the subject is not in the first position of the clause. Topicalized constructions bring about word order phenomena which differ from those occurring in MF, e.g., non-finite parts of VC are not allowed in MF. Our annotation principles demand to analyse the topicalized verb complex and its non-finite parts as VC in the first position of the clause. VC is then attached to VF. If a part of MF is topicalized along with VC, first MF and VC are combined to form FKONJ before they are attached to VF: 116 SIMPX 509 − − − VF 507 MF 508 − ON VC 504 LK 505 OV HD VXINF 500 VXFIN 501 HD HD Geplant V−MOD PX 506 − NX 502 NX 503 − war 0 HD HD der 1 HD Papst−Besuch seit 2 3 langem . 4 5 6 VVPP VAFIN ART NN APPR NN $. −− 3sit nsm nsm d dsn −− SIMPX 517 − − − VF 515 MF 516 − ON V−MOD FKONJ 513 PX 514 − − − HD MF 511 FOPP NX 512 PRED HD PX 507 − HD NX 500 ADJX 501 HD Auf HD Pioneer 0 aufmerksam 1 VC 508 LK 509 OV HD VXINF 502 VXFIN 503 HD HD geworden 2 NX 510 − NX 504 − war 3 − HD der 4 NX 505 HD BUND 5 durch 6 HD Informationen 7 HD ADJX 506 französischer 8 Bauern 9 . 10 11 APPR NE ADJD VAPP VAFIN ART NE APPR NN ADJA NN $. a asm −− −− 3sit nsm nsm a apf gpm gpm −− 7.4 Headlines The syntax of headlines differs from other syntactic constructions in so far as headlines2 often lack the finite verb or a verb at all. If a headline has only an infinitive, the case assignment follows the preference principle formulated in 5.2. Therefore, we assume in general the more plausible grammatical function in each case: a passive constructions with ON in MF if the verb in VC is a past participle and an active construction with OA in MF if the verb in VC is an infinitive. 2 The identifier “HEADLINE” is inserted into the comment line above the sentence for each syntactic unit which is marked as a headline in the original data. 117 SIMPX 507 − − MF 506 ON V−MOD NX 503 PX 504 − HD − VC 505 HD ADJX 500 HD NX 501 HD VXINF 502 HD 20 Dissidenten 0 in 1 HD China 2 festgenommen 3 4 CARD NN APPR NE VVPP −− npm d dsn −− SIMPX 504 − − MF 502 VC 503 OA HD NX 500 VXINF 501 HD HD WBM−Chefs ablösen 0 1 NN VVINF apm −− A headline can also consist of an elliptical sentence (cf. 6.5): SIMPX 506 − − MF 505 PRED VF ADJX 507 503 ON − NX ADJX 500 HD 501 HD HD Welthandelsorganisationen¦ vollkommen 0 kopflos 1 2 NN ADJD ADJD nsf −− −− Headlines can also consist of more than one syntactic structure, for instance, separated by a colon or a dash (cf. 4.7.2 and 5.2): 118 SIMPX 505 − ON HD NX 500 NX 501 HD : 0 7.5 LK 504 HD Rechtschreibreform − VF 503 VXFIN 502 HD Gegner 1 klagen 2 3 NN $. NN VVFIN nsf −− npm 3pis Discourse Markers Generally, discourse markers are expressions or phrases of greeting, apologizing, thanking, short emotional utterances, and interjections. Their node label is DM. The edge label of a discourse marker is empty, i.e., it does not have a head. Typical discourse markers are: ja, nein, hallo, oh, aha, pst, nunja, gewiß, toll, nun ja, etc. In most cases, discourse markers occur as isolated expressions. Interjections, tagged as ITJ, are directly projected to DM without internal structure. The same applies for answer particles (PTKANT): DM 500 − Oh 0 ITJ −− DM 500 − ja 0 PTKANT −− Phrases which function as discourse markers are first projected to their phrase level before they are assigned the node label DM. DM 501 − ADVX 500 HD gewiß 0 ADV −− 119 DM 501 − NX 500 − HD Keine Ahnung 0 1 PIAT NN asf asf DM 502 − NX 501 − HD ADJX 500 HD Liebe tazzen 0 1 ADJA NN np* np* Isolated conjunctions and foreign language discourse markers are tagged according to their part of speech (KON and FM) and are projected to DM: DM 500 − Und 0 KON −− DM 500 − pardon 0 FM −− Discourse markers may also consist of an interjection or an answer particle and a phrase: DM 501 − − ADVX 500 HD Nun ja 0 . 1 2 ADV PTKANT $. −− −− −− In some cases, discourse markers have a grammatical function within a phrase or a clause. Therefore, they are attached to the syntactic structure: 120 PX 503 − HD NX 502 APP APP NX 500 DM 501 − mit HD den − Worten 0 1 " 2 − aha 3 , 4 − aha 5 , 6 aha 7 8 APPR ART NN $( ITJ $, ITJ $, ITJ d dpn dpn −− −− −− −− −− −− NX 502 HD − NCX DM 500 501 HD − Welt ( 0 − Oh 1 , 2 no 3 ! 4 ) 5 6 NN $( ITJ $, FM $. $( dsf −− −− −− −− −− −− 7.6 Parentheses Parentheses occur as interjective utterances within a sentence. Since there is no dependency relation between the parenthesis and the rest of the construction, the parenthesis is not attached to the surrounding constituents. Often parentheses occur as SIMPX-clauses. Insertions like sagte Mehmet Scholl into direct speech are also annotated as parenthesis.3 SIMPX 515 − − − − VF 514 ON NX 513 HD − NX 508 − − NX 500 − HD HD Vielzahl 0 der 1 HD betroffenen 2 − VXFIN 502 Mieter 3 PX 503 HD daher 5 , 6 VC 512 MOD HD ADVX 504 HD sei 4 MF 511 NX 510 V−MOD HD ADJX 501 HD Eine LK 509 OV NX 505 HD so Bertermann , VXINF 507 HD HD bereits 10 ausgezogen 11 . 7 8 ART NN ART ADJA NN VAFIN PROP $, ADV NE $, ADV VVPP $. nsf nsf gpm gpm gpm 3sks −− −− −− nsm −− −− −− −− 3 9 ADVX 506 12 On the TüBa-D/Z web page (http://www.sfs.uni-tuebingen.de/en/tuebadz.shtml), the treebank is also available in the Penn Treebank format. For this format, parentheses are attached to the tree structure with the edge label PAR. For further details about the Penn Treebank format cf. 9. 121 13 − − VF 509 ON ON NCX 501 − HD HD Kuratorium 0 , 1 das 2 − − VF 510 NCX 500 Ein SIMPX 516 SIMPX 515 LK 511 MF 512 HD MOD VXFIN 502 ADVX 503 HD HD ist 3 − − PRED − der OA VXFIN 505 HD 5 MF 514 HD NCX 504 wohl 4 LK 513 NCX 506 HD Gedanke 6 , 7 MOD PRED ADVX 507 ADJX 508 HD macht 8 HD sich 9 HD immer 10 gut 11 . 12 13 ART NN $, PDS VAFIN ADV ART NN $, VVFIN PRF ADV ADJD $. nsn nsn −− nsn 3sis −− nsm nsm −− 3sis as*3 −− −− −− SIMPX 512 SIMPX 511 − − − VF LK 506 MF 507 LK 508 MF 509 510 PRED HD ON HD ON MOD ADJX VXFIN NX VXFIN NX ADVX 500 501 HD " − − HD Schön 0 502 " 1 , − sagte − Mehmet 4 503 HD Scholl 5 , 6 " 504 HD ist das nicht 10 " 11 12 . 2 3 7 8 $( ADJD $( $, VVFIN NE NE $, $( VAFIN PDS PTKNEG $( $. −− −− −− −− 3sit nsm nsm −− −− 3sis nsn −− −− −− 122 9 505 HD 13 Chapter 8 Criteria for the Distinction of Grammatical Functions 8.1 Subcategorization of Verbs The TüBa-D/Z-Verblist document1 lists all verbs occurring in the treebank with their specific subcategorization frames. This reference list guarantees the consistent annotation of grammatical functions. For a detailed description of constructing the verb list see (Hinrichs and Telljohann 2009). 8.2 Subcategorization of PREDs Since constituents which predicates subcategorize for have grammatical function within a sentence, they are neither marked as PRED-MOD nor attached to the predicate itself. These constituents are attached to a field and assigned the respective grammatical function like the constituent which is marked as FOPP in the following examples: 1 In case of interest, please refer to web page (http://www.sfs.uni-tuebingen.de/en/tuebadz. shtml) for contact information. 123 SIMPX 512 − − − VF 511 FOPP PX 509 MF 510 − HD ON NX 506 LK 507 HD − NX 500 − Für HD den 0 NX 501 − Erfolg 1 HD des 2 Volksbegehren 3 NX 508 HD − VXFIN 502 ADVX 503 HD HD sind 4 PRED − HD ADJX 504 ADJX 505 HD etwa 5 HD 243.000 6 Unterschriften 7 erforderlich . 8 9 10 APPR ART NN ART NN VAFIN ADV CARD NN ADJD $. a asm asm gsn gsn 3pis −− −− npf −− −− SIMPX 511 − − − VF 509 MF 510 ON OA NX 505 − LK 506 − HD Brüder 1 − sich 3 − NX 503 HD fühlen 2 HD NX 502 HD älteren 0 ADJX 508 − VXFIN 501 HD PRED PX 507 HD ADJX 500 Die OPP für 4 ADVX 504 HD ihre 5 HD HD Schwestern 6 sehr 7 verantwortlich 8 . 9 10 ART ADJA NN VVFIN PRF APPR PPOSAT NN ADV ADJD $. npm npm npm 3pis ap*3 a apf apf −− −− −− 8.3 Distinction of FOPP, OPP, and V-MOD One of the major problems is to distinguish, whether a given PP is an obligatory (OPP) or an optional (FOPP) complement of a specific verb in a specific reading, or whether it is a free adjunct (V-MOD) of that verb. The TüBa-D/Z-Verblist is intended as a reference for these problematic cases. In the following, we will briefly describe what criteria have been used in order to decide about the subcategorization with respect to PP complements/modifiers: 1. A PP is called OPP within a sentence if the sentence were ungrammatical without the OPP (or if there was at least a very noticeable change of meaning). For instance, Sie gehen [OPP gegen die Faschisten] vor./ Das Gesetz ist [OPP in Kraft] getreten. 2. A PP is called FOPP if it can be left out of this specific sentence without causing ungrammaticality (or a very noticeable change of meaning) and if its preposition is selected by this specific verb. For instance, Insgesamt berichtet die Polizei [FOPP von 19 Festnahmen und 98 Ingewahrsamnahmen]./Später würden wir [FOPP über 124 Auswandern] nachdenken. Here, the prepositions select these specific verbs and the PPs cannot be added to any arbitrary verb (which is possible for free adjuncts). In addition, in passive clauses, the subject of the original active clause, which has the form of a prepositional phrase, is marked as FOPP (Sie wurden [FOPP von Autonomen] umringt.). 3. A PP is called V-MOD if its preposition is not selected by this specific verb, i.e., it can be exchanged by any other modifying PP, and similarly, this PP can occur with arbitrary verbs (Nur [V-MOD im griechischen Lager] gab es Probleme). Typical V-MODs are temporal or local adjuncts specifying time and location of the action, event, or state expressed by the verb. 8.4 Distinction of MOD, MOD-MOD, and V-MOD A typical case of modification of modifiers is a temporal expression (V-MOD) that further specifies another temporal expression (MOD-MOD) in the same clause: 1. [V-MOD am Samstag] finden [MOD-MOD ab 16 Uhr] Führungen statt. [MOD-MOD Wann] finden [V-MOD am Samstag] Führungen statt? 2. [MOD da] finden [V-MOD am Samstag] Führungen statt. [V-MOD wann] finden [MOD da] Führungen statt? [MOD dann] finden [V-MOD am Samstag] Führungen statt. da, dann, etc. can be either temporal, causal, consequential, or local expressions. Thus, one cannot make sure whether the following time expression am Samstag really refers to them. The only obvious observation is that the time expression is a V-MOD in any case. For resumptive constructions (LV), there is also a clear criterion concerning the modification relations. Within a verb-second clause, a modifier occurring in VF is MOD/XMOD, whereas the modifier in LV is MOD-MOD, not vice versa, because the modifier in VF occurs within the core of the sentence, whereas the modifier in LV has to be licensed by some other constituent in the core sentence, e.g. Wenn da was gebucht worden ist, dann ist das nicht in Ordnung. (cf. 6.1.5). 8.5 Distinction of ON, PRED, ON-MOD, and PRED-MOD It is not always trivial to distinguish which constituent is ON, PRED, or ON-MOD for predicative verbs. For this reason, a few criteria and examples are listed here that can be of help. Here are some properties of ON and PRED: 1. Typically, PRED occurs in MF, whereas ON occurs in VF of verb-second clauses. This should be considered for annotation, if no other criterion (as described below) applies. 125 2. Subject-verb agreement always has to be taken into account. For instance, if the verb is in plural form, the subject has to be plural as well. 3. If there is a suitable NP that could serve as subject, then this NP is annotated as subject rather than any other constituent with a different syntactic category (PP, ADVP, etc.). For verb-second clauses, it is important to follow these two steps in exactly this order to stick to the distributional criterion that has been chosen for the PRED/ON distinction: 1. Have a look at the constituent in VF. If it is an NP which might serve as subject and if it agrees with the verb, annotate it as ON. 2. If it does not agree with the verb, annotate it as PRED (ADJP, ADVP, PP, etc.). Examples: 1. [ON neue Wortschöpfungen] sind [PRED es] nur. [PRED es] sind nur [ON neue Wortschöpfungen]. oder sind [PRED es] nur [ON neue Wortschöpfungen]. [PRED das] sind ohnehin [ON die schwächsten Partner]. [ON die schwächsten Partner] sind [PRED das] ohnehin. oder sind [PRED das] ohnehin [ON die schwächsten Partner]. Subject-verb agreement suggests that neue Wortschöpfungen und die schwächsten Partner are the subject, because of their plural form regardless in which field they occur. 2. [ON die Ursache] war [PRED unklar]. [PRED unklar] war [ON die Ursache] [ON Candan Ercettin] ist [PRED überall]. [PRED überall] ist [ON Candan Ercettin]. ADJPs and ADVPs typically have PRED function when occurring together with predicative verbs and NP subjects. 3. [PRED aus den Trauernden] wird [ON ein wütender Mop]. ein wütender Mop is considered the subject, because it is a noun phrase. Therefore, the prepositional phrase is PRED. 4. [ON [ON [ON [ON das] ist [PRED eine einmalige Chance]. eine einmalige Chance] ist [PRED das]. es] ist [PRED der erste Besuch eines Papstes]. der erste Besuch eines Papstes] ist [PRED es]. 126 [ON Hauptauftraggeber] ist [PRED die Bremer Verwaltung]. [ON die Bremer Verwaltung] ist [PRED Hauptauftraggeber]. The NP in VF position agrees with the verb and therefore has subject priority. As a consequence, the constituent in MF is PRED. 5. [PRED wer] bin [ON ich]. [PRED was] ist [ON das]. In w-questions, the interrogative pronoun is always PRED because here also the agreement rule applies. 6. [ON-MOD es] sei [PRED wichtig], [ON daß man ... ]. [ON Aufgabe des Festspielhauses] sei [PRED-MOD es], [PRED das Haus spielfertig zu halten]. If a sentential subject or a sentential predicate occurs with an expletive es, the expletive es is either ON-MOD or PRED-MOD (cf. 4.2.10). 127 Chapter 9 The TüBa-D/Z Data Formats The TüBa-D/Z treebank is released in five different data formats : 1. the NEGRA Export format 2. the Penn Treebank format 3. the Export-XML format (incl. anaphora and coreference relations) 4. the TIGER-XML format 5. the CoNLL format 9.1 The NEGRA Export Format This format is provided by the annotation tool Annotate (Brants and Skut 1998), it is created automatically from the database underlying the annotation process in Annotate. The NEGRA Export format is a line-oriented pointer-based representation of the syntactic annotation. It is also the most complete data format since it preserves all the information available during the manual annotation. A more complete description of the negra Export format can be found in (Brants 1997). There are two versions of this format; NEGRA Export format 3 contains 3 layers of token information (token, POS tags, morphology), lemmata are displayed in the ‘comment’ column; NEGRA Export format 4 contains a 4th layer of lemma annotation. An example of the NEGRA Export format 4 is given below, combined with the graphical representation of the syntactic annotation for the sentence ”Vikare müssen sich nach dem Kandidatengetz so verhalten, wie es von einem künftigen Pfarrer erwartet werden kann”. 128 Graphical representation (print out of the annotate tool): SIMPX 523 − − − − − NF 522 PRED−MOD SIMPX 521 − − − MF 520 ON FOPP MF PX 518 OA VF LK 512 PRED PX 513 ON 519 V−MOD HD VC 514 HD − NX 515 − HD VC 516 OV − 517 − HD OV OV refvc NX VXFIN 500 NX 501 HD 502 HD Vikare − sich 1 ADVX 503 HD müssen 0 NX nach 2 HD dem 3 504 HD Kandidatengetz 4 C 505 − verhalten 6 NX 506 HD so 5 VXINF , 507 508 HD wie VXINF 509 HD es von einem 11 Pfarrer 13 511 HD erwartet 14 VXFIN 510 HD künftigen 12 HD 509 VXINF HD werden 15 kann 16 . 8 9 10 NN VMFIN PRF APPR ART NN ADV VVINF $, KOUS PPER APPR ART ADJA NN VVPP VAINF VMFIN $. npm 3pis ap*3 d dsn dsn −− −− −− −− nsn3 d dsm dsm dsm −− −− 3sis −− LM=Vikar LM=müssen%aux LM=#refl LM=nach LM=verhalten LM=, LM=wie LM=es LM=von LM=ein LM=künftig LM=Pfarrer LM=erwarten LM=werden%passiv LM=können%aux LM=. LM=dasLM=Kandidatengesetz Kandidatengesetz¦ LM=so 7 ADJX 17 18 The first line of the sentence representation (marked as ’begin of sentence’ (BOS) includes the sentence id (here: 24538), the identity of the last annotator (here the one with id 2), the time of the last modification (in UNIX format, i.e. seconds since 1/1/1970) and the id of the origin of the file (1146 points to article 155 of the edition of 11/7/1992). In the right column, secondary edges (here: ’refvc’ pointing from node # 518 to node # 517, a dependency within the verbal complex) as well as corrections of misspellings (here: ’Kandidatengesetz’) are also represented. Optionally, there is a version of NEGRA Export format 4 that contains anaphoric relations (here: ’es’ is marked as expletive). 129 Export format 4: #BOS 24538 2 1134150923 1146 Vikare Vikar müssen müssen%aux sich #refl nach nach dem das Kandidatengetz Kandidatengesetz so so verhalten verhalten , , wie wie es es von von einem ein künftigen künftig Pfarrer Pfarrer erwartet erwarten werden werden%passiv kann können%aux . . #500 -#501 -#502 -#503 -#504 -#505 -#506 -#507 -#508 -#509 -#510 -#511 -#512 -#513 -#514 -#515 -#516 -#517 -#518 -#519 -#520 -#521 -#522 -#523 -#EOS 24538 NN VMFIN PRF APPR ART NN ADV VVINF $, KOUS PPER APPR ART ADJA NN VVPP VAINF VMFIN $. NX VF VXFIN LK NX NX PX ADVX MF VXINF VC C NX ADJX NX PX MF VXINF VXINF VXFIN VC SIMPX NF SIMPX npm 3pis ap*3 d dsn dsn ----nsn3 d dsm dsm dsm --3sis -------------------------- HD 500 HD 502 HD 504 506 505 HD 505 HD 507 HD 509 -0 511 HD 512 515 514 HD 513 HD 514 HD 517 HD 518 HD 519 -0 ON 501 523 HD 503 523 OA 508 HD 506 V-MOD 508 PRED 508 523 OV 510 523 521 ON 516 514 HD 515 FOPP 516 521 OV 520 OV 520 HD 520 521 PRED-MOD 523 -0 %% Kandidatengesetz %% R=expletive refvc 517 522 The only deviation from context-freeness which the annotation scheme allows concerns the annotation of parentheses. Parentheses are annotated as separate trees with no attachment to surrounding trees. The following tree gives an example for such a phenomenon (for a more complete description of the annotation cf. 7.6). 130 Graphical representation: SIMPX 517 − − VF 514 515 − NX LK MF 509 etwas 0 , 1 sagen 2 511 die 3 NX 503 HD , 5 HD hätten 6 ADVX 504 HD Abgeordneten 4 VC 512 − VXFIN 502 − MOD ADVX HD NX 501 HD MOD LK ON VXFIN 500 HD ON 510 HD ADVX So 516 − HD − MF OA 508 − − SIMPX VXINF 506 507 HD auch 8 OV ADVX 505 HD sie 7 513 HD HD noch 9 nicht 10 erlebt 11 . 12 13 ADV PIS $, VVFIN ART NN $, VAFIN PPER ADV ADV PTKNEG VVPP $. −− *** −− 3pis np* np* −− 3pkt np*3 −− −− −− −− −− LM=so LM=etwas LM=, LM=sagen LM=haben%aux LM=sie LM=auch LM=noch LM=nicht LM=erleben LM=. LM=der|die|dasLM=Abgeordneter|Abgeordnete|Abgeordnetes¦LM=, The pointer-based representation of the NEGRA Export format separates information about the linear precedence of words from attachment information so that parentheses can be represented naturally without having to resort to explicitly marking non-attached nodes. Here, the SIMPX node dominating the parenthesis (node #517) is marked as not having a mother node. Export format 3: #BOS 7307 5 1121695339 So etwas , sagen die Abgeordneten , hätten sie auch noch nicht erlebt . #500 #501 #502 #503 #504 #505 #506 #507 #508 #509 #510 #511 #512 #513 #514 #515 #516 #517 #EOS 7307 364 ADV PIS $, VVFIN ART NN $, VAFIN PPER ADV ADV PTKNEG VVPP $. ADVX NX VF VXFIN LK NX ADVX ADVX ADVX MF VXINF VC SIMPX VXFIN LK NX MF SIMPX -*** -3pis np* np* -3pkt np*3 ------------------------ HD HD -HD HD -HD HD HD HD HD HD -OA HD ON MOD MOD OV -HD ON -- 500 501 0 513 515 515 0 503 505 506 507 508 510 0 501 502 512 504 512 509 509 508 509 512 511 512 0 514 517 516 517 0 131 %% %% %% %% %% %% %% %% %% %% %% %% %% %% LM=so LM=etwas LM=, LM=sagen LM=der|die|das LM=Abgeordneter|Abgeordnete|Abgeordnetes LM=, LM=haben%aux LM=sie LM=auch LM=noch LM=nicht LM=erleben LM=. 9.2 The Penn Treebank Format This format is based on the format of the Penn Treebank (Mitchell et al. 1993). The attachment of constituents is shown via bracketing and indentation. Thus, all constituents which show the same level of indentation are attached on the same level. In the Penn Treebank format, grammatical functions, which are shown in the NEGRA Export format in the column ”edge label”, are attached to the syntactic label via a colon. Thus, the label ”NX:OA” means that the constituent is a noun phrase with the grammatical function accusative object. The Penn Treebank format is a representation that combines the linear representation of words with their attachment to higher constituents. For this reason, this format is restricted to completely context-free tree structures, i.e. it cannot adequately represent the annotation of parentheses in TüBa-D/Z. In order to capture the original syntactic annotation as well as the original word order in the sentence, it was decided to introduce a new edge label to mark such cases: PAR. Thus, the sentence ”So etwas , sagen die Abgeordneten , hätten sie auch noch nicht erlebt .”, as shown above is represented in the Penn Treebank format by the following bracketed structure: Comments are preceded by a double ’%’ sign. The comment behind the structure is intended to help the reader locate the beginning of the parenthesis and it is not part of the actual data. 132 %% sent. no. 7307 ( (SIMPX (VF (NX:OA (ADVX (ADV:HD So) ) (PIS:HD etwas) ) ) ($, ,) (SIMPX:PAR (LK (VXFIN:HD (VVFIN:HD sagen) ) ) (MF (NX:ON (ART die) (NN:HD Abgeordneten) ) ) ) ($, ,) (LK (VXFIN:HD (VAFIN:HD hätten) ) ) (MF (NX:ON (PPER:HD sie) ) (ADVX:MOD (ADV:HD auch) ) (ADVX:MOD (ADVX (ADV:HD noch) ) (PTKNEG:HD nicht) ) ) (VC (VXINF:OV (VVPP:HD erlebt) ) ) ) ($. .) ) %% here starts the parenthesis! 133 Commas, which are not attached to the tree, are indented on the highest level although they are included in the bracketing of the constituent surrounding them. In the sentence below, e.g., the first comma is grouped into the noun phrase NX via word order. The indentation, however, signals that the comma cannot necessarily be attached to this node. It is also conceivable that it may be attached to one of the lower nodes, NX or R-SIMPX. In the case of the second comma, there are even more possible attachment sites. %% fragment of sent. no. 33 ( (R-SIMPX (C (NX:ON (PRELS:HD die) ) ) (MF (NX:OA (NX=ORG:HD (ART:-NE die) (NN:HD AWO) ) ($, ,) (R-SIMPX (C (PX:V-MOD (PWAV:HD wo) ) ) (MF (NX:ON (PPER:HD er) ) (NX:PRED (NN:HD Kreisvorsitzender) ) ) (VC (VXFIN:HD (VAFIN:HD ist) ) ) ) ) ) ($, ,) (VC (VXFIN:HD (VVFIN:HD prüfte) ) ) ) ($. .)) 134 9.3 The Export-XML Format The XML format is a custom-made XML format that follows the NEGRA Export file format. It is designed to accommodate all original information provided in the Export format, including e.g. comments . Dominance relations between nodes are represented directly within the XML tree structure. Nodes without parent node (e.g. the sentence node SIMPX and punctuation marks do not have any ”parent” attribute. They have the edge label ”- -” that can be linked to an implicit root node. Thus, it is possible to represent parentheses without the use of additional labels. Anaphora is expressed by a link between two related nodes. Coreference sets therefore are represented implicitly by chains of nodes that are part of a referential relation. The following example shows the XML structure for the sentence “Schillen erklärte, sie werde als Kriegsgegnerin kandidieren”. The personal pronoun “sie” is anaphoric to the antecedent noun phrase “Schillen”. In the XML document, a <relation> tag is added below each node that is part of a referential relation. It encodes the type of referential relation and the node ID of the antecedent node. In our example, the antecedent is the node with ID s1723 500, that is the NX dominating the named entity “Schillen”. This NX in turn is in a coreferential relationship with node s1721 11 (word number 11 in sentence 1721), thus part of a coreference chain. Please note that the format of Export-XML has changed from release 6 to release 7 of the TüBa-D/Z. There are XMLPerl skripts available for the old Export-XML format and a Java library of the new Export-XML format. Graphical representation of the tree without annotation of the referential relation: SIMPX 514 − − − NF 513 OS SIMPX 512 − VF LK 506 VF 507 ON HD NX VXFIN 500 Schillen NX VXFIN HD erklärte 0 NE HD , 1 VVFIN 2 $, sie 511 OV NX 503 VXINF 504 − werde 3 VC 510 PRED HD PPER − MF 509 ON 502 HD − LK 508 501 HD − als 4 VAFIN 505 HD HD Kriegsgegnerin 5 KOKOM kandidieren 6 NN . 7 VVINF 8 $. nsf 3sit −− nsf3 3sks −− nsf −− −− LM=Schillen LM=erklären LM=, LM=sie LM=werden%aux LM=als LM=Kriegsgegnerin LM=kandidieren LM=. 135 XML format including the referential relation: <sentence xml:id="s1723"> <node xml:id="s1723_514" cat="SIMPX" func="--"> <node xml:id="s1723_501" cat="VF" func="-" parent="s1723_514"> <node xml:id="s1723_500" cat="NX" func="ON" parent="s1723_501"> <relation type="coreferential" target="s1721_11"/> <ne xml:id="ne_29507" type="PER"> <word xml:id="s1723_1" form="Schillen" pos="NE" morph="nsf" lemma="Schillen" func="HD" parent="s1723_500"/> </ne> </node> </node> <node xml:id="s1723_503" cat="LK" func="-" parent="s1723_514"> <node xml:id="s1723_502" cat="VXFIN" func="HD" parent="s1723_503"> <word xml:id="s1723_2" form="erklärte" pos="VVFIN" morph="3sit" lemma="erklären" func="HD" parent="s1723_502"/> </node> </node> <word xml:id="s1723_3" form="," pos="$," lemma="," func="--"/> <node xml:id="s1723_513" cat="NF" func="-" parent="s1723_514"> <node xml:id="s1723_512" cat="SIMPX" func="OS" parent="s1723_513"> <node xml:id="s1723_505" cat="VF" func="-" parent="s1723_512"> <node xml:id="s1723_504" cat="NX" func="ON" parent="s1723_505"> <relation type="anaphoric" target="s1723_500"/> <word xml:id="s1723_4" form="sie" pos="PPER" morph="nsf3" lemma="sie" func="HD" parent="s1723_504"/> </node> </node> <node xml:id="s1723_507" cat="LK" func="-" parent="s1723_512"> <node xml:id="s1723_506" cat="VXFIN" func="HD" parent="s1723_507"> <word xml:id="s1723_5" form="werde" pos="VAFIN" morph="3sks" lemma="werden%aux" func="HD" parent="s1723_506"/> </node> </node> <node xml:id="s1723_509" cat="MF" func="-" parent="s1723_512"> <node xml:id="s1723_508" cat="NX" func="PRED" parent="s1723_509"> <word xml:id="s1723_6" form="als" pos="KOKOM" lemma="als" func="-" parent="s1723_508"/> <word xml:id="s1723_7" form="Kriegsgegnerin" pos="NN" morph="nsf" lemma="Kriegsgegnerin" func="HD" parent="s1723_508"/> </node> </node> <node xml:id="s1723_511" cat="VC" func="-" parent="s1723_512"> <node xml:id="s1723_510" cat="VXINF" func="OV" parent="s1723_511"> <word xml:id="s1723_8" form="kandidieren" pos="VVINF" lemma="kandidieren" func="HD" parent="s1723_510"/> </node> </node> </node> </node> </node> <word xml:id="s1723_9" form="." pos="$." lemma="." func="--"/> </sentence> 136 9.4 The TIGER-XML Format The TIGER-XML format was designed by Lezius (2002) as interchange format for the TIGERSeach treebank query tool. The corpus body contains sentences ”s” containing one or several “graphs”. Graphs are annotated stand-off: A list of terminal nodes is followed by nonterminals with their edges. TIGER-XML doesn’t include the full set of annotations. POS, morphology, and lemmas are included; corrected word forms and anaphora are missing. A detailed description is provided in chapter V “The TIGER-XML treebank encoding format” of the TIGERSearch User’s Manual (König et al. 2003). 137 Sentence 1723 in TIGER-XML format: <s id="s1723"> <graph root="s1723_514"> <terminals> <t id="s1723_1" word="Schillen" lemma="Schillen" pos="NE" morph="nsf" /> <t id="s1723_2" word="erklärte" lemma="erklären" pos="VVFIN" morph="3sit" /> <t id="s1723_3" word="," lemma="," pos="$," morph="--" /> <t id="s1723_4" word="sie" lemma="sie" pos="PPER" morph="nsf3" /> <t id="s1723_5" word="werde" lemma="werden%aux" pos="VAFIN" morph="3sks" /> <t id="s1723_6" word="als" lemma="als" pos="KOKOM" morph="--" /> <t id="s1723_7" word="Kriegsgegnerin" lemma="Kriegsgegnerin" pos="NN" morph="nsf" /> <t id="s1723_8" word="kandidieren" lemma="kandidieren" pos="VVINF" morph="--" /> <t id="s1723_9" word="." lemma="." pos="$." morph="--" /> </terminals> <nonterminals> <nt id="s1723_500" cat="NX=PER"> <edge label="HD" idref="s1723_1" /> </nt> <nt id="s1723_501" cat="VF"> <edge label="ON" idref="s1723_500" /> </nt> <nt id="s1723_502" cat="VXFIN"> <edge label="HD" idref="s1723_2" /> </nt> <nt id="s1723_503" cat="LK"> <edge label="HD" idref="s1723_502" /> </nt> <nt id="s1723_504" cat="NX"> <edge label="HD" idref="s1723_4" /> </nt> <nt id="s1723_505" cat="VF"> <edge label="ON" idref="s1723_504" /> </nt> <nt id="s1723_506" cat="VXFIN"> <edge label="HD" idref="s1723_5" /> </nt> <nt id="s1723_507" cat="LK"> <edge label="HD" idref="s1723_506" /> </nt> <nt id="s1723_508" cat="NX"> <edge label="-" idref="s1723_6" /> <edge label="HD" idref="s1723_7" /> </nt> <nt id="s1723_509" cat="MF"> <edge label="PRED" idref="s1723_508" /> </nt> <nt id="s1723_510" cat="VXINF"> <edge label="HD" idref="s1723_8" /> </nt> <nt id="s1723_511" cat="VC"> <edge label="OV" idref="s1723_510" /> </nt> <nt id="s1723_512" cat="SIMPX"> <edge label="-" idref="s1723_505" /> <edge label="-" idref="s1723_507" /> <edge label="-" idref="s1723_509" /> <edge label="-" idref="s1723_511" /> </nt> <nt id="s1723_513" cat="NF"> <edge label="OS" idref="s1723_512" /> </nt> <nt id="s1723_514" cat="SIMPX"> <edge label="-" idref="s1723_501" /> <edge label="-" idref="s1723_503" /> <edge label="-" idref="s1723_513" /> </nt> </nonterminals> </graph> </s> 138 9.5 The CoNLL Format CoNLL format contains a dependency version of TüBa-D/Z in the format of the CoNLLX shared task. The conversion was done automatically, but is oriented at the annotation guidelines by Foth (2006). CoNLL format is a table format containing a series of tabular-separated lines. Each line contains the following information: 1. ID – a sequential ID for each token 2. FORM – the word form (token) 3. LEMMA – the gold standard lemma of the token 4. CPOSTAG – simplified part-of-speech tag 5. POSTAG – part-of-speech tag according to STTS tag set 6. FEATS – tag with morphological information 7. HEAD – regent of the token in the dependency analysis, or “0” for tokens without regent 8. DEPREL – dependency relation between the token and its regent, or “ROOT” for tokens without regent Sentence 1723 in CoNLL format: 1 2 3 4 5 6 7 8 9 Schillen erklärte , sie werde als Kriegsgegnerin kandidieren . Schillen erklären , sie werden%aux als Kriegsgegnerin kandidieren . N V $, PRO V KOKOM N V $. 139 NE VVFIN $, PPER VAFIN KOKOM NN VVINF $. nsf 3sit -nsf3 3sks -nsf --- 2 0 2 5 2 8 6 5 8 SUBJ ROOT -PUNCTSUBJ S KOM CJ AUX -PUNCT- References Bech, G. 1955–57. Studien über das deutsche Verbum infinitum. Kopenhagen. 2 Bände. 2. unveränderte Auflage 1983 mit einem Vorwort von Catharine Fabricius-Hansen. Tübingen: Max Niemeyer. Behaghel, O. 1932. Deutsche Syntax (Eine geschichtliche Darstellung), Band 4. Heidelberg: Carl Winter. Brants, T., and W. Skut. 1998. Automation of treebank annotation. In Proceedings of the Conference on New Methods in Language Processing (NeMLaP-3/CoNLL98), January 14-17, 1998, Sydney, Australia, 49–57. Sydney. Brants, T. 1997. The NeGra Export Format for Annotated Corpora. Universität des Saarlandes, Computational Linguistics, Saarbrücken, Germany. Brants, T. 1998. TnT – A Statistical Part-of-Speech Tagger. Universität des Saarlandes, Computational Linguistics, Saarbrücken, Germany. Bresnan, J. 1995. Lexicality and Argument Structure. In Invited Paper given at the Paris Syntax and Semantics Conference. Paris. October 12-14, 1995. URL: http: //www-csli.stanford.edu/bresnan/download.html. Drach, E. 1937. Grundgedanken der Deutschen Satzlehre. Frankfurt/Main. Drosdowski, G. (Ed.). 1995. Duden ”Die Grammatik der deutschen Gegenwartssprache”. Mannheim, Leipzig, Wien, Zürich: Dudenverlag. Eisenberg, P. 1999–2001. Grundriß der deutschen Grammatik, Band 2: Der Satz. Stuttgart, Weimar: J.B. Metzler. Engel, U. 1996. Deutsche Grammatik. Heidelberg: Julius Groos Verlag. Erdmann, O. 1886. Grundzüge der deutschen Syntax nach ihrer geschichtlichen Entwicklung dargestellt. Stuttgart: Cotta. Erste Abteilung. Foth, K. A. 2006. Eine umfassende Constraint-Dependenz-Grammatik des Deutschen. Technical report, Fachbereich Informatik der Universität Hamburg. Grewendorf, G. 1991. Aspekte der deutschen Syntax. Vol. 33 of Studien zur deutschen Grammatik. Tübingen: Gunter Narr Verlag. Helbig, G., and J. Buscha. 1998. Deutsche Grammatik. Ein Handbuch für den Ausländerunterricht. Leipzig: Verl. Enzyklopädie. 18 edition. Herling, S. H. A. 1821. Über die Topik der deutschen Sprache. In Abhandlungen des frankfurterischen Gelehrtenvereins für deutsche Sprache, 296–362, 394. Frankfurt/Main. Drittes Stück. 140 Hinrichs, E. W., and H. Telljohann. 2009. Constructing a Valence Lexicon for a Treebank of German. In Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories (TLT 7): January 23-24, 2009, Groningen, The Netherlands. URL: http://www.let.rug.nl/tlt/. Hinrichs, E. W., J. Bartels, Y. Kawata, V. Kordoni, and H. Telljohann. 2000. The Tübingen Treebanks for Spoken German, English, and Japanese. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation. Berlin: Springer. Höhle, T. N. 1986. Der Begriff ‘Mittelfeld’. Anmerkungen über die Theorie der topologischen Felder. In A. Schöne (Ed.), Kontroversen alte und neue. Akten des 7. Internationalen Germanistenkongresses Göttingen, 329–340. Tübingen: Niemeyer. Kathol, A. 1995. Linearization-Based German Syntax. PhD thesis, Ohio State University. Kiss, T. 1995. Infinitive Komplementation. Neue Studien zum deutschen Verbum infinitum. Tübingen: Max Niemeyer. König, E., W. Lezius, and H. Voormann. 2003. TIGERSearch 2.1 – User’s Manual. URL: http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/ manual.shtml. Kübler, S., and H. Telljohann. 2002. Towards a dependency-based evaluation for partial parsing. In Beyond PARSEVAL – Towards Improved Evaluation Measures for Parsing Systems – (LREC 2002 Workshop), Las Palmas, Gran Canaria, June 2002. Lezius, W. 2002. Ein Suchwerkzeug für syntaktisch annotierte Textkorpora. PhD thesis, IMS, University of Stuttgart. URL: http://www.ims.uni-stuttgart.de/projekte/ TIGER/TIGERSearch/manual_html.html. Mitchell, M., B. Santorini, and M. A. Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19(2):313–330. Naumann, K., and V. Möller. 2007. Manual for the Annotation of in-document Referential Relations. University of Tübingen, May 2007. Plaehn, O. 1998. Annotate – Bedienungsanleitung. FR 8.7 Computerlinguistik, Projekt C3 Nebenläufige Grammatische Verarbeitung, Sonderforschungsbereich 378, Ressourcenadaptive Kognitive Prozesse, 13. April 1998. URL: http://www.coli. uni-saarland.de/projects/sfb378/negra-corpus/annotate.html. Pütz, H. 1986. Über die Syntax der Pronominalform ’es’ im modernen Deutsch. Tübingen: Stauffenburg. 2 edition. Schiller, A., S. Teufel, and C. Thielen. 1995. Guidelines für das Tagging deutscher Textcorpora mit STTS. Technical report, Universities of Stuttgart and Tübingen. URL: http://www.sfs.nphil.uni-tuebingen.de/ELWIS/stts/stts.html. Stegmann, R., H. Telljohann, and E. W. Hinrichs. 2000. Stylebook for the German Treebank in verbmobil. Verbmobil-report 239, University of Tübingen. 141 Telljohann, H., E. W. Hinrichs, S. Kübler, H. Zinsmeister, and K. Beck. 2009. Stylebook for the Tübingen Treebank of Written German (TüBa-D/Z). University of Tübingen, November 2009. Trushkina, J. 2004. Morpho-syntactic annotation and dependency parsing of German. PhD thesis, University of Tübingen. URL: http://w210.ub.uni-tuebingen.de/ dbt/volltexte/2004/1523). Versley, Y., K. Beck, E. W. Hinrichs, and H. Telljohann. 2010. A Syntax-first Approach to High-quality Morphological Analysis and Lemma Disambiguation for the TüBaD/Z Treebank. In Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT 9): December 3-4, 2010, Tartu, Estonia. URL: http: //www.math.ut.ee/tlt9/. 142 Index accusative object, double, 73 AcI, 72 adverbial adjective, 13, 58, 60 adverbial phrase, 17, 63 ambiguity, 10, 23, 25, 80, 82–85, 103 apposition, 18, 34 attributive adjective, 13, 24, 27, 28, 58, 61 isolated phrase, 20, 21, 84, 85, 107 C-field, 8, 9, 17, 86, 89, 92 cardinal numbers, 13, 32, 48, 49 circumposition, 13, 58 coherency, 71, 72 comparatives, 5, 112 CoNLL format, 128 context-freeness, 4 coordination, 5, 7, 12, 17, 29, 56, 61, 95–99, 101–108, 114 modal verbs, 14, 78 KOORD-field, 8, 9, 17, 88, 95 lassen, 72 levels of annotation, 11 long-distance dependency, 4, 23, 90, 113 longest match principle, 10, 14, 22, 105 Dependency Grammar, 24, 55 determiner phrase, 17, 54 discourse marker, 3, 5, 14, 17, 24, 119, 120 named entities, 4, 12, 19, 33, 41–44, 47 Negra export format, 128 Negra treebank, 2 node labels, 4, 11, 12, 17, 19, 38, 67, 85, 102, 119 nominalized adjective, 60 non-ambiguity, 23, 81 non-words, 14, 50 ordinal numbers, 48 paratactic construction, 17, 105, 107 parenthesis, 3, 5, 14, 121 PARORD-field, 8, 9, 17, 88, 89, 107 part-of-speech tags, 4, 11, 20, 50, 56 particle verb, 75 Penn Treebank format, 128 postmodifier, 10, 25, 31, 33, 35, 40, 44, 46, 63, 83, 93, 94, 112 postnominal modifier, 31, 32 flat clustering principle, 10, 25, 65, 86 postposition, 13, 58 foreign language material, 13, 38, 47 predicate, 75, 76, 123, 127 predicate-argument structure, 3, 9 headline, 3, 5, 14, 67, 80, 81, 117 high attachment principle, 10, 25, 101, 114 predicative adjective, 13, 58, 59, 115 preference principle, 81, 117 premodifier, 25, 28, 44–46, 48, 59, 60, 62, imperative, 14, 73 63, 83, 103 incoherency, 72 prenominal modifier, 27, 29, 31 infinitives with zu, 69, 70, 72 preposition, 13, 24, 26, 55, 56, 124 initial field, 6, 8, 17, 83 edge labels, 4, 9, 11, 12, 18–20, 23, 24, 81, 95 elliptical construction, 5, 11, 14, 95, 109, 111, 118 Ersatzinfinitiv, 8, 17, 67, 68 expletive, 18, 51, 52, 127 export-XML format, 128 143 proper noun, 13, 19, 25, 26, 29, 37, 38, 41, 47 punctuation marks, 14, 16, 34, 83, 84 relative clause, 8, 17, 82, 92–94 relative clause, event-modifying, 94 relative clause, independent, 94 resumptive construction, 7, 8, 17, 90, 125 reusability, 2, 3 secondary edge label, 4, 12, 18, 23, 65, 66, 72, 93 split coordination, 18, 96, 108 superlative forms, 112 syntactic dependencies, 4 syntactic-semantic node labels, 12, 42, 43 TüBa-D/S treebank, 2 TüBa-D/Z data formats, 2, 128 theory-neutrality, 2, 3 TIGER treebank, 2 TigerXML format, 128 topicalization, 5, 116 topological fields, 4–9, 11, 12, 23, 24, 27, 76, 80, 86, 102, 109, 111 truncated word, 14, 99 verb complex, 4, 6–8, 17, 65–67, 69, 71, 72, 75, 93, 116 verb particle, 14, 74 VERBMOBIL treebank, 1, 2, 20 144