GermaNet: Anwendungen und Zugriffsstrukturen Verena Henrich und Erhard Hinrichs Eberhard Karls Universität Tübingen Seminar für Sprachwisenschaft Berlin, 5. Dezember 2011 Struktur des Vortrags • GermaNet: eine kurze Einführung • Komposita in GermaNet • Alignierung von Lesarten in GermaNet und Wiktionary • Demos: Anwendungen und Zugriffsstrukturen Introduction to GermaNet GermaNet: A German Wordnet • GermaNet is a lexical semantic network covering the German base vocabulary • Belongs to the family of wordnets modeled after the Princeton WordNet for English • GermaNet is divided into 3 word categories: - - - Adjectives Nouns Verbs • Words are ordered according to their meaning Word Meanings, Lexical Units, and Synsets • Word meanings are represented by lexical units Lexical Unit g elbe Rübe Lexical Unit • Lexical units are grouped into semantic concepts (synsets) according to their meaning Mohrrübe Lexical Unit Karotte Lexical Unit • A synset is a set of (near-) synonymous words Sy no ny m y Möhre Lexical Relations • Lexical relations hold between two lexical units - Synonymy - Antonymy - Pertainymy Antonymy Blume Pertainymy g eblümt Conceptual Relations Hy po pe rn ym y Ball Hy • Conceptual relations hold between synsets - Hypernymy and hyponymy - Part-whole relations - Entailment - Causation - Association ny m y Fußball Fußballtor Association Fußball Tennisball Volleyball Part-Whole Relations • 4 kinds of part-whole relations - Component meronymy - Portion meronymy - Substance meronymy - Member meronymy 1kg Portion Meronymy 1g Part-Whole Relations Member Meronymy Schiff Flotte Substance Meronymy Schnee Schneemann Size of GermaNet Release 6 (April 2011) • Number of lexical units: 93.407 - - - Adjectives: 8.582 lexical units Nouns: 71.844 lexical units Verbs: 12.981 lexical units • Number of synsets: 69.594 - - - Adjectives: 5.991 synsets Nouns: 5.3753 synsets Verbs: 9.850 synsets • Literals: 85.214 • Lexical relations: 3.562 • Conceptual relations: 81.852 + apple = tree + rain + apple tree sun flower + = bow = rainbow foot sunflower = ball football Compounds in GermaNet Determining Immediate Constituents of Compounds in GermaNet Introduction: Modeling Compounds in GermaNet • Goal: systematically link compounds in GermaNet to their constituent parts • Condition: compound splitting needs to be applied recursively Kraftfahrzeugsteuer ‘motor vehicle tax’ c_modifier Kraftfahrzeug ‘motor vehicle’ c_head Steuer ‘tax’ Determining Immediate Constituents of Compounds in GermaNet Introduction: Modeling Compounds in GermaNet Kraftfahrzeugsteuer ‘motor vehicle tax’ • Goal: systematically link compounds in GermaNet to their constituent parts • Condition: compound splitting needs to be applied recursively c_modifier c_head Kraftfahrzeug ‘motor vehicle’ c_modifier Kraft ‘power’ c_head Fahrzeug ‘vehicle’ Steuer ‘tax’ Determining Immediate Constituents of Compounds in GermaNet Introduction: Modeling Compounds in GermaNet Kraftfahrzeugsteuer ‘motor vehicle tax’ • Goal: systematically link compounds in GermaNet to their constituent parts • Condition: compound splitting needs to be applied recursively c_modifier c_head Kraftfahrzeug ‘motor vehicle’ c_modifier Steuer ‘tax’ c_head Kraft ‘power’ Fahrzeug ‘vehicle’ c_modifier fahren ‘to drive’ c_head Zeug ‘stuff’ Determining Immediate Constituents of Compounds in GermaNet Compounding in German is Challenging • Intervening linking elements: Blume ‘flower’ Sieg ‘win’ ns + n e + s + Vase ‘vase’ = Blumenvase ‘flower vase’ + Wille ‘will’ = Siegeswille ‘will to win’ = Hüftschwung ‘hip swing’ ens es en linking elements • Elision of word-final characters: Hüfte ‘hip’ – e word-final character + Schwung ‘swing’ Determining Immediate Constituents of Compounds in GermaNet SMOR and ASV Toolbox Compound Splitter • SMOR is a morphological analyzer for German • ASV Toolbox Baseforms is a German compound splitter à Both do not group immediate constituents • Modified versions of SMOR and ASV Toolbox compound splitters group immediate constituents Determining Immediate Constituents of Compounds in GermaNet Compound Splitter Incorporating GermaNet (GN-CS) Example 1: Kohleprodukt ‘coal product’ • Pattern matching for gathering all potential modifiers and heads • In case more than one potential modifier-head composition is possible: heuristics Kohleprodukt ‘coal product’ Kohle ‘coal’ + Produkt ‘product’ & Kohl + e + ‘cabbage’ linking element Produkt ‘product’ Determining Immediate Constituents of Compounds in GermaNet Compound Splitter Incorporating GermaNet (GN-CS) Example 1: Kohleprodukt ‘coal product’ • Pattern matching for gathering all potential modifiers and heads • In case more than one potential modifier-head composition is possible: heuristics Kohleprodukt ‘coal product’ Correct Kohle ‘coal’ False + Produkt ‘product’ Matches without linking elements are preferred & Kohl + e + ‘cabbage’ linking element Produkt ‘product’ Determining Immediate Constituents of Compounds in GermaNet Compound Splitter Incorporating GermaNet (GN-CS) Example 2: Flughafengelände ‘airport area’ Flughafengelände ‘airport area’ Flughafen ‘airport’ + Gelände ‘area’ • No linking elements • All existing words in GermaNet & Flug ‘flight’ + Hafengelände ‘harbor area’ Determining Immediate Constituents of Compounds in GermaNet Compound Splitter Incorporating GermaNet (GN-CS) Example 2: Flughafengelände ‘airport area’ Flughafengelände ‘airport area’ Flughafen ‘airport’ + Gelände ‘area’ • No linking elements • All existing words in GermaNet & Flug ‘flight’ + Hafengelände ‘harbor area’ Determining Immediate Constituents of Compounds in GermaNet Compound Splitter Incorporating GermaNet (GN-CS) Example 2: Flughafengelände ‘airport area’ Flughafengelände ‘airport area’ Correct Flughafen ‘airport’ False + Gelände ‘area’ Matches with connected constituents are preferred • No linking elements • All existing words in GermaNet & Flug ‘flight’ + Hafengelände ‘harbor area’ Determining Immediate Constituents of Compounds in GermaNet Compound Splitter Incorporating GermaNet (GN-CS) Example 3: Nachttischlampe ‘bedside lamp’ Nachttisch ‘bed table’ Nacht ‘night’ + + Lampe ‘lamp’ Tischlampe ‘table lamp’ • No linking elements • All existing words in GermaNet Determining Immediate Constituents of Compounds in GermaNet Compound Splitter Incorporating GermaNet (GN-CS) Example 3: Nachttischlampe ‘bedside lamp’ Nachttisch ‘bed table’ + • No linking elements Lampe ‘lamp’ • All existing words in GermaNet • Both heads are direct/indirect hypernyms of compound has hypernym Nacht ‘night’ + Tischlampe ‘table lamp’ has (indirect) hypernym has hypernym Nachttischlampe ‘bedside lamp’ Determining Immediate Constituents of Compounds in GermaNet Compound Splitter Incorporating GermaNet (GN-CS) Example 3: Nachttischlampe ‘bedside lamp’ Correct Nachttisch ‘bed table’ + • No linking elements Lampe ‘lamp’ • All existing words in GermaNet • Both heads are direct/indirect hypernyms of compound has hypernym False Nacht ‘night’ + Tischlampe ‘table lamp’ has (indirect) hypernym has hypernym Nachttischlampe ‘bedside lamp’ Longer hypernym distance is preferred Determining Immediate Constituents of Compounds in GermaNet Majority Voting Example: Segelflugzeug ‘glider’ 1 • SMOR: Segel + Flugzeug ‘sail + plane’....1 • GN-CS: Segel + Flugzeug ‘sail + plane’... • ASV: Segelflug + Zeug ‘gliding + stuff’.................................. 1 Determining Immediate Constituents of Compounds in GermaNet Majority Voting Example: Segelflugzeug ‘glider’ 1 • SMOR: Segel + Flugzeug ‘sail + plane’....1 • GN-CS: Segel + Flugzeug ‘sail + plane’... • ASV: Segelflug + Zeug ‘gliding + stuff’.................................. 2 Segel + Flugzeug ‘sail + plane’ 1 1 Segelflug + Zeug ‘gliding + stuff’ Determining Immediate Constituents of Compounds in GermaNet Combined Hybrid Compound Splitter Example: Gesetzmäßigkeit ‘legality’ 1‚law + moderateness‘ • SMOR: Gesetz + Mäßigkeit...............1‚law + moderateness‘ • ASV: Gesetz + Mäßigkeit...................1‚law + moderateness ‘ • GN-CS: Gesetz + Mäßigkeit.............. Determining Immediate Constituents of Compounds in GermaNet Combined Hybrid Compound Splitter Example: Gesetzmäßigkeit ‘legality’ 1‚law + moderateness‘ • SMOR: Gesetz + Mäßigkeit...............1‚law + moderateness‘ • ASV: Gesetz + Mäßigkeit...................1‚law + moderateness ‘ • GN-CS: Gesetz + Mäßigkeit.............. 3 Gesetz + Mäßigkeit ‘law + moderateness’ 0 Determining Immediate Constituents of Compounds in GermaNet Combined Hybrid Compound Splitter Example: Gesetzmäßigkeit ‘legality’ 1‚law + moderateness‘ • SMOR: Gesetz + Mäßigkeit...............1‚law + moderateness‘ • ASV: Gesetz + Mäßigkeit...................1‚law + moderateness ‘ • GN-CS: Gesetz + Mäßigkeit.............. no compound predicted Heuristic: adjective gesetzmäßig ‚legal‘ + derivation suffix –keit à no composition 3 Gesetz + Mäßigkeit ‘law + moderateness’ 0 Determining Immediate Constituents of Compounds in GermaNet Combined Hybrid Compound Splitter – Heuristics • Suffixes –heit, –keit, –ität, –ung, –tum, etc. indicate derivation = All Gemeinheit + ‘universe’ ‘villainy’ Allgemeinheit ‘generality’ = allgemein ‘general’ adjective + –heit derivation suffix no compound predicted Determining Immediate Constituents of Compounds in GermaNet Combined Hybrid Compound Splitter – Heuristics • Suffixes –heit, –keit, –ität, –ung, –tum, etc. indicate derivation = False All Gemeinheit + ‘universe’ ‘villainy’ Allgemeinheit ‘generality’ = allgemein ‘general’ adjective –heit + derivation suffix Correct no compound predicted Determining Immediate Constituents of Compounds in GermaNet Combined Hybrid Compound Splitter – Heuristics • Small case heads are most probably incorrect Teppichleger ‘carpet layer’ = Teppich ‘carpet’ + + leger small case head ‘informal’ Leger ‘layer’ capitalized head Determining Immediate Constituents of Compounds in GermaNet Combined Hybrid Compound Splitter – Heuristics • Small case heads are most probably incorrect Teppichleger ‘carpet layer’ = Teppich ‘carpet’ + + leger small case head False ‘informal’ Leger ‘layer’ capitalized head Correct Determining Immediate Constituents of Compounds in GermaNet Combined Hybrid Compound Splitter – Heuristics • Small case heads are most probably incorrect Teppichleger ‘carpet layer’ Teppich ‘carpet’ = leger small case head False ‘informal’ + + Leger ‘layer’ capitalized head Correct • Head should be an ending substring of the compound Weltreise ‘world trip’ = Welt ‘world’ + + Reis ‘rice’ no ending substring Reise ‘trip’ ending substring Determining Immediate Constituents of Compounds in GermaNet Combined Hybrid Compound Splitter – Heuristics • Small case heads are most probably incorrect Teppichleger ‘carpet layer’ Teppich ‘carpet’ = leger small case head False ‘informal’ + + Leger ‘layer’ capitalized head Correct • Head should be an ending substring of the compound Weltreise ‘world trip’ = Welt ‘world’ + + Reis ‘rice’ no ending substring False Reise ‘trip’ ending substring Correct Sense Alignment of GermaNet and Wiktionary Introduction: The Necessity of Sense Descriptions • Descriptions illustrate individual word senses in dictionaries • For example: Princeton WordNet contains 3 senses for nail Introduction: The Necessity of Sense Descriptions • Descriptions illustrate individual word senses in dictionaries • For example: Princeton WordNet contains 3 senses for nail • Without definitions it is not easy to distinguish senses Extend GermaNet with Sense Definitions GermaNet ‘advertisement’ ‘complaint’ ‘display’ Extend GermaNet with Sense Definitions Wiktionary Kurze Mitteilungen in den Medien, die der Bekanntmachung oder Werbung dienen GermaNet ‘advertisement’ Short notices in the media for making announcements Recht: Bekanntgabe einer Straftat bei einer Behörde Law: report of a crime at an authority Technik: eine Vorrichtung zur Signalisierung von Zuständen und Werten Technical device for signaling visual information ‘complaint’ ‘display’ GermaNet-Wiktionary Mapping Wiktionary GermaNet ‘advertisement’ ‘complaint’ ‘display’ Bag of Words from GermaNet Sense Anzeige ‘advert.’ Wiktionary GermaNet ‘advertisement’ ‘complaint’ ‘display’ Bag of Words from GermaNet Sense Anzeige ‘advert.’ Wiktionary GermaNet Anzeige Annonce Inserat Ausschreibung Versandanzeige Kaufgesuch Verkaufsangebot Familienanzeige Partnergesuch Kontaktanzeige Stellenanzeige Stellenangebot Stellenannonce Stellengesuch Kleinanzeige Großanzeige Zeitungsanzeige Zeitung Blatt Gazette ‘advertisement’ ‘complaint’ ‘display’ Bag of Words from Wiktionary Sense Anzeige ‘advert.’ Wiktionary Bag of Words from Wiktionary Sense Anzeige ‘advert.’ Wiktionary Anzeige Mitteilung Medien Bekanntmachung Werbung Annonce Inserat Familienanzeige Geburtstaganzeige Heiratsanzeige Hochzeitsanzeige Kontaktanzeige Todesanzeige Traueranzeige Verlobungsanzeige Werbeanzeige Kleinanzeige Word Overlap Example: Anzeige ‘advertisement’ GermaNet Anzeige Mitteilung Medien Bekanntmachung Werbung Annonce Inserat Familienanzeige Geburtstaganzeige Heiratsanzeige Hochzeitsanzeige Kontaktanzeige Todesanzeige Traueranzeige Verlobungsanzeige Werbeanzeige Kleinanzeige Word overlap Anzeige Annonce Inserat Familienanzeige Kontaktanzeige Kleinanzeige Anzeige Annonce Inserat Ausschreibung Versandanzeige Kaufgesuch Verkaufsangebot Familienanzeige Partnergesuch Kontaktanzeige Stellenangebot Stellenannonce Stellengesuch Kleinanzeige Großanzeige Zeitungsanzeige Zeitung Blatt Gazette Bag of words from GermaNet Bag of words from Wiktionary Wiktionary Coordinated Relations Example: Anzeige ‘advert.’ Wiktionary GermaNet ‘advertisement’ ‘complaint’ ‘display’ Coordinated Relations Example: Anzeige ‘advert.’ Wiktionary GermaNet Synonyms in common ‘advertisement’ ‘complaint’ Hyponyms in common ‘display’ Different Sense Granularities Wiktionary Sammlung historisch oder aus anderen Gründen bedeutsamer Dokumente ‘Collection of documents that are historically or for other reasons important’ Einrichtung, Institution zur Aufbewahrung und Pflege historisch oder aus anderen Gründen bedeutsamer Dokumente GermaNet Archiv ‘data repository’ Archiv ‘archive’ ‘Institution for storing and maintenance of historically or for other reasons important documents’ Gebäude oder Gebäudeteil, der eine Institution zur Aufbewahrung von Dokumenten enthält ‘Building or part of the building containing an institution for storing documents’ Archiv ‘archived file’ Sense Mapping Editor Using Anzeige Applications and Tools for GermaNet Tools for GermaNet • Application Programming Interfaces - - Java API Perl API • Web Application: • Web service: as part of WebLicht • GermaNet-Explorer: visualisation tool (developed at the University of Dortmund) • GernEdiT: GermaNet editing tool GernEdiT – The GermaNet Editing Tool Demo GermaNet-Explorer (University of Dortmund) Demo Web Application Demo Thank you. Verena Henrich & Erhard Hinrichs Department of Linguistics University of Tübingen Wilhelmstr. 19 72074 Tübingen Germany [email protected] [email protected] Links & References • GermaNet homepage: • GermaNet web application: • Verena Henrich and Erhard Hinrichs: Determining Immediate Constituents of Compounds in GermaNet. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2011), Hissar, Bulgaria, 2011. • Verena Henrich, Erhard Hinrichs, and Tatiana Vodolazova: Semi-Automatic Extension of GermaNet with Sense Definitions from Wiktionary. In Proceedings of 5th Language & Technology Conference (LTC 2011), Poznań, Poland, 2011. • Verena Henrich and Erhard Hinrichs: GernEdiT - The GermaNet Editing Tool. In Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), Valletta, Malta, 2010. • Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. The MIT Press, 1998. • Princeton WordNet web application:
