Categories and Cliques
Transcription
Categories and Cliques
Categories, Cliques and Analogies in Creative Information/Knowledge Management Tony Veale (+ students) School of Computer Science and informatics University College Dublin [email protected] http://Afflatus.UCD.ie 1 Cliques, Categories, Analogies, Metaphors and Ontologies Ontologies impose structure on a domain by grouping “like with like” Good categories have high intra-category similarity, low inter-category similarity The members of coherent (tight-knit) categories resemble social “cliques” Members closely connected: new members must bond with all existing members Analogy is a social connective: clique members have complementary roles Like socializes with Like: Consider bands, teams, summits, parties, matches, etc. Like socializes with Like à Like mingles with Like à Like collocates with Like ∴ Text Collocation à Category Co-Membership à Cliques as Categories 2 Categories are tight-knit clusters of instances in a Feature Space Furniture Creature Body_part Time Building E.g., Almuhareb & Poesio (2004/2005), Veale and Hao (2007/2008) 3 Manipulating Categories: The View from Cognitive Science / Ling. 4 Modelling Metaphor: The State of the Art in Computer Science Who wants a car that drinks gasoline? Fragile Systems / Tiny Coverage / Deep Knowledge Bottleneck / No Scalability 5 More Credible (& Useful) Goals: Lower Level, Broader Coverage A Modest Proposal Creative Information Mgmt. (texts, corpora, ngrams) Creative Knowledge Mgmt. (categories, ontologies) Creative information retrieval Metaphor finder / thesaurus Lexicalized Idea Exploration Robust Systems / Broad Coverage / Plentiful Shallow Knowledge / Scalability 6 The Supermarket Principle: Group by Theme and Kind Ontological Categories should reflect both semantic relatedness and similarity 7 Cliques: A Social-Model of Category Structure Brownie George Laura Scooter 3-clique Dick Karl Rummy 4-clique Judith Wolfie Condi Seymour A Clique is a complete sub-graph. Colin A clique is Maximal iff not part of a larger clique. 8 5-clique Linguistic Cliques: Different Levels of Conceptual Grouping Cliques of Proper-Name instances Identified by capitalization patterns, Named-Entity Recognition Instance Cliques E.g., Tiger Woods, Roger Federer and Thierry Henry Cliques of Generic Category References Identified by patterns of coordination among bare plurals Generic Cliques E.g., astronauts, cowboys and pirates churches and priests Cliques of Complex Category Names Higher-level cliques formed from generic and instance cliques. Category Cliques E.g., Catholic church 9 and Catholic priest Analogy as a Social Connective: Cliques imply Cliques Tennis Player Roger Federer Golf Player Tiger Woods Thierry Henry Soccer Player Golf Tennis Soccer Text Collocation à Instance cliques à Category cliques à Domain cliques 10 Mining Collocations from Corpora: Instances, Categories & Domains 5-grams 4-grams 3-grams Roger Federer , Tiger Woods Rafael Nadal and Roger Federer Roger Federer, Andy Roddick Thierry Henry , Roger Federer Tiger Woods , Roger Federer David Beckham, Thierry Henry Tom Cruise & David Beckham Tom Cruise and Katie Holmes Steven Spielberg, Tom Cruise Tom Hanks / Steven Spielberg Dan Brown and Tom Hanks Tiger Woods vs. Thierry Henry : : : : : tennis and golf players tennis / squash players soccer and hockey moms polo and tennis teams squash and tennis courts soccer and rugby fields tennis and soccer fans soccer and tennis players polo and lacrosse teams soccer vs. golf players TV and movie stars radio and TV stars : : : : tennis and golf polo and tennis hockey and boxing boxing and karate players and fans coaches and players golf vs. soccer soccer and rugby soccer versus tennis Hollywood / Bollywood radio and TV actors and directors : : : E.g., we use the Google N-Grams 1T Web Corpus (N ≤ 5) 11 Generic Clique Distribution in Corpus Data (coordinated plurals) bats and balls sticks and stones boxers and wrestlers cowboys and Indians players and fans coaches and players toys and games books and pens judges and lawyers cats and dogs cops and robbers actors and directors : : : stick game ball 12 Instance Clique Distribution in Corpus Data (proper individuals) 5-grams James Gosling Linus Torvalds Larry Wall 13 Roger Federer , Tiger Woods Rafael Nadal / Roger Federer Roger Federer, Andy Roddick Thierry Henry , Roger Federer Tiger Woods , Roger Federer David Beckham, Thierry Henry Tom Cruise & David Beckham Tom Cruise and Katie Holmes Steven Spielberg, Tom Cruise Tom Hanks / Steven Spielberg Dan Brown and Tom Hanks : : : : : ( ~ 30,000 named entities) Learning Inter-Category Connections from Corpus Cliques {LEADER} isa {RELIGIOUS_LEADER} isa isa isa {IMAM} {RABBI} isa religious and tribal leaders (753) religious and secular leaders (745) religious and union leaders (53) religious and political leaders (8557) religious and scientific leaders (115) : : {POPE} {TRIBAL_LEADER} isa {CHIEF, CHIEFTAIN} Assume No Zeugma in compressed coordinations à metaphoric pathways 14 Systematic Mapping between Cliques: Cliques on top of Cliques Scientist 3-grams Experiment Artist laboratory Exhibition scientists and artists experiments & scientists studios and artists scientists and laboratories laboratories / experiments artists and exhibitions laboratories / studios studios and exhibitions exhibitions , experiments scientists and experts portfolios and artists chefs and kitchens : : : Studio Text Collocation à 2-clique counterparts & 3-clique core representations 15 Analogy as a Social Connective (II): Mapping Like-to-Like {founder, publisher} Rolling Stone Playboy Playboy and Rolling Stone Hugh Hefner and Jann Wenner Hugh Hefner Jann Wenner Textual Collocation of Instances à Analogy between collocated Categories 16 Analogy as a Social Connective (III): Analogical Bootstrapping Holocaust hero Raoul Wallenberg (?) Holocaust_hero? Holocaust_hero ISA ISA (?) Oskar Schindler and Raoul Wallenberg Raoul Wallenberg Oskar Schindler Collocation of Instances à Shared Category (?) à Bootstrapping Hypothesis 17 Cliques As Categories-of-Categories for Additional Structure { PLAYER} isa {TENNIS_PLAYER, SOCCER PLAYER, HOCKEY_PLAYER} isa isa {TENNIS_PLAYER} instance {ROGER_FEDERER} {SOCCER_PLAYER} instance {DAVID_BECKHAM} isa {HOCKEY_PLAYER} instance {WAYNE_GRETZKY} Insert Cliques of Categories as New Upper-Ontology Categories in WordNet 18 Clique Distribution at Category Level (analogical mappings) Perl inventor Java inventor Tennis Player Golf Player Linux inventor Soccer Player 19 Comparing Clique Distributions (instance- and analogical- cliques) Linus Torvalds James Gosling Larry Wall Java inventor Perl inventor Linux inventor 20 Afflatus.UCD.ie/dorian 21 Salient Properties: Clique Members often share defining properties Some members of a clique have strong stereotypical associations These properties are then sensibly projected onto co-members Stereotypical Properties E.g., Cowboys, Astronauts and Pirates reckless brave bold Some members belong to different, but parallel, domains The domains are then sensibly projected onto co-members E.g., Astronauts and space Domain Associations scientists E.g., Astronauts and NASA officials 22 Category Properties also socialize in Clique Structures Adjacency matrix of mutually-reinforcing properties acquired from WWW: hot spicy humid fiery dry sultry … hot spicy humid fiery dry sultry … --75 18 6 6 11 … 35 --0 0 0 1 … 39 0 --0 0 0 … 6 15 0 --0 2 … 34 1 1 0 --0 … 11 1 0 0 0 --… … … … … … … … Use the Google query “as * and * as” 23 to acquire associations Creative Information Retrieval: Cliques as Categories View the Google 1T corpus of ngrams as a Lexicalized Idea Space to Explore 24 25 26 27 28 29 30 Feature Space and Categorization Via Clustering: Revisited 13-way clustering of 214 nouns, compared to WordNet Creature Almuhareb & Poesio (2004) Weak text-derived features ~ 60,000 features for 214 nouns Result: 0.855 cluster purity Veale & Hao (2007 / 2008) Strong simile-derived features ~ 7,300 features for 214 nouns Result: 0.902 cluster purity Time Veale & Li (2009) Generic Clique-derived features ~ 8,300 features for 214 nouns Result: 0.934 cluster purity 31 32 33 34 35 Conclusions: Cliques all the way down … • Cliques can be calculated in any adjacency graph (but: NP-hard!) Analysis of large text corpora yields different kinds of adjacency matrix • Cliques at different ontological levels can be mapped together We can find cliques amongst instances, amongst categories, and amongst domains • Analogy within an Ontology can be viewed as a clique-alignment problem Mapping instance cliques à category cliques gives implicit (unlabelled) categories • Analogical Cliques enrich Ontological Structure Afflatus.ucd.ie/dorian E.g., supports intelligent (guided) exploration of large idea spaces (creative IR) 36
Similar documents
ostadt urg , 2015
medium whatsoever, including on the internet. The scanning and posting of articles published in Dance Europe and Danza Europa is strictly forbidden and constitutes a breach of copyright. © Dance Eu...
More information