Categories and Cliques

Transcription

Categories and Cliques
Categories, Cliques and Analogies
in Creative Information/Knowledge Management
Tony Veale (+ students)
School of Computer Science
and informatics
University College Dublin
[email protected]
http://Afflatus.UCD.ie
1
Cliques, Categories, Analogies, Metaphors and Ontologies
Ontologies impose structure on a domain by grouping “like with like”
Good categories have high intra-category similarity, low inter-category similarity
The members of coherent (tight-knit) categories resemble social “cliques”
Members closely connected: new members must bond with all existing members
Analogy is a social connective: clique members have complementary roles
Like socializes with Like: Consider bands, teams, summits, parties, matches, etc.
Like socializes with Like à Like mingles with Like à Like collocates with Like
∴
Text Collocation à Category Co-Membership à Cliques as Categories
2
Categories are tight-knit clusters of instances in a Feature Space
Furniture
Creature
Body_part
Time
Building
E.g., Almuhareb & Poesio (2004/2005), Veale and Hao (2007/2008)
3
Manipulating Categories: The View from Cognitive Science / Ling.
4
Modelling Metaphor: The State of the Art in Computer Science
Who wants
a
car that
drinks gasoline?
Fragile Systems / Tiny Coverage / Deep Knowledge Bottleneck / No Scalability
5
More Credible (& Useful) Goals: Lower Level, Broader Coverage
A Modest Proposal
Creative Information Mgmt.
(texts, corpora, ngrams)
Creative Knowledge Mgmt.
(categories, ontologies)
Creative information retrieval
Metaphor finder / thesaurus
Lexicalized Idea Exploration
Robust Systems / Broad Coverage / Plentiful Shallow Knowledge / Scalability
6
The Supermarket Principle: Group by Theme and Kind
Ontological Categories should reflect both semantic relatedness and similarity
7
Cliques:
A Social-Model of
Category Structure
Brownie
George
Laura
Scooter
3-clique
Dick
Karl
Rummy
4-clique
Judith
Wolfie
Condi
Seymour
A Clique is a complete
sub-graph.
Colin
A clique is Maximal iff not
part of a larger clique.
8
5-clique
Linguistic Cliques:
Different Levels of Conceptual Grouping
Cliques of Proper-Name instances
Identified by capitalization patterns, Named-Entity Recognition
Instance Cliques
E.g., Tiger Woods, Roger Federer and Thierry Henry
Cliques of Generic Category References
Identified by patterns of coordination among bare plurals
Generic Cliques
E.g., astronauts, cowboys and pirates
churches and priests
Cliques of Complex Category Names
Higher-level cliques formed from generic and instance cliques.
Category Cliques
E.g., Catholic church
9
and
Catholic priest
Analogy as a Social Connective: Cliques imply Cliques
Tennis
Player
Roger
Federer
Golf
Player
Tiger
Woods
Thierry
Henry
Soccer
Player
Golf
Tennis
Soccer
Text Collocation à Instance cliques à Category cliques à Domain cliques
10
Mining Collocations from Corpora: Instances, Categories & Domains
5-grams
4-grams
3-grams
Roger Federer , Tiger Woods
Rafael Nadal and Roger Federer
Roger Federer, Andy Roddick
Thierry Henry , Roger Federer
Tiger Woods , Roger Federer
David Beckham, Thierry Henry
Tom Cruise & David Beckham
Tom Cruise and Katie Holmes
Steven Spielberg, Tom Cruise
Tom Hanks / Steven Spielberg
Dan Brown and Tom Hanks
Tiger Woods vs. Thierry Henry
:
:
:
:
:
tennis and golf players
tennis / squash players
soccer and hockey moms
polo and tennis teams
squash and tennis courts
soccer and rugby fields
tennis and soccer fans
soccer and tennis players
polo and lacrosse teams
soccer vs. golf players
TV and movie stars
radio and TV stars
:
:
:
:
tennis and golf
polo and tennis
hockey and boxing
boxing and karate
players and fans
coaches and players
golf vs. soccer
soccer and rugby
soccer versus tennis
Hollywood / Bollywood
radio and TV
actors and directors
:
:
:
E.g., we use the Google N-Grams 1T Web Corpus (N ≤ 5)
11
Generic Clique Distribution in Corpus Data (coordinated plurals)
bats and balls
sticks and stones
boxers and wrestlers
cowboys and Indians
players and fans
coaches and players
toys and games
books and pens
judges and lawyers
cats and dogs
cops and robbers
actors and directors
:
:
:
stick
game
ball
12
Instance Clique Distribution in Corpus Data (proper individuals)
5-grams
James
Gosling
Linus
Torvalds
Larry
Wall
13
Roger Federer , Tiger Woods
Rafael Nadal / Roger Federer
Roger Federer, Andy Roddick
Thierry Henry , Roger Federer
Tiger Woods , Roger Federer
David Beckham, Thierry Henry
Tom Cruise & David Beckham
Tom Cruise and Katie Holmes
Steven Spielberg, Tom Cruise
Tom Hanks / Steven Spielberg
Dan Brown and Tom Hanks
:
:
:
:
:
( ~ 30,000 named entities)
Learning Inter-Category Connections from Corpus Cliques
{LEADER}
isa
{RELIGIOUS_LEADER}
isa
isa
isa
{IMAM}
{RABBI}
isa
religious and tribal leaders
(753)
religious and secular leaders (745)
religious and union leaders
(53)
religious and political leaders (8557)
religious and scientific leaders (115)
:
:
{POPE}
{TRIBAL_LEADER}
isa
{CHIEF, CHIEFTAIN}
Assume No Zeugma in compressed coordinations à metaphoric pathways
14
Systematic Mapping between Cliques: Cliques on top of Cliques
Scientist
3-grams
Experiment
Artist
laboratory
Exhibition
scientists and artists
experiments & scientists
studios and artists
scientists and laboratories
laboratories / experiments
artists and exhibitions
laboratories / studios
studios and exhibitions
exhibitions , experiments
scientists and experts
portfolios and artists
chefs and kitchens
:
:
:
Studio
Text Collocation à 2-clique counterparts & 3-clique core representations
15
Analogy as a Social Connective (II): Mapping Like-to-Like
{founder, publisher}
Rolling Stone
Playboy
Playboy and Rolling Stone
Hugh Hefner and Jann Wenner
Hugh Hefner
Jann Wenner
Textual Collocation of Instances à Analogy between collocated Categories
16
Analogy as a Social Connective (III): Analogical Bootstrapping
Holocaust hero Raoul Wallenberg (?)
Holocaust_hero?
Holocaust_hero
ISA
ISA (?)
Oskar Schindler and Raoul Wallenberg
Raoul Wallenberg
Oskar Schindler
Collocation of Instances à Shared Category (?) à Bootstrapping Hypothesis
17
Cliques As Categories-of-Categories for Additional Structure
{ PLAYER}
isa
{TENNIS_PLAYER, SOCCER PLAYER, HOCKEY_PLAYER}
isa
isa
{TENNIS_PLAYER}
instance
{ROGER_FEDERER}
{SOCCER_PLAYER}
instance
{DAVID_BECKHAM}
isa
{HOCKEY_PLAYER}
instance
{WAYNE_GRETZKY}
Insert Cliques of Categories as New Upper-Ontology Categories in WordNet
18
Clique Distribution at Category Level (analogical mappings)
Perl
inventor
Java
inventor
Tennis
Player
Golf
Player
Linux
inventor
Soccer
Player
19
Comparing Clique Distributions (instance- and analogical- cliques)
Linus
Torvalds
James
Gosling
Larry Wall
Java
inventor
Perl
inventor
Linux
inventor
20
Afflatus.UCD.ie/dorian
21
Salient Properties: Clique Members often share defining properties
Some members of a clique have strong stereotypical associations
These properties are then sensibly projected onto co-members
Stereotypical
Properties
E.g., Cowboys, Astronauts and Pirates
reckless
brave
bold
Some members belong to different, but parallel, domains
The domains are then sensibly projected onto co-members
E.g., Astronauts and space
Domain
Associations
scientists
E.g., Astronauts and NASA officials
22
Category Properties also socialize in Clique Structures
Adjacency matrix of mutually-reinforcing properties acquired from WWW:
hot
spicy
humid
fiery
dry
sultry
…
hot
spicy
humid
fiery
dry
sultry
…
--75
18
6
6
11
…
35
--0
0
0
1
…
39
0
--0
0
0
…
6
15
0
--0
2
…
34
1
1
0
--0
…
11
1
0
0
0
--…
…
…
…
…
…
…
…
Use the Google query
“as * and * as”
23
to acquire associations
Creative Information Retrieval: Cliques as Categories
View the Google 1T corpus of ngrams as a Lexicalized Idea Space to Explore
24
25
26
27
28
29
30
Feature Space and Categorization Via Clustering: Revisited
13-way clustering of 214 nouns, compared to WordNet
Creature
Almuhareb & Poesio (2004)
Weak text-derived features
~ 60,000 features for 214 nouns
Result: 0.855 cluster purity
Veale & Hao (2007 / 2008)
Strong simile-derived features
~ 7,300 features for 214 nouns
Result: 0.902 cluster purity
Time
Veale & Li (2009)
Generic Clique-derived features
~ 8,300 features for 214 nouns
Result: 0.934 cluster purity
31
32
33
34
35
Conclusions: Cliques all the way down …
• Cliques can be calculated in any adjacency graph (but: NP-hard!)
Analysis of large text corpora yields different kinds of adjacency matrix
• Cliques at different ontological levels can be mapped together
We can find cliques amongst instances, amongst categories, and amongst domains
• Analogy within an Ontology can be viewed as a clique-alignment problem
Mapping instance cliques à category cliques gives implicit (unlabelled) categories
• Analogical Cliques enrich Ontological Structure
Afflatus.ucd.ie/dorian
E.g., supports intelligent (guided) exploration of large idea spaces (creative IR)
36

Similar documents

ostadt urg , 2015

ostadt urg , 2015 medium whatsoever, including on the internet. The scanning and posting of articles published in Dance Europe and Danza Europa is strictly forbidden and constitutes a breach of copyright. © Dance Eu...

More information