Slides - Tim Althoff

Transcription

Slides - Tim Althoff
TimeMachine
T I M E L I N E G E N E R AT I O N F O R
K N O W L E D G E -B A S E E N T I T I E S
Tim Althoff (Stanford CS),
Xin Luna Dong, Kevin Murphy, Safa Alai,
Van Dang, Wei Zhang (Google)
Learning about new topics is hard
2
3
Web Search Results
4
Structured Information
5
Our System: TimeMachine
bert Downey Jr. (1965—)
5
Deborah
Falconer
Ben Stiller
Robert
Downey, Sr.
Chaplin
1990
Fiona Apple
Paramount
Pictures
1995
The Party's
Over
Ally McBeal
2000
Susan Downey
Gothika
Iron Man 2
Iron Man
2005
Iron Man 3
The Avengers
2010
2
6
Timeline Generation
Problem Definition
1-hop
event
Robert Downey Jr. (1965—)
April 4, 1965
Deborah
Falconer
Do
B
Ben Stiller
Robert
Downey, Sr.
1985
Chaplin
1990
Fiona Apple
Paramount
Pictures
1995
The Party's
Over
Ally McBeal
2000
Susan Downey
Gothika
Iron Man 2
Iron Man
2005
Iron Man 3
The Avengers
2010
2015
May 4, 2012
Timeline
rIn
rel
Da
sta
te
Robert
Downey Jr.
Entity
2-hop
event
tar
In
•  Timeline needs to be rendered on device
The Avengers
•  Generation must be fast (zoom + interaction)
Subject
7
Timeline Generation Approach
1-hop
event
Robert Downey Jr. (1965—)
April 4, 1965
Deborah
Falconer
B
Ben Stiller
Do
Robert
Downey, Sr.
bert Downey Jr.
List of events
rIn
rel
Da
sta
te
Entity
1985
Chaplin
1990
The Party's
Over
Fiona Apple
Paramount
Pictures
Ally McBeal
1995
2000
Susan Downey
Gothika
Iron Man 2
Iron Man
2005
Iron Man 3
The Avengers
2010
May 4, 2012
Timeline
2-hop
event
tar
In
The Avengers
Subject
8
2
List of Events
Subject Related Entity
Time Description
R. D. Jr Robert D. Sr
1988 In movie
directed by
2004 TV show app.
with
2005 Got married
Jon Bon Jovi
Susan Downey
Iron Man
Avengers
2008 Award for
movie
2010 Acted in movie
…
…
…
9
1. Event Generation
10
Candidate Event Generation
1-hop
event
Do
B
April 4, 1965
May 4, 2012
rIn
rel
Da
sta
te
Robert Downey Jr.
2-hop
event
The Avengers
sta
rIn
Subject
Samuel L Jackson
related through
2-hop event
Related Entity
Timestamp
11
Resulting Candidate Events
Subject Related Entity
Time Description
R. D. Jr -
1965 Was born
The Avengers
2012 Acted in movie
Samuel L Jackson
2012 Co-starred in
The Avengers
…
…
…
12
•  Some non-informative events:
?
any American
à nationality: USA
à founded 1776
•  Frequency Filter: events commonly
associated with a large number of subjects
are unlikely to be interesting (like IDF)
(no “nationality à founded”)
•  Existence Filter: filter out events before
entity begins to exist (no “parent à DOB”)
wallpaperbase.org / en.wikipedia.org
Event Filtering
13
Event Filtering Evaluation
•  87% precision (two indep. raters)
•  Generate many candidate events (Freebase)
14
2. Event Selection
15
Timeline Quality Criteria
1.  Correctness
X
2.  Relevance
X
16
Timeline Quality Criteria (cont.)
3.  Content Diversity
Encourage selection of different entities
US Release
Award
EU Release
vs
Award
Award
US Release
17
Timeline Quality Criteria (cont.)
4.  Temporal Diversity / Layout
?
18
Optimization Problem
Candidate Set:
Objective:
Constraint:
19
(2) Relevance Signals
•  Baseline: global relevance signal
•  Based on # search queries for entity
•  Biased towards popular but unspecific events
20
How to improve relevance signal
•  Use co-occurrences on web scale
marvel.com
•  “Robert Downey Jr” and “May 4, 2012”
occurs 173 times on 71 different webpages
•  US Release date of The Avengers
21
Assigning scores to entities/dates
•  Run NLP tools across
large web corpus of
10B documents
(NER + CoRef)
•  Extract entity-entity and
entity-date co-occurrences
within small windows
•  Normalize counts using Normalized
Pointwise Mutual Information (NPMI)
§  accounts for popular entities/dates
22
Improvements from New Signal
•  Large improvement over global relevance
23
(3) Content Diversity
24
(4) Temporal Diversity
Fill
•  Constraint in optimization problem:
“Event boxes cannot overlap.”
•  Enforce balanced layout during optimization
25
Submodular Optimization
•  Relevance(T) is submodular
§  “Diminishing returns”
§  Less reward for adding to “bigger set”
26
Algorithm & Theoretical Results
•  Fast approximation through lazy-greedy
§  Allows for zoom and interaction
•  Provable approximation guarantee
§  Worst-case: 33% of optimal solution!
§  Still holds with complex constraint!
§  We prove: constraint structure induces
independence family that is a p-system
(Calinescu et al. 2011)
27
3. Evaluation
28
Experimental Evaluation
•  User study on Amazon Mechanical Turk
§  Rating of timelines challenging, possibly
subjective, and no ground truth available
•  Pairwise comparisons using 250 entities
§  Relative judgements
§  Explanations
•  Large-scale
VS
§  >1200 raters
§  >6000 tasks
29
Results
In what fraction of cases do raters prefer our full
method over the ablated baselines?
Global relevance signal
vs Base
Baseline
●
vs Full−E2D
No Date
Rel
●
vs Full−E2E
No Entity
Rel
Removing Date/Entity Cooc
●
No content or
temporal
diversity
vs Full−TD
No Temp.
Div
No Cont.
Div
vs Full−CD
0.5
0.6
0.7
●
●
0.8
0.9
Fraction preferring Full (RPref)
1.0
Full = “everything” = Global + Web Cooc + TD + CD
30
Related Work
•  Document summarization: (Allan et al. 2001)
•  Submodular optimization: (Krause & Golovin 2014),
(Calinescu et al. 2011), (Nemhauser et al. 1978)
•  Maps of information:
(Shahaf et al. 2013)
•  Timelines based on knowledge bases:
(Mazeika et al. 2011), (Tuan et al. 2011), (Wang et al. 2010)
31
Conclusions
•  TimeMachine: Automatic timeline generation
for knowledge base entities
•  Submodular optimization framework
§  Jointly optimizes for relevance, content
diversity, and temporal diversity
§  Proved near-optimal performance guarantees
•  User studies show
§  Web-based co-occurrence signals improve
over baseline model (global importance)
§  Temporal and content diversity are crucial
32
Thanks!
Contact
@timalthoff [email protected]
Check out the demo!
cs.stanford.edu/~althoff/timemachine
Paper / Proofs / Slides
cs.stanford.edu/~althoff
Acknowledgements
Evgeniy Gabrilovich, Arun Chaganty, Stefanie Jegelka,
Karthik Raman, Sujith Ravi, Ravi Kumar, Jeff Tamer,
Patri Friedman, Danila Sinopalnikov, Alexander
Lyashuk, Jure Leskovec, David Hallac, Caroline Suen
33