Anthony Fader, Stephen Soderland, and Oren Etzioni

Transcription

Anthony Fader, Stephen Soderland, and Oren Etzioni
Identifying Relations for
Open Information
Extraction
Anthony Fader, Stephen Soderland, and Oren Etzioni
2011
TextRunner TR2
TR3 WOE ReVerb OLLIE
2007 2008 2009 2010
2011
2012
Agenda
•
•
•
Solution overview
Solution details
Evaluation
Why?
WOE and TextRunner
extractions are
incoherent,
uninformative
Why?
What?
WOE and TextRunner
extractions are
incoherent,
uninformative
Improve the quality
Why?
What?
WOE and TextRunner
extractions are
incoherent,
uninformative
Improve the quality
How?
Add constraints both
syntactic
and
lexical
Why?
WOE and TextRunner
extractions are
incoherent,
uninformative
Samples
Incoherentextraction phrase
has no meaningful interpretation
Why?
WOE and TextRunner
extractions are
incoherent,
uninformative
Samples
Incoherentextraction phrase
has no meaningful interpretation
{
The Extractor makes a decision
about each word separately
}
up to 13-30% of output are incoherent
Why?
WOE and TextRunner
extractions are
incoherent,
uninformative
How?
Syntactic constraint:
multiword relation
1. begins with a verb;
2. end with a preposition;
3. is a contiguous sequence
of words in a sentence;
{… made a deal with …}
Samples
Incoherentextraction phrase
has no meaningful interpretation
{
The Extractor makes a decision
about each word separately
}
up to 13-30% of output are incoherent
Why?
WOE and TextRunner
extractions are
incoherent,
uninformative
Samples
Uninformative extraction that
omit critical information
Why?
WOE and TextRunner
extractions are
incoherent,
uninformative
Samples
Uninformative extraction that
omit critical information
{
The Extractor handles improperly
verb-noun relation phrases (LVC)
}
4-7% are uninformative
Ex. Faust made a deal with the devil.
(Faust, made, a deal)-(Faust, made a deal with, the devil)
Why?
WOE and TextRunner
extractions are
incoherent,
uninformative
How?
Samples
Uninformative extraction that
omit critical information
{
The Extractor handles improperly
verb-noun relation phrases (LVC)
}
4-7% are uninformative
Syntactic constraint:
multiword relation
1. begins with a verb;
2. end with a preposition;
3. is a contiguous sequence
of words in a sentence;
{… made a deal with …}
Ex. Faust made a deal with the devil.
(Faust, made, a deal)-(Faust, made a deal with, the devil)
Demo time
Are we perfect now?
No, overly specific relation phrases
Example
The Obama administration
is offering only modest greenhouse gas reduction targets at
the conference
No, overly specific relation phrases
Example
[The Obama administration]
is offering only modest greenhouse gas reduction targets at
[the conference]
No, overly specific relation phrases
Example
[The Obama administration]
{is offering only modest greenhouse gas reduction targets at }
[the conference]
How: what is inside?
Constraints
Syntactic
How: what is inside?
Constraints
Syntactic
How: what is inside?
Constraints
Syntactic
Lexical
How: what is inside?
Constraints
Syntactic
Lexical
Valid relation phrase
should take many distinct
arguments in a large corpus
How: what is inside?
Constraints
[ arg1 - rel - arg2 ]
common relation for OIE
Syntactic
Lexical
Valid relation phrase
should take many distinct
arguments in a large corpus
How: what is inside?
Constraints
[ arg1 - rel - arg2 ]
common relation for OIE
Syntactic
Lexical
Valid relation phrase
should take many distinct
arguments in a large corpus
Extendicare agreed to buy Arbor Health Care
for about US $432 million in cash and assumed debt.
How: what is inside?
Constraints
[ arg1 - rel - arg2 ]
common relation for OIE
Syntactic
Lexical
Valid relation phrase
should take many distinct
arguments in a large corpus
“Extendicare agreed to buy Arbor Health Care
for about US $432 million in cash and assumed debt.”
TextRunner output: (Arbor Health Care, for assumed, debt).
First evaluation: how much do we lose?
Loosing recall
First test set
-
Random web pages
300 sentences
327 verb relation phrases
First evaluation: how much do we lose?
Loosing recall
First test set
-
Random web pages
300 sentences
327 verb relation phrases
?
First evaluation: how much do we lose?
Loosing recall
First test set
-
Random web pages
300 sentences
327 verb relation phrases
dependency parsers
are still slow on web-scale
ReVerb
Sentence
POSed
(X1; R1; Y1)
(X2; R2; Y2)
…
(Xn; Rn; Yn)
ReVerb
1. Relation extraction
Sentence
POSed
2. Argument extraction
(X1; R1; Y1)
(X2; R2; Y2)
…
(Xn; Rn; Yn)
ReVerb
Sentence
POSed
1. Relation extraction
(find a verb->
expand it satisfying constraints)
2. Argument extraction
(find the nearest on the left,
find the nearest on the right)
(X1; R1; Y1)
(X2; R2; Y2)
…
(Xn; Rn; Yn)
ReVerb
Sentence
POSed
1. Relation extraction
(find a verb->
expand it satisfying constraints)
2. Argument extraction
(find the nearest on the left,
find the nearest on the right)
(X1; R1; Y1)
(X2; R2; Y2)
…
(Xn; Rn; Yn)
We
talk about Open Information Extraction
PRP VBP IN
NNP NNP
NNP
B-NP B-VP B-PP B-NP I-NP
I-NP
ReVerb
Sentence
POSed
1. Relation extraction
(find a verb->
expand it satisfying constraints)
2. Argument extraction
(find the nearest on the left,
find the nearest on the right)
(X1; R1; Y1)
(X2; R2; Y2)
…
(Xn; Rn; Yn)
We
[talk] about Open Information Extraction
PRP [VBP ] IN
NNP NNP
NNP
B-NP [B-VP] B-PP B-NP I-NP
I-NP
ReVerb
Sentence
POSed
1. Relation extraction
(find a verb->
expand it satisfying constraints)
2. Argument extraction
(find the nearest on the left,
find the nearest on the right)
(X1; R1; Y1)
(X2; R2; Y2)
…
(Xn; Rn; Yn)
We
[talk about] Open Information Extraction
PRP [VBP IN
] NNP NNP
NNP
B-NP [B-VP B-PP ] B-NP I-NP
I-NP
ReVerb
Sentence
POSed
1. Relation extraction
(find a verb->
expand it satisfying constraints)
2. Argument extraction
(find the nearest on the left,
find the nearest on the right)
(X1; R1; Y1)
(X2; R2; Y2)
…
(Xn; Rn; Yn)
{We } [talk about] Open Information Extraction
{PRP } [VBP IN
] NNP NNP
NNP
{B-NP} [B-VP B-PP ] B-NP I-NP
I-NP
ReVerb
Sentence
POSed
1. Relation extraction
(find a verb->
expand it satisfying constraints)
2. Argument extraction
(find the nearest on the left,
find the nearest on the right)
(X1; R1; Y1)
(X2; R2; Y2)
…
(Xn; Rn; Yn)
{We } [talk about] {Open Information Extraction}
{PRP } [VBP IN
] {NNP NNP
NNP
}
{B-NP} [B-VP B-PP ] {B-NP I-NP
I-NP
}
ReVerb
Sentence
POSed
1. Relation extraction
(find a verb->
expand it satisfying constraints)
2. Argument extraction
(find the nearest on the left,
find the nearest on the right)
(X1; R1; Y1)
(X2; R2; Y2)
…
(Xn; Rn; Yn)
{We } [talk about] {Open Information Extraction}
{PRP } [VBP IN
] {NNP NNP
NNP
}
{B-NP} [B-VP B-PP ] {B-NP I-NP
I-NP
}
(‘we’; ’talk about’; ‘open information extraction’)
But!
Still low precision
even though recall is comparatively high
But!
Still low precision
even though recall is comparatively high
?
Confidence function
train conf. function based on
Web+Wiki,
all extractions from 1000 sent.
Confidence function
train conf. function based on
Web+Wiki,
all extractions from 1000 sent.
Confidence function
train conf. function based on
Web+Wiki,
all extractions from 1000 sent.
Evaluation results
500 sent.
from Web
2 judges,
0.68 agr.
Evaluation results
500 sent.
from Web
2 judges,
0.68 agr.
boost over lex
flow
TR & WOE
1.
2.
3.
Auto Labeling the sentence
An extractor is learned
using sequence labeling
graphical model
extraction: arguments, label
the relation between args
as part of relation phrase
ReVerb
1.
2.
Relation extraction
Argument extraction
Evaluation results
Achievements
Achievements
Incoherent extractions elimination if much better
Achievements
Incoherent extractions elimination if much better
Outperforming precision-recall
Achievements
Incoherent extractions elimination if much better
Outperforming precision-recall
Faster
ReVerb Error analysis
Possible improvements
Precision
ReVerb Error analysis
Possible improvements
Recall
Thank you
Open Language Learning
for Information Extraction
Mausam, Michael Schmitz, Robert Bart,
Stephen Soderland, and Oren Etzioni
TextRunner TR2
TR3 WOE ReVerb OLLIE
2007 2008 2009 2010
2011
2012
2
Agenda
Introduction
Relation Extraction
Context Analysis
Evaluation
3
why?
Reverb and WOE
V | V P | V W*P Only for verbs
OLLIE
Nouns, adjectives, and more
4
why?
Reverb and WOE
V | V P | V W*P Only for verbs
No context is taken into account
OLLIE
Nouns, adjectives, and more
Including contextual information
5
why?
Reverb and WOE
V | V P | V W*P Only for verbs
No context is taken into account
OLLIE
Nouns, adjectives, and more
Including contextual information
XXXX
XXXX
Not factual extractions
6
why?
Reverb and WOE
V | V P | V W*P Only for verbs
No context is taken into account
The last version (Nov 2015)
doesn’t give these results
OLLIE
Nouns, adjectives, and more
Including contextual information
XXXX
XXXX
Not factual extractions
7
Introduction
+ conf.function
8
Bootstrapping set
Goal
Create large training set
Hypothesis
Every relation
can be expressed in Reverb style
Sentences express original tuple
110th seed tuples
from ReVerb from ClueWeb
(Students,build, bootstrap set)
Extract all sentences
with the same content words
…
(Bootstrap set is built by students)
(while working on OIE, students built the set)
9
Bootstrapping set
Goal
Create large training set
Hypothesis
Every relation
can be expressed in Reverb style
Sentences express original tuple
110th seed tuples
from ReVerb from ClueWeb
(Students,build, bootstrap set)
Extract all sentences
with the same content words
Bootstrapping error reduction
(Bootstrap set is built by students)
(while working on OIE, students built the set)
(students worked on a set of tasks,
workers built a new cafe on the campus)
linear path size<5
10
Open Pattern Learning
Goal
Learn general patterns that
encode diff types of relations
11
Open Pattern Learning
Goal
Learn general patterns that
encode diff types of relations
Open pattern templates
dep path
Sample for 2.:
open extraction
We do interesting things.
12
Open Pattern Learning
Goal
Learn general patterns that
encode diff types of relations
Open pattern templates
dep path
open extraction
But what are syntactic and semantic patterns?
13
Open Pattern Learning
Purely syntactic patterns
There are no slot nodes in the path
Relation node in bw/ (arg1, arg2)
The prep edge in the pattern matches
the prep in relation
Path has no nn and amor edges
14
Open Pattern Learning
Purely syntactic patterns
Semantic/lexical patterns
There are no slot nodes in the path
together with words/types
the pattern is used with
Relation node in bw/ (arg1, arg2)
(generalize words into types
The prep edge in the pattern matches if it’s possible)
the prep in relation
Path has no nn and amor edges
15
Open Pattern Learning
Purely syntactic patterns
Semantic/lexical patterns
There are no slot nodes in the path
together with words/types
the pattern is used with
Relation node in bw/ (arg1, arg2)
(generalize words into types
The prep edge in the pattern matches if it’s possible)
the prep in relation
Path has no nn and amor edges
16
Semantic/lexical patterns
Example
17
Context analysis
Conditional truth
Attribution
ClausalModifier
AttributeTo
advcl dep edge
only if we meet
{if, then, although}
ccomp dep edge
match context verb
with list of
comm/cogn verbs from
VerbNet
18
Comparison
19
20
21
22
Thank you
23

Similar documents