Untitled

Transcription

Untitled
Outline
1. The gap: Natural language and Semantic Web data
2. The bridge: Ontology lexica
3. Building bridges
Hand-crafting ontology lexica
Learning ontology lexica
Limitations
Outline
1. The gap: Natural language and Semantic Web data
2. The bridge: Ontology lexica
3. Building bridges
Hand-crafting ontology lexica
Learning ontology lexica
Limitations
The Semantic Web
From the human-readable web of documents
to the machine-readable web of data
England
birthPlace
capital
birthDate
London
urbanPopulation
9 787 426
.
Alan Turing
1912-06-23
SELECT ?x
WHERE {
dbpedia:Hallasan dbpedia-owl:elevation ?x .
}
SELECT ?x
WHERE {
dbpedia:Hallasan dbpedia-owl:elevation ?x .
}
.
Hallasan
elevation
1,950 m
• Hallasan is 1,950 m high.
• Hallasan rises to 1,950 m.
• The altitude of Hallasan is 1,950 m.
•
rt
routeSta
Santa Monica
.
Interstate
10
routeEnd
Jacksonville
• Interstate 10 links Santa Monica to Jacksonville.
• Interstate 10 connects Santa Monica with Jacksonville.
•
Bruce. Lee
child
Shannon Lee
child
Wren Keasler
gender
Female
• Wren Keasler is the granddaughter of Bruce Lee.
•
Bruce. Lee
child
Shannon Lee
child
Wren Keasler
gender
Male
• Bruce Lee is the grandfather of Wren Keasler.
•
Outline
1. The gap: Natural language and Semantic Web data
2. The bridge: Ontology lexica
3. Building bridges
Hand-crafting ontology lexica
Learning ontology lexica
Limitations
dbpedia-owl:elevation rdfs:label "elevation"@en ,
"Höhe@de ,
"hoogte"@nl .
Ontology lexica capture rich linguistic information
• word forms
• part of speech
• subcategorization
• meaning
about how ontology elements are verbalized in a particular
language.
lemon (Lexicon Model for Ontologies)
http://lemon-model.net
lemon provides a meta-model for describing
ontology lexica with RDF.
lemon (Lexicon Model for Ontologies)
http://lemon-model.net
lemon provides a meta-model for describing
ontology lexica with RDF.
.
Semantics by reference
.
The meaning of lexical entries is specified by pointing to
elements
in an ontology.
.
The lemon model (core)
LexicalForm
writtenRep:String
form
Lexicon
entry
canonicalForm
otherForm
abstractForm
LexicalEntry
Word
Phrase
language:String
isSenseOf
sense
LexicalSense
prefRef
altRef
hiddenRef
reference
isReferenceOf
Ontology
Part
The lemon model (argument mapping)
Lexical
Entry
sense
synBehavior
isSenseOf
subsense
LexicalSense
propertyDomain
propertyRange
semArg
context:Resource
condition:Resource
definition:Resource
reference
synArg
Argument
marker
isReferenceOf
Ontology
Frame
subjOfProp
objOfProp
isA
Syntactic
Role
Marker
Example
play : LexicalEntry
.
partOfSpeech=verb
Example
: Form
writtenRep="play"@en
canonical form
play : LexicalEntry
.
partOfSpeech=verb
Example
: Form
writtenRep="play"@en
canonical form
play : LexicalEntry
.
partOfSpeech=verb
sense
: LexicalSense
reference
<http://dbpedia.org/ontology/team>
Example
: Form
writtenRep="play"@en
canonical form
: IntransitivePPFrame
synBehavior
play : LexicalEntry
.
partOfSpeech=verb
ct
bje
su
prepositionalObject
x : Argument
sense
: LexicalSense
reference
y : Argument
<http://dbpedia.org/ontology/team>
Example
: Form
writtenRep="play"@en
canonical form
: IntransitivePPFrame
synBehavior
play : LexicalEntry
.
partOfSpeech=verb
ct
bje
su
prepositionalObject
sense
x : Argument
: LexicalSense
reference
y : Argument
<http://dbpedia.org/ontology/team>
marker
: Word
canonicalForm
: Form
writtenRep="for"@en
Example
: Form
writtenRep="play"@en
canonical form
synBehavior
: IntransitivePPFrame
play : LexicalEntry
.
partOfSpeech=verb
ct
bje
su
prepositionalObject
sense
x : Argument
subjOfProp
: LexicalSense
reference
fProp
objO
y : Argument
<http://dbpedia.org/ontology/team>
marker
: Word
canonicalForm
: Form
writtenRep="for"@en
Example
connect : LexicalEntry
.
partOfSpeech=verb
Example
: Form
writtenRep="connect"@en
canonical form
connect : LexicalEntry
.
partOfSpeech=verb
Example
: Form
writtenRep="connect"@en
canonical form
connect : LexicalEntry
.
partOfSpeech=verb
sense
: LexicalSense
subsense
subsense
: LexicalSense
: LexicalSense
reference
reference
dbpedia:routeStart
dbpedia:routeEnd
Example
: Form
writtenRep="connect"@en
canonical form
connect : LexicalEntry
.
partOfSpeech=verb
sense
t
jec
Ob
prepositionalObject
sub
synBehavior
ect
dir
ject
: TransitivePPFrame
y : Argument
x : Argument
: LexicalSense
subsense
subsense
: LexicalSense
reference
dbpedia:routeStart
z : Argument
marker
…
: LexicalSense
reference
dbpedia:routeEnd
Example
: Form
writtenRep="connect"@en
canonical form
t
jec
ject
sense
Ob
prepositionalObject
ect
dir
sub
connect : LexicalEntry
.
partOfSpeech=verb
synBehavior
: TransitivePPFrame
: LexicalSense
y : Argument
x : Argument
objO
subjOfProp
subjO
fProp
fProp
subsense
subsense
: LexicalSense
reference
dbpedia:routeStart
z : Argument
marker
…
objOfProp
: LexicalSense
reference
dbpedia:routeEnd
Outline
1. The gap: Natural language and Semantic Web data
2. The bridge: Ontology lexica
3. Building bridges
Hand-crafting ontology lexica
Learning ontology lexica
Limitations
Outline
1. The gap: Natural language and Semantic Web data
2. The bridge: Ontology lexica
3. Building bridges
Hand-crafting ontology lexica
Learning ontology lexica
Limitations
• lemon lexicon for DBpedia 3.9
• 354 classes (98 %)
• 300 properties (17 %, all those with
10 000 or more occurences)
• English, Spanish, German
https://github.com/cunger/lemon.dbpedia
Lemon design pattern library
https://github.com/jmccrae/lemon.patterns
StateVerb("play", dbpedia-owl:team,
propSubj = Subject,
propObj = prepositionalObject("for"))
Lemon Assistant
Outline
1. The gap: Natural language and Semantic Web data
2. The bridge: Ontology lexica
3. Building bridges
Hand-crafting ontology lexica
Learning ontology lexica
Limitations
Idea
1.
For each predicate P in a data repository (e.g. DBpedia),
collect the set of entities S and O connected through P.
Example: spouse
• Audrey Hepburn
• Albert Einstein
• …
Mel Ferrer
Mileva Maric
Idea
2.
Search a text corpus (e.g. Wikipedia) for all sentences
containing the labels of S and O.
• Mileva Maric, the future wife of Albert
Einstein, was the only woman among the six
students in the mathematics and physics
section.
• Einstein was married to Maric for 16 year.
Idea
3.
For all retrieved sentences, the natural language pattern
connecting both entities is a potential lexicalization of P.
• S, the future wife of O
• S, wife of O
•
appos
S.
prep
wife
. .
pobj
of
.
O.
M-ATOLL
https://github.com/ag-sc/matoll/
BOA
http://aksw.org/Projects/BOA.html
Outline
1. The gap: Natural language and Semantic Web data
2. The bridge: Ontology lexica
3. Building bridges
Hand-crafting ontology lexica
Learning ontology lexica
Limitations
Limitations
rt
.
routeSta
routeEnd
.
child
child
gender
Female
Learning adjective lexicalizations
.
gender
.
nationality
.
religion
Female
"Australian"
"Buddhist"
• lemon
http://lemon-model.net
• W3C community group Ontology Lexica
https://www.w3.org/community/ontolex/
• DBpedia lexicon
https://github.com/jmccrae/lemon.patterns
• M-ATOLL
https://github.com/ag-sc/matoll/
• BOA
http://aksw.org/Projects/BOA.html

Similar documents