Information Extraction and the Semantic Web

Transcription

Information Extraction and the Semantic Web
Information Extraction
and the Semantic Web
Lecture 3:
Knowledge representation
continued
There is optional supplementary material available
here
Thanks for your feedback!
The feedback was overwhelmingly
positive:
Results
Proposals for improvement:
• get bigger room
• fix blackboard
• speak more slowly: will try
• make it a 6 CP class: yes, next time
• move exam:
Participate in Doodle!
”What is important?”
The important things are the Def’s
and Tasks. Here is a summary so far:
Classes
Relation
loves:
Tuples/
Facts
Person
Person
Intensity
Irma
Mr.Bean
1.0
Mr.Bean
Irma
0.227
Entities
Literals
”It’s confusing”
Philosophy is always confusing :-(
It is much more honest
to have the question
without knowing the answer,
than to have the answer
without knowing the question.
If confused, stick to the Def’s and Tasks!
Ignore the Digressions.
More Feedback
• Make a break: will disrupt class :-(
• Air conditioning: agreed
• Is there additional material?
Not for the exam. But see references
at the end of each lecture.
• Do I have to get all the Def’s right?
Not verbatim, but you should be able
to explain the concepts with your words.
• Can you show applications? See Lecture 1
”Show practical examples”
Everything we have seen so far is
implemented 1:1 in real KBs:
Mr. Bean in YAGO
Atkinson in YAGO
Types of Atkinson
Subclasses
”Make a quizz”
Sure! We will make the ”Tasks” as quizzes!
Please bring paper and a pen (+ 1 brain)!
Overview
• Names/Labels
• Meta Entities
• Reification
• Types of Knowledge
• Canonicity
• WordNet
Reminder: Relations
Binary relations can be represented as
• relation
drives ⊆ person × car
drives(M rBean, M ini1000)
• table
drives
Person
MrBean
Car
Mini1000
• graph
drives
• triple store
Subject
Relation
Object
MrBean
drives
Mini1000 9
Task: Functions
Which of the following relations
are functional?
r1
r2
r2
r2
r1
r1
r2
r3
r3
r3
10
Def:Name
A name (also: label) of an entity is
a human-readable string attached
to that entity.
The entity
Entity
Name
is called the
meaning of
the name.
”Mr. Bean”
11
Label
”Label” is a binary relation that holds
between an entity and its name.
label
”Mr. Bean”
12
Def:Synonymy
If an entity has multiple names,
the names are called synonymous.
(The adjective for the names is ”synonymous”, each name is a ”synonym”, the phenomenon is called ”synonymy”)
label
”The King”
label
”Elvis”
Def:Ambiguity
If a name is attached to multiple entities,
the name is called ambiguous.
(The adjective for the names is ”ambiguous”, the phenomenon is called ”ambiguity”)
label
”The King”
label
Task:Names
List some entities with their names,
some ambiguous names,
and some synonyms.
15
Knowledge Representation
• Names/Labels
• Meta Entities
• Reification
• Types of Knowledge
• Canonicity
• WordNet
Tuples and classes
A relation tuple contains entities.
Can it also contain classes?
sings(Alizee, P opSongs)
Entity
Class
?
Def:Class Entities
A class entity is an entity that
represents the class.
The class entity ”PopSongs” represents
popSongs = {M oiLolita, IslaBonita, ...}
”PopSongs” is an entity and
”popSongs” is a set of entities.
RDFS Vocabulary
18
Example: Class Entity
The class entity
”Marianne”
represents the
f renchP eople = {
class ”French
People”
Hollande, P iaf, Alizee, ...}
The Class Class
”Class” is the class of all class entities.
class = {Cars, Cities, Rivers, ...}
This class can appear in relations
likes ⊆ person × class
likes = {< Alizee, T attoos >,
< M onroe, Shoes >, ...}
20
SubClassOf
SubClassOf is a binary relation on
class entities, which contains <x,y>
if the class represented by x is a
subclass of the class represented
by y.
subClassOf ⊆ class × class
subClassOf (Singers, P ersons)
subClassOf (Cars, V ehicles)
21
Type
Type is a binary relation, which
contains <x,y> if x is an instance
of the class represented by y.
type ⊆ entity × class
type(Alizee, Singer)
type(Elvis, livingP eople)
Task: Class entities
Draw a knowledge graph with
the relations subClassOf and type.
23
Def:Relation Entity
A relation entity is an entity that
represents the relation.
The relation entity ”Likes” represents
likes = {< Alizee, T attoos >, ...}
Now, ”Likes” is an entity (as opposed
to ”likes”, which is a set of pairs).
Def: Dom
”dom” is a relation on binary relation
entities and class entities,
which contains <x,y> if the domain
of the relation represented by x
is the class represented by y.
dom ⊆ relation × class
dom(BornInCity, P erson)
25
Def: Ran
”ran” is defined like ”dom” and
identifies the range of a relation.
ran ⊆ relation × class
ran(BornInCity, City)
26
Task: Meta Relations
Draw a knowledge graph with the
relations dom and ran.
Can dom and ran appear as nodes?
27
Digression:Class&Relation
A fact can be modeled as a class
or as a relation.
type
type
woman
singer
gender
job
female
singer
Knowledge Representation
• Names/Labels
• Meta Entities
• Reification
• Types of Knowledge
• Canonicity
• WordNet
n-ary facts as binary facts
Every n-ary fact can be represented
as binary facts.
drives
Person
MrBean
Car
Mini1000
car
destination
Nice
MrBean
person
MrBeanVacation
Destination
Mini1000
Nice
Def: Event Entity
An event entity represents an n-ary fact.
Event entity
MrBean
person
MrBeanVacation
car
destination
Mini1000
Nice
Task: Event Entities
Draw a knowledge graph for the following
facts.
Irma loves Mr. Bean since 1955.
Mr. Bean drives with Irma to the cinema.
Irma and Mr. Bean watch ”Titanic”.
The movie is about the trip of the ship
”Titanic” from Europe to New York.
(There may be multiple solutions)
32
Binary relations are flexible
n-ary relations enforce the presence
of all arguments:
born
Person
City
Atkinson
Consett
Year
1955
Binary relations don’t:
1955
1955
Binary vs n-ary
Binary and n-ary relations can
represent the same facts.
binary
n-ary
• more relations
• less arity
• less relations
• more arity
• more flexibility
• more control
Task: Representation
Represent the following statement in
4 different ways:
Rowan Atkinson
is an actor who plays Mr Bean.
Def:Reified statement
A reified statement is an entity that represents
a statement. This phenomenon is called reificati
Alizee
represents
completed
Danse Classique
Dance
School
36
Reification Vocabulary
statement
= set of reified statements
subject ⊆ statement × entity
predicate ⊆ statement × relation
object ⊆ statement × entity
subject
predicate
s41
object
Alizee
completed
Dance
School
Example: Reification
hopes(P ierre, s42)
subject(s42, Alizee)
predicate(s42, type)
object(s42, single)
Simplified notation:
hopes(P ierre, type(Alizee, single))
The represented statement itself
is not necessarily in the KB!
Pierre
Task: Reification
Write down a knowledge base
with some reified facts.
Can you reify facts that have reified arguments?
39
Knowledge Bases
• Names/Labels
• Meta Entities
• Reification
• Types of Knowledge
• Canonicity
• WordNet
Structured Information
There are several types of
structured information:
• spatial
• factual
• taxonomic
• lexical
• multilingual
• phrasal
• meta
• common-sense
• multimodal
• epistemic
• temporal
41
Factual Knowledge
Factual knowledge concerns relationships
between entities.
Alizee is a French
singer from Ajaccio,
Corsica.
Alizee hasNationality
Alizee isFrom
Ajaccio isLocatedIn
French
Ajaccio
Corsica
Taxonomic Knowledge
Taxonomic knowledge concerns classes.
Singers
Pop
Singers
Alizee, a pop singer
from France, has sold more
than 5 million records.
AlizeeAmerica
Lexical Knowledge
Lexical knowledge concerns labels,
words, and properties of words.
Also known by her
nickname ”Lili”, Alize
started dancing early.
Wikipedia/Alizee
nickname
label
fullName
(Please also look
at the technical
content of this slide)
”Lili”
”Alizee”
”Alize Jacotey”
Multilingual Knowledge
Multilingual knowledge concerns
labels in different languages.
”France”
”Fronkraisch”
”??????”
”??????????????”
”Francio”
Phrasal Knowledge
Phrasal knowledge is about small
groups of words that stand together
as a conceptual unit.
GoogleDef
PATTY
Temporal Knowledge
Temporal knowledge concerns the time
of events and facts.
Alize started dancing early in her
Event
life, and by age four was already
dance
proficient Age 4
Age 5
enroll
proficient. A year later, she was
enrolled in Ajaccio’s Ecole de
Danse, and trained there until she
was 15. In 1995, at the age of 11,
she won a competition of a French
airline. The aircraft was
subsequently named after her.
Time
early
train
won
Age 5-15
won
1995
born
1984
Age 11
Spatial Knowledge
Spatial knowledge concerns the location
of events, entities, and facts.
Meta Knowledge
Meta knowledge is knowledge
about knowledge.
Structured Knowledge of
NELL about Alizee:
Meta knowledge about ”Alizee is a celebrity”
says where, when, and how this fact was found.
Common Sense Knowledge
Common sense knowledge concerns facts
of ordinary sensible understanding.
Myspace
daughter(x, y) ⇒ gender(x, f emale)
mother(x, y) ⇒ loves(x, y)
mother(x, y) ⇒ ∃z : f ather(z, y)
Wikionary
Multimodal Knowledge
Multimodal knowledge concerns information
that is not written text: audio,video,haptic,etc.
hasPicture
Alizee
sings
MoiLolita.mp3
appearsIn
Youtube
Epistemic Knowledge
Epistemic knowledge concerns beliefs.
Hellocoton
believes(Hellocoton, divorced(Alizee, Jeremy))
Knowledge Representation
• Names/Labels
• Meta Entities
• Reification
• Types of Knowledge
• Canonicity
• WordNet
Def: Canonic Entities
An entity is canonic in a KB,
if there is no other entity in the KB that
represents the same real-world object.
Alizee
A. Jacotey
...
produced
produced
...
Gourmandises
Psychdlices
...
not canonical
(Here, we distinguish exceptionally between an entity in the KB, and the real world object. This distinction is correct, but rarely necessary otherwise, which is why this lecture does not make the difference)
54
Def: Canonic Relations
An relation is canonic in a KB,
if there is no other relation in the KB that
represents the same real-world relation.
Alizee
Alizee
...
produced
Gourmandises
hasProduced Psychdlices
...
...
not canonical
(Here, we distinguish exceptionally between a relation in the KB, and the real world relation. This distinction is correct, but rarely necessary otherwise, which is why this lecture does not make the difference)
55
Use of Canonicity
Canonicity is essential for
• counting
• answering queries
• constraint satisfaction
Alizee
Alizee
...
produced
Gourmandises
hasProduced Psychdlices
...
...
not canonical
Canonicity and Names
A canonic entity can have multiple names.
Alizee
produced
Alizee
produced
Alizee
label
Alizee
produced
label
”Alizee”
”A. Jacotey”
label
”produced”
produced
...
label
...
”has produced”
...
Gourmandises
Psychdlices
Canonicity is not easy
Jacotey is considered one of the ”100 Se
women of the world”. The singer said i
that Alizee is married, but she lives sep
Example: Non-Canonicity
”Tell me about Alize”
near duplicates
not Alizee
not an entity
TextRunner
Example: Canonicity
YAGO
Example: Non-Canonicity
”Who built the pyramids?”
correct
not bad
less likely
duplicate
useful
TextRunner
Example: Canonicity
No answer to
”Who built
the pyramids”
YAGO
Canonicity as Trade-Off
non-canonic
• easier to extract
• less easy to use
canonic
• difficult to
• more noise
extract
• easy to use
• more data
• less noise
• less data
Knowledge bases
• Names/Labels
• Meta Entities
• Reification
• Types of Knowledge
• Canonicity
Digression: Reality
We model reality by a representation.
person
Alize
Monroe
female
Digression: Reality
Our identifiers are arbitrary names.
A16
A17
A14
A15
Digression: Reality
Can we reconstruct reality
from our model?
?
A16
A17
A14
?
A15
Digression: Reality
Most likely no: A Chinese dictionary
is a model of the world...
...yet, by reading it,
you cannot learn Chinese.
References
RDF Primer
69