AGDISTIS - VideoLectures.NET

Transcription

AGDISTIS - Graph-based Disambiguation of Named
Entities
Ricardo Usbeck1,2 Axel-Cyrille Ngonga Ngomo1 Michael Röder1,2
Daniel Gerber1 Sandro Athaide Coelho3 Sören Auer4 Andreas
Both2
1 University
2 R & D,
3 Federal
4 University
Usbeck et al. (AKSW)
of Leipzig, Germany
Unister GmbH, Germany
University of Juiz de Fora, Brazil
of Bonn & Fraunhofer IAIS, Germany
AGDISTIS
1 / 19
Motivation
1
Every minute on the Document
Web ...
278,000 tweets
41,000 Facebook posts
571 new websites
2
In contrast
Linked Data Web is mostly
static
Most data is encyclopedic
Lack of actuality
AGDISTIS
2 / 19
Motivation
1
Every minute on the Document
Web ...
278,000 tweets
41,000 Facebook posts
571 new websites
2
In contrast
Linked Data Web is mostly
static
Most data is encyclopedic
Lack of actuality
Solution
Deploy scalable knowledge extraction to bridge between unstructured and
structured data
AGDISTIS
2 / 19
Named Entity Disambiguation
Drawbacks
1
Poor performance on Web documents
2
Current approaches rely on exhaustive data mining methods or
algorithms with non-polynomial time complexity
3
Partly difficult to port to other languages
AGDISTIS
3 / 19
Named Entity Disambiguation
Drawbacks
1
Poor performance on Web documents
2
Current approaches rely on exhaustive data mining methods or
algorithms with non-polynomial time complexity
3
Partly difficult to port to other languages
Goal
1
Design accurate knowledge-base-agnostic approach
2
Ensure polynomial time complexity
3
Provide easy portability to other languages
AGDISTIS
3 / 19
Overview
AGDISTIS
4 / 19
Entity Recognition
Example
Barack Obama arrived this afternoon in Washington, D.C..
By default we use FOX for named entity recognition
Figure: All token-based.
Figure: All entity-based.
Figure: All dataset.
AGDISTIS
5 / 19
Candidate Generation
Given: Set of entity labels
Output: Set of candidate resources for each label
Greedy Approach: Merge objects of labeling properties (e.g.,
rdfs:label, skos:prefLabel, . . .) and surface forms (if available).
Select all resources with label similarity larger than θ
AGDISTIS
6 / 19
Example
AGDISTIS
6 / 19
Example
Example (List of Candidates)
Barack Obama: dbr:Barack Obama, dbr:Barack Obama,Sr.
Washington, D.C.: dbr:Washington D.C.,
dbr:Washington D.C. (novel), . . .
AGDISTIS
6 / 19
Breadth-First Search and HITS
Given: Set of resources for each label, i.e., set of nodes
Output: Highest ranked resource for each label
Method:
Breadth-first search from each initial resource
Run HITS algorithm on this graph
Choose resource with highest authority for each label
AGDISTIS
7 / 19
Example II
AGDISTIS
8 / 19
Example III
Node
xa
dbr:Barack Obama
dbr:Barack Obama, Sr.
dbr:Washington, D.C.
dbr:Washington, D.C. (novel)
AGDISTIS
0.273
0.089
0.093
0.000
9 / 19
Experimental Setup
Goal
Measure the accuracy of AGDISTIS on different languages
Baseline: State-of-the-art
frameworks (AIDA, Spotlight,
TagMe2)
Evaluation measure: Micro
F-measure
Nine datasets in three
languages (7x English, 1x
Chinese, 1x German)
Default settings:
d = 2, θ = 0.82
AGDISTIS
10 / 19
Evaluation
Best θ ∈ [0.8, 0.9] across all datasets
d = 2 best in all experiments
AGDISTIS
11 / 19
Evaluation
English datasets
Deployed AGDISTIS on DBpedia and YAGO2
Corpus
AGDISTIS
AGDISTIS
AIDA
Spotlight
K
DBpedia
YAGO2
YAGO2
DBpedia
Reuters
RSS-500
AIDA-YAGO2
F-measure
0.78
0.75
0.73
F-measure
0.60
0.53
0.58
F-measure
0.62
0.6
0.83
F-measure
0.56
0.56
0.57
AGDISTIS
12 / 19
Evaluation
Benchmark datasets from BAT framework (English)
Dataset
Approach
F1-measure
Precision
Recall
AIDA/CONLL-TestB
TagMe 2
DBpedia Spotlight
AGDISTIS
0.565
0.341
0.596
0.58
0.308
0.642
0.551
0.384
0.556
AQUAINT
TagMe 2
DBpedia Spotlight
AGDISTIS
0.457
0.26
0.547
0.412
0.178
0.777
0.514
0.48
0.422
IITB
TagMe 2
DBpedia Spotlight
AGDISTIS
0.408
0.46
0.31
0.416
0.434
0.646
0.4
0.489
0.204
MSNBC
TagMe 2
DBpedia Spotlight
AGDISTIS
0.466
0.331
0.761
0.431
0.317
0.796
0.508
0.347
0.729
AGDISTIS
13 / 19
Chinese AGDISTIS
Support a non-European language
Benchmark: QALD4 queries
200 questions in the training data
50 questions in the test data
F-measure between 65% (training data) and 70% (test data).
AGDISTIS
14 / 19
German AGDISTIS
news.de Dataset (N3 collection)
Collected from web news portal news.de
53 documents, 627 named entities
AGDISTIS: 0.87 F1-measure, Spotlight: 0.84 F1-measure
AGDISTIS
15 / 19
GERBIL
Annotators
...
GERBIL
Web service calls
Natural Language
Interchange
Format
Matching
BAT-Framework for
Entity Annotators
by Cornolti et al.
(WWW ‘13)
Experiment type
only results are
(measures) transferred,
not annotations
Online User Interface
Natural Language
Interchange
Format
NLP Interchange Format (Pull via HTTP or HDD lookup)
Datasets
OPEN
http://github.com/AKSW/GERBIL
AGDISTIS
16 / 19
Demo
http://agdistis.aksw.org/demo/
AGDISTIS
17 / 19
Conclusion
Presented AGDISTIS
Polynomial time complexity
Greedy and knowledge-base
agnostic
Multilingual (English, German
and Chinese, more to come)
High accuracy on diverse
knowledge bases and test
datasets
Future Work
Include graph summarization
Extension to more languages
Combination with other
approaches
AGDISTIS
18 / 19
That’s all Folks!
Thank you!
Questions?
Axel Ngonga
AKSW Research Group
[email protected]
http://github.com/AKSW/AGDISTIS
http://agdistis.aksw.org/demo
Live Demo: Oct. 21st, Stand 79
AGDISTIS
19 / 19

AGDISTIS - VideoLectures.NET

Transcription

Similar documents

Untitled

2009-2010 Facts

John B. Cade Library Reference Department (225) 771‐2875

Review of DBR and Marc Bamuthi Joseph`s

extremesearch help key

Obama-extra for Schools Gategory

Luminator X Halogen Auxiliary Spotlight Brief Information

She doth teach the torches to burn bright

Greta Van Susteren reveals

Barack Obama Famous People

BARACK OBAMA AND JOE BIDEN: NEW ENERGY FOR AMERICA

Semantic Tags Generation and Retrieval for Online Advertising

Wikidata through the Eyes of DBpedia