DSNotify - EuropeanaConnect

Transcription

DSNotify - EuropeanaConnect
DSNotify - Detecting and Fixing
Broken Links in Linked Data Sets
WebS ’09 @ DEXA 2009
Linz, 02/09/2009
Bernhard Haslhofer and Niko Popitsch
Bernhard Haslhofer, Niko Popitsch
Summary
Bernhard Haslhofer, Niko Popitsch
2
<mo:MusicGroup rdf:about="/music/artists/084308bd-1654-436f-ba03-df6697104e19#artist">
<foaf:name>Green Day</foaf:name>
<owl:sameAs rdf:resource="http://dbpedia.org/resource/Green_Day" />
<mo:image rdf:resource="/music/images/artists/7col_in/084308bd-1654-436f-ba03df6697104e19.jpg" />
<foaf:page rdf:resource="/music/artists/084308bd-1654-436f-ba03-df6697104e19.html" />
<mo:musicbrainz rdf:resource="http://musicbrainz.org/artist/084308bd-1654-436f-ba03df6697104e19.html" />
<mo:homepage rdf:resource="http://www.greenday.com/" />
<mo:fanpage rdf:resource="http://www.greendayvideos.com/" />
<mo:fanpage rdf:resource="http://www.greenday.net" />
<mo:imdb rdf:resource="http://www.imdb.com/name/nm1554564/" />
<mo:myspace rdf:resource="http://www.myspace.com/greenday" />
...
...
<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">
<dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Green Day
is an American rock trio formed in 1987. The band has consisted of Billie Joe Armstrong
(vocals, guitar), Mike Dirnt, and Tré Cool for the majority of its existence...
</dbpprop:abstract>
</rdf:Description>
...
<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">
<dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="de">Green Day
[gɹiːn deɪ] ist eine US-amerikanische Punk-Rock-Band, mit der Anfang der 1990er das PunkRevival begann. Die Band wurde 1987 von Billie Joe Armstrong und Mike Dirnt zusammen
mit dem Schlagzeuger John Kiffmeyer alias Al Sobrante als The Sweet Children....
</dbpprop:abstract>
</rdf:Description>
...
...but...
Bernhard Haslhofer, Niko Popitsch
8
Some numbers...
•
Events between DBpedia 3.2 (10/2008) and 3.3
(05/2009)
•
•
•
# resources created: 29449
# resources removed: 4789
# resources moved: 729
Bernhard Haslhofer, Niko Popitsch
9
Link Integrity...
•
is a qualitative property that is given when all links
within and between a set of data sources are valid and
deliver the result intended by the link creator.
•
•
cf. referential integrity in RDBMS
demands a solution that
•
•
detects broken links between resources
provides support for fixing broken links
Bernhard Haslhofer, Niko Popitsch
11
Types of broken links
•
•
•
Removed link targets
•
e.g., resource deleted, server not available anymore, etc.
Moved link targets
•
•
available at another Web location
e.g., reorganization of Web resources
Modified link targets
Bernhard Haslhofer, Niko Popitsch
12
The DSNotify Approach
•
periodically monitor items (resources) in a specific
Linked Data source
•
•
•
extract descriptive features vector for each item
•
if moved, add relationship between “old” and “new”
item
store item + feature vector in index
use feature vectors to detect if items have been
removed or moved to another location
Bernhard Haslhofer, Niko Popitsch
13
Architecture
LOD „consuming“
application
LOD Sources
LOD Source
owl:sameAs
owl:sameAs
monitor
update
* Monitor (feature extraction)
* LOD source
updater
notifications
Event
LOG
querying
* Decider
Decision making
user
Bernhard Haslhofer, Niko Popitsch
Indices
II
RII
AII
* Move Detector (heuristic)
DSNOTIFY
14
Index Interaction
Item Index (II)
t1
Archived Item Index (AII)
Removed Item Index (RII)
http://dbpedia.org/resource/
Green_Day (band)
http://dbpedia.org/resource/
Green_Day (band)
t2
t3
http://dbpedia.org/resource/
band/Green_Day
http://dbpedia.org/resource/
Green_Day (band)
t4
http://dbpedia.org/resource/
band/Alternative/Green_Day
http://dbpedia.org/resource/
band/Green_Day
time
Bernhard Haslhofer, Niko Popitsch
http://dbpedia.org/resource/
Green_Day (band)
15
Move Detection
•
•
is a semi-automatic process
•
•
probability > given threshold: automatic decision
calculate similarity between items based on their
feature vectors using domain-specific heuristics
probability < given threshold: ask expert user
Bernhard Haslhofer, Niko Popitsch
16
DSNotify HTTP Interface
•
•
•
GET http://<server>:<port>/<dsnotify>/item/<uri>
•
find out what happened with an item
GET http://<server>:<port>/<dsnotify>/eventChoice
•
retrieve pending event choices (move / remove)
...
Bernhard Haslhofer, Niko Popitsch
17
Evaluation Plan
t
...
-n
DBpedia 2.0
t
t
-2
DBpedia 3.0
t
-1
DBpedia 3.1
DBpedia 3.2
Diff
Diff
Diff
manual classification
manual classification
manual classification
mv
mv
mv
Bernhard Haslhofer, Niko Popitsch
rm
rm
18
0
rm
Status / Future Work
•
•
•
1st prototype (infrastructure) ready
annotated test-data set based on DBpedia available
Currently working on:
•
•
system for simulating past modifications in DBpedia
the DSNotify evaluation
Bernhard Haslhofer, Niko Popitsch
19
Fixing Your Web since 2009
Backup
Bernhard Haslhofer, Niko Popitsch
21
Evaluation Plan
•
•
Monitor simulated DBpedia evolution (t-n - t0)
Precision / recall of automatic move detection
•
•
with different similarity thresholds
with different heuristics / and feature vectors
Bernhard Haslhofer, Niko Popitsch
22
Linked Data / Web of Data
•
Data management paradigm on the basis of Web
technologies
•
•
•
HTTP, URI, and RDF/S are the key technologies
Applications (not Web browsers) are data consumers
Links between resources play a major role
Bernhard Haslhofer, Niko Popitsch
23

Similar documents

our dramaturg`s guide to the play

our dramaturg`s guide to the play Three-chord punk, youthful rebellion, high energy and surprisingly catchy songs. You could use any of these phrases when writing about the Bay Area band Green Day. Also, staying power. Green Day ma...

More information