Triplestores and SPARQL 1 Content to negotiate and how to SPARQL

Transcription

Triplestores and SPARQL 1 Content to negotiate and how to SPARQL
Triplestores and SPARQL
Notes and Exercises
Kevin Page & John Pybus
Tue 9th July 2013
1 Content to negotiate and how to SPARQL
1.1 More RDF and content negotiation with the Audio File
Repository
1. Use the AskApache HTTP Headers Tool again on with this URI:
http://jamendo.legacy.audiofiles.linkedmusic.org/audiofile/98933
but this time set the accept type to “audio/mpeg”
give the 303 location to your web browser (or if your browser doesn’t
have a handler for audio, download the file then play it)
2. Now use Morph to retrieve and view the RDF. Follow your nose to Jamendo.
Make a note of the Jamendo recordings the audio files are encodings of
(from the RDF).
○
○
3. Also try:
○
○
http://jamendo.legacy.audiofiles.linkedmusic.org/audiofile/71263
http://jamendo.legacy.audiofiles.linkedmusic.org/audiofile/71265
1.2 A simple SPARQL query to the the Audio File
Repository
1. We’ve put an web interface for the Audio File Repository SPARQL endpoint
up at: http://jamendo.legacy.audiofiles.linkedmusic.org/snorql/
a. open it in your browser
b. note that non-human client can connect directly to the endpoint using
the SPARQL protocol
2. When the Collection Builder “grounds” a collection it queries the Audio File
repository with a simple query to find any audio files which encode the
abstract recordings listed in the (ungrounded) collection:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
1
PREFIX mo: <http://purl.org/ontology/mo/>
SELECT ?audiofile WHERE {
<http://dbtune.org/jamendo/track/98933> mo:available_as ?
audiofile .
?audiofile a mo:AudioFile .
}
3. Substituting in the Jamendo recordings you noted above, try out the query
on the Audio File Repository.
1.3 SPARQLing with Jamando at dbtune
1. The Collection Builder creates its collections using the RDF and SPARQL
endpoint available from dtune
○ http://dbtune.org/jamendo/
2. There is a similar SPARQL endpoint web interface for the Jamendo data
○ http://dbtune.org/jamendo/store/user/query
○ be sure to select “SPARQL”
3. Enter the query on slide 9 “SPARQL Queries (3)”
○ Do you receive the expected match?
4. Change the SELECT statement to return all variable by replacing “?
artistname” with “*”, i.e.
○ SELECT * FROM {
○ Resubmit the query
5. Try changing the graph pattern. Instead of matching against the title of the
album find all artists who have made albums that have a 7th track. Some
hints:
Extend the Record pattern to match tracks
Match tracks that have a number 7
Look at the RDF you discovered earlier to find the required concepts
and properties (e.g. http://dbtune.org/jamendo/track/71263 )
6. Verify your results - list and look at the albums that match within the graph
pattern
○
○
○
○ Hint: add “?album” after “?artistname” to the SELECT clause
○ Are there fewer artists with 20 track records?
7. Do some artists have more than one album with 5 tracks? Does this cause
them to be matched multiple times?
○
Try changing the part of the SELECT clause to “SELECT DISTINCT ?
artistname”
2
2 SPARQL in detail
2.1 Why SPARQL?
SPARQL is the query language of the Semantic Web.
•
SPARQL Protocol and RDF Query Language
It lets us:
•
•
•
•
Pull values from structured and semi-structured data
Explore data by querying unknown relationships
Perform complex joins of disparate databases in a single, simple query
Transform RDF data from one vocabulary to another
2.2 Structure of a SPARQL Query
A SPARQL query comprises, in order:
•
•
•
•
•
Prefix declarations, for abbreviating URIs
Dataset definition, stating what RDF graph(s) are being queried
A result clause, identifying what information to return from the query
The query pattern, specifying what to query for in the underlying dataset
Query modifiers, slicing, ordering, and otherwise rearranging query results
# prefix declarations
PREFIX foo: <http://example.com/resources/>
...
# dataset definition
FROM ...
# result clause
SELECT ...
# query pattern
WHERE {
...
}
# query modifiers
ORDER BY ...
2.3 Friend of a Friend (FOAF)
• FOAF (http://www.foaf-project.org/) is a standard RDF vocabulary for
describing people and relationships
• Tim Berners-Lee's FOAF information available at
http://www.w3.org/People/Berners-Lee/card
• For our first query, let's find all the names of people mentioned in Tim's
FOAF file:
find all subjects (?person) and objects (?name) linked with the
foaf:name predicate. Then return all the values of ?name. In other
words, find all names mentioned in Tim Berners-Lee's FOAF file.
3
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
WHERE {
?person foaf:name ?name .
}
• SPARQL variables start with a ? and can match any node (resource or
literal) in the RDF dataset.
• Triple patterns are just like triples, except that any of the parts of a triple
can be replaced with a variable.
• The SELECT result clause returns a table of variables and values that satisfy
the query.
2.3.1
Traversing the graph
Find me the homepage of anyone known by Tim Berners-Lee.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX card: <http://www.w3.org/People/Berners-Lee/card#>
SELECT *
WHERE {
card:i foaf:knows ?known .
?known foaf:homepage ?homepage .
}
• We can use multiple triple patterns to retrieve multiple properties about a
particular resource
• Shortcut: SELECT * selects all variables mentioned in the query.
• By using ?known as an object of one triple and the subject of another, we
traverse multiple links in the graph.
2.3.2
Running a SPARQL query in Gruff
• Select View → Query View
• Copy the SPARQL into the query window
• Click Do Query
• The results will be displayed below
• (you can select items and add them to the graph view)
4
2.4 Exercise
Use the data you loaded into the gruff triplestore in Monday's exercises.
1. Run the query from 2.3.1
2.
•
http://oxpoints.oucs.ox.ac.uk/id/59030245 is the node for the
University Science Area
•
http://data.ordnancesurvey.co.uk/ontology/spatialrelations/withi
n is a property describing one place being contained within another.
Write a query to find the names of buildings which are in within the
University Science Area
2.5 DISTINCT/LIMIT/ORDER BY
Example: DBPedia
• DBPedia (http://dbpedia.org/) is an RDF version of information from
Wikipedia.
• DBPedia contains data derived from Wikipedia's infoboxes, category
hierarchy, article abstracts, and various external links.
• DBpedia contains over 100 million triples.
Find me 50 example concepts in the DBPedia dataset.
SELECT DISTINCT ?concept
WHERE {
?s a ?concept .
} LIMIT 50
• LIMIT is a solution modifier that limits the number of rows returned from a
query. SPARQL has two other solution modifiers:
• ORDER BY for sorting query solutions on the value of one or more variables
• OFFSET, used in conjunction with LIMIT and ORDER BY to take a slice of a
sorted solution set (e.g. for paging)
• The SPARQL keyword a is a shortcut for the common predicate rdf:type,
giving the class of a resource.
• The DISTINCT modifier eliminates duplicate rows from the query results.
2.6 SPARQL filters
Find me all landlocked countries with a population greater than 15
million.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
5
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/property/>
SELECT ?country_name ?population
WHERE {
?country a type:LandlockedCountries ;
rdfs:label ?country_name ;
prop:populationEstimate ?population .
FILTER (?population > 15000000) .
} ORDER BY DESC(?population)
• It would be ORDER BY ?population, or ASC(?population) for ascending
order
• FILTER constraints use boolean conditions to filter out unwanted query
results.
• Shortcut: a semicolon (;) can be used to separate two triple patterns that
share the same subject. (?country is the shared subject above.)
• rdfs:label is a common predicate for giving a human-friendly label to a
resource.
2.6.1
•
•
•
•
•
•
SPARQL built-in filter functions
Logical: !, &&, ||
Math: +, -, *, /
Comparison: =, !=, >, <, ...
SPARQL tests: isURI, isBlank, isLiteral, bound
SPARQL accessors: str, lang, datatype
Other: sameTerm, langMatches, regex
2.7 OPTIONAL/UNION
Dataset: Jamendo
• Jamendo is a community collection of music all freely licensed under
Creative Commons licenses.
• DBTune.org hosts a queryable RDF version of information about Jamendo's
music collection.
• Hosts data on thousands of artists, tens of thousands of albums, and nearly
100,000 tracks.
2.7.1
Finding artists' info - the wrong way
Find all Jamendo artists along with their image, home page, and the
location they're near.
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?img ?hp ?loc
WHERE {
?a a mo:MusicArtist ;
foaf:name ?name ;
foaf:img ?img ;
foaf:homepage ?hp ;
6
foaf:based_near ?loc .
}
• Jamendo has information on about 3,500 artists.
• Trying the query, though, we only get 2,667 results. What's wrong?
2.7.2
Finding artists' info - the right way
Find all Jamendo artists along with their image, home page, and the
location they're near, if any.
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?img ?hp ?loc
WHERE {
?a a mo:MusicArtist ;
foaf:name ?name .
OPTIONAL { ?a foaf:img ?img }
OPTIONAL { ?a foaf:homepage ?hp }
OPTIONAL { ?a foaf:based_near ?loc }
}
• Not every artist has an image, homepage, or location!
• OPTIONAL tries to match a graph pattern, but doesn't fail the whole query if
the optional match fails.
• If an OPTIONAL pattern fails to match for a particular solution, any variables
in that pattern remain unbound (no value) for that solution.
2.7.3
Querying alternatives
Part of the Class hierarchy used by the CLAROS data is shown. What if we
want to find all instances of Man-Made Things? There are two alternatives a
Man-Made Object or a Man Made Feature.
PREFIX crm: <http://purl.org/NET/crm-owl#>
SELECT DISTINCT ?thing
WHERE {
7
{ ?thing a crm:E25_Man-Made_Feature .}
UNION
{ ?thing a crm:E22_Man-Made_Object . }
}
• The UNION keyword forms a disjunction of two graph patterns. Solutions to
both sides of the UNION are included in the results.
2.8 SPARQL endpoints on the web
Many of the SPARQL endpoints to data on the web have forms which allow you to
enter SPARQL directly into a webpage.
Many of the queries shown above have public endpoints:
•
Oxpoints
◦ http://oxpoints.oucs.ox.ac.uk/sparql
CLAROS
•
◦ http://data.clarosnet.org/sparql/
•
DBpedia (more than one)
◦ http://dbpedia.org/sparql
◦ http://dbpedia.org/snorql
•
DBTune/Jamendo
◦ http://dbtune.org/jamendo/store/user/query
◦ (need to manually select SPARQL as defaults to SeRQL)
Others listed at:
•
http://www.w3.org/wiki/SparqlEndpoints
◦ UK gov data: http://data.gov.uk/sparql
2.9 Exercise
1. Try some of the above queries at the relevant endpoint.
2. What query can come up with against either the Gruff store, or one on the
web?
8
3 Further SPARQL
3.1 DESCRIBE
•
The DESCRIBE query result clause allows the server to return
whatever RDF it wants that describes the given resource(s).
•
Because the server is free to interpret DESCRIBE as it sees fit,
DESCRIBE queries are not interoperable.
Example from data.ox: http://data.ox.ac.uk/sparql/
PREFIX oxp: <http://ns.ox.ac.uk/namespace/oxpoints/2009/02/owl#>
DESCRIBE ?x
WHERE {
?x a oxp:Library .
} LIMIT 10
3.2 CONSTRUCT – creating new triples
PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
?X vCard:FN ?name .
?X vCard:URL ?url .
?X vCard:TITLE ?title .
}FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
WHERE {
OPTIONAL { ?X foaf:name ?name . FILTER isLiteral(?name) . }
OPTIONAL { ?X foaf:homepage ?url . FILTER isURI(?url) . }
OPTIONAL { ?X foaf:title ?title . FILTER isLiteral(?title) . }
}
•
CONSTRUCT is an alternative SPARQL result clause to SELECT. Instead of
returning a table of result values, CONSTRUCT returns an RDF graph.
•
The result RDF graph is created by taking the results of the equivalent SELECT query
and filling in the values of variables that occur in the CONSTRUCT template.
•
Triples are not created in the result graph for template patterns that involve an
unbound variable.
3.3 SPARQL 1.1 – more features
•
SPARQL 1.0 became a standard in January, 2008, and included:
◦ SPARQL 1.0 Query Language
◦ SPARQL 1.0 Protocol
◦ SPARQL Results XML Format
9
•
SPARQL 1.1 became a standard in March, 2013, and includes:
◦ SPARQL 1.1 Update - for inserting, deleting, modifying RDF data
◦ SPARQL 1.1 Graph Store HTTP Protocol
◦ SPARQL 1.1 Service Descriptions - describe capabilities of SPARQL
endpoints
◦ SPARQL 1.1 Entailments - how to combine reasoning with SPARQL
◦ SPARQL 1.1 Basic Federated Query
◦ SPARQL Results CSV/TSV Formats
SPARQL 1.0 only dealt with querying data, updating the data was out of scope.
SPARQL 1.1 Update adds a language for managing and updating RDF graphs.
•
•
•
•
•
•
•
INSERT DATA { triples }
DELETE DATA { triples }
[ DELETE { template } ] [ INSERT { template } ] WHERE { pattern }
LOAD uri [ INTO GRAPH uri ]
CLEAR GRAPH uri
CREATE GRAPH uri
DROP GRAPH uri
The SPARQL 1.1 Uniform HTTP Protocol defines how to use RESTful HTTP
requests to affect an RDF graph store.
3.4 SPARQL 1.1 new query features
•
Aggregate queries post-process query results by dividing the solutions into
groups, and then performing summary calculations on those groups.
• As in SQL, the GROUP BY clause specifies the key variable(s) to use to
partition the solutions into groups.
• SPARQL 1.1 defines these aggregate functions: COUNT, MIN, MAX, SUM, AVG,
GROUP_CONCAT, SAMPLE
• SPARQL 1.1 also includes a HAVING clause to filter the results of the query
after applying aggregates.
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX oxp: <http://ns.ox.ac.uk/namespace/oxpoints/2009/02/owl#>
SELECT ?division (count(?building) as ?number_of_buildings)
WHERE {
?division a oxp:Division .
?building a oxp:Building .
?building
dcterms:isPartOf*/^oxp:occupies/dcterms:isPartOf*
?division .
} GROUP BY ?division
10
3.5 Things to try:
1. Try the queries above at data.ox
•
Can you show a name for the university divisions rather than a URI?
2. The query in section 3.6 listed landlocked countries.
•
Can you count those above and below 15000000 people instead?
•
What's the average population of world countries?
11
Part of the SPARQL demo comes from Cambridge Semantics' SPARQL by Example
http://www.cambridgesemantics.com/2008/09/sparql-by-example/
Licensed under a Creative Commons Attribution-Share Alike 3.0 License
12