Slides

Comments

Transcription

Slides
Semantic Technologies in MarkLogic
Stephen Buxton
Micah Dubinko
John Snelson
April 9 2013
Today's Talk
 Semantics and MarkLogic – An Overview
Stephen Buxton
 APIs and Applications – Making It All Work
Micah Dubinko
 RDF and SPARQL in MarkLogic – The Details
John Snelson
Slide 4
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Disclaimer – Forward-looking Statements
All statements describing future releases and capabilities, estimated
release dates, and content are plans only, and MarkLogic is under no
obligation to develop, include or make available, commercially or
otherwise, any specific feature or functionality in any MarkLogic
product.
Information is provided for general understanding and informational
purposes only, and is subject to change at the sole discretion of
MarkLogic in response to changing customer requirements, market
conditions, delivery schedules and other factors.
Information should not be distributed without written permission
from MarkLogic.
Slide 5
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Rich MarkLogic Applications .. Made Richer
Slide 6
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Rich MarkLogic Applications .. Made Richer
Name: John Smith
Affiliation: IBM
Timezone: PST
Committer: Hadoop
Slide 7
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Search With Real-World Context
Slide 8
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Query Facts, Documents, Values Together
Slide 9
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
HOW?
Slide 10
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What is Semantics?
A different way of organizing and searching information
Slide 11
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What is Semantics?
A different way of organizing and searching information
Data stored in Triples
Expressed as Subject : Predicate : Object
Example:
"John Smith" : livesIn : "London"
"London" : isIn : "England"
Slide 12
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What is Semantics?
A different way of organizing and searching information
Data stored in Triples
Expressed as Subject : Predicate : Object
Example:
"John Smith" : livesIn : "London"
"London" : isIn : "England"
Rules tell us something about the triples
Example:
If (A livesIn X) AND (X isIn Y) then (A livesIn Y)
Inference: "John Smith" : livesIn : "England"
Slide 13
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What is Semantics?
A different way of organizing and searching information
Data stored in Triples
Expressed as Subject : Predicate : Object
Example:
"John Smith" : livesIn : "London"
"London" : isIn : "England"
Rules tell us something about the triples
"John Smith"
livesIn
"London"
livesIn
Slide 14
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
isIn
"England"
What is Semantics?
A different way of organizing and searching information
Data stored in triples
Expressed as Subject : Predicate : Object
Example:
"John Smith" : livesIn : "London"
"London" : isIn : "England"
Rules tell us something about the triples
Example:
If (A livesIn X) AND (X isIn Y) then (A livesIn Y)
Inference: "John Smith" : livesIn : "England"
Language: SPARQL is a language designed to query triples.
It looks a bit like SQL
Slide 15
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Why do you care about Semantics?
Companies and organizations across all verticals
 Publishing: Dynamic Semantic Publishing (BBC)

Manage and leverage facts + documents for a rich user experience
 Pharma: facts about drugs + reports on clinical trials

Find new cures for diseases

Make decisions about what to research next
 Financial Services: reduce risk, comply with regulations

Report on exposure

Know where each piece of data came from
 Government Agencies: facts on file + intelligence reports

Find bad guys
 Civilian Government: Open Data

Slide 17
Open Government through Open Data
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Dynamic Semantic Publishing
BBC Sports
The Challenge
Size and Complexity:
 # of athletes
 # of teams
 # of assets (match
reports, statistics, etc.)
 # of relations (facts)
Goals
 Rich user experience

See information in context

Personalize content

Easy navigation

Intelligently serve ads
(outside of UK)
 Manageable

Static pages?
Too many, too fast-changing
 Limited number of journalists

Slide 18
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Automate as much as possible
BBC page for "West Ham" shows:
 News story about Andy Carroll

Carroll playsFor West Ham
 Latest results for West Ham

West Ham isIn this match
 League table for Premier League

West Ham playsIn Premier League
 Video/audio/news related to





Slide 20
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
West Ham United Football Club
a West Ham player
West Ham's manager
West Ham's league
West Ham's venue
Dynamic Semantic Publishing: A Solution
MarkLogic
Triple Store
 Store, manage documents
 Metadata about documents

Stories

Tagged by journalists

Blogs

Added (semi-)automatically

Feeds

Inferred

Profiles
 Store, manage values

Statistics
 Full-Text search
 Performance, scalability
 Robustness
Slide 21
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
 Facts reported by journalists
 Real-world facts from the
Open Data Web
Dynamic Semantic Publishing: A Solution
MarkLogic
Triple Store
 At query time, dynamically aggregate stories, blogs, feeds,
images, profiles, results, statistics, videos for a particular
concept such as "West Ham".
(See Jem Rayfield, BBC, http://bbc.in/I1NdkB)
 w e are not publishing pages, but publishing
content as assets which are then organized by the
metadata dynamically into pages
(John O'Donovan, BBC and PA)
Slide 22
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Dynamic Semantic Publishing: A Solution
MarkLogic with
Triple Store
 At query time, dynamically aggregate stories, blogs, feeds,
images, profiles, results, statistics, videos for a particular
concept such as "West Ham".
(See Jem Rayfield, BBC, http://bbc.in/I1NdkB)
 w e are not publishing pages, but publishing
content as assets which are then organized by the
metadata dynamically into pages
(John O'Donovan, BBC and PA)
Slide 23
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
MarkLogic Semantics
New features under development
 RDF data store
 Special-purpose triples index
 MarkLogic Server includes a triple store !
 Query RDF with native SPARQL
 Query across triples, documents, values
Slide 27
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Today's Talk
 Semantics and MarkLogic – An Overview
Stephen Buxton
 APIs and Applications – Making It All Work
Micah Dubinko
 RDF and SPARQL in MarkLogic – The Details
John Snelson
Slide 28
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
XQuery and Triples
Load triples from an external source
sem:rdf-load(http://example.org/bigdata.rdf,“rdfxml”)
Construct a triple in XQuery
sem:triple(
sem:iri(“http://example.org/subject”),
sem:iri(“http://example.org/predicate”),
“object”
)
Extract triples from a document…
Image credit:
dulllhunk
Slide 29
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Facts
A mini case study
http://www.expatistan.com/cost-of-living/index
<tr>
<td class="ranking">1</td>
<td class="city-name"><a
href="http://www.expatistan.com/cost-of-living/oslo">Oslo
(Norway)</a></td>
<td class="price-index">267</td>
</tr>
…
Slide 30
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Extracting Facts
Case study continued
let $html := xdmp:document-get(…)
let $rows := ($html//html:tr)[html:td/@class eq 'ranking']
let $build := sem:rdf-builder(sem:prefixes("my: http://example.org/vocab/"))
for $row in $rows
let $node := "_:" || $row/html:td[@class eq 'ranking']
return (
$build($node, "my:rank", xs:decimal( $row/html:td[@class eq 'ranking'] )),
$build($node, "rdfs:label", xs:string( $row/html:td[@class eq 'city-name'] )),
$build($node, "my:cola", xs:int( $row/html:td[@class eq 'price-index'] ))
)
Photo credit:
gbaku
Slide 31
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
It’s all about the connections
Subjects/Objects can point to database URIs
@prefix db: <http://dbpedia.org/resource/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
_:Person1 foaf:name “Micah Dubinko”.
_:Person1 foaf:mbox <mailto:[email protected]>.
_:Person1 foaf:depiction <http://example.org/dubinko.jpg>.
run SPARQL query and from the results, extract the image
let $img-url := (…get from SPARQL…)
return fn:doc($img-url)
Photo credit:
anneh632
Slide 32
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
It’s all about the connections
Documents can contain triple markup
<article>
<info>
<title>News for April 9, 2013</title>
<sem:triple>
<sem:subject>http://example.org/article</sem:subject>
<sem:predicate>http://example.org/mentions</sem:predicate>
<sem:object>http://example.org/Steve_Jobs</sem:object>
</sem:triple>
…
Query documents based on contained triples
cts:triple-range-query((), (), $match, ‘sameTerm’ )
Photo credit:
hinkelstone
Slide 33
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Blended queries
Semantic queries with Search API
A semantic query can drive the construction of a custom query
declare function semquery:parse(…) {
run sparql query to determine meaning of “South Bay”
use sparql results to construct a particular polygon
feed polygon into generated query that Search API will use
}
Slide 34
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
REST API
Three approaches: graphs, queries, and things
CRUD on graphs
GET /v1/graphs?default
PUT /v1/graphs?graph=http://example.org/g
DELETE /v1/graphs?graph=http://example.org/g
Interoperable query support
POST /v1/graphs/sparql
(…SPARQL in POST body…)
Wander through your data
GET /v1/things?iri=http://dbpedia.org/resource/XML
Photo credit:
hinkelstone
Slide 35
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Demo
Slide 36
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Today's Talk
 Semantics and MarkLogic – An Overview
Stephen Buxton
 APIs and Applications – Making It All Work
Micah Dubinko
 RDF and SPARQL in MarkLogic – The Details
John Snelson
Slide 37
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What is RDF?
:birth-place
:place5
:person4
:first-name
“John”
:birth-place
:person5
Slide 38
:has-child
:has-parent
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
:person20
What is RDF?
RDF
•
•
•
•
Schema-less
Triple granularity
Open world assumption
Joins - the cost of granularity
Slide 39
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Why use RDF?
RDF
• Born or extracted to RDF
• Denormalize into XML by default
• Lift data into RDF if you need to:
•
•
•
•
Slide 40
combine it with disparate data sources
navigate it like a graph
use it for relationships or taxonomy
expose it as RDF to end users
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
RDF
Semantics Architecture
GRAPH
SPARQL
XQY
XSLT
SPARQL
SQL
TRIPLE
Slide 42
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Triple Index
TRIPLE
•
•
•
•
•
•
3 triple orders
Cached for performance
Works seamlessly with other indexes
Security
350 bytes per triple on disk
1 billion+ triples per host
Slide 43
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
SPARQL
SPARQL
select * where {
?person :birth-place ?place;
:first-name “John”
}
•
•
•
•
•
Executed using the triple index
SPARQL 1.0
Cost-based optimization
Join ordering and algorithms
More in the lightning talks
Slide 44
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Today's Talk
 Semantics and MarkLogic – An Overview
Stephen Buxton
 APIs and Applications – Making It All Work
Micah Dubinko
 RDF and SPARQL in MarkLogic – The Details
John Snelson
Slide 45
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
MarkLogic Semantics
New features under development
 RDF data store
 Special-purpose triples index
 MarkLogic Server includes a triple store !
 Query RDF with native SPARQL
 Query across triples, documents, values
 World-class Triple Store
 Horizontally scalable
 Not restricted by physical memory limits
 Enterprise hardened
 World-beating Information Store
 Triples + Documents + Values
 All in one Enterprise NoSQL database
Slide 46
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Any Questions?
Slide 47
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
For More Information
Stephen Buxton
[email protected]
Slide 48
Copyright © 2013 MarkLogic® Corporation. All rights reserved.