Slides
Transcription
Slides
Semantic Technologies in MarkLogic Stephen Buxton Micah Dubinko John Snelson April 9 2013 Today's Talk Semantics and MarkLogic – An Overview Stephen Buxton APIs and Applications – Making It All Work Micah Dubinko RDF and SPARQL in MarkLogic – The Details John Snelson Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Disclaimer – Forward-looking Statements All statements describing future releases and capabilities, estimated release dates, and content are plans only, and MarkLogic is under no obligation to develop, include or make available, commercially or otherwise, any specific feature or functionality in any MarkLogic product. Information is provided for general understanding and informational purposes only, and is subject to change at the sole discretion of MarkLogic in response to changing customer requirements, market conditions, delivery schedules and other factors. Information should not be distributed without written permission from MarkLogic. Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Rich MarkLogic Applications .. Made Richer Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Rich MarkLogic Applications .. Made Richer Name: John Smith Affiliation: IBM Timezone: PST Committer: Hadoop Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Search With Real-World Context Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Query Facts, Documents, Values Together Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved. HOW? Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved. What is Semantics? A different way of organizing and searching information Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved. What is Semantics? A different way of organizing and searching information Data stored in Triples Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London" "London" : isIn : "England" Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved. What is Semantics? A different way of organizing and searching information Data stored in Triples Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London" "London" : isIn : "England" Rules tell us something about the triples Example: If (A livesIn X) AND (X isIn Y) then (A livesIn Y) Inference: "John Smith" : livesIn : "England" Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved. What is Semantics? A different way of organizing and searching information Data stored in Triples Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London" "London" : isIn : "England" Rules tell us something about the triples "John Smith" livesIn "London" livesIn Slide 14 Copyright © 2013 MarkLogic® Corporation. All rights reserved. isIn "England" What is Semantics? A different way of organizing and searching information Data stored in triples Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London" "London" : isIn : "England" Rules tell us something about the triples Example: If (A livesIn X) AND (X isIn Y) then (A livesIn Y) Inference: "John Smith" : livesIn : "England" Language: SPARQL is a language designed to query triples. It looks a bit like SQL Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Why do you care about Semantics? Companies and organizations across all verticals Publishing: Dynamic Semantic Publishing (BBC) Manage and leverage facts + documents for a rich user experience Pharma: facts about drugs + reports on clinical trials Find new cures for diseases Make decisions about what to research next Financial Services: reduce risk, comply with regulations Report on exposure Know where each piece of data came from Government Agencies: facts on file + intelligence reports Find bad guys Civilian Government: Open Data Slide 17 Open Government through Open Data Copyright © 2013 MarkLogic® Corporation. All rights reserved. Dynamic Semantic Publishing BBC Sports The Challenge Size and Complexity: # of athletes # of teams # of assets (match reports, statistics, etc.) # of relations (facts) Goals Rich user experience See information in context Personalize content Easy navigation Intelligently serve ads (outside of UK) Manageable Static pages? Too many, too fast-changing Limited number of journalists Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Automate as much as possible BBC page for "West Ham" shows: News story about Andy Carroll Carroll playsFor West Ham Latest results for West Ham West Ham isIn this match League table for Premier League West Ham playsIn Premier League Video/audio/news related to Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved. West Ham United Football Club a West Ham player West Ham's manager West Ham's league West Ham's venue Dynamic Semantic Publishing: A Solution MarkLogic Triple Store Store, manage documents Metadata about documents Stories Tagged by journalists Blogs Added (semi-)automatically Feeds Inferred Profiles Store, manage values Statistics Full-Text search Performance, scalability Robustness Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Facts reported by journalists Real-world facts from the Open Data Web Dynamic Semantic Publishing: A Solution MarkLogic Triple Store At query time, dynamically aggregate stories, blogs, feeds, images, profiles, results, statistics, videos for a particular concept such as "West Ham". (See Jem Rayfield, BBC, http://bbc.in/I1NdkB) w e are not publishing pages, but publishing content as assets which are then organized by the metadata dynamically into pages (John O'Donovan, BBC and PA) Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Dynamic Semantic Publishing: A Solution MarkLogic with Triple Store At query time, dynamically aggregate stories, blogs, feeds, images, profiles, results, statistics, videos for a particular concept such as "West Ham". (See Jem Rayfield, BBC, http://bbc.in/I1NdkB) w e are not publishing pages, but publishing content as assets which are then organized by the metadata dynamically into pages (John O'Donovan, BBC and PA) Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved. MarkLogic Semantics New features under development RDF data store Special-purpose triples index MarkLogic Server includes a triple store ! Query RDF with native SPARQL Query across triples, documents, values Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Today's Talk Semantics and MarkLogic – An Overview Stephen Buxton APIs and Applications – Making It All Work Micah Dubinko RDF and SPARQL in MarkLogic – The Details John Snelson Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved. XQuery and Triples Load triples from an external source sem:rdf-load(http://example.org/bigdata.rdf,“rdfxml”) Construct a triple in XQuery sem:triple( sem:iri(“http://example.org/subject”), sem:iri(“http://example.org/predicate”), “object” ) Extract triples from a document… Image credit: dulllhunk Slide 29 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Facts A mini case study http://www.expatistan.com/cost-of-living/index <tr> <td class="ranking">1</td> <td class="city-name"><a href="http://www.expatistan.com/cost-of-living/oslo">Oslo (Norway)</a></td> <td class="price-index">267</td> </tr> … Slide 30 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Extracting Facts Case study continued let $html := xdmp:document-get(…) let $rows := ($html//html:tr)[html:td/@class eq 'ranking'] let $build := sem:rdf-builder(sem:prefixes("my: http://example.org/vocab/")) for $row in $rows let $node := "_:" || $row/html:td[@class eq 'ranking'] return ( $build($node, "my:rank", xs:decimal( $row/html:td[@class eq 'ranking'] )), $build($node, "rdfs:label", xs:string( $row/html:td[@class eq 'city-name'] )), $build($node, "my:cola", xs:int( $row/html:td[@class eq 'price-index'] )) ) Photo credit: gbaku Slide 31 Copyright © 2013 MarkLogic® Corporation. All rights reserved. It’s all about the connections Subjects/Objects can point to database URIs @prefix db: <http://dbpedia.org/resource/>. @prefix foaf: <http://xmlns.com/foaf/0.1/>. _:Person1 foaf:name “Micah Dubinko”. _:Person1 foaf:mbox <mailto:[email protected]>. _:Person1 foaf:depiction <http://example.org/dubinko.jpg>. run SPARQL query and from the results, extract the image let $img-url := (…get from SPARQL…) return fn:doc($img-url) Photo credit: anneh632 Slide 32 Copyright © 2013 MarkLogic® Corporation. All rights reserved. It’s all about the connections Documents can contain triple markup <article> <info> <title>News for April 9, 2013</title> <sem:triple> <sem:subject>http://example.org/article</sem:subject> <sem:predicate>http://example.org/mentions</sem:predicate> <sem:object>http://example.org/Steve_Jobs</sem:object> </sem:triple> … Query documents based on contained triples cts:triple-range-query((), (), $match, ‘sameTerm’ ) Photo credit: hinkelstone Slide 33 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Blended queries Semantic queries with Search API A semantic query can drive the construction of a custom query declare function semquery:parse(…) { run sparql query to determine meaning of “South Bay” use sparql results to construct a particular polygon feed polygon into generated query that Search API will use } Slide 34 Copyright © 2013 MarkLogic® Corporation. All rights reserved. REST API Three approaches: graphs, queries, and things CRUD on graphs GET /v1/graphs?default PUT /v1/graphs?graph=http://example.org/g DELETE /v1/graphs?graph=http://example.org/g Interoperable query support POST /v1/graphs/sparql (…SPARQL in POST body…) Wander through your data GET /v1/things?iri=http://dbpedia.org/resource/XML Photo credit: hinkelstone Slide 35 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Demo Slide 36 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Today's Talk Semantics and MarkLogic – An Overview Stephen Buxton APIs and Applications – Making It All Work Micah Dubinko RDF and SPARQL in MarkLogic – The Details John Snelson Slide 37 Copyright © 2013 MarkLogic® Corporation. All rights reserved. What is RDF? :birth-place :place5 :person4 :first-name “John” :birth-place :person5 Slide 38 :has-child :has-parent Copyright © 2013 MarkLogic® Corporation. All rights reserved. :person20 What is RDF? RDF • • • • Schema-less Triple granularity Open world assumption Joins - the cost of granularity Slide 39 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Why use RDF? RDF • Born or extracted to RDF • Denormalize into XML by default • Lift data into RDF if you need to: • • • • Slide 40 combine it with disparate data sources navigate it like a graph use it for relationships or taxonomy expose it as RDF to end users Copyright © 2013 MarkLogic® Corporation. All rights reserved. RDF Semantics Architecture GRAPH SPARQL XQY XSLT SPARQL SQL TRIPLE Slide 42 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Triple Index TRIPLE • • • • • • 3 triple orders Cached for performance Works seamlessly with other indexes Security 350 bytes per triple on disk 1 billion+ triples per host Slide 43 Copyright © 2013 MarkLogic® Corporation. All rights reserved. SPARQL SPARQL select * where { ?person :birth-place ?place; :first-name “John” } • • • • • Executed using the triple index SPARQL 1.0 Cost-based optimization Join ordering and algorithms More in the lightning talks Slide 44 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Today's Talk Semantics and MarkLogic – An Overview Stephen Buxton APIs and Applications – Making It All Work Micah Dubinko RDF and SPARQL in MarkLogic – The Details John Snelson Slide 45 Copyright © 2013 MarkLogic® Corporation. All rights reserved. MarkLogic Semantics New features under development RDF data store Special-purpose triples index MarkLogic Server includes a triple store ! Query RDF with native SPARQL Query across triples, documents, values World-class Triple Store Horizontally scalable Not restricted by physical memory limits Enterprise hardened World-beating Information Store Triples + Documents + Values All in one Enterprise NoSQL database Slide 46 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Any Questions? Slide 47 Copyright © 2013 MarkLogic® Corporation. All rights reserved. For More Information Stephen Buxton [email protected] Slide 48 Copyright © 2013 MarkLogic® Corporation. All rights reserved.