Structure and Dynamics of Emergent Semantics - LSIR
Transcription
Structure and Dynamics of Emergent Semantics - LSIR
Structure and Dynamics of Emergent Semantics Systems Karl Aberer EPFL School of Computer and Communication Science NCCR MICS, National Centre of Competence in Research on Mobile Information and Communication Systems [email protected] www.mics.org lsirwww.epfl.ch The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation Overview 1. 2. 3. 4. 5. Emergent Semantics Mapping Inference in Semantic Overlay Networks Structure of Semantic Overlay Networks Peer Data Management Systems Implementation Outlook: Sensor Internet The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 2 Semantics • Long-standing debate:”What is semantics?” • Standard response: “Mapping of a syntactic structure into a semantic domain” tiger Syntactic structure: database, knowledge base The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation Semantic domain: real-world 3 Semantic Web • Real-world is a somewhat ill-defined and hard to compute concept • Proposal: Substitute real-world by shared formal conceptualization [Gruber 93] tiger Syntactic structure: database, knowledge base owl:tiger Semantic domain: ontology The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 4 The Issue with Ontologies • What is the semantics of ontologies? After all, they are syntactic structures! • Heterogeneous and evolving models and ontologies oil:cat tiger owl:tiger wn:animal rdf:tiger The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 5 The Correspondence Continuum • Observation: Meaning is rarely a simple mapping from a syntactic structure to a semantic domain • Continuum of (semantic) correspondences from symbol to (symbol to)* object [Smith 87] • The meaning of a symbol is given by the composition of the semantic mappings that relate it to its root • Instead of focusing on ever-richer modelling languages for concepts, focus on mapping languages and mapping discovery tools [Mylopolous 06] The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 6 Semantic Grounding • Still: what should “relate ot the root mean”? • The meaning of symbols can be explained by its semantic correspondences to other symbols alone [“Understanding understanding” Rapaport 93] • Type 1 semantics: understanding in terms of something else – Problem: how to ground semantics? • Type 2 semantics: understanding something in terms of itself – “syntactic semantics”: grounding through recursive understanding The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 7 Emergent Semantics • Semantic correspondences form rich self-referential structures: recursive understanding • Recursive understanding may converge against stable structures (fixpoints): emergent semantics [Aberer 04] oil:cat owl:tiger wn:animal rdf:tiger The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 8 Peer-to-Peer Systems • Resource Sharing (e.g. images) – No centralized infrastructure – Global scale information systems – Application-specific overlay networks The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 9 Efficiently Searching Resources (Data) • Find images taken last week in Trondheim! ? The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 10 Resource Sharing • What is shared? knowledge <rdf:Description about='' xmlns:xap='http://ns.adobe.com/xap/1.0/'> <xap:CreateDate>2001-12-19T18:49:03Z</xap:CreateDate> <xap:ModifyDate>2001-12-19T20:09:28Z</xap:ModifyDate> <xap:Creator> John Doe </xap:Creator> </rdf:Description> … content bandwidth processing storage The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 11 Beyond Keyword Search • Support of structured data at peers: schemas • Structured querying in peer-to-peer system date? <es:cDate> 05/08/2004 </es:cDate> <xap:CreateDate>2001-1219T18:49:03Z</xap:CreateDate> <xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate> <myRDF:Date> Jan 1, 2005 </myRDF:Date> The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 12 Peer Data Management Systems Q1= <GUID>$p/GUID</GUID> FOR $p IN /Photoshop_Image WHERE $p/Creator LIKE "%Robi%" Photoshop (own schema) <Photoshop_Image> <GUID>178A8CD8865</GUID> <Creator>Robinson</Creator> <Subject> T12 = <Bag> <Photoshop_Image> <Item> <GUID>$fs/GUID</GUID> Tunbridge Wells </Item> <Creator> <Item>Royal Council</Item> $fs/Author/DisplayName </Bag> </Creator> </Subject> </Photoshop_Image> … FOR $fs IN /WinFSImage </Photoshop_Image> Q2= <GUID>$p/GUID</GUID> FOR $p IN T12 WHERE $p/Creator LIKE "%Robi%" WinFS (known schema) <WinFSImage> <GUID>178A8CD8866</GUID> <Author> <DisplayName> Henry Peach Robinson <DisplayName> <Role>Photographer</Role> <Author> <Keyword> Tunbridge </Keyword> <Keyword>Council</Keyword> … </WinFSImage> ⇒ Extending data integration techniques to decentralized settings The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 13 Semantic Heterogeneity in PDMS • Pairwise mappings – Local mappings overcome global heterogeneity – Iterative query reformulation date? <es:cDate> 05/08/2004 </es:cDate> <xap:CreateDate>2001-1219T18:49:03Z</xap:CreateDate> <xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate> m Î te te Da Da :c F: es yRD m xa yR p: D F M :D od at ify e Î Da te es:cDate Î xap:CreateDate <myRDF:Date> Jan 1, 2005 </myRDF:Date> The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 14 PDMS vs. Classical Data Integration • Traditional database techniques (e.g., LAV/GAV) rely on centralized schemas to integrate data sources Date m(myDate) = Date myDate m(yourDate) = Date yourDate • Not applicable to large-scale, decentralized contexts – Scale: 100s vs. 10^3-10^6 – Churn: no fixed topology – Autonomy: no transactions, no integrity constraints, no global schema The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 15 Emergent Semantics in PDMS • P2P data management systems form (complex) mapping networks between models: Semantic Overlay Networks (SON) – Mappings manually or automatically generated – Mappings establish semantic correspondences – Mutually negotiated and verified (pragmatic dimension) • Practical systems with the potential to exhibit emergent semantics properties • Technical challenges? The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 16 Overview 1. 2. 3. 4. 5. Emergent Semantics Mapping Inference in Semantic Overlay Networks Structure of Semantic Overlay Networks Peer Data Management Systems Implementation Outlook: Sensor Internet The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 17 Answering Queries in PDMS • Semantic Query routing – To whom shall I forward a query posed against my local schema? • Some (most) mappings will be (partially) faulty – Different views on conceptualizations – Low expressive power of mapping languages – Automatic schema matching techniques • Alternatives – Local query resolution only: Low recall – Flooding the whole network (PDMS so far): Low precision ? The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 18 Analyzing PDMS Mapping Networks • Standard deductive inference is not sufficient – Uncertainty on mappings and conceptualizations – Precision/Recall tradeoff • Analyze Mapping Network – – – – Transitive closures of mapping operations Cycles and parallel paths Check for consistency Abductive reasoning: find best possible explanation in case of inconsistency The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 19 Example q:art/Creator? m0 m4 f0 m3 q VS m3(m4(m0(q))) art/Creator? VS art/creatDate? The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 20 Probabilistic Message Passing (Semantic Gossiping) • Deriving quality measures for the mappings using feedback – Reduces uncertainty – Used to route query / optimize mappings good maybe bad The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation bad probably good 21 Using feedback • The result of applying a composite mapping to a query should be identical to the original query for a cycle – Allows to estimate the probability of correctness of mapping m0 m1 m4 m5 f0 m3 The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation m2 22 Computing a Marginal for One Cycle unknown observed • P(m0, m3, m4, f0) = P(m0) P(m3) P(m4) P(f0 | m0, m3, m4) • Determine P(mi | f0) given P(f0 | m0, m3, m4)? • P(m0| f0)= ∑m3, m4 P(m0, m3, m4, f0) P(f0)-1 • But: feedbacks on different cycles are correlated – One wrong mapping will affect several cycles/paths – Need to express a global probabilistic model for the mapping graph The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 23 A Brief Intro to Factor-Graphs • g(x1, x2, x3, x4) = fA(x1, x2)fB(x2, x3, x4) The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 24 Deriving PDMS Factor-Graphs Innate probabilities of mappings being correct The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 25 PDMS Factor-Graphs • Cyclic graph – Junction Tree? • Centralization • Computational + communicational overhead – Iterative Sum-Product • Corrrect only for tree structured networks • Approximate result • How to perform iterative sum-product by message passing on the mapping graph? – Message passing in factor graph does not correspond to connectivity of mapping graph – We want to rely on decentralized computations only The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 26 Embedded Message Passing • Decentralized computations – Computationally inexpensive – Sums and Products • Message-Passing Schedules – Lazy (piggybacking on query forwarding) • No message overhead – Periodic The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 27 Evaluation: Convergence (undirected example graph, prior 0.7 delta 0.1) The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 28 Evaluation: Fault-tolerance (faulty links) (undirected example graph, prior 0.8 delta 0.1) The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 29 Evaluation: Performance Detecting Errors (randomly generated networks of 50 schemas and 200 mappings, TTL = 5) The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 30 Conclusion Probabilistic Message Passing • A technique to implement emergent semantic processes – Decentralized decision making – Converges to an agreement on conceptualizations – Scalable and robust method to infer correct mappings in a semantic overlay network • Questions – Do such mapping networks exist? – What is their structure? The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 31 Overview 1. 2. 3. 4. 5. Emergent Semantics Mapping Inference in Semantic Overlay Networks Structure of Semantic Overlay Networks Peer Data Management Systems Implementation Outlook: Sensor Internet The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 32 Semantic Overlay Networks 0.7 0.1 • Networks of schema mappings SA 0.8 – Directed, weighted, redundant • Semantic Interoperability Two peers are said to be semantically interoperable SD SB SF 0.1 0.9 0.2 if they can forward queries to each other in the mapping graph, SC potentially through series of translation links • Question SE 0.6 0.7 1 – Which are necessary conditions that a semantic overlay network becomes semantically interoperable in the large-scale? • Idea: use percolation theory to detect the emergence of a strongly connected component in S – Adaptation of a recent graph-theoretic framework (Newman, Strogatz, Watts 2001) The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 1 33 SG The Model • Large-scale semantic overlay networks as random graphs with arbitrary degree distribution • Specificities of the model – Strong clustering (clustering coefficient cc) – Bidirectionality (bidirectionality coefficient bc) • Based on generatingfunctionology (pk probability of degree k) • Necessary condition for semantic interoperability in the large – Appearance of a giant strongly-connected component: ci > 0 The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 34 Size of Out-component in-component strongly connected component links-in links-out out-component Relative Size of Out −Component 0.8 0.6 a theoretic 0.4 0.2 b experimental # edges 5000 10000 15000 The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 20000 35 Evaluation: The Sequence Retrieval System (SRS) • Bioinformatic libraries: EMBL, SwissProt, Prosite, etc. – Commercial information indexing and retrieval system – Links from one database to others by mapping identifiers – More than 380 databanks and 500 (undirected) links – Custom crawler The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 36 Results • Connectivity indicator ci = 25.4 – Giant connected component: 187 nodes • Size of the giant component – 0.47 (derived) – 0.48 (observed) • Powerlaw Topology • Small-World Graph – Clustering coefficient = 0.32 – Diameter = 9 The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 37 Graphs with Same Power-law Degree Distribution • Varying number of edges The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 38 Analyzing Weighted Networks • Do we have a sufficient number of good mappings? • Using quality measures from the mappings derived from message passing – Uniformly distributed weights between 0 and 1 – Attribute / schema level • Semantic query forwarding – Per-hop forwarding behaviors – Only forward if wi ≥ τ • τ = 0 : flooding • τ = 1 : exact answers The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 39 Overview 1. 2. 3. 4. 5. Emergent Semantics Mapping Inference in Semantic Overlay Networks Structure of Semantic Overlay Networks Peer Data Management Systems Implementation Outlook: Sensor Internet The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 40 GridVine: Annotating Shared Resources • End-users create annotations / ”categories” / ”translation links” – Constraining the annotation mechanism: we do not expect them to write ontologies, views… The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 41 GridVine: Annotating Shared Resources • Principle of data independence – Scalable physical layer: structured overlay network (P2P network) – Semantic logical layer: Semantic Gossiping SearchFor(query) Insert(RDF schema) Insert(RDF triple) Insert(Schema translation) Return(tuples) Logical Layer (GridVine) Insert(key, value) Retrieve(key) Return(value) Physical Layer (P-Grid) The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 42 Mapping annotations onto P-Grid User-defined annotations (RDF triples) P-Grid: Structured overlay network •Supports key-based search •decentralized, scalable, self-organizing access structure <rdf:Description rdf:about="urn:x-pgrid:F59F92C8BC…lucene_green_100.gif"> <Year xmlns="pgrids://CF0C052CE4…Pdf.rdfs#">2001</Year> </rdf:Description> ??? 0?? 1?? P-Grid 00? 000 01? 010 10? 100 11? 011 User-defined category translations (OWL) User-defined categories (RDFS) <Title xmlns="pgrids://CF0C052CE418FC78…PDFFIle.rdfs#"> <owl:equivalentProperty rdf:resource="pgrids://CF…Pdf.rdfs#Title" /> <rdf:Statement> <rdf:subject rdf:resource="pgrids://CF0…PDFFIle.rdfs#Title" /> <rdf:predicate rdf:resource="pgrids://owl/hasCorrectness" /> <rdf:object rdf:resource="1.0" /> </rdf:Statement> </Title> <rdfs:Class rdf:ID="Pdf" rdfs:comment="New schema class"> <rdfs:subClassOf rdf:resource="http://www.p-grid.org/pgrid.rdfs#PGridDataFile"/> </rdfs:Class> <rdf:Property rdf:ID="Year"> <rdfs:domain rdf:resource="#Pdf"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/> </rdf:Property> ⇒ RDQL queries The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 43 Traversals of the Semantic Overlay Network • GridVine: structured P2P network = Distributed index – Query forwarding independent of structure of semantic overlay! • Different query forwarding paradigms – Iterative forwarding – Recursive forwarding % results received 100 80 iterative 15 p 60 recursive 15 p 40 iterative 60 p 20 recursive 60 p time[ms] 2000 4000 6000 8000 The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 10000 12000 44 Overview 1. 2. 3. 4. 5. Emergent Semantics Mapping Inference in Semantic Overlay Networks Structure of Semantic Overlay Networks Peer Data Management Systems Implementation Outlook: Sensor Internet The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 45 Outlook Information Sharing in the Sensor Internet (Source: activecampus/UCSD) The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 46 Current Situation with Information Sharing ? Different WSNs Web publishing (repetitive) DB app, java app, Web interface, … The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation Discovery and correlation (difficult) Syntactic and semantic heterogeneity 47 Challenges • Provide generic and simple-to-use tools for publishing data collected from sensors over the Web – Data stream management • Provide tools for discovering published sensor data – Semi-structured metadata • Provide tools for correlating data from autonomous and heterogeneous sensor data sources – Emergent semantics The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 48 Publishing: Global Sensor Network • A simple-to-use system to publish and correlate sensor data streams – Virtual sensors are sensors, sensor networks or derived data streams – GSN nodes managing virtual sensors • Architecture – Virtual Sensors published in a P2P network using metadata annotations – GSN nodes connected in a peer-to-peer streaming network in the Web – Data processing specified in a temporal SQL extension The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 49 Development Status Available through sourceforge: http://sourceforge.net/projects/globalsn The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 50 Discovery: PicShark • Assume sensor data is being massively published – Already the case for images (photo sharing, Flickr!) • Discovery depends on the availability of annotations – Content-based search capabilities are limited (no text!) • Manual annotations are hard to obtain – Metadata scarcity • Social annotation (folksonomies) The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 51 Exploiting Social Context • Assume standardized annotation scheme – Attributes A1, …, An • Information-theoretic measure of metadata scarcity of image I H ( I ) = −∑ p ( Ai ) log( p ( Ai )) ⎧⎪ 0 if attribute is present where p ( Ai ) = ⎨ 1 ⎪⎩ n otherwise (n : # attributes) • Reducing metadata scarcity by metadata propagation to similar images – Similarity derived from metadata, features, user relevance feedback The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 52 PicShark Prototype Extract existing metadata in different formats Propagate metadata Store metadata in standard formats in a peer-to-peer network The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation Extract features from image content and text annotations 53 Summary Information Sharing • Publishing and sharing of sensor data in a peerto-peer architecture: Global Sensor Network • Shared annotations of image/sensor data in a social network: PicShark • Distributed reasoning on the correctness of mappings among heterogeneous annotation schemes: emergent semantics The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 54 Smart Earth The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 55 Acknowledgements • Joint research with the following members and visitors of my group – Philippe Cudre Mauroux, Adriana Budura, Ali Salehi, Manfred Hauswirth, Andras Feher, Tim van Pelt, Julien Gaugaz • Funding provided by – Swiss National Foundation (SNF) through NCCR MICS – European Union through the Evergrow project (FET) The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 56 For more information • www.mics.ch • lsirwww.epfl.ch • www.p-grid.org • sourceforge.net/projects/globalsn Thanks for your attention! The National Centres of Competence in Research are a research instrument of the Swiss National Science Foundation 57