Structure and Dynamics of Emergent Semantics - LSIR

Transcription

Structure and Dynamics of Emergent Semantics - LSIR
Structure and Dynamics of
Emergent Semantics Systems
Karl Aberer
EPFL School of Computer and Communication Science
NCCR MICS, National Centre of Competence in Research
on Mobile Information and Communication Systems
[email protected]
www.mics.org
lsirwww.epfl.ch
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
Overview
1.
2.
3.
4.
5.
Emergent Semantics
Mapping Inference in Semantic Overlay Networks
Structure of Semantic Overlay Networks
Peer Data Management Systems Implementation
Outlook: Sensor Internet
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
2
Semantics
• Long-standing debate:”What is semantics?”
• Standard response:
“Mapping of a syntactic structure into a semantic domain”
tiger
Syntactic structure:
database, knowledge base
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
Semantic domain:
real-world
3
Semantic Web
• Real-world is a somewhat ill-defined and hard to compute
concept
• Proposal: Substitute real-world by shared formal
conceptualization [Gruber 93]
tiger
Syntactic structure:
database, knowledge base
owl:tiger
Semantic domain:
ontology
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
4
The Issue with Ontologies
• What is the semantics of ontologies? After all, they are syntactic
structures!
• Heterogeneous and evolving models and ontologies
oil:cat
tiger
owl:tiger
wn:animal
rdf:tiger
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
5
The Correspondence Continuum
• Observation: Meaning is rarely a simple mapping from a
syntactic structure to a semantic domain
• Continuum of (semantic) correspondences from symbol to
(symbol to)* object [Smith 87]
• The meaning of a symbol is given by the composition of the
semantic mappings that relate it to its root
• Instead of focusing on ever-richer modelling languages for
concepts, focus on mapping languages and mapping discovery
tools [Mylopolous 06]
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
6
Semantic Grounding
• Still: what should “relate ot the root mean”?
• The meaning of symbols can be explained by its semantic
correspondences to other symbols alone
[“Understanding understanding” Rapaport 93]
• Type 1 semantics: understanding in terms of something else
– Problem: how to ground semantics?
• Type 2 semantics: understanding something in terms of itself
– “syntactic semantics”: grounding through recursive understanding
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
7
Emergent Semantics
• Semantic correspondences form rich self-referential structures:
recursive understanding
• Recursive understanding may converge against stable structures
(fixpoints): emergent semantics [Aberer 04]
oil:cat
owl:tiger
wn:animal
rdf:tiger
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
8
Peer-to-Peer Systems
• Resource Sharing (e.g. images)
– No centralized infrastructure
– Global scale information systems
– Application-specific overlay networks
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
9
Efficiently Searching Resources (Data)
• Find images taken last week in Trondheim!
?
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
10
Resource Sharing
• What is shared?
knowledge
<rdf:Description about='' xmlns:xap='http://ns.adobe.com/xap/1.0/'>
<xap:CreateDate>2001-12-19T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-12-19T20:09:28Z</xap:ModifyDate>
<xap:Creator> John Doe </xap:Creator>
</rdf:Description>
…
content
bandwidth
processing
storage
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
11
Beyond Keyword Search
• Support of structured data at peers: schemas
• Structured querying in peer-to-peer system
date?
<es:cDate> 05/08/2004 </es:cDate>
<xap:CreateDate>2001-1219T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate>
<myRDF:Date> Jan 1, 2005 </myRDF:Date>
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
12
Peer Data Management Systems
Q1=
<GUID>$p/GUID</GUID>
FOR $p IN /Photoshop_Image
WHERE $p/Creator LIKE "%Robi%"
Photoshop
(own schema)
<Photoshop_Image>
<GUID>178A8CD8865</GUID>
<Creator>Robinson</Creator>
<Subject>
T12 =
<Bag>
<Photoshop_Image>
<Item>
<GUID>$fs/GUID</GUID>
Tunbridge Wells
</Item>
<Creator>
<Item>Royal Council</Item>
$fs/Author/DisplayName
</Bag>
</Creator>
</Subject>
</Photoshop_Image>
…
FOR $fs IN /WinFSImage
</Photoshop_Image>
Q2=
<GUID>$p/GUID</GUID>
FOR $p IN T12
WHERE $p/Creator LIKE "%Robi%"
WinFS
(known schema)
<WinFSImage>
<GUID>178A8CD8866</GUID>
<Author>
<DisplayName>
Henry Peach Robinson
<DisplayName>
<Role>Photographer</Role>
<Author>
<Keyword>
Tunbridge
</Keyword>
<Keyword>Council</Keyword>
…
</WinFSImage>
⇒ Extending data integration techniques to decentralized settings
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
13
Semantic Heterogeneity in PDMS
• Pairwise mappings
– Local mappings overcome global heterogeneity
– Iterative query reformulation
date?
<es:cDate> 05/08/2004 </es:cDate>
<xap:CreateDate>2001-1219T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate>
m
Î
te te
Da Da
:c F:
es yRD
m
xa yR
p: D F
M :D
od at
ify e Î
Da
te
es:cDate Î xap:CreateDate
<myRDF:Date> Jan 1, 2005 </myRDF:Date>
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
14
PDMS vs. Classical Data Integration
• Traditional database techniques (e.g., LAV/GAV) rely on
centralized schemas to integrate data sources
Date
m(myDate) = Date
myDate
m(yourDate) = Date
yourDate
• Not applicable to large-scale, decentralized contexts
– Scale: 100s vs. 10^3-10^6
– Churn: no fixed topology
– Autonomy: no transactions, no integrity constraints, no global
schema
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
15
Emergent Semantics in PDMS
• P2P data management systems form (complex) mapping
networks between models:
Semantic Overlay Networks (SON)
– Mappings manually or automatically generated
– Mappings establish semantic correspondences
– Mutually negotiated and verified (pragmatic dimension)
• Practical systems with the potential to exhibit emergent
semantics properties
• Technical challenges?
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
16
Overview
1.
2.
3.
4.
5.
Emergent Semantics
Mapping Inference in Semantic Overlay Networks
Structure of Semantic Overlay Networks
Peer Data Management Systems Implementation
Outlook: Sensor Internet
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
17
Answering Queries in PDMS
• Semantic Query routing
– To whom shall I forward a query posed against my local schema?
• Some (most) mappings will be (partially) faulty
– Different views on conceptualizations
– Low expressive power of mapping languages
– Automatic schema matching techniques
• Alternatives
– Local query resolution only: Low recall
– Flooding the whole network (PDMS so far): Low precision
?
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
18
Analyzing PDMS Mapping Networks
•
Standard deductive inference is not sufficient
– Uncertainty on mappings and conceptualizations
– Precision/Recall tradeoff
•
Analyze Mapping Network
–
–
–
–
Transitive closures of mapping operations
Cycles and parallel paths
Check for consistency
Abductive reasoning: find best possible explanation in case of
inconsistency
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
19
Example
q:art/Creator?
m0
m4
f0
m3
q VS m3(m4(m0(q)))
art/Creator? VS art/creatDate?
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
20
Probabilistic Message Passing (Semantic Gossiping)
• Deriving quality measures for the mappings using feedback
– Reduces uncertainty
– Used to route query / optimize mappings
good
maybe bad
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
bad
probably good
21
Using feedback
• The result of applying a composite mapping to a query should
be identical to the original query for a cycle
– Allows to estimate the probability of correctness of mapping
m0
m1
m4
m5
f0
m3
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
m2
22
Computing a Marginal for One Cycle
unknown
observed
• P(m0, m3, m4, f0) = P(m0) P(m3) P(m4) P(f0 | m0, m3, m4)
• Determine P(mi | f0) given P(f0 | m0, m3, m4)?
• P(m0| f0)= ∑m3, m4 P(m0, m3, m4, f0) P(f0)-1
• But: feedbacks on different cycles are correlated
– One wrong mapping will affect several cycles/paths
– Need to express a global probabilistic model for the mapping graph
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
23
A Brief Intro to Factor-Graphs
• g(x1, x2, x3, x4) = fA(x1, x2)fB(x2, x3, x4)
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
24
Deriving PDMS Factor-Graphs
Innate probabilities
of mappings being correct
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
25
PDMS Factor-Graphs
• Cyclic graph
– Junction Tree?
• Centralization
• Computational + communicational overhead
– Iterative Sum-Product
• Corrrect only for tree structured networks
• Approximate result
• How to perform iterative sum-product by message passing on
the mapping graph?
– Message passing in factor graph does not correspond to
connectivity of mapping graph
– We want to rely on decentralized computations only
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
26
Embedded Message Passing
• Decentralized computations
– Computationally inexpensive
– Sums and Products
• Message-Passing Schedules
– Lazy (piggybacking on query forwarding)
• No message overhead
– Periodic
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
27
Evaluation: Convergence
(undirected example graph, prior 0.7 delta 0.1)
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
28
Evaluation: Fault-tolerance (faulty links)
(undirected example graph, prior 0.8 delta 0.1)
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
29
Evaluation: Performance Detecting Errors
(randomly generated networks of 50 schemas and 200 mappings, TTL = 5)
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
30
Conclusion Probabilistic Message Passing
• A technique to implement emergent semantic processes
– Decentralized decision making
– Converges to an agreement on conceptualizations
– Scalable and robust method to infer correct mappings in a semantic
overlay network
• Questions
– Do such mapping networks exist?
– What is their structure?
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
31
Overview
1.
2.
3.
4.
5.
Emergent Semantics
Mapping Inference in Semantic Overlay Networks
Structure of Semantic Overlay Networks
Peer Data Management Systems Implementation
Outlook: Sensor Internet
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
32
Semantic Overlay Networks
0.7
0.1
• Networks of schema mappings
SA
0.8
– Directed, weighted, redundant
• Semantic Interoperability
Two peers are said to be semantically interoperable
SD
SB
SF
0.1
0.9
0.2
if they can forward queries to each other in the mapping graph,
SC
potentially through series of translation links
• Question
SE
0.6
0.7
1
– Which are necessary conditions that a semantic overlay network
becomes semantically interoperable in the large-scale?
• Idea: use percolation theory to detect the emergence of a
strongly connected component in S
– Adaptation of a recent graph-theoretic framework (Newman,
Strogatz, Watts 2001)
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
1
33
SG
The Model
• Large-scale semantic overlay networks as random graphs with
arbitrary degree distribution
• Specificities of the model
– Strong clustering (clustering coefficient cc)
– Bidirectionality (bidirectionality coefficient bc)
• Based on generatingfunctionology
(pk probability of degree k)
• Necessary condition for semantic interoperability in the large
– Appearance of a giant strongly-connected component: ci > 0
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
34
Size of Out-component
in-component
strongly
connected
component
links-in
links-out
out-component
Relative Size of Out −Component
0.8
0.6
a theoretic
0.4
0.2
b experimental
# edges
5000
10000
15000
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
20000
35
Evaluation: The Sequence Retrieval System (SRS)
• Bioinformatic libraries: EMBL, SwissProt, Prosite, etc.
– Commercial information indexing and retrieval system
– Links from one database to others by mapping identifiers
– More than 380 databanks and
500 (undirected) links
– Custom crawler
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
36
Results
• Connectivity indicator ci = 25.4
– Giant connected component:
187 nodes
• Size of the giant component
– 0.47 (derived)
– 0.48 (observed)
• Powerlaw Topology
• Small-World Graph
– Clustering coefficient = 0.32
– Diameter = 9
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
37
Graphs with Same Power-law Degree Distribution
• Varying number of edges
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
38
Analyzing Weighted Networks
• Do we have a sufficient number of good mappings?
• Using quality measures from the mappings derived from
message passing
– Uniformly distributed weights between 0 and 1
– Attribute / schema level
• Semantic query forwarding
– Per-hop forwarding behaviors
– Only forward if wi ≥ τ
• τ = 0 : flooding
• τ = 1 : exact answers
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
39
Overview
1.
2.
3.
4.
5.
Emergent Semantics
Mapping Inference in Semantic Overlay Networks
Structure of Semantic Overlay Networks
Peer Data Management Systems Implementation
Outlook: Sensor Internet
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
40
GridVine: Annotating Shared Resources
• End-users create annotations / ”categories” / ”translation links”
– Constraining the annotation mechanism: we do not expect them to
write ontologies, views…
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
41
GridVine: Annotating Shared Resources
• Principle of data independence
– Scalable physical layer: structured overlay network (P2P network)
– Semantic logical layer: Semantic Gossiping
SearchFor(query)
Insert(RDF schema)
Insert(RDF triple)
Insert(Schema translation)
Return(tuples)
Logical Layer
(GridVine)
Insert(key, value)
Retrieve(key) Return(value)
Physical Layer
(P-Grid)
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
42
Mapping annotations onto P-Grid
User-defined annotations (RDF triples)
P-Grid: Structured overlay network
•Supports key-based search
•decentralized, scalable, self-organizing
access structure
<rdf:Description
rdf:about="urn:x-pgrid:F59F92C8BC…lucene_green_100.gif">
<Year xmlns="pgrids://CF0C052CE4…Pdf.rdfs#">2001</Year>
</rdf:Description>
???
0??
1??
P-Grid
00?
000
01?
010
10?
100
11?
011
User-defined category translations (OWL)
User-defined categories (RDFS)
<Title xmlns="pgrids://CF0C052CE418FC78…PDFFIle.rdfs#">
<owl:equivalentProperty rdf:resource="pgrids://CF…Pdf.rdfs#Title" />
<rdf:Statement>
<rdf:subject rdf:resource="pgrids://CF0…PDFFIle.rdfs#Title" />
<rdf:predicate rdf:resource="pgrids://owl/hasCorrectness" />
<rdf:object rdf:resource="1.0" />
</rdf:Statement>
</Title>
<rdfs:Class rdf:ID="Pdf" rdfs:comment="New schema class">
<rdfs:subClassOf rdf:resource="http://www.p-grid.org/pgrid.rdfs#PGridDataFile"/>
</rdfs:Class>
<rdf:Property rdf:ID="Year">
<rdfs:domain rdf:resource="#Pdf"/>
<rdfs:range
rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</rdf:Property>
⇒ RDQL queries
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
43
Traversals of the Semantic Overlay Network
• GridVine: structured P2P network = Distributed index
– Query forwarding independent of structure of semantic overlay!
• Different query forwarding paradigms
– Iterative forwarding
– Recursive forwarding
% results received
100
80
iterative 15 p
60
recursive 15 p
40
iterative 60 p
20
recursive 60 p
time[ms]
2000
4000
6000
8000
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
10000
12000
44
Overview
1.
2.
3.
4.
5.
Emergent Semantics
Mapping Inference in Semantic Overlay Networks
Structure of Semantic Overlay Networks
Peer Data Management Systems Implementation
Outlook: Sensor Internet
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
45
Outlook
Information Sharing in the Sensor Internet
(Source: activecampus/UCSD)
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
46
Current Situation with Information Sharing
?
Different WSNs
Web publishing (repetitive)
DB app, java app, Web interface, …
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
Discovery and
correlation (difficult)
Syntactic and semantic
heterogeneity
47
Challenges
• Provide generic and simple-to-use tools for publishing data
collected from sensors over the Web
– Data stream management
• Provide tools for discovering published sensor data
– Semi-structured metadata
• Provide tools for correlating data from autonomous and
heterogeneous sensor data sources
– Emergent semantics
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
48
Publishing: Global Sensor Network
• A simple-to-use system to publish and correlate sensor data streams
– Virtual sensors are sensors, sensor networks or derived data streams
– GSN nodes managing virtual sensors
• Architecture
– Virtual Sensors published in a P2P network using metadata annotations
– GSN nodes connected in a peer-to-peer streaming network in the Web
– Data processing specified in a temporal SQL extension
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
49
Development Status
Available through sourceforge: http://sourceforge.net/projects/globalsn
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
50
Discovery: PicShark
• Assume sensor data is being massively published
– Already the case for images (photo sharing, Flickr!)
• Discovery depends on the availability of annotations
– Content-based search capabilities are limited (no text!)
• Manual annotations are hard to obtain
– Metadata scarcity
• Social annotation (folksonomies)
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
51
Exploiting Social Context
• Assume standardized annotation scheme
– Attributes A1, …, An
• Information-theoretic measure of metadata scarcity of image I
H ( I ) = −∑ p ( Ai ) log( p ( Ai ))
⎧⎪ 0 if attribute is present
where p ( Ai ) = ⎨ 1
⎪⎩ n otherwise (n : # attributes)
• Reducing metadata scarcity by metadata propagation to similar
images
– Similarity derived from metadata, features,
user relevance feedback
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
52
PicShark Prototype
Extract existing
metadata in different
formats
Propagate metadata
Store metadata in
standard formats in a
peer-to-peer network
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
Extract features from
image content and
text annotations
53
Summary Information Sharing
• Publishing and sharing of sensor data in a peerto-peer architecture: Global Sensor Network
• Shared annotations of image/sensor data in a
social network: PicShark
• Distributed reasoning on the correctness of
mappings among heterogeneous annotation
schemes: emergent semantics
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
54
Smart Earth
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
55
Acknowledgements
• Joint research with the following members and visitors of my
group
– Philippe Cudre Mauroux, Adriana Budura, Ali Salehi, Manfred
Hauswirth, Andras Feher, Tim van Pelt, Julien Gaugaz
• Funding provided by
– Swiss National Foundation (SNF) through NCCR MICS
– European Union through the Evergrow project (FET)
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
56
For more information
• www.mics.ch
• lsirwww.epfl.ch
• www.p-grid.org
• sourceforge.net/projects/globalsn
Thanks for your attention!
The National Centres of Competence in Research are a
research instrument of the Swiss National Science Foundation
57