Semantic Tags Generation and Retrieval for Online Advertising

Transcription

Semantic Tags Generation and Retrieval for Online Advertising
Semantic Tags Generation
and Retrieval for Online
Advertising
1, 2
2
Roberto Mirizzi, Azzurra Ragone,
2
Tommaso Di Noia
1
Yahoo!
2
Polytechnic university of Bari,
Italy
Presented by
Pranjali Rajiv Joshi
• Introduction
Outline
Traditional Ad generation process
Problem Statement
• Background Technologies
DBpedia
SPARQL
• Not Only Tag (NOT) System
Architecture
Algorithm
• Evaluation
• Conclusion
2
Introduction
• Display relevant and appropriate advertisements
• Sematic relations among keywords important
User searches Tiger Woods
Ads on Zoo
3
Introduction
Traditional process of Ads generation
• Select a keyword to activates an ad
• Lexicographic approach not
concerned with semantics
Possible Solution
Ontology defines the concepts and
• ________
relationships used to describe and
represent an area of knowledge
Drawbacks? Maintain an ontology, cover every possible domain
4
Introduction
Problem Statement
• Traditional approaches failed to display relevant ads
to users
• Could use an ontology, but difficult to maintain
• Develop a system called Not Only Tags (NOT) that
address these issues
5
Background Technologies
• DBpedia - Linked RDF Data Graph
– Extracts structured information from Wikipedia
– Links different datasets on the Web to Wikipedia
data (As of 2014, 3 billion triples, 50 million links)
– Allows to make sophisticated queries
• SPARQL
– Perform queries on RDF data sources (DBpedia)
6
RDF - Resource Description
Framework
Helps to add sematic information to web
Dave
Likes
Cookies
Predicate
Subject
Triple -> URI
RDF Linked Data –
RDF Triples connected with RDF links
7
Object
RDF Linked Data and DBpedia
• DBpedia is one of the main cloud of the RDF Linked
Data graph.
• Machine-understandable equivalent of Wikipedia
Example Links
• Wikipedia : https://en.wikipedia.org/wiki/RDFa
• DBpedia : http://dbpedia.org/page/RDFa
• SPARQL – To perform queries on DBpedia
8
Quiz
• A statement in RDF is called _______
a) Tuple
b) Treble
c) Triple
d) Trouple
• RDF uses _______ to identify resources
a) Web Identifier
b) MAC Address
c) IP Address
d) Network Address
• RDF Linked Data Graph consists of _______
a) RDF Triples
b) SPARQL
c) Both a & d
d) RDF Links
9
Quiz
• Which of the following is true about Dbpedia?
a) It is a RDF Linked Data Graph
b) It consists of structured information from Wikipedia
c) It is possible to ask complex queries on DBpedia
d) All a,b,c
• SPARQL is used to query on RDF data sources.
a) True
b) False
10
Not Only Tag (NOT) System
Demo : http://sisinflab.poliba.it/not-only-tags/
Text input area
User’s tag bag area
Tag cloud population
11
What is behind NOT ? (I)
Graph Exploration and computation of similarity value
Dbpedia page of RDFa
Semantic_Web
XML-based_standards
12
What is behind NOT ? (II)
…
Knowledge_representation
Data_managemen
t
…
Internet_architecture
…
XML
Computer_and_telecommunication_stantards
…
Semantic_Web
XML-based_standards
RDFa
Resource Description Framework
Microformats
Triplestores
Folksonomy
…
Web_services
User_interface_markup_languages
…
Scalable_Vector_Graphics
…
Legends
Article
13
Category
skos:subject
skos:broader
NOT System Architecture
1•
Linked Data graph
exploration
1• Rank nodes
exploiting external
information
1• Store results as pairs
of nodes together
with their similarity
14
Data structure
uri – a DBpedia URI
hits – number of times the URI is visited during
the exploration
r
ranked – boolean value representing if URI has
been ranked
in_context – boolean value stating if URI under
consideration is within the context or not
15
Flow Chart
DBpedia Ranker
max : 2
d < max ?
RDFa
16
Flow Chart
DBpedia Ranker
Explore on basis
of skos : broader
&
skos : subject
properties
Yes
d < max ?
RDFa
RDFa DBpedia Webpage
Exploring
first time?
Semantic Web
17
Flow Chart
DBpedia Ranker
Yes
d < max ?
RDFa
Exploring
first time?
uri
hits : 1
ranked : false
in_context : true
Yes
Create data
structure ‘r’
Create ‘r’ for Sematic Web
Semantic Web
18
Flow Chart
DBpedia Ranker Is semantic Web
‘r’
uri
hits
ranked
in_context
Compute
Similarity
within the context
of our search domain?
sim (RDFa, Sematic Web)
Yes
In
Context ?
Increment
hit in ‘r’
No
Stop
Exploration
No
Semantic
Web
Yes
d < max ?
Exploring
first time?
Yes
Create data
structure ‘r’
No
RDFa
Semantic Web
Create ‘r’ for Sematic Web
19
Similarity Value (I)
delicious
1. Web pages contain or have
been tagged by the rdfs : label
value associated with uri’s
20
Similarity Value (II)
2.
For Wikipedia w1 linked to w2, in Dbpedia we
have dbpprop : wikilink from uri 1 to uri 2
W1
Uri 1
RDFa
3.
0 – No link
W2
1 – unidirectional
2 – bidirectional
uri 2
Semantic Web
Rdfs : label of uri 1 is contained in
dbprop : abstract of uri 2 and vice versa : m/n
n : number of words composing the label
m : number of words composing label also is abstract
21
Similarity Value (III)
Bidirectional link
wikilinkScore(RDFa, Sematic Web) = 2
‘RDFa’ not present in abstract of Semantic Web
‘Semantic Web’ not present in abstract of RDFa
abstractScore(RDFa, Semantic Web) = 0
22
Similarity Value (IV) A quick recap
Calculated on basis three things :
1. Number of webpages returned with uri 1, uri 2
2. Wikilink between uri 1 and uri 2
3. Label of uri 1 present in abstract of uri 2 (& vice versa)
23
Context Analyzer (I)
• Advertising agency is centered with
‘database and programming languages’
Context
‘C’
24
Context Analyzer (III)
Semantic Web
Context ‘C’
C {c1,c2,…ck}
Top K resources
Returned for
‘database &
programming’
Current Node
uri
Calculate
Similarity value
s = s + s(c.uri & uri)
s (c1, Sematic Web)
s (c2, Sematic Web)
…
…
s (ck, Semantic Web)
25
True – In context
Yes
s >= threshold ?
Threshold value
4
No
False –
Not in Context
Quick Recap
Graph exploration, similarity
value, context analyzer
‘r’ for Semantic Web
Semantic
Web
RDFa
max = 2
In
context?
s >= threshold?
(4)
No
Exploring
first time?
Stop
Exploration
s (context c1…ck,
Semantic Web)
No
Yes
d < max ?
Yes
Increment
hit in ‘r’
uri
hits
ranked
in_context
• web pages - label
• wikilinks
• label - abstract
sim (RDFa, Semantic Web)
Compute
Similarity
Yes
Create data
structure ‘r’
No
subject, broader Semantic Web
26





Evaluation (I)
Comparison of 5 different algorithms
50 volunteers
Researchers in the ICT area
244 votes collected (on average 5 votes for each users)
Average time to vote: 1min and 40secs
27
Evaluation (II)
DBpedia Ranker
Algo 3
Algo 4
Algo 2
28
Algo 5
Conclusion
• Presented a novel system for sematic tag
generation and retrieval
• System architecture – DBpedia, SPARQL
Algorithms - graph exploration, similarity value and
context analyzer
• Help advertisers in process of keyword selection
and enhance ad selection process
29
That’s all for today,
have a nice day!
Thank you