Advertising Keyword Suggestion Based on Concept Hierarchy

Transcription

Advertising Keyword Suggestion Based on Concept Hierarchy
Yifan Chen, Guirong Xue and Yong Yu
Apex Data & Knowledge Management LabShanghai Jiao
Tong University
Presented by Qiang Yang, Hong Kong Univ. of Science and
Technology
1
€ In
a Search Engine Company
• Advertisers bid on keywords
• Search engine users enter queries
x Ads w/ keywords that match the query are displayed
2
€ However…
• There is a gap between the advertisers and
customers,
x Between Advertisers’ vocabulary and customers’
vocabulary
x Limited imagination vs. unbounded query possibilities
Advertiser
x Thus, keyword suggestions!
Apple
Ipod
Mp3
Player
…
iTouch
Nano
Shuffle
…
3
€ Based
on query log and advertiser’s logs
• Adwords from Google, e.g.
• Find keywords used concurrently
€ Find
relevant keywords co-occurrence in
meta tags
€ Based on search-engine result
• Find near-by phrases in search results
4
€ Based
on a concept hierarchy
• Mining the semantic relationships
• Concentrating on users’ real interest
fair
close
digitial products
digitial cameras
walkman
portable players
mp3 players
tightly
cd players
ipod shuffle
5
Concept taxonomy induces a distance
iPod nano
mp3 player
closer
audio player
DVP-FX810
cd player
digital product
EOS 40D
digial camera
video…
goods
goods
digital products
audio
mp3
player
CD
player
video
digital
camera
6
€ Offline:
• Deriving a Concept
Search
Matrix
Hierarchy
• Generate keywords
€ Online
• Mapping keywords to
generate concept
candidates
• Generate new keywords
• Categorizing new
keywords to concept
clusters
Web Directory
Query Term
Movie
Mathematics
Animation
Concept Hierarchy
Category
matlab
calculator
mathematica
linear algebra
…
Concept Contents
Web Contents
Suggestion
7
• Built from Web Directories (e.g., ODP)
x High coverage and accuracy
• Categories as concepts
• Structure as relationship
Computers
Computer Science, Software, Hardware…
Computer Science
Artificial Intelligence, Computer Graphics,
Software Engineering…
8
€ The
meaning of concepts: keywords
• Phrases gathered from Web Pages under each
concept node in taxonomy
x Keyword extraction
x Accumulate meaning from sub-concepts
Programming
Computer
System
…
Algorithm
Programming
Computer
…
Machine Learning
Neural Networks
Algorithm
…
Shareware
Programming
System
Algorithm
Computer
Programming
…
CPU
Computer
System
…
Programming
Project Management
Algorithm
…
9
€
€
What keywords are representative for a concept?
A keyword is good if
• It is commonly used within the concept (high document
frequency)
• It is seldom used by other concepts (low concept
frequency)
• Similar to TF-IDF, we derive a new keyword evaluation
criterion:
x The Document-Frequency, Inverse-Concept Frequency
(DF·ICF) factor
10
Concept #43037
1. computer science
2. department of computer science
3. computer
4. computing
5. university
6. computer science department
7. department
8. research
9. science
10. theoretical computer science
Concept #43056
Top/Computers/
Computer_Science/People
1. computer science
0.08
2. programming languages 0.04
3. university
0.03
4. university of edinburgh 0.03
5. software engineering
0.03
6. database systems
0.03
7. indian institute
0.02
8. algorithms
0.02
9. computer
0.02
10. distributed systems
0.02
Top/Computers/Computer_Science
0.12
0.05
0.04
0.03
0.03
0.03
0.03
0.03
0.03
0.02
15. distributed systems
20. software engineering
21. computational complexity
22. complexity
23. programming languages
24. database systems
26. complexity theory
27. quantum
29. algorithms
32. quantum information
Concept #84417
Top/Computers/Computer_Science/
Academic_Departments
1. computer science
2. department of computer science
3. computer
4. department
5. computer science department
6. science
7. computing
8. university
9. research
10. computer science department
0.25
0.14
0.10
0.09
0.08
0.07
0.05
0.04
0.04
0.13
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
<0.01
<0.01
Concept #259003
Top/Computers/Computer_Science/
Theoretical
1. complexity
2. computational complexity
3. quantum
4. complexity theory
5. computer science
6. quantum information
7. theoretical computer science
8. university
9. quantum computing
10. theory
0.09
0.08
0.08
0.08
0.08
0.07
0.07
0.05
0.04
0.03
11
Keyword Suggestion by
merging
€ Weighted
union: from query q to term t
• Sim(q,t)= ∑c Weight(q->c)*Weight(c->t)
Merged List
matlab
calculator
matrix reloaded
matrix
keanu reeves
film
toolbox
mathematica
…
12
Categorizing the Keyword List
€ Keyword list Categorization
• Partition keyword list using concept hierarchy
• For avoiding concepts from interfering with
each other
€ Present
the Advertisers with categories
13
€ Dataset
• 1,306,586 web pages from the 150,446 ODP
categories
€ Experiments
• Random Selection of test concepts and keywords
• 3 labelers are asked to critique the relevance of
keywords
14
€ Accuracy
of ranked keyword suggestion:
• Baseline: document frequency (DF)
SCF: Sub-category Freq
LCF: Local Concept
Frequency
15
€ Accuracy
of keyword extraction:
• Randomly sampled 100 keywords
• Baseline: DF
SCF: Sub-category Freq
LCF: Local Concept
Frequency
16
€ Completeness
of suggestion: all distinct
meanings found?
• Ambiguous queries
• Baseline: DF
17
€ Disambiguition
Performance
• Suggestion without categorizing
• Baseline: co-occurrence
DFSCF
DF
DFLCF
Baseline
1.00
Precision
0.95
0.90
0.85
0.80
0.75
0.0
0.2
0.4
0.6
Recall
0.8
1.0
18
€
Case Study with query “matrix”
Ours
Top/Arts
matrix revolutions
Top/Science/Math
matlab
matrix screensaver
matrix reloaded
calculator
matrix reloaded
toolbox
mathematica
functions
software
linear algebra
scientific calculator
matrix revolutions
matrix multiplication
matrix inverse
matrix soundtrack
matrix wallpaper
rotation matrix
algebra
matrix code
matlab toolbox
biochemistry
graphing
analysis
matrix revolution
matrix inversion
determinant matrix
trinity matrix
calculators
math matrix
matrix review
keanu reeves
film
neo
matrix trilogy
movie
matrix revolutions
review
revolutions
larry wachowski
reloaded
matrix movies
agent smith
laurence fishburne
andy wachowski
morpheus
neo and trinity
matrix movie
matrix revisited
data analysis
computer algebra
system
department
linear
archaeology
molecular biology
Google
Overture
the matrix
belief bridging divine
matrix miracle space
time
toyota matrix
matrix reloaded
matrix screensaver
matrix revolution
matrix game
matrix soundtrack
dot matrix printer
symmetric matrix
matrix online
enter the matrix
matrix movie
matrix 3d
enter guide matrix
official strategy
matrix mris
matrix properties
matrix hair product
rank matrix
algebra matrix
matrix product
matrix hacking
matrix wallpaper
matrix trilogy
matrix neo path
matrix shampoo
WordTracker
matrix
the matrix
matrix reloaded
toyota matrix
matrix soundtrack
matrix revolutions
matrix theme
matrix wallpaper
matrix mp3
matrix screensaver
matrix background
the matrix reloaded
matrix ping pong
matrix code
matrix movie
19
€
Our work
• A novel approach for advertising keyword suggestion.
• Key idea: associate concepts with keywords tightly.
• Disambiguation to avoid interfering.
€
Future work
• Beyond using web pages
• keep concept hierarchy up-to-date
x Automatic content reinforcement from web content such as
query logs
20