Advertising Keyword Suggestion Based on Concept Hierarchy
Transcription
Advertising Keyword Suggestion Based on Concept Hierarchy
Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University Presented by Qiang Yang, Hong Kong Univ. of Science and Technology 1 In a Search Engine Company • Advertisers bid on keywords • Search engine users enter queries x Ads w/ keywords that match the query are displayed 2 However… • There is a gap between the advertisers and customers, x Between Advertisers’ vocabulary and customers’ vocabulary x Limited imagination vs. unbounded query possibilities Advertiser x Thus, keyword suggestions! Apple Ipod Mp3 Player … iTouch Nano Shuffle … 3 Based on query log and advertiser’s logs • Adwords from Google, e.g. • Find keywords used concurrently Find relevant keywords co-occurrence in meta tags Based on search-engine result • Find near-by phrases in search results 4 Based on a concept hierarchy • Mining the semantic relationships • Concentrating on users’ real interest fair close digitial products digitial cameras walkman portable players mp3 players tightly cd players ipod shuffle 5 Concept taxonomy induces a distance iPod nano mp3 player closer audio player DVP-FX810 cd player digital product EOS 40D digial camera video… goods goods digital products audio mp3 player CD player video digital camera 6 Offline: • Deriving a Concept Search Matrix Hierarchy • Generate keywords Online • Mapping keywords to generate concept candidates • Generate new keywords • Categorizing new keywords to concept clusters Web Directory Query Term Movie Mathematics Animation Concept Hierarchy Category matlab calculator mathematica linear algebra … Concept Contents Web Contents Suggestion 7 • Built from Web Directories (e.g., ODP) x High coverage and accuracy • Categories as concepts • Structure as relationship Computers Computer Science, Software, Hardware… Computer Science Artificial Intelligence, Computer Graphics, Software Engineering… 8 The meaning of concepts: keywords • Phrases gathered from Web Pages under each concept node in taxonomy x Keyword extraction x Accumulate meaning from sub-concepts Programming Computer System … Algorithm Programming Computer … Machine Learning Neural Networks Algorithm … Shareware Programming System Algorithm Computer Programming … CPU Computer System … Programming Project Management Algorithm … 9 What keywords are representative for a concept? A keyword is good if • It is commonly used within the concept (high document frequency) • It is seldom used by other concepts (low concept frequency) • Similar to TF-IDF, we derive a new keyword evaluation criterion: x The Document-Frequency, Inverse-Concept Frequency (DF·ICF) factor 10 Concept #43037 1. computer science 2. department of computer science 3. computer 4. computing 5. university 6. computer science department 7. department 8. research 9. science 10. theoretical computer science Concept #43056 Top/Computers/ Computer_Science/People 1. computer science 0.08 2. programming languages 0.04 3. university 0.03 4. university of edinburgh 0.03 5. software engineering 0.03 6. database systems 0.03 7. indian institute 0.02 8. algorithms 0.02 9. computer 0.02 10. distributed systems 0.02 Top/Computers/Computer_Science 0.12 0.05 0.04 0.03 0.03 0.03 0.03 0.03 0.03 0.02 15. distributed systems 20. software engineering 21. computational complexity 22. complexity 23. programming languages 24. database systems 26. complexity theory 27. quantum 29. algorithms 32. quantum information Concept #84417 Top/Computers/Computer_Science/ Academic_Departments 1. computer science 2. department of computer science 3. computer 4. department 5. computer science department 6. science 7. computing 8. university 9. research 10. computer science department 0.25 0.14 0.10 0.09 0.08 0.07 0.05 0.04 0.04 0.13 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 <0.01 <0.01 Concept #259003 Top/Computers/Computer_Science/ Theoretical 1. complexity 2. computational complexity 3. quantum 4. complexity theory 5. computer science 6. quantum information 7. theoretical computer science 8. university 9. quantum computing 10. theory 0.09 0.08 0.08 0.08 0.08 0.07 0.07 0.05 0.04 0.03 11 Keyword Suggestion by merging Weighted union: from query q to term t • Sim(q,t)= ∑c Weight(q->c)*Weight(c->t) Merged List matlab calculator matrix reloaded matrix keanu reeves film toolbox mathematica … 12 Categorizing the Keyword List Keyword list Categorization • Partition keyword list using concept hierarchy • For avoiding concepts from interfering with each other Present the Advertisers with categories 13 Dataset • 1,306,586 web pages from the 150,446 ODP categories Experiments • Random Selection of test concepts and keywords • 3 labelers are asked to critique the relevance of keywords 14 Accuracy of ranked keyword suggestion: • Baseline: document frequency (DF) SCF: Sub-category Freq LCF: Local Concept Frequency 15 Accuracy of keyword extraction: • Randomly sampled 100 keywords • Baseline: DF SCF: Sub-category Freq LCF: Local Concept Frequency 16 Completeness of suggestion: all distinct meanings found? • Ambiguous queries • Baseline: DF 17 Disambiguition Performance • Suggestion without categorizing • Baseline: co-occurrence DFSCF DF DFLCF Baseline 1.00 Precision 0.95 0.90 0.85 0.80 0.75 0.0 0.2 0.4 0.6 Recall 0.8 1.0 18 Case Study with query “matrix” Ours Top/Arts matrix revolutions Top/Science/Math matlab matrix screensaver matrix reloaded calculator matrix reloaded toolbox mathematica functions software linear algebra scientific calculator matrix revolutions matrix multiplication matrix inverse matrix soundtrack matrix wallpaper rotation matrix algebra matrix code matlab toolbox biochemistry graphing analysis matrix revolution matrix inversion determinant matrix trinity matrix calculators math matrix matrix review keanu reeves film neo matrix trilogy movie matrix revolutions review revolutions larry wachowski reloaded matrix movies agent smith laurence fishburne andy wachowski morpheus neo and trinity matrix movie matrix revisited data analysis computer algebra system department linear archaeology molecular biology Google Overture the matrix belief bridging divine matrix miracle space time toyota matrix matrix reloaded matrix screensaver matrix revolution matrix game matrix soundtrack dot matrix printer symmetric matrix matrix online enter the matrix matrix movie matrix 3d enter guide matrix official strategy matrix mris matrix properties matrix hair product rank matrix algebra matrix matrix product matrix hacking matrix wallpaper matrix trilogy matrix neo path matrix shampoo WordTracker matrix the matrix matrix reloaded toyota matrix matrix soundtrack matrix revolutions matrix theme matrix wallpaper matrix mp3 matrix screensaver matrix background the matrix reloaded matrix ping pong matrix code matrix movie 19 Our work • A novel approach for advertising keyword suggestion. • Key idea: associate concepts with keywords tightly. • Disambiguation to avoid interfering. Future work • Beyond using web pages • keep concept hierarchy up-to-date x Automatic content reinforcement from web content such as query logs 20