Learning from the Users: from Search to
Transcription
Learning from the Users: from Search to
Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Learning from the Users: from Search to Adaptive Search Udo Kruschwitz School of Computer Science and Electronic Engineering University of Essex [email protected] 18th March 2011 Udo Kruschwitz - DCU - 18 March 2011 1 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Overview I Motivation and context I Exploiting query logs I Adaptive search I AutoEval: evaluating adaptive search I Next steps Udo Kruschwitz - DCU - 18 March 2011 2 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Context I Collection of documents, e.g. digital library, local Web site, intranet I Not Web search in general I Ad hoc queries Udo Kruschwitz - DCU - 18 March 2011 3 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Problems I Common problem with too many matches I I I General queries Ambiguous queries Short queries I Data sparsity problem I Typical intranet problem: recall can be important (e.g. single matching document) I Express information need as a query I Usable knowledge sources not available Udo Kruschwitz - DCU - 18 March 2011 4 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Another Problem (Source: http://xkcd.com/773) Udo Kruschwitz - DCU - 18 March 2011 5 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Our Approach I I Search system that makes suggestions using automatically extracted domain knowledge But ... I I I Domain knowledge is noisy and incomplete System suggestions not always useful/helpful Document collection is changing I Learn from the users’ interactions I Improve system over time by adapting to the users’ search behaviour I No single user profile but “community profile” Udo Kruschwitz - DCU - 18 March 2011 6 Overview Log Data Udo Kruschwitz - DCU - 18 March 2011 Adaptive Search Evaluation Next Steps Conclusions 7 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Partial Domain Knowledge (Web Site) registration dates office undergraduate Udo Kruschwitz - DCU - 18 March 2011 card ... regulations essex ... ... ... students jobshop ... ... 8 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Partial Domain Knowledge (Digital Library) Udo Kruschwitz - DCU - 18 March 2011 9 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Applying Domain Knowledge - General Idea I I Combine standard search system with initial domain model Utilize domain model to construct I I query refinements query relaxations I Visual graph representation for navigation I Present suggestions alongside matching documents Udo Kruschwitz - DCU - 18 March 2011 10 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Log Data Example (Web site) ... 33136 33137 1FEE0F65A1DA07ABE70F497C900D5E7E Wed Jan 02 08:36:02 GMT 2008 \ 0 0 0 posrgarduate application form \ posrgarduate application form posrgarduate application form 1FEE0F65A1DA07ABE70F497C900D5E7E Wed Jan 02 08:36:58 GMT 2008 \ 1 0 1 application application application<r> ... Udo Kruschwitz - DCU - 18 March 2011 11 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Log Data Example (Digital Library) ... 903779;guest;83.33.xxx.xxx;83et8b7j010eh4vlht3ucj8dl1;en; ("pomegranate fertilization");search_sim;;0;-;;;2007-10-05 13:52:30 ... 1889115;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en; ("mozart");search_url;;0;-;;;2008-06-24 22:02:52 ... 1889118;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en; ("mozart");view_full;;1;;;;2008-06-24 22:03:03 ... 1889120;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en; Klavierkonzerte;search_res_rec_all;;0;-;;;2008-06-24 22:03:55 1889121;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en; ("klavierkonzerte");view_full;;1;;;;2008-06-24 22:04:10 ... Udo Kruschwitz - DCU - 18 March 2011 12 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Using Log Data to Acquire a Domain Model I Queries submitted by users I Identify sessions I Associate related queries (many possible ways of doing so) I Result is a query association graph (of some sort) Udo Kruschwitz - DCU - 18 March 2011 13 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Using Log Data to Acquire a Domain Model - Example ... 903779;guest;83.33.xxx.xxx;83et8b7j010eh4vlht3ucj8dl1;en; ("pomegranate fertilization");search_sim;;0;-;;;2007-10-05 13:52:30 ... 1889115;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en; ("mozart");search_url;;0;-;;;2008-06-24 22:02:52 ... 1889118;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en; ("mozart");view_full;;1;;;;2008-06-24 22:03:03 ... 1889120;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en; Klavierkonzerte;search_res_rec_all;;0;-;;;2008-06-24 22:03:55 1889121;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en; ("klavierkonzerte");view_full;;1;;;;2008-06-24 22:04:10 ... Udo Kruschwitz - DCU - 18 March 2011 14 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Using Log Data to Acquire a Domain Model - Example ... 8eb3bdv3odg9jncd71u0s2aff6 xxxx 1889115 xxxx mozart xxxx 2008-06-24 22:02:52 8eb3bdv3odg9jncd71u0s2aff6 xxxx 1889120 xxxx klavierkonzerte xxxx 2008-06-24 22 ... Udo Kruschwitz - DCU - 18 March 2011 15 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Using Log Data to Acquire a Domain Model - Example Udo Kruschwitz - DCU - 18 March 2011 16 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Our Log Data I We use query logs collected on different collections, e.g. I I University of Essex intranet search engine: almost 2 million queries (since Nov 2007) The European Library: 1.8 million interactions (Jan 2007 - Jun 2008) I Query log analysis (not discussed here) I Bootstrap (adaptive) domain models Udo Kruschwitz - DCU - 18 March 2011 17 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Towards Adaptive Search I I I I I I I I Start by employing initially extracted domain knowledge Observe user interaction with the system Incorporate clickthrough trails Use this implicit relevance feedback to adjust domain knowledge accordingly Do this fully automatically Aim: evolving domain knowledge that adjusts to the users’ search behaviour Should learn common patterns over time, e.g. “map” → “campus map” Should deal with seasonal terms appropriately, e.g. “registration” This should improve search ... Udo Kruschwitz - DCU - 18 March 2011 18 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions ... and Navigation Udo Kruschwitz - DCU - 18 March 2011 19 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Automatic Domain Model Adaptation Variety of adaptation models, including: I Exploiting Maximum Likelihood Estimates (MLE) I Formal Concept Analysis (FCA) I Ant Colony Optimization analogy (ACO) I Adaptive Intranet Navigation ... let’s quickly look at the first three approaches. Udo Kruschwitz - DCU - 18 March 2011 20 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions MLE: Domain Model derived from Query Logs q1 registration registration registration registration ... online registration ... registration office registration office ... q2 online registration registration office timetable enrol ... registration ... careers centre albert sloman library ... MLE 0.045 0.035 0.025 0.020 ... 0.211 ... 0.053 0.053 ... enrol course enrolment 0.050 Udo Kruschwitz - DCU - 18 March 2011 21 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions MLE: Domain Model derived from Query Logs II Udo Kruschwitz - DCU - 18 March 2011 22 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions MLE: Reminder - Original Domain Knowledge registration dates office undergraduate Udo Kruschwitz - DCU - 18 March 2011 card ... regulations essex ... ... ... students jobshop ... ... 23 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions MLE: Results I Studied 18-months log file of Essex search engine (about 670,000 queries) I Sampled frequent and less frequent queries I User study to assess quality of derived term suggestions I Compared these terms to alternatives, e.g. Google suggestions, snippet approach, association rules approach I Log analysis beats alternatives I Session-based approach very powerful, dialogue-based even better Udo Kruschwitz - DCU - 18 March 2011 24 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions FCA Approach to Adaptation I Lattice structure representing terms and corresponding documents I Concept in lattice defined by objects (URLs) and attributes (terms) Udo Kruschwitz - DCU - 18 March 2011 25 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions FCA Approach to Adaptation II I Learn from past user queries (implicit relevance judgements) using relative judgements (Radlinski & Joachims, 2005) I Train a classifier (SVM) that associates terms with documents I Rerun lattice construction Udo Kruschwitz - DCU - 18 March 2011 26 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions FCA: Architecture ! " %( ' $ ! % & %( $ & ' " # # Udo Kruschwitz - DCU - 18 March 2011 27 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions FCA: Screenshot Udo Kruschwitz - DCU - 18 March 2011 28 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions FCA: Results I User study similar to MLE evaluation I Sampled frequent and less frequent queries I User study to assess quality of derived term suggestions I Compared these terms to alternatives, e.g. association rules approach, and unadapted FCA lattice I FCA adaptation beats both alternatives Udo Kruschwitz - DCU - 18 March 2011 29 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions ACO: Adaptive Domain Model I Simple approach using the idea of ant colony optimization: I I I I I Update domain model in daily batches Add weight to query pairs observed that day, e.g. “libraary” → “library” Normalise the weights so that all outgoing graphs in a node sum up to 1 And so on ... Idea: learn associations as they become popular, allow for forgetting relations as well! Udo Kruschwitz - DCU - 18 March 2011 30 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions ACO: Results I I I I I I Again: user studies to assess quality of derived term suggestions Two studies: Essex university logs (Essex), European Library logs (TEL) Compared these terms to alternatives, e.g. Google suggestions, association rule approach, snippet processing Essex: ACO beats all alternatives and suggestions improve over time TEL: ACO better than association rule approach but not snippet baseline Suggestions derived using different methods can be complementary (TEL) ... see (Kruschwitz et al., 2011) and (Dignum et al., 2010) Udo Kruschwitz - DCU - 18 March 2011 31 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions AutoEval: Evaluate Adaptive Search I Limitations of user studies I Evaluate suggestions without recruiting subjects I Compare different models automatically I Idea: use log files and exploit past user interactions domain model query recommendations current day 1/1 actual modification 1/2 1/4 score = (1/2 + 1/4 + 1/1)/3 = 0.583 ... come along to our ECIR’11 talk Udo Kruschwitz - DCU - 18 March 2011 32 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions AutoEval Results (Web Site) 0.12 0.1 M R R 0.08 s 0.06 c o r e s 0.04 Fonseca ACO nootropia 0.02 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Weeks: 1 Oct 2008 - 23 Dec 2009 Udo Kruschwitz - DCU - 18 March 2011 33 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions AutoEval Results (Digital Library) 0.018 0.016 0.014 M R R 0.012 0.01 s c o r e s ACO FONSECA(MinSup=2) 0.008 FONSECA(MinSup=3) 0.006 0.004 0.002 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Months Jan 2007 - Jun 2008 Udo Kruschwitz - DCU - 18 March 2011 34 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Next Steps I Prototypes to go live at the University of Essex I Work with industrial partners I TREC 2011 and CLEF 2011 I Ongoing EPSRC project (Essex, Robert Gordon University Aberdeen & Open University): AutoAdapt (November 2008 November 2011) ... any collaboration welcome! Udo Kruschwitz - DCU - 18 March 2011 35 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Conclusions I Adaptive search by exploiting query logs I Adaptive domain models can be learned, experiments with different approaches demonstrate this I Have to deal with noisy data I Data sparsity I Navigation support as a suitable alternative to query suggestions (Saad & Kruschwitz, 2011) Udo Kruschwitz - DCU - 18 March 2011 36 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions Acknowledgements I Dyaa Albakour, Nikolaos Nanas, Jinzhong Niu, Dawei Song, Anne De Roeck, Maria Fasli, Stephen Dignum (EPSRC-funded AutoAdapt research project: http://autoadaptproject.org/ I Deirdre Lungley, Sharhida Saad (Language and Computation Group at Essex) I Johannes Leveling (DCU) Udo Kruschwitz - DCU - 18 March 2011 37 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions References I U. Kruschwitz, M-D. Albakour, J. Niu, J. Leveling, N. Nanas, Y. Kim, D. Song, M. Fasli, and A. De Roeck. Moving Towards Adaptive Search in Digital Libraries. In Advanced Technologies for Digital Libraries. Lecture Notes in Computer Science (Hot Topics). Springer. Forthcoming. I M-D. Albakour, U. Kruschwitz, N. Nanas, Y. Kim, D. Song, M. Fasli, and A. De Roeck. AutoEval: An Evaluation Methodology for Evaluating Query Suggestions Using Query Logs. In Proceedings of ECIR 2011, Forthcoming. I M-D. Albakour, U. Kruschwitz, J. Niu, M. Fasli. University of Essex at the TREC 2010 Session Track. In Proceedings of the Nineteenth Text Retrieval Conference (TREC 2010), NIST, 2011. Udo Kruschwitz - DCU - 18 March 2011 38 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions References II I S. Z. Saad & U. Kruschwitz. Applying Web Usage Mining for Adaptive Intranet Navigation. In Proceedings of the 2nd Information Retrieval Facility Conference, 2011, Forthcoming. I D. Lungley & U. Kruschwitz. Automatically Maintained Domain Knowledge: Initial Findings. ECIR: 739-743, 2009. I S. Dignum, U. Kruschwitz, M. Fasli, Y. Kim, D. Song, U. Cervino & A. De Roeck. Incorporating Seasonality into Search Suggestions Derived from Intranet Query Logs. Web Intelligence (WI’10), 2010. Udo Kruschwitz - DCU - 18 March 2011 39 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions ECIR 2011 Industry Day in Dublin I Co-located with last day of ECIR 2011 I Dedicated to the latest innovations in web and enterprise search I Mix of research and application I Excellent list of speakers I Highly interactive and collegial, with attendance limited to 60-80 delegates I Date: 21st April I http://ecir2011.dcu.ie/program/industry-day/ Udo Kruschwitz - DCU - 18 March 2011 40 Overview Log Data Adaptive Search Evaluation Next Steps Conclusions That’s it ... Udo Kruschwitz - DCU - 18 March 2011 41
Similar documents
Moving Towards Adaptive Intranet Search
Sample Interactions: Query Refinements with Highest MLE
More information