Moving Towards Adaptive Intranet Search
Transcription
Moving Towards Adaptive Intranet Search
Overview Intranet Search Log Analysis Adaptive Search Moving Towards Adaptive Intranet Search Udo Kruschwitz School of Computer Science and Electronic Engineering University of Essex [email protected] 29th July 2010 Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 1 Overview Intranet Search Log Analysis Adaptive Search Overview I Motivation & context I Prototype on the Essex intranet I Preliminary log analysis I Steps towards adaptation Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 2 Overview Intranet Search Log Analysis Adaptive Search Context I Collection of documents, e.g. local Web sites, corporate or academic intranet I Not Web search in general I Ad hoc queries Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 3 Overview Intranet Search Log Analysis Adaptive Search Problems I Common problem with too many matches I I I General queries Ambiguous queries Short queries I Data sparsity problem I Typical intranet problem: recall can be important (e.g. single matching document) I Express information need as a query I Usable knowledge sources not available Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 4 Overview Intranet Search Log Analysis Adaptive Search Our Approach I I Search system that makes suggestions using automatically extracted domain knowledge But ... I I I Domain knowledge is noisy and incomplete System suggestions not always useful/helpful Document collection is changing I Learn from the users’ interactions I Improve system over time by adapting to the users’ search behaviour Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 5 Overview Intranet Search Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 Log Analysis Adaptive Search 6 Overview Intranet Search Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 Log Analysis Adaptive Search 7 Overview Intranet Search Log Analysis Adaptive Search What Sort of Domain Knowledge? I Online clustering (e.g. Vivisimo) I Subsumption hierarchies (Sanderson & Croft 1999) I Lexical modification approach (Anick & Tipirneni 1999) I Formal Concept Analysis (e.g. CREDO) I Term associations using neural networks and fuzzy logic (e.g. Aquabrowser) I Flat list of terms using simple tf.idf applied to top matching documents I ... Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 8 Overview Intranet Search Log Analysis Adaptive Search Partial Domain Knowledge (Example) registration dates office undergraduate card ... regulations essex ... ... ... students jobshop ... ... Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 9 Overview Intranet Search Log Analysis Adaptive Search Applying Domain Knowledge - General Idea I I Combine standard search system with initial domain model Utilize domain model to construct I I I query refinements query relaxations Present suggestions alongside matching documents Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 10 Overview Intranet Search Log Analysis Adaptive Search Log Data Collection I Essex intranet search engine I Originally running alongside standard Essex search engine I Operating since summer 2006 I About 40,000 queries collected in 12 months I November 2007: system replaced old search engine altogether ... more that 1 million queries collected since then Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 11 Overview Intranet Search Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 Log Analysis Adaptive Search 12 Overview Intranet Search Log Analysis Adaptive Search Towards Adaptive Intranet Search I Start by employing initially extracted domain knowledge I Observe user interaction with the system I Incorporate clickthrough trails I Use this implicit relevance feedback to adjust domain knowledge accordingly I Aim: evolving domain knowledge that adjusts to the users’ search behaviour ... let’s see what the log files tell us so far. Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 13 Overview Intranet Search Log Analysis Adaptive Search Observations I I More than 10% of queries are modifications! ... suggests the general system setup makes sense Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 14 Overview Intranet Search Log Analysis Adaptive Search Most Frequent User Queries (since Nov 2007) 52841 28183 25423 16410 9413 8535 7366 6938 6426 6318 5853 5634 5504 4865 moodle library search timetable graduation cmr enrol accomodation accommodation ocs psychology exam timetable term dates courses Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 15 Overview Intranet Search Log Analysis Adaptive Search Observations II I Queries are domain-specific I This is different from general Web search ... suggests that domain-independent knowledge (e.g. WordNet, Google n-grams) might not be suitable. Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 16 Overview Intranet Search Log Analysis Adaptive Search Frequent Queries 350 300 250 200 timetable calendar 150 100 50 0 Aug Sep Oct Nov Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 Dec Jan 17 Overview Intranet Search Log Analysis Adaptive Search Query Traffic I 5000 4500 Number of Queries 4000 3500 3000 2500 2000 1500 1000 500 0 1 3 5 7 9 11 13 15 17 19 21 23 Hour of Day Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 18 Overview Intranet Search Log Analysis Adaptive Search Query Traffic II 400 350 Number of Queries 300 250 200 150 100 50 0 1.Oct 1.Nov 1.Dec 1.Jan 1.Feb 1.Mar 1.Apr Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 1.May 1.Jun 1.Jul 1.Aug 1.Sep 19 Overview Intranet Search Log Analysis Adaptive Search Observations III I All sorts of variations (seasonal, usage patterns, ...) I Again different from general Web search ... suggests that system should perhaps adapt to “context”. Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 20 Overview Intranet Search Log Analysis Adaptive Search Query Statistics Number of Queries Average Query Length Length of Longest Query Queries with Spelling Errors Fraction of Query Corpus Set 1 100 1.54 3 2% 20% Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 Set 2 40,006 1.98 17 ≈6% 100% 21 Overview Intranet Search Log Analysis Adaptive Search Observations IV I Queries are even shorter than on the Web! Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 22 Overview Intranet Search Log Analysis Adaptive Search User-System Interaction I More system suggestions than manual modifications I More additions of terms than replacements I Long tail of modifications only submitted once Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 23 Overview Intranet Search Log Analysis Adaptive Search Sample Interactions: Frequent Query Pairs Frequency 122 92 77 70 69 68 68 62 61 55 54 52 q1 fees accomodation student union accomodation post room my essex mondo map time table foundation su libary Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 q2 tuition fees accommodation students union accomodation office postroom myessex mondo pizza campus map timetable foundation degree student union library 24 Overview Intranet Search Log Analysis Adaptive Search Sample Interactions: Query Pairs with Highest MLE q1 moole email on we moodlr lbrary timetabke myesex moodke libraary student suport regisrty moodlew graduatin caterin q2 moodle email on web moodle library timetable myessex moodle library student support registry moodle graduation catering Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 MLE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 25 Overview Intranet Search Log Analysis Adaptive Search Sample Interactions: Query Refinements with Highest MLE q1 film society erol mood e exam timetable 09 web email w800 the islamic society stavrakakis norvald mouffe q2 art film society enrol moodle exam dates email creative writing islamic society yannis stavrakakis norvald instefjord chantal mouffe Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 MLE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 26 Overview Intranet Search Log Analysis Adaptive Search Observations V I Users select query modification options on a continuing basis I Lots of implicit (domain-specific) relationships Different “types” of relationships, e.g. I I I I Dialogue-based vs. session-based interactions Adding vs. replacing query terms But: data sparsity issues Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 27 Overview Intranet Search Log Analysis Adaptive Search Automatic Domain Model Adaptation I Use log analysis to improve search suggestions I Do this fully automatically I Should learn common patterns over time, e.g. “map” → “campus map” I Should deal with seasonal terms appropriately, e.g. “registration” Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 28 Overview Intranet Search Log Analysis Adaptive Search Automatic Domain Model Adaptation II Variety of adaptation models, including: I Exploiting Maximum Likelihood Estimates (MLE) I Formal Concept Analysis (FCA) I Ant Colony Optimization analogy (ACO) ... let’s quickly look at these three approaches. Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 29 Overview Intranet Search Log Analysis Adaptive Search MLE: Domain Model derived from Query Logs q1 registration registration registration registration ... online registration ... registration office registration office ... q2 online registration registration office timetable enrol ... registration ... careers centre albert sloman library ... MLE 0.045 0.035 0.025 0.020 ... 0.211 ... 0.053 0.053 ... enrol course enrolment 0.050 Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 30 Overview Intranet Search Log Analysis Adaptive Search MLE: Domain Model derived from Query Logs II Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 31 Overview Intranet Search Log Analysis Adaptive Search MLE: Reminder - Original Domain Knowledge registration dates office undergraduate card ... regulations essex ... ... ... students jobshop ... ... Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 32 Overview Intranet Search Log Analysis Adaptive Search MLE: Initial Findings I Studied 18-months log file of Essex search engine (about 670,000 queries) I Sampled frequent and less frequent queries I User study to assess quality of derived term suggestions I Compared these terms to alternatives, e.g. Google suggestions, snippet approach, association rules approach (Fonseca et al., 2003) I Log analysis beats alternatives I Session-based approach very powerful, dialogue-based even better Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 33 Overview Intranet Search Log Analysis Adaptive Search FCA Approach to Adaptation I Lattice structure representing terms and corresponding documents I Concept in lattice defined by objects (URLs) and attributes (terms) Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 34 Overview Intranet Search Log Analysis Adaptive Search FCA Approach to Adaptation II I Learn from past user queries (implicit relevance judgements) using relative judgements (Radlinski & Joachims, 2005) I Train a classifier (SVM) that associates terms with documents I Rerun lattice construction I Promising evidence that lattice improves over time (Lungley & Kruschwitz, 2009) Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 35 Overview Intranet Search Log Analysis Adaptive Search FCA: Architecture ! " %( ' $ ! % & %( $ & ' " # # Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 36 Overview Intranet Search Log Analysis Adaptive Search FCA: Lattice Example Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 37 Overview Intranet Search Log Analysis Adaptive Search FCA: Screenshot Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 38 Overview Intranet Search Log Analysis Adaptive Search ACO: Adaptive Domain Model I Simple approach using the idea of ant colony optimization: I I I I I Update domain model in daily batches Add weight to query pairs observed that day, e.g. “libraary” → “library” Normalise the weights so that all outgoing graphs in a node sum up to 1 And so on ... Idea: learn associations as they become popular, allow for forgetting relations as well! Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 39 Overview Intranet Search Log Analysis Adaptive Search Accuracy Score ACO: Initial Findings 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Weeks Ant Colony Optimisation Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 Fonseca Baseline 40 Overview Intranet Search Log Analysis Adaptive Search Next Steps I Adaptive search engine on the Essex Web site I EPSRC project (Essex, Robert Gordon University Aberdeen & Open University): AutoAdapt (November 2008 - November 2011) I Want to get involved? Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 41 Overview Intranet Search Log Analysis Adaptive Search Conclusions I Query logs for adaptive intranet search I Utilize automatically acquired domain knowledge I Evolve domain model based on users’ search behaviour (domain-specific!) I Promising direction: clickthrough data, lattice structures, query logs Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 42 Overview Intranet Search Log Analysis Adaptive Search Acknowledgements I Deirdre Lungley (PhD student) I Dyaa Albakour, Dawei Song, Anne De Roeck, Maria Fasli, Stephen Dignum, Yunhyong Kim, Nikolaos Nanas (AutoAdapt research project) I Essex University Web and Learning Technology group Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 43 Overview Intranet Search Log Analysis Adaptive Search Search Solutions 2010 I Annual one-day event at BCS Headquarters (Southampton Street) I Dedicated to the latest innovations in web and enterprise search I Highly interactive and collegial, with attendance limited to 60-80 delegates I Let me know if you are interested (attending or talking)! I Date: 21st October I More info soon on http://irsg.bcs.org/ ... and ECIR 2011 Industry Day (21st April 2011 in Dublin) Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 44 Overview Intranet Search Log Analysis Adaptive Search References I M. Sanderson & B. Croft. Deriving concept hierarchies from text. SIGIR: 206–213, 1999. I P. Anick & S. Tipirneni. The paraphrase search assistant: terminological feedback for iterative information seeking. SIGIR: 153–159, 1999. I F. Radlinski & T. Joachims. Query Chains: learning to rank from implicit feedback. SIGKDD: 239–248. 2003. I B. M. Fonseca, P. B. Golgher, E. S. de Moura & N. Ziviani. Using Association Rules to Discover Search Engines Related Queries, Proceedings of the First Latin American Web Congress, 2003. Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 45 Overview Intranet Search Log Analysis Adaptive Search References II I I I D. Lungley & U. Kruschwitz. Automatically Maintained Domain Knowledge: Initial Findings. ECIR: 739-743, 2009. S. Dignum, U. Kruschwitz, M. Fasli, Y. Kim, D. Song, U. Cervino & A. De Roeck. Incorporating Seasonality into Search Suggestions Derived from Intranet Query Logs. Web Intelligence (WI’10), 2010. U. Kruschwitz. Intelligent Document Retrieval. Springer, 2005. Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010 46
Similar documents
Learning from the Users: from Search to
Methodology for Evaluating Query Suggestions Using Query Logs. In Proceedings of ECIR 2011, Forthcoming.
More information