Learning from the Users: from Search to

Transcription

Learning from the Users: from Search to
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Learning from the Users:
from Search to Adaptive Search
Udo Kruschwitz
School of Computer Science and Electronic Engineering
University of Essex
[email protected]
18th March 2011
Udo Kruschwitz - DCU - 18 March 2011
1
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Overview
I
Motivation and context
I
Exploiting query logs
I
Adaptive search
I
AutoEval: evaluating adaptive search
I
Next steps
Udo Kruschwitz - DCU - 18 March 2011
2
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Context
I
Collection of documents, e.g.
digital library, local Web site, intranet
I
Not Web search in general
I
Ad hoc queries
Udo Kruschwitz - DCU - 18 March 2011
3
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Problems
I
Common problem with too many matches
I
I
I
General queries
Ambiguous queries
Short queries
I
Data sparsity problem
I
Typical intranet problem: recall can be important
(e.g. single matching document)
I
Express information need as a query
I
Usable knowledge sources not available
Udo Kruschwitz - DCU - 18 March 2011
4
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Another Problem
(Source: http://xkcd.com/773)
Udo Kruschwitz - DCU - 18 March 2011
5
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Our Approach
I
I
Search system that makes suggestions using automatically
extracted domain knowledge
But ...
I
I
I
Domain knowledge is noisy and incomplete
System suggestions not always useful/helpful
Document collection is changing
I
Learn from the users’ interactions
I
Improve system over time by adapting to the users’ search
behaviour
I
No single user profile but “community profile”
Udo Kruschwitz - DCU - 18 March 2011
6
Overview
Log Data
Udo Kruschwitz - DCU - 18 March 2011
Adaptive Search
Evaluation
Next Steps
Conclusions
7
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Partial Domain Knowledge (Web Site)
registration
dates
office
undergraduate
Udo Kruschwitz - DCU - 18 March 2011
card
...
regulations
essex
...
...
...
students
jobshop
...
...
8
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Partial Domain Knowledge (Digital Library)
Udo Kruschwitz - DCU - 18 March 2011
9
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Applying Domain Knowledge - General Idea
I
I
Combine standard search system with initial domain model
Utilize domain model to construct
I
I
query refinements
query relaxations
I
Visual graph representation for navigation
I
Present suggestions alongside matching documents
Udo Kruschwitz - DCU - 18 March 2011
10
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Log Data Example (Web site)
...
33136
33137
1FEE0F65A1DA07ABE70F497C900D5E7E Wed Jan 02 08:36:02 GMT 2008 \
0 0 0 posrgarduate application form \
posrgarduate application form
posrgarduate application form
1FEE0F65A1DA07ABE70F497C900D5E7E Wed Jan 02 08:36:58 GMT 2008 \
1 0 1 application application application<r>
...
Udo Kruschwitz - DCU - 18 March 2011
11
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Log Data Example (Digital Library)
...
903779;guest;83.33.xxx.xxx;83et8b7j010eh4vlht3ucj8dl1;en;
("pomegranate fertilization");search_sim;;0;-;;;2007-10-05 13:52:30
...
1889115;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en;
("mozart");search_url;;0;-;;;2008-06-24 22:02:52
...
1889118;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en;
("mozart");view_full;;1;;;;2008-06-24 22:03:03
...
1889120;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en;
Klavierkonzerte;search_res_rec_all;;0;-;;;2008-06-24 22:03:55
1889121;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en;
("klavierkonzerte");view_full;;1;;;;2008-06-24 22:04:10
...
Udo Kruschwitz - DCU - 18 March 2011
12
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Using Log Data to Acquire a Domain Model
I
Queries submitted by users
I
Identify sessions
I
Associate related queries (many possible ways of doing so)
I
Result is a query association graph (of some sort)
Udo Kruschwitz - DCU - 18 March 2011
13
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Using Log Data to Acquire a Domain Model - Example
...
903779;guest;83.33.xxx.xxx;83et8b7j010eh4vlht3ucj8dl1;en;
("pomegranate fertilization");search_sim;;0;-;;;2007-10-05 13:52:30
...
1889115;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en;
("mozart");search_url;;0;-;;;2008-06-24 22:02:52
...
1889118;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en;
("mozart");view_full;;1;;;;2008-06-24 22:03:03
...
1889120;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en;
Klavierkonzerte;search_res_rec_all;;0;-;;;2008-06-24 22:03:55
1889121;guest;71.249.xxx.xxx;8eb3bdv3odg9jncd71u0s2aff6;en;
("klavierkonzerte");view_full;;1;;;;2008-06-24 22:04:10
...
Udo Kruschwitz - DCU - 18 March 2011
14
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Using Log Data to Acquire a Domain Model - Example
...
8eb3bdv3odg9jncd71u0s2aff6 xxxx 1889115 xxxx mozart xxxx 2008-06-24 22:02:52
8eb3bdv3odg9jncd71u0s2aff6 xxxx 1889120 xxxx klavierkonzerte xxxx 2008-06-24 22
...
Udo Kruschwitz - DCU - 18 March 2011
15
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Using Log Data to Acquire a Domain Model - Example
Udo Kruschwitz - DCU - 18 March 2011
16
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Our Log Data
I
We use query logs collected on different collections, e.g.
I
I
University of Essex intranet search engine:
almost 2 million queries (since Nov 2007)
The European Library:
1.8 million interactions (Jan 2007 - Jun 2008)
I
Query log analysis (not discussed here)
I
Bootstrap (adaptive) domain models
Udo Kruschwitz - DCU - 18 March 2011
17
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Towards Adaptive Search
I
I
I
I
I
I
I
I
Start by employing initially extracted domain knowledge
Observe user interaction with the system
Incorporate clickthrough trails
Use this implicit relevance feedback to adjust domain
knowledge accordingly
Do this fully automatically
Aim: evolving domain knowledge that adjusts to the users’
search behaviour
Should learn common patterns over time,
e.g. “map” → “campus map”
Should deal with seasonal terms appropriately,
e.g. “registration”
This should improve search ...
Udo Kruschwitz - DCU - 18 March 2011
18
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
... and Navigation
Udo Kruschwitz - DCU - 18 March 2011
19
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Automatic Domain Model Adaptation
Variety of adaptation models, including:
I
Exploiting Maximum Likelihood Estimates (MLE)
I
Formal Concept Analysis (FCA)
I
Ant Colony Optimization analogy (ACO)
I
Adaptive Intranet Navigation
... let’s quickly look at the first three approaches.
Udo Kruschwitz - DCU - 18 March 2011
20
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
MLE: Domain Model derived from Query Logs
q1
registration
registration
registration
registration
...
online registration
...
registration office
registration office
...
q2
online registration
registration office
timetable
enrol
...
registration
...
careers centre
albert sloman library
...
MLE
0.045
0.035
0.025
0.020
...
0.211
...
0.053
0.053
...
enrol
course enrolment
0.050
Udo Kruschwitz - DCU - 18 March 2011
21
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
MLE: Domain Model derived from Query Logs II
Udo Kruschwitz - DCU - 18 March 2011
22
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
MLE: Reminder - Original Domain Knowledge
registration
dates
office
undergraduate
Udo Kruschwitz - DCU - 18 March 2011
card
...
regulations
essex
...
...
...
students
jobshop
...
...
23
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
MLE: Results
I
Studied 18-months log file of Essex search engine
(about 670,000 queries)
I
Sampled frequent and less frequent queries
I
User study to assess quality of derived term suggestions
I
Compared these terms to alternatives, e.g. Google
suggestions, snippet approach, association rules approach
I
Log analysis beats alternatives
I
Session-based approach very powerful, dialogue-based even
better
Udo Kruschwitz - DCU - 18 March 2011
24
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
FCA Approach to Adaptation
I
Lattice structure representing terms and corresponding
documents
I
Concept in lattice defined by objects (URLs) and attributes
(terms)
Udo Kruschwitz - DCU - 18 March 2011
25
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
FCA Approach to Adaptation II
I
Learn from past user queries (implicit relevance judgements)
using relative judgements (Radlinski & Joachims, 2005)
I
Train a classifier (SVM) that associates terms with documents
I
Rerun lattice construction
Udo Kruschwitz - DCU - 18 March 2011
26
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
FCA: Architecture
!
"
%(
'
$
!
%
&
%(
$
&
'
"
#
#
Udo Kruschwitz - DCU - 18 March 2011
27
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
FCA: Screenshot
Udo Kruschwitz - DCU - 18 March 2011
28
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
FCA: Results
I
User study similar to MLE evaluation
I
Sampled frequent and less frequent queries
I
User study to assess quality of derived term suggestions
I
Compared these terms to alternatives, e.g. association rules
approach, and unadapted FCA lattice
I
FCA adaptation beats both alternatives
Udo Kruschwitz - DCU - 18 March 2011
29
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
ACO: Adaptive Domain Model
I
Simple approach using the idea of ant colony optimization:
I
I
I
I
I
Update domain model in daily batches
Add weight to query pairs observed that day, e.g.
“libraary” → “library”
Normalise the weights so that all outgoing graphs in a node
sum up to 1
And so on ...
Idea: learn associations as they become popular, allow for
forgetting relations as well!
Udo Kruschwitz - DCU - 18 March 2011
30
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
ACO: Results
I
I
I
I
I
I
Again: user studies to assess quality of derived term
suggestions
Two studies: Essex university logs (Essex), European Library
logs (TEL)
Compared these terms to alternatives, e.g. Google
suggestions, association rule approach, snippet processing
Essex: ACO beats all alternatives and suggestions improve
over time
TEL: ACO better than association rule approach but not
snippet baseline
Suggestions derived using different methods can be
complementary (TEL)
... see (Kruschwitz et al., 2011) and (Dignum et al., 2010)
Udo Kruschwitz - DCU - 18 March 2011
31
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
AutoEval: Evaluate Adaptive Search
I
Limitations of user studies
I
Evaluate suggestions without recruiting subjects
I
Compare different models automatically
I
Idea: use log files and exploit past user interactions
domain model
query
recommendations
current day
1/1
actual modification
1/2
1/4
score = (1/2 + 1/4 + 1/1)/3 = 0.583
... come along to our ECIR’11 talk
Udo Kruschwitz - DCU - 18 March 2011
32
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
AutoEval Results (Web Site)
0.12
0.1
M
R
R
0.08
s
0.06
c
o
r
e
s 0.04
Fonseca
ACO
nootropia
0.02
0
1
3
5
7
9
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63
Weeks: 1 Oct 2008 - 23 Dec 2009
Udo Kruschwitz - DCU - 18 March 2011
33
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
AutoEval Results (Digital Library)
0.018
0.016
0.014
M
R
R
0.012
0.01
s
c
o
r
e
s
ACO
FONSECA(MinSup=2)
0.008
FONSECA(MinSup=3)
0.006
0.004
0.002
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Months Jan 2007 - Jun 2008
Udo Kruschwitz - DCU - 18 March 2011
34
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Next Steps
I
Prototypes to go live at the University of Essex
I
Work with industrial partners
I
TREC 2011 and CLEF 2011
I
Ongoing EPSRC project (Essex, Robert Gordon University
Aberdeen & Open University): AutoAdapt (November 2008 November 2011)
... any collaboration welcome!
Udo Kruschwitz - DCU - 18 March 2011
35
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Conclusions
I
Adaptive search by exploiting query logs
I
Adaptive domain models can be learned, experiments with
different approaches demonstrate this
I
Have to deal with noisy data
I
Data sparsity
I
Navigation support as a suitable alternative to query
suggestions (Saad & Kruschwitz, 2011)
Udo Kruschwitz - DCU - 18 March 2011
36
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
Acknowledgements
I
Dyaa Albakour, Nikolaos Nanas, Jinzhong Niu, Dawei Song,
Anne De Roeck, Maria Fasli, Stephen Dignum (EPSRC-funded
AutoAdapt research project: http://autoadaptproject.org/
I
Deirdre Lungley, Sharhida Saad (Language and Computation
Group at Essex)
I
Johannes Leveling (DCU)
Udo Kruschwitz - DCU - 18 March 2011
37
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
References
I
U. Kruschwitz, M-D. Albakour, J. Niu, J. Leveling, N. Nanas,
Y. Kim, D. Song, M. Fasli, and A. De Roeck. Moving
Towards Adaptive Search in Digital Libraries. In Advanced
Technologies for Digital Libraries. Lecture Notes in Computer
Science (Hot Topics). Springer. Forthcoming.
I
M-D. Albakour, U. Kruschwitz, N. Nanas, Y. Kim, D. Song,
M. Fasli, and A. De Roeck. AutoEval: An Evaluation
Methodology for Evaluating Query Suggestions Using Query
Logs. In Proceedings of ECIR 2011, Forthcoming.
I
M-D. Albakour, U. Kruschwitz, J. Niu, M. Fasli. University of
Essex at the TREC 2010 Session Track. In Proceedings of the
Nineteenth Text Retrieval Conference (TREC 2010), NIST,
2011.
Udo Kruschwitz - DCU - 18 March 2011
38
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
References II
I
S. Z. Saad & U. Kruschwitz. Applying Web Usage Mining for
Adaptive Intranet Navigation. In Proceedings of the 2nd
Information Retrieval Facility Conference, 2011, Forthcoming.
I
D. Lungley & U. Kruschwitz. Automatically Maintained
Domain Knowledge: Initial Findings. ECIR: 739-743, 2009.
I
S. Dignum, U. Kruschwitz, M. Fasli, Y. Kim, D. Song, U.
Cervino & A. De Roeck. Incorporating Seasonality into Search
Suggestions Derived from Intranet Query Logs. Web
Intelligence (WI’10), 2010.
Udo Kruschwitz - DCU - 18 March 2011
39
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
ECIR 2011 Industry Day in Dublin
I
Co-located with last day of ECIR 2011
I
Dedicated to the latest innovations in web and enterprise
search
I
Mix of research and application
I
Excellent list of speakers
I
Highly interactive and collegial, with attendance limited to
60-80 delegates
I
Date: 21st April
I
http://ecir2011.dcu.ie/program/industry-day/
Udo Kruschwitz - DCU - 18 March 2011
40
Overview
Log Data
Adaptive Search
Evaluation
Next Steps
Conclusions
That’s it ...
Udo Kruschwitz - DCU - 18 March 2011
41

Similar documents

Moving Towards Adaptive Intranet Search

Moving Towards Adaptive Intranet Search Sample Interactions: Query Refinements with Highest MLE

More information