Moving Towards Adaptive Intranet Search

Transcription

Moving Towards Adaptive Intranet Search
Overview
Intranet Search
Log Analysis
Adaptive Search
Moving Towards Adaptive Intranet Search
Udo Kruschwitz
School of Computer Science and Electronic Engineering
University of Essex
[email protected]
29th July 2010
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
1
Overview
Intranet Search
Log Analysis
Adaptive Search
Overview
I
Motivation & context
I
Prototype on the Essex intranet
I
Preliminary log analysis
I
Steps towards adaptation
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
2
Overview
Intranet Search
Log Analysis
Adaptive Search
Context
I
Collection of documents, e.g. local Web sites, corporate or
academic intranet
I
Not Web search in general
I
Ad hoc queries
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
3
Overview
Intranet Search
Log Analysis
Adaptive Search
Problems
I
Common problem with too many matches
I
I
I
General queries
Ambiguous queries
Short queries
I
Data sparsity problem
I
Typical intranet problem: recall can be important
(e.g. single matching document)
I
Express information need as a query
I
Usable knowledge sources not available
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
4
Overview
Intranet Search
Log Analysis
Adaptive Search
Our Approach
I
I
Search system that makes suggestions using automatically
extracted domain knowledge
But ...
I
I
I
Domain knowledge is noisy and incomplete
System suggestions not always useful/helpful
Document collection is changing
I
Learn from the users’ interactions
I
Improve system over time by adapting to the users’ search
behaviour
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
5
Overview
Intranet Search
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
Log Analysis
Adaptive Search
6
Overview
Intranet Search
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
Log Analysis
Adaptive Search
7
Overview
Intranet Search
Log Analysis
Adaptive Search
What Sort of Domain Knowledge?
I
Online clustering (e.g. Vivisimo)
I
Subsumption hierarchies (Sanderson & Croft 1999)
I
Lexical modification approach (Anick & Tipirneni 1999)
I
Formal Concept Analysis (e.g. CREDO)
I
Term associations using neural networks and fuzzy logic
(e.g. Aquabrowser)
I
Flat list of terms using simple tf.idf applied to top matching
documents
I
...
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
8
Overview
Intranet Search
Log Analysis
Adaptive Search
Partial Domain Knowledge (Example)
registration
dates
office
undergraduate
card
...
regulations
essex
...
...
...
students
jobshop
...
...
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
9
Overview
Intranet Search
Log Analysis
Adaptive Search
Applying Domain Knowledge - General Idea
I
I
Combine standard search system with initial domain model
Utilize domain model to construct
I
I
I
query refinements
query relaxations
Present suggestions alongside matching documents
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
10
Overview
Intranet Search
Log Analysis
Adaptive Search
Log Data Collection
I
Essex intranet search engine
I
Originally running alongside standard Essex search engine
I
Operating since summer 2006
I
About 40,000 queries collected in 12 months
I
November 2007: system replaced old search engine altogether
... more that 1 million queries collected since then
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
11
Overview
Intranet Search
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
Log Analysis
Adaptive Search
12
Overview
Intranet Search
Log Analysis
Adaptive Search
Towards Adaptive Intranet Search
I
Start by employing initially extracted domain knowledge
I
Observe user interaction with the system
I
Incorporate clickthrough trails
I
Use this implicit relevance feedback to adjust domain
knowledge accordingly
I
Aim: evolving domain knowledge that adjusts to the users’
search behaviour
... let’s see what the log files tell us so far.
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
13
Overview
Intranet Search
Log Analysis
Adaptive Search
Observations I
I
More than 10% of queries are modifications!
... suggests the general system setup makes sense
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
14
Overview
Intranet Search
Log Analysis
Adaptive Search
Most Frequent User Queries (since Nov 2007)
52841
28183
25423
16410
9413
8535
7366
6938
6426
6318
5853
5634
5504
4865
moodle
library
search
timetable
graduation
cmr
enrol
accomodation
accommodation
ocs
psychology
exam timetable
term dates
courses
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
15
Overview
Intranet Search
Log Analysis
Adaptive Search
Observations II
I
Queries are domain-specific
I
This is different from general Web search
... suggests that domain-independent knowledge (e.g. WordNet,
Google n-grams) might not be suitable.
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
16
Overview
Intranet Search
Log Analysis
Adaptive Search
Frequent Queries
350
300
250
200
timetable
calendar
150
100
50
0
Aug
Sep
Oct
Nov
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
Dec
Jan
17
Overview
Intranet Search
Log Analysis
Adaptive Search
Query Traffic I
5000
4500
Number of Queries
4000
3500
3000
2500
2000
1500
1000
500
0
1
3
5
7
9
11
13
15
17
19
21
23
Hour of Day
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
18
Overview
Intranet Search
Log Analysis
Adaptive Search
Query Traffic II
400
350
Number of Queries
300
250
200
150
100
50
0
1.Oct
1.Nov 1.Dec
1.Jan
1.Feb 1.Mar
1.Apr
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
1.May
1.Jun
1.Jul
1.Aug
1.Sep
19
Overview
Intranet Search
Log Analysis
Adaptive Search
Observations III
I
All sorts of variations (seasonal, usage patterns, ...)
I
Again different from general Web search
... suggests that system should perhaps adapt to “context”.
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
20
Overview
Intranet Search
Log Analysis
Adaptive Search
Query Statistics
Number of Queries
Average Query Length
Length of Longest Query
Queries with Spelling Errors
Fraction of Query Corpus
Set 1
100
1.54
3
2%
20%
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
Set 2
40,006
1.98
17
≈6%
100%
21
Overview
Intranet Search
Log Analysis
Adaptive Search
Observations IV
I
Queries are even shorter than on the Web!
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
22
Overview
Intranet Search
Log Analysis
Adaptive Search
User-System Interaction
I
More system suggestions than manual modifications
I
More additions of terms than replacements
I
Long tail of modifications only submitted once
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
23
Overview
Intranet Search
Log Analysis
Adaptive Search
Sample Interactions: Frequent Query Pairs
Frequency
122
92
77
70
69
68
68
62
61
55
54
52
q1
fees
accomodation
student union
accomodation
post room
my essex
mondo
map
time table
foundation
su
libary
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
q2
tuition fees
accommodation
students union
accomodation office
postroom
myessex
mondo pizza
campus map
timetable
foundation degree
student union
library
24
Overview
Intranet Search
Log Analysis
Adaptive Search
Sample Interactions: Query Pairs with Highest MLE
q1
moole
email on we
moodlr
lbrary
timetabke
myesex
moodke
libraary
student suport
regisrty
moodlew
graduatin
caterin
q2
moodle
email on web
moodle
library
timetable
myessex
moodle
library
student support
registry
moodle
graduation
catering
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
MLE
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
25
Overview
Intranet Search
Log Analysis
Adaptive Search
Sample Interactions: Query Refinements with Highest MLE
q1
film society
erol
mood e
exam timetable 09
web email
w800
the islamic society
stavrakakis
norvald
mouffe
q2
art film society
enrol
moodle
exam dates
email
creative writing
islamic society
yannis stavrakakis
norvald instefjord
chantal mouffe
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
MLE
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
26
Overview
Intranet Search
Log Analysis
Adaptive Search
Observations V
I
Users select query modification options on a continuing basis
I
Lots of implicit (domain-specific) relationships
Different “types” of relationships, e.g.
I
I
I
I
Dialogue-based vs. session-based interactions
Adding vs. replacing query terms
But: data sparsity issues
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
27
Overview
Intranet Search
Log Analysis
Adaptive Search
Automatic Domain Model Adaptation
I
Use log analysis to improve search suggestions
I
Do this fully automatically
I
Should learn common patterns over time,
e.g. “map” → “campus map”
I
Should deal with seasonal terms appropriately,
e.g. “registration”
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
28
Overview
Intranet Search
Log Analysis
Adaptive Search
Automatic Domain Model Adaptation II
Variety of adaptation models, including:
I
Exploiting Maximum Likelihood Estimates (MLE)
I
Formal Concept Analysis (FCA)
I
Ant Colony Optimization analogy (ACO)
... let’s quickly look at these three approaches.
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
29
Overview
Intranet Search
Log Analysis
Adaptive Search
MLE: Domain Model derived from Query Logs
q1
registration
registration
registration
registration
...
online registration
...
registration office
registration office
...
q2
online registration
registration office
timetable
enrol
...
registration
...
careers centre
albert sloman library
...
MLE
0.045
0.035
0.025
0.020
...
0.211
...
0.053
0.053
...
enrol
course enrolment
0.050
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
30
Overview
Intranet Search
Log Analysis
Adaptive Search
MLE: Domain Model derived from Query Logs II
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
31
Overview
Intranet Search
Log Analysis
Adaptive Search
MLE: Reminder - Original Domain Knowledge
registration
dates
office
undergraduate
card
...
regulations
essex
...
...
...
students
jobshop
...
...
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
32
Overview
Intranet Search
Log Analysis
Adaptive Search
MLE: Initial Findings
I
Studied 18-months log file of Essex search engine
(about 670,000 queries)
I
Sampled frequent and less frequent queries
I
User study to assess quality of derived term suggestions
I
Compared these terms to alternatives, e.g. Google
suggestions, snippet approach, association rules approach
(Fonseca et al., 2003)
I
Log analysis beats alternatives
I
Session-based approach very powerful, dialogue-based even
better
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
33
Overview
Intranet Search
Log Analysis
Adaptive Search
FCA Approach to Adaptation
I
Lattice structure representing terms and corresponding
documents
I
Concept in lattice defined by objects (URLs) and attributes
(terms)
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
34
Overview
Intranet Search
Log Analysis
Adaptive Search
FCA Approach to Adaptation II
I
Learn from past user queries (implicit relevance judgements)
using relative judgements (Radlinski & Joachims, 2005)
I
Train a classifier (SVM) that associates terms with documents
I
Rerun lattice construction
I
Promising evidence that lattice improves over time
(Lungley & Kruschwitz, 2009)
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
35
Overview
Intranet Search
Log Analysis
Adaptive Search
FCA: Architecture
!
"
%(
'
$
!
%
&
%(
$
&
'
"
#
#
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
36
Overview
Intranet Search
Log Analysis
Adaptive Search
FCA: Lattice Example
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
37
Overview
Intranet Search
Log Analysis
Adaptive Search
FCA: Screenshot
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
38
Overview
Intranet Search
Log Analysis
Adaptive Search
ACO: Adaptive Domain Model
I
Simple approach using the idea of ant colony optimization:
I
I
I
I
I
Update domain model in daily batches
Add weight to query pairs observed that day, e.g.
“libraary” → “library”
Normalise the weights so that all outgoing graphs in a node
sum up to 1
And so on ...
Idea: learn associations as they become popular, allow for
forgetting relations as well!
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
39
Overview
Intranet Search
Log Analysis
Adaptive Search
Accuracy Score
ACO: Initial Findings
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Weeks
Ant Colony Optimisation
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
Fonseca Baseline
40
Overview
Intranet Search
Log Analysis
Adaptive Search
Next Steps
I
Adaptive search engine on the Essex Web site
I
EPSRC project (Essex, Robert Gordon University Aberdeen &
Open University): AutoAdapt (November 2008 - November
2011)
I
Want to get involved?
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
41
Overview
Intranet Search
Log Analysis
Adaptive Search
Conclusions
I
Query logs for adaptive intranet search
I
Utilize automatically acquired domain knowledge
I
Evolve domain model based on users’ search behaviour
(domain-specific!)
I
Promising direction: clickthrough data, lattice structures,
query logs
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
42
Overview
Intranet Search
Log Analysis
Adaptive Search
Acknowledgements
I
Deirdre Lungley (PhD student)
I
Dyaa Albakour, Dawei Song, Anne De Roeck, Maria Fasli,
Stephen Dignum, Yunhyong Kim, Nikolaos Nanas (AutoAdapt
research project)
I
Essex University Web and Learning Technology group
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
43
Overview
Intranet Search
Log Analysis
Adaptive Search
Search Solutions 2010
I
Annual one-day event at BCS Headquarters (Southampton
Street)
I
Dedicated to the latest innovations in web and enterprise
search
I
Highly interactive and collegial, with attendance limited to
60-80 delegates
I
Let me know if you are interested (attending or talking)!
I
Date: 21st October
I
More info soon on http://irsg.bcs.org/
... and ECIR 2011 Industry Day (21st April 2011 in Dublin)
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
44
Overview
Intranet Search
Log Analysis
Adaptive Search
References
I
M. Sanderson & B. Croft. Deriving concept hierarchies from
text. SIGIR: 206–213, 1999.
I
P. Anick & S. Tipirneni. The paraphrase search assistant:
terminological feedback for iterative information seeking.
SIGIR: 153–159, 1999.
I
F. Radlinski & T. Joachims. Query Chains: learning to rank
from implicit feedback. SIGKDD: 239–248. 2003.
I
B. M. Fonseca, P. B. Golgher, E. S. de Moura & N. Ziviani.
Using Association Rules to Discover Search Engines Related
Queries, Proceedings of the First Latin American Web
Congress, 2003.
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
45
Overview
Intranet Search
Log Analysis
Adaptive Search
References II
I
I
I
D. Lungley & U. Kruschwitz. Automatically Maintained
Domain Knowledge: Initial Findings. ECIR: 739-743, 2009.
S. Dignum, U. Kruschwitz, M. Fasli, Y. Kim, D. Song, U.
Cervino & A. De Roeck. Incorporating Seasonality into Search
Suggestions Derived from Intranet Query Logs. Web
Intelligence (WI’10), 2010.
U. Kruschwitz. Intelligent Document Retrieval. Springer,
2005.
Udo Kruschwitz - Enterprise Search London Meetup - 29 July 2010
46

Similar documents

Learning from the Users: from Search to

Learning from the Users: from Search to Methodology for Evaluating Query Suggestions Using Query Logs. In Proceedings of ECIR 2011, Forthcoming.

More information