Terrier MUMIA slides (Craig Macdonald)

Transcription

Terrier MUMIA slides (Craig Macdonald)
11/04/2014
Experiences in Developing Craig Macdonald, Richard McCreadie, Iadh Ounis • An open source Java informa1on retrieval pla4orm • Posi1onal & field indexer • Monolithic & MapReduce indexing, scales to ClueWeb12 • Hundreds of weigh1ng models, including: − BM25, PL2, Language Models − Arbitrary weigh1ng models in Divergence from Randomness − Field-­‐based weigh1ng models (BM25F, PL2F etc) − Proximity/MRF term dependence models − Various DFR query expansion • Easily extensible (we hope: discussed later) adfsdf
2
1
11/04/2014
Timeline Project funded to develop IR system in Java 2001 Refactored Indexing & Retrieval 2003 2005 1st open source release; Focus on DFR models; Query Expansion Disk1-­‐5 WT10G .GOV Terrier.org v3.0; Improved MapReduce’ Retrieval from very large corpora UTF/mul1-­‐
lingual support 2007 2009 Single-­‐pass indexing support for larger corpora 2011 MapReduce Support: even larger corpora! 2013 V3.6: Bugfixes V4.0 Real-­‐1me indices, LTR 2014 V3.5; Refactored tokenisers for indexing non-­‐
La1n .GOV2 WebCLEF ClueWeb09 EN JA W3C CLEF Blogs06 Blogs08 Tweets11 ClueWeb12 3
Our Philosophy “An IR system should just work… out of the box” Dimensions: •  effec1veness •  efficiency •  scalability EESA
•  adaptability ✔ Index large experimental
corpora
✘ Indexing without
requiring large memory
✔ Retrieve from uber-large
indices
✘ Don’t assume enough
memory is available.
✔ Use any weighting model ✘ Indices tied to a
with any index
weighting model
✔ Docid ordering
✘ Score ordering
4
2
11/04/2014
Improvements towards EESA Indexing • 
Single-­‐pass indexing (since v2.0) -­‐ Scalability
• 
• 
• 
Retrieval • 
MapReduce indexing (since v2.2) -­‐ Scalability Streaming pos1ng list decompression (since v3.0) -­‐ Scalability, Efficiency • 
JSON indexing for Tweets2011 corpus (plugin for v3.5) -­‐ Adaptability Document-­‐At-­‐A-­‐Time retrieval (since v3.5) -­‐ Scalability, Efficiency • 
Pluggable tokenisa1on (since v3.5) -­‐ Adaptability, Effec1veness • 
Improved support for metadata, allowing query-­‐
biased summarisa1on (since v3.0) -­‐ Adaptability Support for image indexing c.f. ImageTerrier (3rd party product using v3.5) -­‐ Adaptability 3.6 released 3rd April 2014, 4.0 due in next month 5
ROADMAP – 4.X 6
3
11/04/2014
Roadmap for 4.x (1/3) Feature-­‐based retrieval – EffecHveness, efficiency •  Ability to retrieve with mulHple features and apply learning to rank techniques Hashtags as milestones in time
•  I’m not the bakeoff fan, but… (e.g.) TREC Web track 2011 (101-­‐150) NDCG@20
MAP
Cat A. DPH Baseline
Cat A. Learned Model
0.1309
0.3015
Cat B. DPH Baseline
Cat B. Learned Model
0.2025
0.0927
0.2278
0.1262
7
Roadmap for 4.x (1/3) Feature-­‐based retrieval – EffecHveness, efficiency •  Ability to retrieve with mulHple features and apply learning to rank techniques Distributed & real-­‐Hme search – Scalability •  Support for distributed retrieval, to enhance efficiency on large document indices •  We are also developing indexing and retrieval strategies for real-­‐Hme search, e.g. index data structures that can be constantly updated with streaming data •  Terrier’s real-­‐1me infrastructure is being released as part of the EC-­‐funded SMART project 8
4
11/04/2014
Roadmap for 4.x (2/3) Non-­‐global configuraHon – adaptability •  Enables mul1ple indexing and retrieval instances to run at once, support tasks like metasearch and search results diversificaHon Dynamic pruning – scalability, efficiency, adaptability •  Dynamic pruning strategies such as WAND allow efficient yet effec1ve matching, but tradi1onally use pre-­‐calculated term upper bounds •  We have shown how term-­‐upper bounds can be accurately approximated at query execu1on 1me, permiing efficient retrieval without specifying a weigh1ng model a-­‐priori 9
Roadmap for 4.x (3/3) Pluggable Compression algorithms – efficiency, adaptability Compression has significantly moved on since Terrier’s first release… we are now able to integrate state-­‐of-­‐the-­‐
art compression schemes, such as PForDelta Plugin Expansions – adaptability To avoid sojware bloat, we are moving towards a system of periodic core releases and 1mely plugin expansions [Catena et al, ECIR 2014] 10
5
11/04/2014
Pu\ng it Altogether: Best Prac_ces So we took state-­‐of-­‐the-­‐art research in… • … Weigh1ng Models, Query Expansion, Learning to Rank, Dynamic Pruning, Diversifica1on, Compression But how to combine all of these into a coherent, funcHoning plaVorm – i.e. how to make EESA improvements “add up” has been challenging Hence, we have published several papers on best pracHces, covering compression, learning-­‐to-­‐rank, efficiency, to name a few:
11
•  The Whens and Hows of Learning to Rank. Craig Macdonald,
Rodrygo Santos and Iadh Ounis. Information Retrieval 16(5):
584-628. 2012.
•  On the Usefulness of Query Features for Learning to Rank. Craig
Macdonald, Rodrygo Santos and Iadh Ounis. In Proceedings of
CIKM 2012.
•  About Learning Models with Multiple Query Dependent Features.
Craig Macdonald, Rodrygo L.T. Santos, Iadh Ounis and Ben He.
Transactions on Information Systems. 31(3) 2013.
•  Efficient and effective retrieval using selective pruning. Nicola
Tonellotto, Craig Macdonald and Iadh Ounis. In Proceedings of
WSDM 2013.
•  On Inverted Index Compression for Search Engine Efficiency.
Matteo Catena, Craig Macdonald and Iadh Ounis. In Proceedings
of ECIR 2014.
12
6
11/04/2014
TOWARDS STANDARDISATION IN IR 13
All IR systems the same? [Craswell et al, TREC 2005 Enterprise Track] 14
7
11/04/2014
All IR systems the same? PCA Plot of TREC 2012 Web track runs, on per-­‐topic ERR@20 [Dincer et al, ECIR 2014] 15
What makes IR systems different? Different query languages? Grid@CLEF task Different document parsers/tokenisaHon? Different stemming/stopwords configuraHons? Different document representaHons? Different ranking models? • Most use very similar sta1s1cs, but may not implement all of the same heuris1cs [Fang & Zhai 04] Different query analyses (segmentaHon, QE, etc)? StandardisaHon may help improve some aspects, but would lead to homogenous systems not sufficiently diverse for pool-­‐driven evaluaHon forums such as TREC! 16
8
11/04/2014
Document Representa_on Textual documents: • Unstructured: Bag-­‐of-­‐words • Ordered: Posi1onal informa1on (to allow phrases or proximity informa1on) • Semi-­‐structured: tags/fields/zones − E.g. 1tle, body, anchor text − Can fields overlap? E.g. H1 ∈ Body? − Are posi1ons con1gious across fields <HTML> <TITLE>term2</TITLE> <BODY> <H1> term1 </H1> term3 </BODY> </HTML> • Structured: no strong use-­‐case, despite efforts from INEX 17
Ranking (1) We’re all quite clear what informaHon/staHsHcs weighHng models need: • Standard: Term (4), Document (length), background (TF) & collec1on sta1s1cs (avg_doc_length) • Field-­‐based: frequency of terms in each field (e.g. 1tle, body, anchor text) • Proximity: posi1on of terms in documents, e.g. n-­‐grams (background sta1s1cs are more expensive) [Macdonald & Ounis SIGIR 2010 Ngram] So we have convergence on these staHsHcs -­‐ hence, Inverted Index layouts can be standardised, even when supporHng state-­‐of-­‐the-­‐art compression schemes • See [Catena et al, ECIR2014] 18
9
11/04/2014
Ranking (2) Learning to Rank (LTR) needs more features: •  Mul1ple weigh1ng models [Macdonald et al, TOIS 2012] •  Query independent features: PageRank, doclength, spam scores etc. •  Query features: [Macdonald et al, CIKM 2012] SVMlight/LETOR format is a de facto standard for training LTR Do we need to derive standard baseline features for parHcular tasks such as Web or Patent retrieval? An IR system should support the efficient and flexible computaHon of mulHple features during ranking •  We use a representa1on for each pos1ng that once retrieved from the inverted index can be kept in memory for documents making the top k [Macdonald et al, TOIS 2012] •  Lin et al suggests a document-­‐oriented representa1on containing the features, like a direct file [Asadi & Lin, J. INRT 2013] Compromises are needed: memory vs speed vs adaptability Analysing Query Logs Query logs represent a rich area for IR: 19
Standardisation or Tools
for Query Log Analysis
are Needed
• They provide evidence of real user needs and permit user modeling (e.g. A/B tes1ng and interleaving) But: No standard toolkits for typical query log analysis • What should be logged, and what format for the logs? • What can be mined from these logs? − Real queries for query comple1on/sugges1ons, diversifica1on − User behaviour metrics Our experience with a commercial medical search engine: clicks recorded, but with insufficient metadata, and no strong idea of the possible analyses/benefits Inhibitor: not many academic researchers have access 20
10
11/04/2014
Conclusions Some IR defacto & dejure standards exist already: • Input/Output: Z39, OpenSearch, Solr API • Evalua1on: TREC Qrels, SVMlight/LETOR • Stemming (Porter), Weigh1ng Models (BM25) Some internal aspects of IR systems could benefit from standardisaHon: • document representa1on, inverted file contents • LTR etc – from empirical best prac1ces However, cauHon is needed against standardising every aspect of a search system, as this may damage creaHvity within the research community ala TREC. 21
hvp://terrier.org (the pla4orm) hvp://terrierteam.dcs.gla.ac.uk -­‐ @terrierteam (the team) 22
11