Exploiting Temporal References in Information Retrieval
Transcription
Exploiting Temporal References in Information Retrieval
Exploiting Temporal References in Text Retrieval Irem Arikan advised by: Srikanta Bedathur, Klaus Berberich Motivation users’ information needs often have a temporal dimension, but traditional information retrieval systems do not exploit the temporal content in documents. query: PM United Kingdom 2000 search engine is not aware that 2000 is actually mentioned implicitly by the document ! an approach which recognizes and exploits temporal references in documents to yield better search results Example Temporal Queries Broad Queries • British colony 17th century • Economic situtation Germany 1920s • President assasination 1950 – 2000 Specific Queries • US president October 1962 • Pope 1940s • Academy awards best actor 1975 Ambiguous Queries • George Bush 1990 vs. George Bush 2007 • Gulf war 1991 vs. Gulf war 2005 Outline Language Modeling for Information Retrieval Time Modeling for Temporal Information Retrieval Combining Text Relevance with Temporal Relevance Experimental Results Language Modeling for Information Retrieval Language Model: a statistical model to generate text Language Modeling: the task of estimating the statistical parameters of a language model Language Modeling for IR: the problem of estimating the likelihood that a query and a document could have been generated by the same language model • In practical IR approaches: Unigram Language Model • words occur independently Language Modeling for IR 1) document : a sample from a language model • assume an underlying multinomial probability distribution over words for each document • estimate statistics of this distribution: P[word] document 2) infer estimate the likelihood that the query is generated by this distribution P(q | d ) P(t | d ) tq 3) Md : P [ word | Md] rank the documents by P(q | d ) Temporal Modeling for Temporal Retrieval General approach similar to LM approach based on a generative model which generates temporal references temporal model splits query into 2 parts: text query and temporal query Probabilistic mechanism for producing temporal content of the document each time reference generated by a different generative temporal model M it i 1.....n for generating a time reference 1) first choose a temporal model 2) then generate a time reference using this temporal model Temporal Modeling Estimating temporal query likelihood Infer a temporal model from each temporal reference in the document M it i 1.....n Estimate the likelihood that the temporal query is generated by one of the models which generated the temporal content of the document P(q t | d t ) Temporal query generation probability n P(q t | d t ) = 1 t t P(q | M i ) i 1n P(q | M it ) ? Temporal Modeling What is a temporal model? P(q | M it ) ? A probabilistic model to generate temporal references What kind of distribution? How can we estimate its parameters? Temporal Modeling What is a temporal model? P(q | M it ) ? A probabilistic model to generate temporal references What kind of distribution? How can we estimate its parameters? Formalize the problem in a goal-oriented way, We should infer a temporal model from each time interval (sample time interval) This temporal model should be able to generate all time intervals which are relevant to the sample interval 1. Approach lOverlap Assumptions: rOverlap • only relevant if they intersect • the generative model inferred should be able to produce subintervals, superintervals, overlapping intervals of the interval in the document • probability of generating an intersecting time interval should be proportional to the length of intersection sup1 sup2 sub1 sub2 t • query: 1980 – 1990 • 1980 – 1989 is more relevant than 23 March 1984 s e Appropriate probabilistic model: •2 underlying triangular distributions • one for start, • one for end, Ps (x) Pe (x) M it { Ps ( x), Pe ( x) } Triangular Distribution f ( x | a, b, c) 2( x a) (b a)(c a) for axc 2(b x) (b a)(b c) for cxb 0 Parameters a : a ( , ) b:b a c:a c b Support a xb for any other case 1. Approach Ps (x) Pe ( y ) r1 u r2 s r3 e +1 qs - 1 r4 e • nonzero probability for intersecting intervals •r1 – r3 : left overlaps •r1 – r4 : super intervals •r2 – r3 : subintervals •r2 - r4 : right overlaps • interval [s,e] has the highest probability • probability decreases to the left and right resulting in lower probability for intervals which have smaller intersection lengths l 1. Approach Ps (x) Pe (x) r1 r2 s u r3 e +1 qs - 1 r4 e l M it { Ps ( x), Pe ( x) } q {qs , qe } P(q | M ) ? t i P(q | M it ) P( qs , qe | M it ) P(qs ) P(qe | qs ) 2. Approach Assumptions: Only relevant if they are positioned closely to each other on the time axis and have similar lengths | start1 – start2 | < a | length1 – length2 | < b The generative model inferred should be able to produce temporal intervals in some neighbourhood on the time axis ∆l ∆s t l s 2. Approach Ps (x) Pl ( y ) s -a s s+a l-b l Temporal interval x = s , y = l has the highest probability Probability decreases as start point moves away from s and as length moves away from l l+b 2. Approach Ps (x) Pl (x) s -a s s+a l-b l M it { Ps ( x), Pl ( x) } q {qs , ql } P(q | M ) ? t i P(q | M it ) P( qs , ql | M it ) P(qs ) P(ql ) l+b Combining Text Relevance with Temporal Relevance score(q, d ) scorew (q, d ) scoret (q, d ) Text relevance P(q | M dw ) Combining Text Relevance with Temporal Relevance score(q, d ) scorew (q, d ) scoret (q, d ) Text relevance P(q | M dw ) Temporal relevance P (q | M dt ) Filter and re-rank search results by weighting text relevance score by temporal relevance System Architecture Information Retrieval (IR) with Temporal Extension Query IR System Index Result Set Temporal Query Temporal Retrieval Result Set Index for temporal references Experimental Results-1 Query: Spanish painter 18th century Terrier Boolean Our Method Art_in_Puerto_Rico Agustín_Esteve José_del_Castillo Spanish_art Acislo_Antonio_Palomino_ de_Castro_y_Velasco Agustín_Esteve Palazzo_Bianco_(Genoa) Alvarez Roybal Caprichos Agostino_Scilla_00e6 Maldonado List_of_people_from_Antw erp Bassano Luis_Egidio_Meléndez Experimental Results-2 Query: Chancellor Germany 1955 Terrier Boolean Our Method Federal_Minister_for_Speci al_Affairs_of_Germany Basic_Law_for_the_Federal _Republic_of_Germany Occupation_statute Otto_Gessler Bonn-Paris_conventions Second_German_Bundestag Bonn-Paris_conventions Bavaria_Party West_Germany Occupation_statute All-German_Bloc_League_ of_Expellees_and_Deprived _of_Rights Bonn-Paris_conventions Petersberg_Agreement Anschluss Konrad_Adenauer Experimental Results-3 Query: George Bush 1990 Terrier Boolean Our Method George_W._Bush_insider_tr ading_allegations Bush_family President_Bush Bush_family Bush_administration Bush_administration Early_life_of_George_W._B ush Andrew_Card President's Council of Advisors on Science and Technology George_H._W._Bush Approval_rating George_H._W._Bush C_Boyden_Gray Brent_Scowcroft Arbusto_Energy Thanks!