Composite Retrieval
Transcription
Composite Retrieval
Composite Retrieval applied to movie recommenda2on Sihem Amer-‐Yahia, Eric Gaussier, Vincent Leroy (LIG/CNRS) ALICIA mee2ng Oct 2nd, 2014 Mo2va2on Search results beyond flat ranked lists – Some items complement each other – Opportunity to create rich results comprising several items: composite items ranked list of items composite items vs. … … Composite Item (CI) • A set I of items (e.g. movies, ar2sts…) • A composite item C is a subset of I that sa2sfies: – Complementarity: each item contributes to a different aspect of the CI, no redundancy – Consistency: all items have some similari2es and fit together – Budget: total cost within bounds • Ex: CI of movies – Complementarity aVribute • director, release year, genre, actors, … – Consistency • same raters, similar tags, … – Budget • #movies, rental cost, movie length, … Personalized Composite Retrieval (TKDE 2014) Retrieve a set of k CIs { S 1 ,..., S k } that maximize: γ∑ ∑ sim(a, b) + (1− γ )∑ (1− max sim(x, y)) i a,b∈Si i< j similarity of items in each CI Si x∈Si ,y∈S j diversity between CIs Si Sj Exis2ng approaches (TKDE 2014) • NP-‐hard, reduc2on from MAXIMUM EDGE SUBGRAPH • Heuris2cs: CI genera2on expressed as 2 different steps in 2 different orders – Cluster to achieve consistency (e.g. k-‐means variants) – Apply constraints to enforce budget and complementarity à Not fully integrated àPoten2ally sub-‐op2mal results Integrated Algorithm • State of the art – K-‐means with must link and cannot link constraints • Objec2ve – Support queries such as Generate k clusters comprising similar objects, 3 of which are of type A and 2 of type B à Focus on Complementarity (special case) and Consistency à No budget constraint Data Model • Given: items, users, user ac2ons • Item similarity (consistency): – Based on user ac2ons – a and b -‐ similar if many users rated them similarly • Item complementarity: – Rely on items aVributes – Each item belongs to a single category Sketch of the Algorithm • Build 3 clusters with 2 red points and 3 green – Ini2ate with fuzzy clustering i.e. all points considered when reposi2oning centroids – Transi2on to only considering content of CI Datasets • VK: scrape scrape … and not sure it’s complete – 15683422 movie_comments.tsv – 18350 movie_directors.tsv – 42417 movies_all.tsv – 2460027 user_ra2ngs.tsv – 18736 users_all.txt • MovieLens (we used the 1M dataset) – 3883 movies.dat – 1000209 ra2ngs.dat (at least 20 ra2ngs per user) – 6040 users.dat (Very) Preliminary Output • On VK datasets: – 2 Comédie, 2 Ac2on, 2 Drame – 2 movies <= 1980, 2 movies >1980 <=2000, 2 movies >2000 – Runs in less than 1mn on my laptop (2me is not an issue) but we need to evaluate quality and validate the constraints with you • On Movielens CIs in VK CI1: 2 Comédie, 2 Ac2on, 2 Drame name=Les Vieux chats, genre=Comédie, year=2010, originalId=796202 name=Trois soeurs, genre=Comédie, year=2012, originalId=828379 name=Miss Bala, genre=Ac2on, year=2010, originalId=664350 name=Rampart, genre=Ac2on, year=2012, originalId=747997 name=La Nuit d'en face, genre=Drame, year=2012, originalId=812498 name=Rêve et silence, genre=Drame, year=2012, originalId=854345 CIs in VK CI2: 2 Comédie, 2 Ac2on, 2 Drame name=Paparazzi, genre=Comédie, year=1997, originalId=59455 name=Mensonges et trahisons et plus si affinités..., genre=Comédie, year=2004, originalId=57814 name=Des Serpents dans l'avion, genre=Ac2on, year=2005, originalId=277082 name=Predator, genre=Ac2on, year=1987, originalId=60897 name=La Fievre du samedi soir, genre=Drame, year=1977, originalId=8093 name=Cocktail, genre=Drame, year=1988, originalId=5590 CIs in MovieLens CI1: 2 Comédie, 2 Ac2on, 2 Drame name=Hav Plenty (1997), genre=Comedy, year=1997, originalId=1903 name=Red Dwarf, The (Le Nain rouge) (1998), genre=Comedy, year=1998, originalId=2685 name=Montana (1998), genre=Ac2on, year=1998, originalId=3184 name=Bait (2000), genre=Ac2on, year=2000, originalId=3898 name=Went to Coney Island on a Mission From God... Be Back by Five (1998), genre=Drama, year=1998, originalId=3887 name=Price of Glory (2000), genre=Drama, year=2000, originalId=3482 CIs in MovieLens CI2: 2 Comédie, 2 Ac2on, 2 Drame name=Meet the Parents (2000), genre=Comedy, year=2000, originalId=3948 name=Bamboozled (2000), genre=Comedy, year=2000, originalId=3943 name=Highlander: Endgame (2000), genre=Ac2on, year=2000, originalId=3889 name=Get Carter (2000), genre=Ac2on, year=2000, originalId=3946 name=Two Family House (2000), genre=Drama, year=2000, originalId=3951 name=Contender, The (2000), genre=Drama, year=2000, originalId=3952 CIs in VK CI1: 2 movies <= 1980, 2 movies >1980 <=2000, 2 movies >2000 name=2001 : L'Odyssée de l'espace, genre=Aventure, year=1968, originalId=4148 name=Les Dents de la Mer, genre=Aventure, year=1975, originalId=10709 name=Jurassic Park, genre=Aventure, year=1993, originalId=7637 name=Toy Story, genre=Anima2on, year=1996, originalId=53406 name=Vol spécial, genre=Documentaire, year=2011, originalId=777986 name=La Nuit d'en face, genre=Drame, year=2012, originalId=812498 CIs in VK CI2: 2 movies <= 1980, 2 movies >1980 <=2000, 2 movies >2000 name=Marathon Man, genre=Thriller, year=1976, originalId=84942 name=Star Wars : Episode IV -‐ Un nouvel espoir (La Guerre des étoiles), genre=Ac2on, year=1977, originalId=64669 name=Predator, genre=Ac2on, year=1987, originalId=60897 name=Piège de cristal, genre=Ac2on, year=1988, originalId=60151 name=Les Brigades du Tigre, genre=Ac2on, year=2005, originalId=108145 name=Des Serpents dans l'avion, genre=Ac2on, year=2005, originalId=277082 CIs in MovieLens CI1: CI on years, had to change to <= 1980, 1981-‐1990 and 1991-‐2000 because this dataset is older name=Last Time I Saw Paris, The (1954), genre=Drama, year=1954, originalId=972 name=Mass Appeal (1984), genre=Drama, year=1984, originalId=2397 name=Trick or Treat (1986), genre=Horror, year=1986, originalId=2464 name=Snows of Kilimanjaro, The (1952), genre=Adventure, year=1952, originalId=3207 name=Price of Glory (2000), genre=Drama, year=2000, originalId=3482 name=Went to Coney Island on a Mission From God... Be Back by Five (1998), genre=Drama, year=1998, originalId=3887 CIs in MovieLens CI2: CI on years, had to change to <= 1980, 1981-‐1990 and 1991-‐2000 because this dataset is older name=Phantom of the Opera, The (1943), genre=Drama, year=1943, originalId=3936 name=Sorority House Massacre (1986), genre=Horror, year=1986, originalId=3941 name=Sorority House Massacre II (1990), genre=Horror, year=1990, originalId=3942 name=Get Carter (1971), genre=Thriller, year=1971, originalId=3947 name=Two Family House (2000), genre=Drama, year=2000, originalId=3951 name=Contender, The (2000), genre=Drama, year=2000, originalId=3952 Future Work • Set it up remotely to try it out • Apply on – POIs from Wikipedia – Retail data from Intermarché • Extend with – Personalized composi2on (add user’s interest as a weight or form user groups to provide best feedback on CI) – Support constraints on mul2ple orthogonal aVributes at once – Support budget constraints (cumula2ve on objects) such as price, length …