Action Sci-fi Comedy Romance Children

Transcription

Action Sci-fi Comedy Romance Children
Recommender Systems Dr. Nava Tintarev (NT) (Part 1 of 2)
Did you do your homework? ;)
Wait, how does this stuff fit together?!
• How do we adapt? (How?)
– Adaptive hypermedia - content and navigation
• What can we adapt to? (To What?)
– User model
2
• Adaptive
hypermedia has
grown a lot in the
last years...
3
Wait, how does this stuff fit together?!
• What can we adapt? (What?)
– Domain model
• Why do we need adaptation? (Why?)
– Adaptation/goal model – goals and tasks
• Where can we apply adaptation? (Where?)
• When can we apply adaptation? (When?)
– Application and Context model
4
Why use recommender systems?
Information overload
• Too many
• movies, books, webpages,
Systems
songs, plumbers,
etc that make
Searching is difficult
personalized
recommendations of
goods, services, and
people (Kautz)
What’s good, what’s
not?
5
Psst, recsys aren’t a new thing
• But some factors are...
– User generated content
– Quantity and quality of data
– New domains
– More commercial
6
What IS a recommender system?
• User identifies one or more objects as being of
interest
• The recommender system suggests other
objects that are similar (infers liking)
7
But how does this work?!
• Remember those slides on user modeling?
– What kind of info can we use?
– How are we going to get it?
Suppose our user has rated ten
movies:
A.
B.
C.
D.
E.
Jurassic Park
Harry Potter
ET
Lord of the Rings
Alien
F.
G.
H.
I.
J.
Terminator
101 Dalmatians
Titanic
Sleepless in Seattle
Mr Bean
?
Which movie do we recommend next….?
8
Getting to know a user’s opinion
– Implicit (e.g. viewing time)
– or explicit (e.g. ratings or answering of questions)
– Recall vs. recognition…
• Search vs browse
• Top item, top items...
9
Example: XLibris
• User reads text and annotates
• System generates links and further reading list
Example: MovieLens
To test out a Movie Recommender go to
http://movielens.umn.edu/
• User rates movies
• The system suggests ‘best bets’
• Users keep rating movies while checking best bets
Similar?
– Today: Similar in content  content-based filtering
– Next lecture: Similar in ‘appreciation’ by other users
collaborative filtering
– Demographic (stereotypes!) and knowledge/utility-based
(clever questions!) methods
– Just the tip of the ice-berg…
12
Content-based filtering
Starwars
Action
Sci-fi
X
X
Pretty Woman
Little Mermaid
Romance
Children
Anna
X
101 Dalmatians
Terminator
Comedy
X
+
X
X
+
-
X
?
13
But what if Anna really likes Sci-fi but
not Action movies?
Starwars
Action
Sci-fi
X
X
Pretty Woman
X
101 Dalmatians
Terminator
Comedy
Romance
Children
+
+
X
X
X
Anna
+
?
14
Possible algorithm (1)
• Tend to be ‘classifiers’
• Learn weights (wi) for words so that
 wi for words occurring > threshold
• Initially, weights are 1
• For each rated example determine sum
• If sum above threshold, and user did not like
example, then divide all weights by 2
Possible algorithm (2)
• If sum below threshold, and user did like
example, then multiply all weights by 2
• Recommend items with highest sum
Example (Step 1)
Weights
Movie 1
Action
Sci-fi
Comedy
Romance
Children
1
1
1
1
1
Action
Sci-fi Comedy
Romance Children
X
X
X
X
X
-
Threshold = 2
Sum = 5 > 2, and opinion negative, so, divide weights by 2
Example (Step 2)
Weights
Action
Sci-fi
Comedy
Romance
Children
0.5
0.5
0.5
0.5
0.5
Action
Movie 2
Sci-fi Comedy
X
X
Romance Children
+
Threshold = 2
Sum = 1 < 2, and opinion positive so, multiply by 2
Example (Step 3)
Weights
Action
Sci-fi
Comedy
Romance
Children
0.5
1
1
0.5
0.5
Action
Movie 3
X
Sci-fi Comedy
X
Romance Children
+
Threshold = 2
Sum = 1.5 < 2, and opinion positive so, multiply by 2
Example (Step 4)
Weights
Movie 3
Action
Sci-fi
Comedy
Romance
Children
1
1
2
0.5
0.5
Action
Sci-fi Comedy
Romance Children
X
X
X
-
Threshold = 2
Sum = 2.5 > 2, and opinion negative so, divide by 2
Example (Step 5)
Weights
Movie 3
Action
Sci-fi
Comedy
Romance
Children
0.5
0.5
2
0.25
0.5
Action
Sci-fi Comedy
Romance Children
X
X
X
-
Threshold = 2
Sum = 2.5 > 2, and opinion negative so, divide by 2
ETC ETC, repeat for all ratings, or do all 10 times
Observations
We use our knowledge
• about the items rated
• about other items
In particular, attributes like type of movie.
Multiple attributes are likely to be important.
Needs something like…
• Description of items in
terms of attributes
For example: Type,
Director, Actors, ...
• Description via
keywords
• Possibility to look at
content itself, like the
text
Synopsis: Set in late 1930s Arezzo, Italy, Jewish man and poet, Guido Orefice (Roberto Benigni)
uses cunning wit to win over an Italian schoolteacher, Dora (Nicoletta Braschi) who's set to marry
another man. Charming her with "Buongiorno Principessa“….
What has been used
• Features can be automatically extracted e.g.
TF/IDF, matrix factorization
– 100 words with highest TF-IDF weights. Words
occurring more frequently than average, and
distinguish from other items.
– For example: for restaurant descriptions:
words like “noodle”, “shrimp”, “basil”, “exotic”,
“salmon”
What has been used
• Feature vector for new item and previous
items.
• Common similarity measures: pearsons R
correlation and cosine similarity for feature
vectors.
Multimedia Information Retrieval
•
•
•
•
•
Images Photo collections, Face recognition
Video Movie recommendation, Electronic Program Guides
Spoken documents
Music
Other sounds
Concept-based image retrieval
• Key: Concept-based indexing of images
– Based on attributes extracted manually
– Based on logical, high level features
• Systems for image indexing
– ICONCLASS, A&AT, …
• What?
– Time, location, content
Content-based image retrieval
• Key: Automatic indexing of images based on
low-level features
– Color
– Texture
– Shape
– Spatial orientation and layout
– Sketch
Image input
to search
Examples - content based IR
• QBIC - IBM’s Query By Image Content:
http://wwwqbic.almaden.ibm.com
• MIT PhotoBook
(Source of following examples)
http://vismod.media.mit.edu/vismod/demos/photobook/
• Virage: http://www.virage.com
• VisualSeek: http://www.ctr.columbia.edu/VisualSEEk
Image input
to search
Problems with Content-based Filtering
(1/2)
• Need to know about item content
– requires manual or automatic indexing
• Item features do not capture everything
• “User cold-start” problem
– Needs to learn what content features are
important for the user, so takes time
34
Problems with Content-based Filtering
(2/2)
What if user’s interests change?
Lack of serendipity
[Wikipedia: “the effect by which one accidentally discovers something
fortunate, especially while looking for something entirely unrelated” ]
35
Summary
• User identifies one or more objects as being of
interest
• The recommender system suggests other
objects that are similar
• Content-based filtering is one method
• … but it’s not perfect
• Next week – some solutions!
Univ. Carlos III de Madrid
11/06/2009
36
TF/IDF (extra)
• Term frequency, inverse document frequency
37
Cosi…wha? (extra)
• Cosine similarity - between two vectors of n
dimensions by finding the cosine of the angle
between them
• Value between -1 (different) and 1 (similar).
− 0 => usually independence,
38
Pearson correlation (extra)
(sample) Mean of X
39