CSE 6240: Web Search and Text Mining

Transcription

Memory-Based Collaborative Filtering
Yi Zhen
College of Computing
Georgia Institute of Technology
Yi Zhen (Georgia Tech)
1 / 24
1
Recommendation Methods
2
Memory-based Collaborative Filtering
3
Experiments
4
Summary
2 / 24
1
2
3
Experiments
4
Summary
3 / 24
Content-based Recommender Systems (Lops et al. 2011)
4 / 24
Content-based vs. Collaborative Recommendation
Content-based recommendation: explicit profiling of users and items
— user independence: profile each user independently
— data collection: domain knowledge, time-consuming
— profile flexibility: domain-specific
Collaborative recommendation: rely on past user behaviors (rating,
purchasing)
— avoid extensive data collection
— little domain knowledge: domain-independent
— discover implicit patterns: impossible to profile explicitly
5 / 24
1
2
3
Experiments
4
Summary
6 / 24
Collaborative Filtering (CF): Problem Formulation
Users: u, v ∈ U; Items: i, j ∈ I
Ratings: rui : degree of preference of user u for item i
– rui > ruj ⇒ user u prefers item i to j
Problem Given observed ratings, predict those missing ratings
Incomplete rating matrix
Casablanc God Father
David
5
4
John
3
2
Jenny
5
2
Harry Potter
2
?
5
Lion King
?
5
?
7 / 24
Collaborative Filtering: Methods
Memory-based CF
User centric: for a given user with past rating history, how to
recommend other items to her?
Item centric: for a given item rated by some users before, to which
other users should we recommend it?
The duality between users and items
Model-based CF
8 / 24
In the User Centric World
Problem. For a given user with past purchasing and/or rating history,
how to recommend new items to her?
User-based CF
Find other similar users for the given user
Recommend items those similar users liked
Item-based CF
Find other similar items for items rated by the given user
Recommend those items with high ratings
9 / 24
User-based CF (Breese et al, 1998)
Notations: an active user a, item i, and the rating rai
For any user u, let Iu = {j | ruj 6=?}
Mean user rating:
¯ru =
1 X
ruj
|Iu | j∈I
u
Prediction for rai
ˆrai = ¯ra + κ
X
sim(a, u)(rui − ¯ru )
u
where u is over the set of neighbors, κ normalization factor
10 / 24
Limitations
The set of neighbors is fixed and independent of the item to be
predicted
The best k neighbors may not even have an opinion about the
particular item
Solution:
Dynamically select k best neighbors who have rated the item: N(a, i)
— those tend to rate similarly to u: neighbors
— and also actually rated i
ˆrai = ¯ra + κ
X
sim(a, u)(rui − ¯ru )
u∈N(a,i)
11 / 24
Similarity between Users
(
K-nearest neighbor: sim(a, u) =
if u ∈ Neighborhood(a)
otherwise
1
0
Pearson correlation coefficient:
P
sim(a, u) = qP
i (rai
− ¯ra )(rui − ¯ru )
ra )2
i (rai − ¯
qP
i (rui
− ¯ru )2
where the summation is over i ∈ Ia ∩ Iu ≡ Iau
P
Cosine distance: sim(a, u) = qP
k∈Ia
r r
i ai ui
2
rak
qP
r2
k∈Iu uk
12 / 24
User Similarity Extensions
Inverse user frequency: down-weight items that appear in many Iu
— analogous to inverse document frequency in IR
— many variations on this: log(M/Mi ), Mi # of Iu that item i
appeared
Case amplification: making sim(a, u) more extreme
Support/confidence of user similarities
13 / 24
Issues for User-based Methods
Scaling issues: complexity O(M 2 N)
M: # of users, and N: # of items
in practice more like O(M 2 ) due to small number of items liked by
each user
Some remedies:
sampling users
clustering users
offline computation of user similarity: inappropriate when frequent
changes of user activities
other fast similarity computation methods: hashing...
14 / 24
Item-based CF (Badrul, 2001)
Problem. For a given user with past rating history, how to recommend
other items to her?
For an item, compute correlation with others items
For the given user, aggregate her previous ratings of the items highly
correlated to current item
15 / 24
Item-based CF (2)
Offline computation of item similarity: complexity O(MN 2 ).
Online look-up of similar items does not depend on M or N
— but rather how many the user purchased/rated in the past
Item similarity is more stable than user similarity, hence, it works for
user with limited data, even just one item purchase/rating
16 / 24
Experiments
1
2
3
Experiments
4
Summary
17 / 24
Experiments
Evaluation Metrics
Given a testing set of ratings T , compare true ratings rui and
estimated ˆrui
Root mean squared error (RMSE):
1/2

1 X

(rui − ˆrui )2 
|T | r ∈T
ui
Mean absolute error (MAE):
1 X
|rui − ˆrui |
|T | r ∈T
ui
18 / 24
Experiments
Benchmark: MovieLens Data
A public data set from movielens.org
MovieLens 100K
— users with 20+ ratings
— used 100,000 ratings with a 943 × 1682 user-item matrix
MovieLens 1M
— 1 M ratings, 4K movies, and 6K users.
— About 4% of the ratings are observed.
MovieLens 10M
— 10M ratings, 100K tags, 10K movies, and 72K users
19 / 24
Experiments
Item-based CF: Compare Item Similarities
MAE
Relative performance of different similarity
measures
0.86
0.84
0.82
0.8
0.78
0.76
0.74
0.72
0.7
0.68
0.66
Adjusted cosine
Pure cosine
Correlation
20 / 24
Item-based CF: Item Neighborhood Size
Sensitivity of the Neighborhood Size
x
MAE
0.751
0.746
0.741
200
175
150
125
90
100
80
70
60
50
40
30
0.9
20
0.736
0.8
10
0.7
Experiments
No. of Neighbors
itm-itm
eg
itm-reg
21 / 24
Experiments
Item-based vs. User-based
Item-item vs. User-user at Selected
Neighborhood Sizes (at x=0.8)
Item-ite
Densi
0.755
0.84
0.75
0.82
MAE
MAE
0.745
0.74
0.735
0.8
0.78
0.76
0.73
0.74
0.725
0.72
10
20
60
90
125
200
0.2
No. of neighbors
user-user
item-item-regression
item-item
nonpers
user-user
item-item-r
22 / 24
Summary
1
2
3
Experiments
4
Summary
23 / 24
Summary
Simple and reasonable assumption
Easy to implement
Large storage and computation cost
Hard to deal with code-start users or items
24 / 24

CSE 6240: Web Search and Text Mining

Transcription

Similar documents

Events - PDAC Mining Matters Diamond in the Rough Golf Classic

Gonio, Georgia 18-25 September, 2013

NBI Complaint vs Tallado PDF 13 - Sagip

golf 2016 - Georgia Bridgemen

Modeling the Spread and Control of Ebola in West Africa

Bulletin Board

Conference on the Americas - Georgia Southwestern State University

Tech-2 Brochure

Why Settle? Chartering Georgia

Transforming Georgia Tech’s Procure to Pay Process 24 Mar 2011