KDD 2014

Transcription

KDD 2014
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
KDD 2014
SMVC: Semi-supervised Multi-View Clustering
in Subspace Projections
Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Introduction to Multi-View Clustering
•
For complex data a single clustering is often not sufficient
 multiple valuable clustering interpretations do exist
– Each highlighting different aspects
…
•
Existing approaches:
– Mostly consider just one data representation (full-space)
 Trade-off quality and diversity of clusterings
2
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Multi-View Clustering in Subspace Projections
Few methods: exploit different data representations
– Exploit arbitrary distortions of the data
 Difficult to interpret, quality of clustering not prioritized
We argue: different groupings naturally occur in subspace projections
characteristics for “health status“
oks
read bo
sport activity
h ea l
t hy
shoe siz
e
unhe
a
lthy
consumption of fruit
characteristics for “taste of music“
attended rock concerts
fullspace
loves
R oc k
prefe
rs
o t h er
music
loves
Class
ic
attended classic concerts
– Intuitive semantic interpretation
– No obfuscation of patterns through irrelevant dimensions in the fullspace
3
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Introduction to Semi-Supervised Clustering
External knowledge can improve the clustering quality
especially for complex patterns
clustering quality
•
degree of supervision
•
Various solutions for traditional fullspace clustering
– Instance-level constraints (must-link, cannot-link) are very intuitive
– Soft constraints allow for disagreeing constraints
 Our contribution: Transfer the principle of soft instance-level constraints to
multi-view clustering in subspace projections
4
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Challenges
Semi-Supervised Multi-View Clustering in Subspace Projections
• 1st challenge: which dimensions are relevant for which view?
• 2nd challenge: which constraints belong to which view?
5
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Challenges
Semi-Supervised Multi-View Clustering in Subspace Projections
• 1st challenge: which dimensions are relevant for which view?
• 2nd challenge: which constraints belong to which view?
Our SMVC approach: novel generative model
πk
Zi
k∈K
i∈N
μk,d
Xi,d
τk,d
d ∈D
k∈K
d ∈D
i∈N
&
~
~
~
~
6
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Extending Classical Mixture Models
•
Extending the model to learn multiple views
πk
Zi
k∈K
i∈N
μk,d
Xi,d
τk,d
d ∈D
k∈K
πm,k
d ∈D
i∈N
Zm,i
m∈M
i∈N
k∈K
m∈M
μm,k,d
Xi,d
τm,k,d
d ∈D
k∈K
m∈M
d ∈D
i∈N
– Allow different groupings for each view:
 for each view: random variable 𝑧𝑚,𝑖
7
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Extending Classical Mixture Models
•
Extending the model to learn subspaces for each view
πm,k
Zm,i
m∈M
i∈N
k∈K
m∈M
μm,k,d
Xi,d
Vd
τm,k,d
d ∈D
k∈K
m∈M
d ∈D
d ∈D
i∈N
Challenge: In which subspace is the view located?
 For each dimension: random variable 𝑣𝑑 on 𝑀
– Overall multi-view mixture:
−1
𝑥𝑖,𝑑 ∼ 𝒩 𝜇𝑚,𝑘,𝑑 , 𝜏𝑚,𝑘,𝑑
where 𝑚 = 𝑣𝑑 and 𝑘 = 𝑧𝑚,𝑖
8
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Extending Classical Mixture Models
•
Extending the model to integrate external knowledge
πm,k
Zm,i
Wi,j
j∈N
i∈ N
μm,k,d
Xi,d
~
~
~
~
Vd
τm,k,d
d ∈D
k∈K
m∈M
&
Ci,j
m∈M
i∈N
k∈K
m∈M
d ∈D
d ∈D
i∈N
– Sparse weight matrix 𝑾 ∈ ℝ𝑁×𝑁 :
• 𝑤𝑖,𝑗 > 0: there exists a view where 𝑖 and 𝑗 are grouped together
• 𝑤𝑖,𝑗 < 0: there exists a view where 𝑖 and 𝑗 are not grouped together
Challenge: Which view is responsible for which constraint?
 For each constraint: random variable 𝑐𝑖,𝑗 on 𝑀
– Influencing the clustering:
𝑁
𝑝 𝑧𝑚,∗ 𝜋𝑚 , 𝑊, 𝐶) ∝
𝑁
𝑁
e𝑤𝑖,𝑗 ⋅𝛿(𝑧𝑚,𝑖 ,𝑧𝑚,𝑗 )
𝜋𝑚,𝑧𝑚,𝑖 ⋅
𝑖=1
𝑖=1 𝑗>𝑖,𝑐𝑖,𝑗 =𝑚
9
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Learning the SMVC Model
•
Given observations 𝑋 and constraints 𝑊, infer latent variables 𝐿
– Exact inference of 𝑝(𝐿|𝑋, 𝑊) intractable
 approximate solution via variational inference
• approximate 𝑝(𝐿|𝑋, 𝑊) by tractable family of parameterized distributions
𝑞(𝐿|Ψ)
• iterative coordinate ascent method to optimize values of Ψ
• overall complexity: 𝑂 𝑀 ⋅ 𝑁 ⋅ 𝐾 ⋅ 𝐷 + 𝑊
– Remark: Incorporate the constraints only slowly with each iteration:
• Iteration 1: 𝑊 ′ = 0 ⋅ 𝑊
• Iteration 2: 𝑊 ′ = 0.2 ⋅ 𝑊
• …
• Iteration 6: 𝑊 ′ = 𝑊
10
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Experimental Evaluation
•
Compared approaches:
– Multi-View: SMVC, MVGen, Alt. Clus, Multi-View 1 & 2
our novel approach
Günnemann et al. KDD 2012
Cui, Fern ICDM 2007
Qi, Davidson KDD 2009
– Subspace: Proclus, StatPC
Aggarwal et al. SIGMOD 1999
Moise, Sander KDD 2008
– Semi-supervised: PCKMeans, MPCKMeans
Basu et al. SDM 2004
•
Bilenko et al. ICML 2004
Main Question: How well can constraints support multi-view clustering
(in subspace projections)?
11
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Proof of Concept with Synthetic Data
12
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
The Dancing Stick Figures Data
•
Samples:
– 900 20x20 images by randomly introducing noise in the above samples
13
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
The CMU Face Data
•
Samples:
– Image set of 3 individuals:
• 4 facial expressions
• 4 head positions
• 2 eye status
 96 images
• Preprocessed via PCA
14
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
Future Challenges
•
Consider classical subspace challenges:
–
–
–
–
•
Global noise dimensions
Local noise dimensions (relevant cluster dimensions slightly deviate from view)
Overlapping relevant dimensions of views
Accounting for noise objects
Complex representation of constraints:
– So far only one constraint per object pair
Active learning of constraints:
clustering quality
– Random constraints are not the best choice
– More constraints do not always help more
clustering quality
•
# constraints
# constraints
Ian Davidson
Two approaches to understanding
when constraints help clustering
KDD 2012
15
KDD‘14
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann
Ines Färber
Matthias Rüdiger
Thomas Seidl
SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections
Stephan Günnemann, Ines Färber, Matthias Rüdiger, Thomas Seidl
&
~
~
~
instance-level user constraints
multiple clustering views in subspace projections
λm
Challenge:
Learn association of
dimensions to views
Zm,i
αd
βd
Wi,j
j∈N
i∈N
μm,k,d
Xi,d
Vd
τm,k,d
d ∈D
k∈K
m∈M
d ∈D
d ∈D
i∈N
non-informative
Learned through
Bayesian inference
κd
Ci,j
m ∈M
i∈N
k∈K
m∈M
μd
Challenge:
Learn association of
constraints to views
non-informative
πm,k
~
16

Similar documents