KDD 2014
Transcription
KDD 2014
KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl KDD 2014 SMVC: Semi-supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Introduction to Multi-View Clustering • For complex data a single clustering is often not sufficient multiple valuable clustering interpretations do exist – Each highlighting different aspects … • Existing approaches: – Mostly consider just one data representation (full-space) Trade-off quality and diversity of clusterings 2 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Multi-View Clustering in Subspace Projections Few methods: exploit different data representations – Exploit arbitrary distortions of the data Difficult to interpret, quality of clustering not prioritized We argue: different groupings naturally occur in subspace projections characteristics for “health status“ oks read bo sport activity h ea l t hy shoe siz e unhe a lthy consumption of fruit characteristics for “taste of music“ attended rock concerts fullspace loves R oc k prefe rs o t h er music loves Class ic attended classic concerts – Intuitive semantic interpretation – No obfuscation of patterns through irrelevant dimensions in the fullspace 3 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Introduction to Semi-Supervised Clustering External knowledge can improve the clustering quality especially for complex patterns clustering quality • degree of supervision • Various solutions for traditional fullspace clustering – Instance-level constraints (must-link, cannot-link) are very intuitive – Soft constraints allow for disagreeing constraints Our contribution: Transfer the principle of soft instance-level constraints to multi-view clustering in subspace projections 4 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Challenges Semi-Supervised Multi-View Clustering in Subspace Projections • 1st challenge: which dimensions are relevant for which view? • 2nd challenge: which constraints belong to which view? 5 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Challenges Semi-Supervised Multi-View Clustering in Subspace Projections • 1st challenge: which dimensions are relevant for which view? • 2nd challenge: which constraints belong to which view? Our SMVC approach: novel generative model πk Zi k∈K i∈N μk,d Xi,d τk,d d ∈D k∈K d ∈D i∈N & ~ ~ ~ ~ 6 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Extending Classical Mixture Models • Extending the model to learn multiple views πk Zi k∈K i∈N μk,d Xi,d τk,d d ∈D k∈K πm,k d ∈D i∈N Zm,i m∈M i∈N k∈K m∈M μm,k,d Xi,d τm,k,d d ∈D k∈K m∈M d ∈D i∈N – Allow different groupings for each view: for each view: random variable 𝑧𝑚,𝑖 7 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Extending Classical Mixture Models • Extending the model to learn subspaces for each view πm,k Zm,i m∈M i∈N k∈K m∈M μm,k,d Xi,d Vd τm,k,d d ∈D k∈K m∈M d ∈D d ∈D i∈N Challenge: In which subspace is the view located? For each dimension: random variable 𝑣𝑑 on 𝑀 – Overall multi-view mixture: −1 𝑥𝑖,𝑑 ∼ 𝒩 𝜇𝑚,𝑘,𝑑 , 𝜏𝑚,𝑘,𝑑 where 𝑚 = 𝑣𝑑 and 𝑘 = 𝑧𝑚,𝑖 8 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Extending Classical Mixture Models • Extending the model to integrate external knowledge πm,k Zm,i Wi,j j∈N i∈ N μm,k,d Xi,d ~ ~ ~ ~ Vd τm,k,d d ∈D k∈K m∈M & Ci,j m∈M i∈N k∈K m∈M d ∈D d ∈D i∈N – Sparse weight matrix 𝑾 ∈ ℝ𝑁×𝑁 : • 𝑤𝑖,𝑗 > 0: there exists a view where 𝑖 and 𝑗 are grouped together • 𝑤𝑖,𝑗 < 0: there exists a view where 𝑖 and 𝑗 are not grouped together Challenge: Which view is responsible for which constraint? For each constraint: random variable 𝑐𝑖,𝑗 on 𝑀 – Influencing the clustering: 𝑁 𝑝 𝑧𝑚,∗ 𝜋𝑚 , 𝑊, 𝐶) ∝ 𝑁 𝑁 e𝑤𝑖,𝑗 ⋅𝛿(𝑧𝑚,𝑖 ,𝑧𝑚,𝑗 ) 𝜋𝑚,𝑧𝑚,𝑖 ⋅ 𝑖=1 𝑖=1 𝑗>𝑖,𝑐𝑖,𝑗 =𝑚 9 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Learning the SMVC Model • Given observations 𝑋 and constraints 𝑊, infer latent variables 𝐿 – Exact inference of 𝑝(𝐿|𝑋, 𝑊) intractable approximate solution via variational inference • approximate 𝑝(𝐿|𝑋, 𝑊) by tractable family of parameterized distributions 𝑞(𝐿|Ψ) • iterative coordinate ascent method to optimize values of Ψ • overall complexity: 𝑂 𝑀 ⋅ 𝑁 ⋅ 𝐾 ⋅ 𝐷 + 𝑊 – Remark: Incorporate the constraints only slowly with each iteration: • Iteration 1: 𝑊 ′ = 0 ⋅ 𝑊 • Iteration 2: 𝑊 ′ = 0.2 ⋅ 𝑊 • … • Iteration 6: 𝑊 ′ = 𝑊 10 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Experimental Evaluation • Compared approaches: – Multi-View: SMVC, MVGen, Alt. Clus, Multi-View 1 & 2 our novel approach Günnemann et al. KDD 2012 Cui, Fern ICDM 2007 Qi, Davidson KDD 2009 – Subspace: Proclus, StatPC Aggarwal et al. SIGMOD 1999 Moise, Sander KDD 2008 – Semi-supervised: PCKMeans, MPCKMeans Basu et al. SDM 2004 • Bilenko et al. ICML 2004 Main Question: How well can constraints support multi-view clustering (in subspace projections)? 11 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Proof of Concept with Synthetic Data 12 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl The Dancing Stick Figures Data • Samples: – 900 20x20 images by randomly introducing noise in the above samples 13 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl The CMU Face Data • Samples: – Image set of 3 individuals: • 4 facial expressions • 4 head positions • 2 eye status 96 images • Preprocessed via PCA 14 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl Future Challenges • Consider classical subspace challenges: – – – – • Global noise dimensions Local noise dimensions (relevant cluster dimensions slightly deviate from view) Overlapping relevant dimensions of views Accounting for noise objects Complex representation of constraints: – So far only one constraint per object pair Active learning of constraints: clustering quality – Random constraints are not the best choice – More constraints do not always help more clustering quality • # constraints # constraints Ian Davidson Two approaches to understanding when constraints help clustering KDD 2012 15 KDD‘14 SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann Ines Färber Matthias Rüdiger Thomas Seidl SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections Stephan Günnemann, Ines Färber, Matthias Rüdiger, Thomas Seidl & ~ ~ ~ instance-level user constraints multiple clustering views in subspace projections λm Challenge: Learn association of dimensions to views Zm,i αd βd Wi,j j∈N i∈N μm,k,d Xi,d Vd τm,k,d d ∈D k∈K m∈M d ∈D d ∈D i∈N non-informative Learned through Bayesian inference κd Ci,j m ∈M i∈N k∈K m∈M μd Challenge: Learn association of constraints to views non-informative πm,k ~ 16