Poster - University of Sussex

Transcription

Poster - University of Sussex
Convex Relaxation of Mixture Regression with Efficient Algorithms
Authors
1
1
1
2
Novi Quadrianto | Tibério S. Caetano | John Lim | Dale Schuurmans
1: NICTA & Australian National University | 2: University of Alberta
Abstract
• We develop a convex relaxation of maximum a posteriori estimation of a mixture of regression model.
Semidefinite Relaxation
Problem (1) can be relaxed to
• We reformulate the relaxation problem to eliminate the need for general semidefinite programming.
• We provide two reformulations that admit fast algorithms. The first is a max-min spectral reformulation
exploiting quasi-Newton descent. The second is a min-min reformulation consisting of fast alternating
steps of closed-form updates.
• We provide experiments in a real problem of motion segmentation from video data.
Applications
Toy Data
"
#
T
1 2 T
y
1 y
T
min
max − σ c c −
− c M XX
−c .
2
2
2
2α σ
M:tr M=t,γtI<M<0 c
σ
(2)
(Sketchy) Steps
• Dataset: 30 synthetic data points is generated according to yi = (πi ⊗ xi)w + i, with xi ∈ R, i ∼ N(0, 1)
and w ∈ U(0, 1). The response variable yi is assumed to be generated from a mixture of 5 components .
• Performance comparison with EM (100 random restarts was used to avoid poor local optima).
1 wT ΨT Ψw).
• Compute convex conjugate of the log-partition function (i.e. 2σ
2
• Error rates are 0.347 ± 0.086 (EM) and 0.280 ± 0.063 (Max-Min) across 10 different runs.
• Relax the constraints on the assignment matrix
Mixture of Regression Problem
⊆
{ΠΠT : Π ∈ {0, 1}t×k , Π1 = 1, max(diag(ΠT Π)) ≤ γt}
{M : M ∈ Rt×t , tr M = t, γtI < M < 0}.
Note : The relaxation of Problem (1) to (2) only refers to loosening the set of feasible solutions and no
relaxation has been introduced in the objective function. However, solving (2) directly is prohibitive for
medium to large scale problems as it requires a general semidefinite solver.
Algorithm 1: Max-Min Reformulation
Ground Truth
Max-Min
EM
Motion Segmentation from Video Data
The ”hammer” (Overton & Womersley 1993)
Let V ∈ Rt×t , V = V T and its EVD, PT V P = Λ((λ1, . . . , λt )), then
max
Linear regression (left) and mixture of regression (right); Gaffney & Smyth 1999
Given
• a labeled training set D comprising t input-output pair {(x1, y1), . . . , (xt , yt )};
• assumption that the output variable y is generated by a mixture of k components;
• no information about which component of the mixture generates each output variable yi.
M:tr(M)=q,I<M<0
tr MV T =
q
X
i=1
λi
and
argmax
M:tr(M)=q,I<M<0
• Dataset: Hopkins 155 (http://www.vision.jhu.edu/data/hopkins155/).
tr MV T 3 Pq PTq .
• Given a pair of corresponding points pi and qi from two frames and k motion groups, we have the following
Pk
T
epipolar equation ( Li 2007), qi ( j=1 πi j F j)pi = 0.
Pk
y x π
π
π
x
x
T
T
x
• Redefine [qi pi qi pi qi pi .... qi pi ] := xi and vec(F j ) := w, our problem is now j=1 πi j xiT w j = 0.
To use the ”hammer”, Problem (2) is reformulated to


 1 2 T

t
T
T
max − σ c c −
max
tr( M̄(XX (ȳ − c)(ȳ − c) )) .
c
2
2αq M̄:tr M̄=q,I< M̄<0
Find
• a regression model f : X → Y for each of the mixture component.
The Model
(Sketchy) Steps
• Simply by interchanging min M and maxc, invoking distributivity property, rearranging the terms and definy
ing ȳ := σ2 , problem (2) can be rewritten as
"
1
1 2 T
max
max − σ c c −
c
2
2α M:tr M=t,γtI<M<0
Denote
• X ∈ Rt×n as a matrix of input and y ∈ Rt×1 as a vector of output variables;
• Π ∈ {0, 1}t×k , Π1 = 1 and max(diag(ΠT Π)) ≤ γt (bounding the size of the largest component) as the hidden
assignment matrix.
Assume
• a linear regression model yi|xi, πi = ψiw + i,
i ∼ N(0, σ2) w.r.t a feature representation ψi = πi ⊗ xi.
Gaussian Likelihood
With the Gaussian noise model , our likelihood is then
#
"
1
1
p(yi|xi, πi; w) = √
exp − 2 (ψiw − yi)2 .
2σ
2πσ2
Log-Posterior Optimization
With an additional assumption of Gaussian prior on the parameter w, the minimization of (negative) logposterior is now in the form of
"
#
1 T T
1 T
α T
min
w Ψ Ψw − 2 y Ψw + w w ,
(1)
2
2
Π,w 2σ
σ
s.t. constraints on the assignment matrix Π.
#
tr(M(XX T (ȳ − c)(ȳ − c)T )) .
• Let q = {u : u = max{1, . . . , t}, u ≤ γ−1} and define M̄ := (q/t)M to transform the constraint set from
{M : tr M = t, γtI < M < 0} to {M : tr M = q, I < M < 0}.
Algorithm 2: Min-Min Reformulation
Problem (2) is also equivalent to
"
#
1 T
1
α
T
T
T
T
T M −1 A) .
min
min
y
diag(XA
)
+
diag(XA
)
diag(XA
)
+
tr(A
2γt
{M:IM0,tr M=1/γ} A
σ2
2σ2
σ2 T
γt
Steps; (2) =
min
max
− c c−
tr(C T MC)
2
2α
{M:IM0,tr M=1/γ} {c,C:C=Λ(c−ȳ)X}
σ2 T
γt
=
min
min max − c c −
tr(C T MC) + tr(AT C) − tr(AT Λ(c − ȳ)X).
2
2α
{M:IM0,tr M=1/γ} A c,C
α M −1 A.
Lastly, c and C can be solved as c = − σ12 diag(XAT ) and C = γt
Ground Truth
EM
Max-Min
Min-Min
References
• S. Gaffney and P. Smyth. Trajectory clustering with mixtures of regression models. In ACM SIGKDD,
volume 62, pages 63–72, 1999.
• M. Overton and R. Womersley. Optimality conditions and duality theory for minimizing sums of the
largest eigenvalues of symmetric matrices. Mathematical Programming, 62:321–357, 1993.
• H. Li. Two-view motion segmentation from linear programming relaxation. In CVPR, 2007.