Surpassing Human-Level Face Verification Performance on LFW with GaussianFace (Supplementary Material) Chaochao Lu

Transcription

Surpassing Human-Level Face Verification Performance on LFW with GaussianFace (Supplementary Material) Chaochao Lu
Surpassing Human-Level Face Verification Performance on LFW with
GaussianFace
(Supplementary Material)
Chaochao Lu
Xiaoou Tang
Department of Information Engineering
The Chinese University of Hong Kong
{lc013, xtang}@ie.cuhk.edu.hk
Optimization
constant items) by
As discussed in the main text, learning the GaussianFace
model can amount to minimizing the following marginal
likelihood,
LM odel = − log p(ZT , θ|XT ) − βM.
(1)
For the model optimization, we first expand Equation (1) to
obtain the following equation (ignoring the constant items)
LM odel = − log PT + βPT log PT
+
S
βX
PT,i log Pi − PT,i log PT,i ,
S i=1
(2)
where Pi = p(Zi , θ|Xi ) and Pi,j means that its corresponding covariance function is computed on both Xi and Xj .
To obtain the optimal θ and Z, we need to optimize
Equation (2) with respect to θ and Z, respectively. We first
present the derivations of hyper-parameters θ. It is easy to
get
S
β X PT,i ∂Pi
·
+
S i=1 Pi ∂θj
S
∂PT,i
βX
(log Pi − log PT,i − 1)
.
S i=1
∂θj
The above equation depends on the form
(ignoring the constant items)
∂Pi
∂θj
˜ = A(λIn + AKA)−1 A.
where A
Similarly, the derivations of latent space Z can be also
obtained.
GaussianFace Inference for Face verification
∂LM odel 1 ∂PT
= β(log PT + 1) −
∂θj
PT ∂θj
+
∂ log p(Xi |Zi , θ)
D ∂K ≈ − Tr K−1
∂θj
2
∂θj
1 −1
−1 ∂K
+ Tr K Xi X>
K
,
i
2
∂θj
1 ∂J ∗
∂ log p(Zi )
≈− 2 i
∂θj
σ ∂θj
1 > ∂K
∂K ˜
=−
Aa
a
a − a>
2
λσ
∂θj
∂θj
˜ ∂K AKa
˜
˜ ∂K a ,
+ a> KA
− a> KA
∂θj
∂θj
∂ log p(θ) 1
= ,
∂θj
θj
as follows
∂Pi
∂ log Pi
=Pi
∂θj
∂θj
∂ log p(X |Z , θ) ∂ log p(Z ) ∂ log p(θ) i i
i
≈Pi
+
+
.
∂θj
∂θj
∂θj
In the section of GaussianFace Model for Face Verification, when given a test feature vector x∗ we need to estimate
it latent representation z∗ . In this paper, we apply the same
method in (Urtasun and Darrell 2007) to inference. For
convenience a brief review is given here.
Given x∗ , z∗ can be obtained by optimizing
LInf =
D
1
kx∗ − µ(z∗ )k2
+ ln σ 2 (z∗ ) + kz∗ k2 ,
2σ 2 (z∗ )
2
2
(3)
where
µ(z∗ ) = µ + X> K−1 K∗ ,
−1
σ 2 (z∗ ) = K∗∗ − K>
K∗ ,
∗K
K∗ is the vector with elements kθ (z∗ , zi ) for all other latent
points zi in the model, and K∗∗ = kθ (z∗ , z∗ ).
The above three terms can be easily obtained (ignoring the
Further Validations: Shuffling the Source-Target
c 2015, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
To further prove the validity of our model, we also consider to treat Multi-PIE and MORPH respectively as the
target-domain dataset and the others as the source-domain
1
0.98
GaussianFace-BC
GaussianFace-FE
GaussianFace-FE + GaussianFace-BC
0.96
0.94
0.92
0.96
0.94
Accuracy
Accuracy
GaussianFace-BC
GaussianFace-FE
GaussianFace-FE + GaussianFace-BC
0.92
0.9
0.9
0.88
0.86
0.84
0.82
0.88
0.8
0.86
0
1
2
The Number of SD
3
4
0.78
(a)
0
1
2
3
The Number of SD
4
(b)
Figure 1: (a) The accuracy rate (%) of the GaussianFace model on
Multi-PIE. (b) The accuracy rate (%) of the GaussianFace model
on MORPH.
datasets. The target-domain dataset is split into two mutually
exclusive parts: one consisting of 20,000 matched pairs and
20,000 mismatched pairs is used for training, the other is
used for test. In the test set, similar to the protocol of LFW,
we select 10 mutually exclusive subsets, where each subset
consists of 300 matched pairs and 300 mismatched pairs.
The experimental results are presented in Figure 1. Each
time one dataset is added to the training set, the performance
can be improved, even though the types of data are very
different in the training set.
References
Urtasun, R., and Darrell, T. 2007. Discriminative gaussian
process latent variable model for classification. In ICML.