INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF

Transcription

INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF
INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF MANIFOLD VALUED
DATA
By
HESAMODDIN SALEHIAN
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2014
c 2014 Hesamoddin Salehian
⃝
2
To the memory of my mother, who devoted her life to my education and has always been
truly my encouragement. My wife who has always been supportive and proud of my
work, and shared many challenges and sacrifices towards completing my PhD. My
father, who taught me to persist and work hard throughout my life. My brothers, who
have been always my leaders in education and taught me to be ambitious with high
goals.
3
ACKNOWLEDGMENTS
First and foremost, I would like to thank my advisor, Dr. Baba C. Vemuri, for his
persistent support to make this dissertation. His creativity, excellent knowledge and
patience encouraged me all along my PhD study. This dissertation would have not been
completed without his support.
I would also like to thank my committee, Dr. Arunava Banerjee and Dr. Anand
Rangarajan, Dr. William Hager and Dr. John Forder, for making valuable comments and
providing wonderful advice. Dr. Banerjee and Dr. Rangarajan have always been very
supportive and generous with their time, and taught me fundamental and advanced
machine learning concepts. Dr. Hager had a great impact on my knowledge of linear
algebra and matrix analysis. Dr. Forder kindly provided data for medical imaging
applications.
Also, special thanks to Dr. Jeffrey Ho, for his excellent support through my PhD. I
had the honor to collaborate with him in several publications, and I would like to thank
him for his insightful guidance, dedication, and his wonderful attitude.
I cannot express my gratitude enough to my deceased mother, Zahra Khatibi, who
devoted her entire life to my education, and was always an excellent encouragement
and support all along this road. I never got a chance to say goodbye to her when
she passed away overseas, but her memories was the strongest encouragement to
overcome all the difficulties towards completing this degree and to make her wishes
come true.
I am very thankful to my kind wife, Pegah, who has always been proud of my
accomplishments and has been by my side through highest highs and lowest lows. I
cannot imagine how this dissertation could have been completed, without her persistent
help and support.
Special thanks for my father, Manouchehr Salehian, who have always been my role
model of hard working, strength and great personality, and my older brothers, Hamid
4
and Hamed, who were truly my leaders in education, in music arts and in sport, since I
was a little child till present.
Last, but not least, I want to thank my former lab-mate, Dr. Guang Cheng for his
help and guidance and his excellent work in our several collaborations. Besides, I am
thankful to my friendly and knowledgeable colleagues in CVGMI Laboratory, Yuchen,
Meizhu, Ting, Dohyung, Wenxing, Yan, Yuanxiang, Jiaqi, Ted, Rudrasis, Monami, and
others.
The research in this dissertation was in part supported by NIH grant NS066340
to Dr. Baba C. Vemuri. I also received the Student Travel Award from MICCAI’14
Conference, and the Internship Program at Google. I gratefully acknowledge the
permission granted by IEEE and Springer to reuse materials from my previous
publications in this dissertation.
5
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
CHAPTER
1
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2
INCREMENTAL ESTIMATION OF THE STEIN CENTER OF SPD MATRICES
AND ITS APPLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Background . . . . . . . . . . . . . . . . . . . . . . .
2.2 Incremental Stein Mean Computation . . . . . . . .
2.3 Properties of Pn Equipped with the Stein Distance .
2.3.1 Global Non-Positive Curvature Spaces . . . .
2.3.2 Discussion . . . . . . . . . . . . . . . . . . . .
2.4 Experiments . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Performance of the Incremental Stein Center
2.4.2 Application to K-means Clustering . . . . . .
2.4.3 Application to Image Retrieval . . . . . . . . .
2.4.4 Application to Shape Retrieval . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
20
23
23
30
30
30
31
34
39
INCREMENTAL FRÉCHET MEAN ESTIMATOR ON SPHERE . . . . . . . . . 42
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Riemannian Geometry of Sphere . . . . . . . . . . . . . . . .
3.2.2 Gnomonic Projection . . . . . . . . . . . . . . . . . . . . . . .
3.3 Incremental Fréchet Mean Estimator on Sphere . . . . . . . . . . . .
3.3.1 Angle Bisector Theorem . . . . . . . . . . . . . . . . . . . . .
3.3.2 Lower Bound for tn . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Upper Bound for tn . . . . . . . . . . . . . . . . . . . . . . . .
3.3.4 Convergence of iFME . . . . . . . . . . . . . . . . . . . . . .
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Application to Incremental Shape-Preserving Fréchet Mean of
SPD Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
42
45
45
46
47
51
51
52
53
56
56
. . . 58
IPGA: INCREMENTAL PRINCIPAL GEODESIC ANALYSIS WITH APPLICATIONS
TO MOVEMENT DISORDER CLASSIFICATION . . . . . . . . . . . . . . . . . 63
6
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Riemannian Geometry of the Space of SPD Tensor Fields
4.2.2 Schild’s Ladder Approximation of Parallel Transport . . . .
4.3 iPGA: Incremental Principal Geodesic Analysis . . . . . . . . . .
4.3.1 Incremental Fréchet Mean Estimator . . . . . . . . . . . .
4.3.2 Incremental Principal Geodesic Analysis on Pm
n . . . . . .
k
4.3.3 Incremental Principal Geodesic Analysis on S . . . . . .
4.4 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Manifold of SPD Tensor Fields . . . . . . . . . . . . . . .
4.4.2 Unit Sphere Sk . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Real Data Experiments: Classification of PD vs. ET vs. Controls
4.5.1 Classification Results using Deformation Tensor Features
4.5.2 Classification Results using Shape Features . . . . . . . .
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
65
65
67
68
69
70
72
75
75
76
77
78
80
SUMMARY AND DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7
LIST OF TABLES
Table
page
2-1 Average shape retrieval precision (%) for the MPEG7 database, for different
Binary Code (BC) lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2-2 Time (in seconds) comparison for shape retrieval. . . . . . . . . . . . . . . . . 41
4-1 Summary of Riemannian geometry of the space of n×n positive definite matrices,
Pn , as well as the unit k−dimensional sphere, Sk . . . . . . . . . . . . . . . . . . 67
4-2 Incremental PGA Algorithm for SPD Tensor Fields . . . . . . . . . . . . . . . . 72
4-3 Incremental PGA Algorithm on Unit Sphere . . . . . . . . . . . . . . . . . . . . 73
4-4 Classification results of iPGA, PGA, PCA using SPD tensor field features . . . 79
4-5 Classification results of iPGA, PGA, PCA using shape descriptor features . . . 81
8
LIST OF FIGURES
Figure
page
2-1 Schematic view of x1 , x2 , x3 , x4 in Reshetnyak’s quadruple comparison. . . . . . 24
2-2 Illustration of the proof of Reshetnyak’s inequality for the quadruple (I , D2↓ , X3 , X4↓ ),
from the quadruple (I , D2↓ , X3↓ , X4↓ ). . . . . . . . . . . . . . . . . . . . . . . . . . 29
2-3 Error comparison of the incremental (red) versus non-incremental (blue) Stein
mean computation for data on P3 . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2-4 Time comparison of the incremental (red) versus non-incremental (blue) Stein
mean computation for data on P3 . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2-5 Illustration of the incremental mean updates in K-means clustering. . . . . . . . 33
2-6 Time comparison of the K-means clustering using various methods. . . . . . . 35
2-7 Error comparison of the K-means clustering. . . . . . . . . . . . . . . . . . . . 36
2-8 Time consumption in initializing hashing functions. . . . . . . . . . . . . . . . . 39
2-9 Comparison of retrieval accuracy, for techniques specified in Fig. 2-8 . . . . . . 40
2-10 Example results of proposed retrieval system, based on the incremental Stein
mean, with 640-bits binary codes. . . . . . . . . . . . . . . . . . . . . . . . . . 41
3-1 Gnomonic Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3-2 Use of Euclidean weights to update iFME in Sk , does not necessarily correspond
to the same weights in the tangent space. . . . . . . . . . . . . . . . . . . . . . 49
3-3 Fréchet mean of samples on Sk , does not necessarily coincide with the arithmetic
mean of projected points in the tangent space. . . . . . . . . . . . . . . . . . . 50
3-4 The comparison of the ratio of variances (defined in Eq. 3–25) between iFME
and FM, for different values of ϕ. . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3-5 The time comparison between iFME and FM, for different values of ϕ. . . . . . 58
3-6 Visual comparison of the mean tensor obtained from shape preserving iFME
on the product manifold (top row), and iFME applied on P(3) (bottom row). . . 60
3-7 Comparison of FA values between iFME on P(3), and iFME on the product
manifold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4-1 Illustration of Schild’s Ladder algorithm, described in Eq. 4–9. . . . . . . . . . . 68
4-2 Schematic illustration of the algorithm in Table 4-2. . . . . . . . . . . . . . . . . 72
4-3 Step by step illustration of the iPGA algorithm on Sk , summarized in Table 4-3.
9
74
4-4 Estimation of the projection πS (X ) to the 1-D principal geodesic submanifold
(red curve). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4-5 Time consumption and residual error comparison between iPGA (proposed)
and PGA on Pm
n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4-6 Mean angular error of iPGA estimates w.r.t. PGA on S10000 . . . . . . . . . . . . 77
4-7 Time comparison of incremental and non-incremental PGA estimators on S10000 . 78
4-8 S0 images of a control and a Parkinson subject, along with the computed atlas.
79
4-9 Population of Substantia Nigra regions extracted from the control brain images. 81
4-10 Comparison of incremental (bottom row) and non-incremental (top √
row) results
of (1) Fréchet Means (left column), (2) PGA√with the coefficient 1.5 λ (middle
column), and (3) PGA with the coefficient 3 λ (right column) . . . . . . . . . . 82
10
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF MANIFOLD VALUED
DATA
By
Hesamoddin Salehian
December 2014
Chair: Baba C. Vemuri
Major: Computer Engineering
Manifold-valued features are ubiquitous in many applications in computer vision,
machine learning and medical image analysis. Statistical analysis of a population
of such data is commonly encountered in many tasks in the aforementioned fields
such as, object recognition, shape analysis, facial expression analysis, longitudinal
studies quantifying for example disease related changes in structure/function, and
many others. In this dissertation we present a suite of efficient incremental tools and
techniques for statistical analysis of a given population of manifold-valued data. Most of
the existing tools suffer from computational and storage (memory) inefficiency, due to
the complexities introduced when dealing with manifold-valued features. Therefore, an
incremental technique is an appealing choice in these applications, because, when the
input population is augmented, one only needs to update the most recently estimated
statistical feature (e.g., mean, principal component, etc), without having to re-compute it
from scratch.
We start the dissertation with efficient statistical analysis algorithms of a population
of Symmetric Positive Definite (SPD) matrices. In this regard, we first propose a novel
incremental algorithm to compute the mean of a population of SPD matrices, based on
the recently introduced Stein distance. It is known that the compute time of the Stein
distance between two SPD matrices is far less than that required for computing the
geodesic distance using the canonical GL-invariant metric . However, there is no closed
11
from solution for the Stein mean of a group of SPD tensors, which is defined as the
minimizer of the sum of squared Stein distances. Therefore, our incremental Stein mean
estimator plays a crucial role to speed up many applications dealing with SPD matrices.
In a wide variety of applications the input data lies on a sphere which is an example
of Riemannian manifolds with positive constant sectional curvature. We develop a novel
incremental mean computation algorithm for features lying on a sphere, which is one
of the most widely used manifolds in science and engineering problems. Although
there are several convergence results in recent literature for many manifestations of
an incremental mean estimator, these analysis are all limited to the non-positively
curved spaces. We analytically show the convergence of the incremental method to
the true mean on sphere, when the number of samples tends to infinity. To the best
of our knowledge, there is no similar convergence analysis introduced in literature, for
positively curved spaces. We provide several synthetic and real data experiments to
illustrate the effectiveness and efficiency of the proposed incremental method.
Next, we continue the statistical analysis of manifold-valued data, with the
introduction of a novel incremental Principal Geodesic Analysis (PGA) algorithm.
PGA is the non-linear counterpart of the well-known Principal Component Analysis
(PCA), and is applicable to manifold-valued data. However, the existing PGA algorithms
are computationally very expensive, specially for very large data. Using our incremental
method, we show considerable gains in computation time over the standard PGA
algorithm, while retaining the same accuracy.
12
CHAPTER 1
INTRODUCTION
In many applications in computer vision, machine learning and medical imaging,
features do not belong to a vector space. For instance, having a unit norm is a constraint
which is frequently imposed on a group of vectors, but it is easy to verify that this
fundamental constraint is not necessarily closed under linear operations. Therefore,
these types of data can be best interpreted as features belonging to some manifold. To
mention a few, Symmetric Positive Definite (SPD) matrices which frequently appear in
computer vision and medical imaging, belong to a Riemannian manifold with negative
sectional curvature [47], most of the popular image features such as SIFT [32] are often
defined on spheres, due to normalization, etc.
Statistical analysis of manifold-valued features is encountered in most of the
applications mentioned above, either to characterize the uncertainty of the noisy data,
or to compare and classify the observations in group difference and longitudinal studies.
However, due to the lack of the vector space structure, standard statistical analysis tools,
e.g., arithmetic mean, Principal Component Analysis (PCA), etc., can not be directly
applied to a group of these features. In this dissertation, we introduce computationally
efficient tools for statistical analysis of a given population of manifold-valued data. This is
achieved by developing incremental algorithms for computing the statistics.
Finding the mean of a population of manifold-valued features has gotten a lot of
attentions in recent years. Computing the mean of data lying on a manifold, can be
achieved through minimization of the sum of squared geodesic distances between the
manifold-valued data points and the unknown mean. Mathematically speaking, for a set
of given points, xi , on a Riemannian manifold M,
∗
µ = argminµ∈M
n
∑
i=1
13
d 2 (xi , µ)
(1–1)
This cost function is usually called the Fréchet function, in literature, and its global
minimizer is referred to as the Fréchet mean [15]. The uniqueness of Fréchet mean
for general manifolds cannot be guaranteed, unless some conditions are satisfied [52].
Consequently, any point that is a local minimizer of the above sum of squared distances
is known as Karcher mean. For Riemannian manifolds with non-positive sectional
curvatures, Cartan showed that the Fréchet mean always exists and is unique [38, p.
222]. Later, Grove and Karcher in [16] tried to generalize Cartan’s theorem, and proved
the uniqueness of this center of mass in general Riemannian manifolds, but for the
samples within a geodesic ball with small enough radius. We refer the interested reader
to [1, 15, 52] for further details.
Among various examples of Riemannian manifolds, we are particularly interested in
the statistical analysis of the features lying on one the these two well-known manifolds
which widely appear in computer vision, medical image analysis and machine learning
literature: (i) the space of (n × n) Symmetric Positive Definite (SPD) matrices which is
denoted by P(n), and is a Riemannian manifold with negative sectional curvature [47],
(ii) the k-dimensional unit sphere embedded in Rk+1 , which is denoted by Sk , and is a
standard instance of positively curved spaces [11].
Symmetric Positive Definite (SPD) matrices have been widely used in many
computer vision and medical imaging applications. For instance, structure tensors
and covariance descriptors are ubiquitous in computer vision problems, including but
not limited to classification, object tracking and recognition. Also, in medical imaging,
they are often encountered in Diffusion Tensor Imaging (DTI), Conductance Imaging,
elastography, etc. In DTI, they are used to characterize the diffusion of water molecules,
in elastography, the elasticity tensor is used to describe the material properties of the
tissue and so on and so forth. Cauchy-Green deformation tensors are another example
of such matrices which appear in fluid and solid mechanics.
14
On the other hand, spherical features are frequently used in many applications in
computer vision and machine learning. To mention a few, any probability distribution
function can be parameterized, using square root density and thus mapping it to a point
on a hyper-sphere in an infinite dimensional Hilbert space [45]. (3 × 3) orthogonal
matrices can be represented by unit quaternions which are points on a 4-dimensional
unit sphere [18]. Also, any directional feature, due to normalization, inherently lie on a
3-dimensional unit sphere [33].
It is known that the geodesic distance computation on P(n) is computationally
inefficient, specially for large matrix dimensions. The Stein distance is a recently
proposed alternative [9], which is more efficient. However, lack of a closed form solution
for the Stein mean of more than two SPD matrices, makes it less appealing, because
iterative optimization techniques must be employed to compute the mean. In Chapter
2, we present a novel incremental algorithm to compute the Fréchet mean of a group
of SPD matrices, based on the Stein distance. Through several synthetic and real
data experiments, we demonstrate significant time gains achieved by our incremental
method, compared to its non-incremental counterpart, while the accuracy of the two
methods are very similar.
Further, in Chapter 3, the incremental Fréchet mean estimator for data lying
on sphere, is presented. The existing incremental mean computation techniques in
literature e.g., [6, 21, 30, 46], are applicable to non-positively curved Riemannian
manifolds, while sphere is a space with positive sectional curvature [11]. Therefore,
convergence results in the aforementioned references are not directly applicable to this
case. We analytically prove the convergence of the incremental estimator to the true
Fréchet mean for symmetric distributions, when the number of samples tends to infinity.
To the best of our knowledge, there is no similar convergence results for positively
curved manifolds, in literature. We demonstrate the efficiency of our incremental
method, in several applications.
15
Principal Component Analysis (PCA) is a well-known statistical analysis tool which
is widely used in literature. The non-linear version of PCA is called Principal Geodesic
Analysis (PGA) and was first introduced in [14]. PGA has been applied to many
problems in the past decade. To mention a few, in medical imaging literature, it was
used in [13, 14, 57] and [55] for statistical shape analysis and tensor field classification,
respectively. Also, in computer vision it was applied to facial gender classification [53]
and motion compression [48]. We continue the statistical analysis of manifold-valued
data, by presenting a novel incremental PGA (iPGA) algorithm for both a population of
SPD tensor fields, as well as spherical features, in Chapter 4. To this end, we present a
novel iPGA method using the incremental Fréchet mean estimation technique presented
in [21], and reformulate the PGA algorithm in [55] in an incremental form. In order
to illustrate the effectiveness and accuracy of the proposed method we compare the
performance of iPGA and the batch-mode PGA via synthetic and real data experiments.
16
CHAPTER 2
INCREMENTAL ESTIMATION OF THE STEIN CENTER OF SPD MATRICES AND ITS
APPLICATIONS
2.1 Background
Finding the mean of data lying on Pn can be achieved through a minimization
process. More formally, the mean of a set of N data xi ∈ Pn is defined by
∗
x = argminx
N
∑
d 2 (xi , x)
(2–1)
i=1
where d is the chosen distance/divergence. Depending on the choice of d, different
types of means are obtained. Many techniques have been published on computing
the mean SPD matrix based on different kinds of similarity distances/divergences. In
[51], symmetrized Kullback-Leibler divergence was used to measure the similarities
between SPD matrices, and the mean was computed in closed-form and applied to
texture and diffusion tensor image (DTI) segmentation. Fréchet mean was obtained
by using the GL-invariant (GL denotes the general linear group i.e., the group of (n, n)
invertible matrices) Riemannian metric on Pn and used for DTI segmentation in [28]
and for interpolation in [34]. Another popular distance is the so called Log-Euclidean
distance introduced in [12] and used for computing the mean. More recently, in [9] the
LogDet divergence was introduced and applied for tensor clustering and covariance
tracking. Each one of these distances and divergences possesses their own properties
with regards to invariance to group transformations/operations. For instance, the
natural geodesic distance derived from the GL-invariant metric is GL-invariant. The
c
⃝2013
IEEE. Reprinted with minor changes, with permission, from H. Salehian,
G. Cheng, B.C. Vemuri and J. Ho, ”Recursive Estimation of the Stein Center of SPD
Matrices and Its Applications”, In Computer Vision (ICCV), 2013 IEEE International
Conference on, pp. 1793-1800. IEEE, December 2013. [39]
17
LogEuclidean distance is invariant to the group of rigid motions and so on. Among
these distances/divergences, the LogDet divergence was shown to posses interesting
bounding properties with regards to the natural Riemannian distance in [9] and much
more computationally attractive for computing the mean. However, no closed form
expression exists for computing the mean using the LogDet divergence, for more than
two matrices. When the number of samples in the population is large and the size of
SPD matrices is larger, it would be desirable to have a computationally more attractive
algorithm for computing the mean using this divergence.
An incremental form can effectively address this problem. Incremental formulation
leads to considerable efficiency in mean computation, because for each new sample,
all one needs to do is to update the old. Consequently, the algorithm only needs to
keep track of the most recently computed mean, while computing the mean in a batch
mode requires one to store all previously given samples. This can prove to be quite
storage intensive for large problems. Thus, by using an incremental formula we can
significantly reduce the time and storage consumption. Recently, in [6] recursive
algorithms to estimate the mean SPD matrix based on the natural GL-invariant
Riemannian metric and symmetrized KL-divergence were proposed and applied to
the task of DTI segmentation. Also in [54] a recursive form of Log-Euclidean based
mean was introduced. In this chapter we present a novel incremental algorithm for
computing the mean of a set of SPD matrices, using the Stein metric.
The Jensen-Bregman LogDet (JBLD) divergence was recently introduced in [9] for
(n × n) SPD matrices. Compared to the standard approaches, the JBLD has a much
lower computational cost since the formula does not require any eigen decompositions
of the SPD matrices. Moreover, it has been shown that it is useful for use in nearest
neighbor retrieval [9]. However, JBLD is not a metric on Pn , since it does not satisfy the
triangle inequality. In [44] the authors proved that the square root of JBLD is a metric,
which is called Stein metric. Unfortunately, the mean of SPD matrices based on the
18
Stein metric can not be computed in a closed form, for more than two matrices [5, 9].
Therefore, iterative optimization schemes are applied to find the mean for a given set
of SPD matrices. The computational efficiency of these iterative schemes is effected
considerably especially when the number of samples and size of matrices is large. This
makes the Stein based mean inefficient for computer vision applications which deal with
huge amounts of data. In this chapter, we introduce an efficient incremental formula
to compute the Stein mean. To illustrate the effectiveness of proposed algorithm we
first show that applying the incremental Stein mean estimator to the task of K-means
clustering leads to significant gain in compute time when compared to using the batch
mode Stein center, as well as other recursive mean estimators based on aforementioned
distances/divergences. Furthermore, we develop a novel hashing technique which is a
generalization of the work in [20] to SPD matrices.
The key contributions are: (i) derivation of a closed form solution to the weighted
Stein center of two matrices which is then used in the formulation of the incremental
form for the Stein center estimation of more than two SPD matrices. (ii) Empirical
evidence of convergence of the incremental estimator of Stein mean to the true
Stein mean is shown. (iii) A new hashing technique for image indexing and retrieval
using covariance descriptors. (iv) Synthetic and real data experiments depicting
significant gains in computation time for SPD matrix clustering and image retrieval
(using covariance descriptor features), using our incremental Stein center estimator.
The rest of this chapter is organized as follows: in Section 3.3 we present the
incremental algorithm to find the Stein distance based mean of a set of SPD matrices.
Then in Section 2.3 we provide an overview of the important properties of Pn equipped
with the Stein distance. Section 2.4 includes the empirical evidences of the convergence
of incremental Stein mean estimator to the true Stein mean. Further, we present a set of
synthetic and real data experiments showing the improvements in compute time of SPD
matrix clustering and hashing.
19
2.2 Incremental Stein Mean Computation
The action of the general linear group of n × n invertible matrices (denoted by GL(n))
on Pn defines the natural group action and is defined as follows: ∀g ∈ GL(n), ∀X ∈
Pn , X [g] = gXg T , where T denotes the matrix transpose operation. Let A and B be
any two points in Pn . The geodesic distance on this manifold is defined by the following
GL(n)-invariant Riemannian metric:
dR (A, B)2 = trace(Log(A−1 B)2 ),
(2–2)
where Log is the matrix logarithm. The mean of a set of N SPD matrices based on the
above Riemannian metric is called the Fréchet mean, and is defined as
X ∗ = argminX
N
∑
dR2 (X , Xi ),
(2–3)
i=1
where X ∗ is the Fréchet mean, and Xi are the given matrix-valued data. However,
computation of the distance using (2–2), requires eigen decomposition of the matrix,
which for large matrices slows down the computation considerably. Furthermore, the
minimization problem (2–3) does not have a closed form solution in general (for more
than two matrices) and iterative schemes such as the gradient descent technique are
employed to find the solution.
Recently in [9], the Jensen-Bregman LogDet (JBLD) divergence was introduced to
measure similarity/dissimilarity between SPD matrices. It is defined as
DLD (A, B) = logdet(
A+B
1
) − logdet(AB),
2
2
(2–4)
where A and B are two given SPD matrices. It can be seen that JBLD is much more
computationally efficient than the Riemannian metric, as no eigen decomposition
is required. JBLD is however not a metric, because it does not satisfy the triangle
inequality. However, in [44], it was shown that the square root of JBLD divergence is a
metric, i.e., it is non-negative definite, symmetric and satisfies the triangle inequality.
20
This new metric is called Stein metric and is defined by,
dS (A, B) =
√
DLD (A, B),
(2–5)
where DLD is defined in (2–4). Clearly, Stein metric can also be computed efficiently.
Accordingly, the mean of a set of SPD tensors, based on Stein metric is defined by
∗
X = argminX
N
∑
dS2 (X , Xi ).
(2–6)
i=1
Let X1 , X2 , , XN ∈ Pn be a set of SPD matrices. The incremental Stein mean can be
defined as
M1 = X 1
(2–7)
Mk+1 (wk+1 ) = argminM (1 − wk+1 )dS2 (Mk , M)
+wk+1 dS2 (Xk+1 , M)
where wk+1 =
1
,
k+1
(2–8)
Mk is the old mean of k SPD matrices, Xk+1 is the new incoming
sample and Mk+1 is the updated mean for k + 1 matrices. Note that (2–8) can be thought
of as a weighted Stein mean between the old mean and the new sample point, with the
weight being set to be the same as in Euclidean mean update.
Now, we show that (2–8) has a closed form solution for SPD matrices. Let A and B
be two matrices in Pn . The weighted mean of A and B, denoted by C , with the weights
being wa and wb such that wa + wb = 1, should minimize (2–8). Therefore, one can
compute the gradient of this objective function and set it to zero to find the minimizer C
wa [(
C + A −1
C + B −1
) − C −1 ] + wb [(
) − C −1 ] = 0
2
2
(2–9)
Multiplying both sides of (2–9) by matrices C , C + A and C + B in a right order yields:
CA−1 C + (wb − wa )C (I − A−1 B) − B = 0
21
(2–10)
It can be verified that for any matrices A, B and C in Pn , satisfying (2–10), the matrices
A− 2 CA− 2 and A− 2 BA− 2 commute. In other words
1
1
1
1
A−1 CA−1 B = A−1 BA−1 C
(2–11)
Left multiplication of (2–10) by A−1 yields
A−1 CA−1 C + (wb − wa )A−1 C (I − A−1 B) = A−1 B
(2–12)
The equation above can be rewritten in a matrix quadratic form as the following, by using
the equality in (2–11)
2
(wb − wa )
(I − A−1 B)) =
2
(wb − wa )2
A−1 B +
(I − A−1 B)2
4
(A−1 C +
Taking the square root of both sides and rearranging yields
√
(wb − wa )2
−1
A C = A−1 B +
(I − A−1 B)2
4
(wb − wa )
(I − A−1 B)
−
2
(2–13)
(2–14)
Therefore, the solution of (2–10) for C can be written in the following closed form
√
(wb − wa )2
C = A[ A−1 B +
(I − A−1 B)2
4
wb − wa
−
(I − A−1 B)]
(2–15)
2
It can be verified that the solution in (2–15) satisfies Eq. (2–11). Therefore, Eq. (2–8) for
incremental Stein mean estimation can be rewritten as
Mk+1 =
√
Mk [
Mk−1 Xk+1 +
(2wk+1 − 1)2
(I − Mk−1 Xk+1 )2
4
2wk+1 − 1
−
(I − Mk−1 Xk+1 )]
2
22
(2–16)
with wk+1 , Mk , Mk+1 and Xk+1 being the same as in (2–8).
2.3 Properties of Pn Equipped with the Stein Distance
In this section we briefly remark on the metric geometry of Pn equipped with the
Stein metric. Both the Stein metric dS and the GL(n)-invariant Riemannian metric dR
are GL(n)-invariant. However, their similarity does not go beyond this GL(n)-invariance.
In particular, we first show in this section that Pn equipped with the Stein metric is not
a global Non-Positive Curvature (NPC) space defined in [46]. Lack of this important
property makes it impossible to directly apply the convergence results of the incremental
mean estimators on global NPC spaces, provided in [46], to our incremental Stein mean
estimator. However, we will show that the Stein metric still shares important similarities
and features with global NPC spaces that can serve as strong piece of evidence in favor
of the algorithm’s convergence.
2.3.1 Global Non-Positive Curvature Spaces
In [46], Sturm had provided a study of probability theory on metric spaces of
non-positive curvature (so called global NPC spaces). An important requirement for
this type of spaces is that, aside from being a metric space, the distance between two
arbitrary points in the space M, denoted by dM , can be realized as the arc-length of a
length-minimizing path (geodesic) joining the two points. Non-positive curvature, in this
broader context, is formulated using several important inequalities, and the foremost
of which is the following inequality among three arbitrary points x, y , z ∈ M and the
geodesic path γ(t) joining x, y (with γ(0) = x, γ(1) = y ):
d2M (z, γ(t)) ≤ (1 − t)d2M (z, x) + td2M (z, y ) − t(1 − t)d2M (x, y ).
(2–17)
This important inequality then implies the following well-known Reshetnyak’s quadruple
comparison: for all x1 , x2 , x3 , x4 ∈ M, we have
d2M (x1 , x3 ) + d2M (x2 , x4 ) ≤ d2M (x2 , x3 ) + d2M (x1 , x4 ) + d2M (x1 , x2 ) + d2M (x3 , x4 ).
23
Figure 2-1. Schematic view of x1 , x2 , x3 , x4 in Reshetnyak’s quadruple comparison.
Reshetnyak’s quadruple comparison is a particularly useful result for deducing important
theorems for global NPC spaces (see [46] and the references therein). In particular,
for any global NPC space M and a set of samples, x1 , x2 , ... defined on M, its Fréchet
mean (or barycenter in [46]) will be a unique point on M. Besides, the incremental
mean estimator (similar to [6]) will asymptotically converge to the true Fréchet mean.
Proposition 2.1. Pn with Stein metric is not a global NPC space.
Proof. (Sketch) Proposition 2.3 in [46] states that if a metric space (M, dM ) is a global
NPC space, then it is a geodesic space. However we show in the following proposition
that (Pn , dS ) is not a geodesic space.
Proposition 2.2. Let x, y be two arbitrary points in Pn . Their midpoints, ma , ms , with
respect to the affine-invariant Riemannian metric and the Stein metric, respectively,
coincide:
ma = ms .
However, in general, we have dS (x, ms ) = dS (y , ms ) but
dS (x, ms ) ̸=
1
dS (x, y ).
2
Proof. (Sketch) The coincidence of midpoint is a consequence of [5]. The difference
between dS (x, ms ) and 12 dS (x, y ) can be easily shown with a counter-example. Let x = 1
24
and y = 4, where x, y ∈ P1 , then the coincidence of midpoint implies that ms = ma = 2.
But, it can be verified that dS (x, ms ) = dS (y , ms ) = 0.2427, while 12 dS (x, y ) = 0.2362,
hence dS (x, ms ) ̸= 21 dS (x, y ). Therefore, based on Proposition 1.2 in [46], (Pn , dS ) is not
a geodesic space.
However, the following proposition illustrates that Pn with Stein metric shares an
important similarity with global NPC spaces, although it is not one.
Proposition 2.3. Pn with Stein metric satisfies Reshetnyak’s quadruple comparison. In
other words, for all x1 , x2 , x3 , x4 ∈ Pn , the inequality in 2.3.1 is satisfied.
To prove the theorem we will need to make use of the following lemmas.
Lemma 1. For any quadruple of positive real numbers (matrices in P1 ) the Reshetnyak’s
inequality holds.
Proof. For positive real numbers, x and y , Stein distance can be rewritten as:
√
dS (x, y ) =
x +y
log √
2 xy
(2–18)
Therefore, the Reshetnyak’s inequality can be expressed by the following summation
of real log functions
x1 + x3
x2 + x4
x1 + x2
x2 + x3
x3 + x4
x4 + x1
log √
+ log √
≤ log √
+ log √
+ log √
+ log √
2 x1 x3
2 x2 x4
2 x1 x2
2 x2 x3
2 x3 x4
2 x4 x1
⇒ log
(x1 + x3 )(x2 + x4 )
(x1 + x2 )(x2 + x3 )(x3 + x4 )(x4 + x1 )
≤ log
√
4 x1 x2 x3 x4
16x1 x2 x3 x4
(x1 + x3 )(x2 + x4 )
(x1 + x2 )(x2 + x3 )(x3 + x4 )(x4 + x1 )
≤
√
4 x1 x2 x3 x4
16x1 x2 x3 x4
√
√
√
√
x1 x2
x3 x4
x2 x3
x1 x4
⇒(
+
+
+
)≤
x3 x4
x1 x2
x1 x4
x2 x3
√
√
√
√
√
√
√
√
1
x1 x3
x2 x4
x2 x3
x1 x4
x2 x4
x1 x3
x1 x2
x3 x4
(
+
+
+
)(
+
+
+
)
4
x2 x4
x1 x3
x1 x4
x2 x3
x1 x3
x2 x4
x3 x4
x1 x2
⇒
⇒ (a +
1
1
1
1
1
1
1
+ b + ) ≤ (b + + c + )(a + + c + )
a
b
4
b
c
a
c
25
(2–19)
where a =
√
x1 x2
,
x3 x4
√
b=
x1 x4
x2 x3
and c =
But, for any positive number x, x +
1
x
√
x1 x3
.
x2 x4
≥ 2. Therefore,
A=a+
1
≥2
a
B =b+
1
≥2
b
C =c+
1
≥2
c
So, the inequality 3–19 can be rewritten as
4(A + B) ≤ (C + A)(C + B)
⇒ C 2 + C (A + B) + AB − 4(A + B) ≥ 0
⇒ (A + B)(C − 4) + C 2 + AB ≥ 0
(2–20)
We already know that,
C ≥2
⇒ C − 4 ≥ −2
⇒ (C − 4)(A + B) ≥ −8
(2–21)
since A ≥ 2 and B ≥ 2 and hence A + B ≥ 4. On the other hand: C 2 ≥ 4 and also
AB ≥ 4. Summing up these two inequalities with Eq. 2–21 shows the correctness of
Eq. 2–20.
Lemma 2. For any quadruple of diagonal matrices on Pn , the Reshetnyak’s inequality is
satisfied.
Proof. The previous result can be immediately extended to the diagonal matrices
on Pn . Let X and Y be diagonal matrices, and xi and yi be their diagonal elements,
26
respectively. Then the Stein distance between X and Y can be obtained as
dS2 (X , Y ) =
n
∑
∑
xi + yi
1
d 2 (xi , yi )
) − log(xi yi ) =
2
2
i=1
n
log(
i=1
Now, let X , Y , Z , and W are diagonal matrices, with diagonal elements being xi , yi , zi
and wi , respectively. Based on lemma 1, the inequality for each i is satisfied, resulting n
inequalities for real numbers. Summing up these inequalities and using 2–22 completes
the proof.
Lemma 3. Let A and B be two SPD matrices. There is a matrix P for which P T AP = I
and P T BP = D ↓ , where I is the identity matrix and D ↓ is a diagonal matrix whose
diagonal elements are sorted in decreasing order.
Proof. (Based on the intuition from [44]) Let A = UΛU T , and define S = Λ
−1
2
U. Now
define C = S T U T BUS, hence there exists a matrix V such that C = VD ↓ V T , where D ↓
is diagonal with elements sorted in decreasing order.
The proof will be followed by setting P = USV , because:
P T AP = V T S T U T UΛU T USV = V T U T Λ
−1
2
ΛΛ
−1
2
UV = I
(2–22)
also, by construction of P,
P T BP = V T S T U T BUSV = V T CV = D ↓
(2–23)
Proof of Proposition 2.3 Let A1 , A2 , A3 and A4 be the given quadruple. Based on
Lemma 3, there exists a matrix P such that P T A1 P = I and P T A2 P = D2↓ , where I is the
identity matrix and D2↓ is a diagonal matrix in which the diagonal elements are sorted in
decreasing order. Assume that P T A3 P = X3 and P T A4 P = X4 . Therefore, based on the
congruence invariance of the Stein metric, it will be sufficient to prove the inequality for
the new quadruple (I , D2↓ , X3 , X4 ).
27
Let Xi↓ be the diagonal matrix with diagonal elements being the eigenvalues of Xi ,
sorted in decreasing order. Based on lemma 2, the Reshetnyak’s inequality holds for
quadruple (I , D2↓ , X3↓ , X4↓ ), as all these matrices are diagonal. Mathematically,
2
2
2
2
2
2
dS (I , X4↓ ) + dS (D2↓ , X3↓ ) ≤ dS (I , D2↓ ) + dS (D2↓ , X4↓ ) + dS (X4↓ , X3↓ ) + dS (X3↓ , I )
(2–24)
Now, we want to show the inequality for (I , D2↓ , X3 , X4↓ ), where X3↓ is replaced by X3 .
To this end, we make use of the congruence invariance property of the Stein metric.
There exists a matrix Q for which Q T D2↓ Q = I and Q T X3↓ Q = Y3↓ , where I is the identity
and Y3↓ is a diagonal matrix with decreasing diagonal elements. Suppose I , X3 and X4↓
are moved to Y1 , Y3 and Y4 by the congruent transform Q, respectively. Based on the
congruence invariance, the inequality holds for (Y1 , I , Y3↓ , Y4 ):
2
2
dS (Y1 , Y4 )2 + dS (I , Y3↓ ) ≤ dS (Y1 , I )2 + dS (I , Y4 )2 + dS (Y4 , Y3↓ ) + dS (Y3↓ , Y1 )
2
(2–25)
Moreover, it has been shown in [44] that for all pairs of SPD matrices, dS (A, B) ≥
dS (A↓ , B ↓ ), and in the special case, dS (I , A) = dS (I , A↓ ). Accordingly, dS (X3↓ , X4↓ ) ≤
dS (X3 , X4↓ ) and dS (I , X3↓ ) = dS (I , X3 ). Based on the congruence invariance property,
these two relations can be extended to dS (Y3↓ , Y4 ) ≤ dS (Y3 , Y4 ) and dS (Y1 , Y3↓ ) =
dS (Y1 , Y3 ). Furthermore, in the new quadruple we can obviously see that dS (I , Y3↓ ) =
dS (I , Y3 ). According to these relations we can replace Y3↓ by Y3 in Eq. 2–25, which
implies that
dS (Y1 , Y4 )2 + dS (I , Y3 )2 ≤ dS (Y1 , I )2 + dS (I , Y4 )2 + dS (Y4 , Y3 )2 + dS (Y3 , Y1 )2
(2–26)
At the end, we can apply the group action, Q −1 , to get the original quadruple, which
proves the inequality for (I , D2↓ , X3 , X4↓ ). The sequence of the above group actions is
illustrated in the Fig. 2-2. Note that the curves between each pair of points are drawn
only for demonstration of the corresponding Stein distances, and they do not represent
geodesic curves.
28
Figure 2-2. Illustration of the proof of Reshetnyak’s inequality for the quadruple
(I , D2↓ , X3 , X4↓ ), from the quadruple (I , D2↓ , X3↓ , X4↓ ).
In the last step we will prove the inequality for (I , D2↓ , X3 , X4 ), where X4↓ is replaced
by X4 . Similar to above, we apply the congruence invariance in the following manner;
there exists a matrix R for which R T X3 R = I and R T X4↓ R = Z4↓ . The matrices I , X4
and D2↓ are moved to Z1 , Z4 and Z2 , respectively under this transformation. Congruence
invariance implies that
2
2
2
dS (Z1 , Z4↓ ) + dS (Z2 , I )2 ≤ dS (Z1 , Z2 )2 + dS (Z2 , Z4↓ ) + dS (Z4↓ , I ) + dS (I , Z1 )2
(2–27)
In a similar fashion to the last part we can say that dS (Z1 , Z4↓ ) = dS (Z1 , Z4 ) and also
dS (Z2 , Z4↓ ) ≤ dS (Z2 , Z4 ). Using these relations we will end up with the following inequality
dS (Z1 , Z4 )2 + dS (Z2 , I )2 ≤ dS (Z1 , Z2 )2 + dS (Z2 , Z4 )2 + dS (Z4 , I )2 + dS (I , Z1 )2
(2–28)
Applying the group action, R −1 , asserts that
2
2
2
dS (I , X4 )2 + dS (D2↓ , X3 ) ≤ dS (I , D2↓ ) + dS (D2↓ , X4 ) + dS (X4 , X3 )2 + dS (X3 , I )2
29
(2–29)
Finally, we will use the group action P −1 to get the original quadruple
dS (A1 , A4 )2 + dS (A2 , A3 )2 ≤ dS (A1 , A2 )2 + dS (A2 , A4 )2 + dS (A4 , A3 )2 + dS (A3 , A1 )2 (2–30)
which completes the proof.□
2.3.2 Discussion
If Pn equipped with the Stein metric were a global Non-Positive Curvature (NPC)
space [46], Sturm shows that Mk+1 resulted in 2–16 converges to the unique Stein
expectation as k → ∞ [46]. Unfortunately, as shown in this section, it is not a geodesic
space, and consequently not a global NPC space. Therefore, the proof of convergence
for our case requires further efforts. However, we present empirical evidence for 100
SPD matrices randomly drawn from a log-Normal distribution to indicate that the
incremental estimates of the Stein mean converge to the batch mode Stein mean (see
Fig. 2-3).
2.4 Experiments
In this section, we present several synthetic and real data experiments. All of the
execution times reported in this section are for experiments performed on a machine
with a 2.67GHz Intel-7 CPU with 8GB RAM.
2.4.1 Performance of the Incremental Stein Center
To illustrate the performance of the proposed incremental algorithm, we generate
100 i.i.d samples form a Log-normal distribution [41] on P3 with the variance and
expectation set to 0.25 and the identity matrix respectively. Then, we input these random
samples to the incremental Stein based mean estimator (ISM) and its non-incremental
counterpart (SM). To compare the accuracy of ISM and SM we compute the Stein
distance between the ground truth and the computed estimate. Further, the computation
time for each newly acquired sample is recorded. We repeat this experiment 20 times
and plot the average error and the average computation time at each step. Fig. 2-3
depicts the accuracies of ISM and SM in the same plot. It can be seen that for the
30
Figure 2-3. Error comparison of the incremental (red) versus non-incremental (blue)
Stein mean computation for data on P3 .
given 100 samples, as desired, the accuracy of the incremental and non-incremental
algorithms are almost the same. It should be noted that ISM computes the new mean
by a simple matrix operations, e.g., summations and multiplications, which makes it very
fast for any number of samples. This means that the incremental Stein based mean
is computationally far more efficient, especially when the number of samples is very
large and the samples are input incrementally, for example as in clustering and some
segmentation algorithms.
2.4.2 Application to K-means Clustering
In this section we evaluate the performance of our proposed incremental algorithm
applied to K-means clustering. The two fundamental components of the K-means
algorithm at each step are: (i) distance computation and (ii) the mean update. Due
to the computational efficiency involved in evaluating the Stein metric, the distances
can be efficiently computed. However, due to the lack of a closed form formula for
computing the Stein mean, the cluster center update is more time consuming. To tackle
this problem we employ our incremental Stein mean estimator.
31
Figure 2-4. Time comparison of the incremental (red) versus non-incremental (blue)
Stein mean computation for data on P3 .
To this end, at the end of each K-means iteration, only the matrices that change
cluster membership in previous iteration are considered. Then, each cluster center
is updated only by applying the changes imposed by the matrices that most recently
changed cluster memberships. For instance, let C1i and C2i be the centers of the first
and second clusters, at the end of the i -th iteration. Also, let X be a matrix which has
moved from the first cluster to the second one. Therefore, we can directly update C1i by
removing X from it to get C1i+1 , and adding X to C2i in its update, to get C2i+1 . This will
significantly decrease the computation time of the K-means algorithm, especially for
huge datasets. This process is shown in Fig. 2-5.
To illustrate the efficiency resulting from using our proposed incremental Stein
mean (ISM) update, we compared its performance to the non-incremental Stein
mean (SM), as well as the following three widely used mean computation techniques:
Fréchet mean (FM), symmetric Kullback-Leibler mean (KLsM) and Log-Euclidean
(LEM) mean. Furthermore, to show the effectiveness of the Stein metric in K-means
distance computation, we included comparisons to the following recursive mean
32
Figure 2-5. Illustration of the incremental mean updates in K-means clustering.
estimators recently introduced in literature: Recursive Log-Euclidean mean (RLEM) [54],
Incremental Fréchet Expectation Estimator (IFEE) and Recursive KLs mean (RKLsM)
in [6]. We should emphasize that for each of these mean estimators we used the
corresponding distance/divergence in the K-means algorithm.
The efficiency of the proposed K-means algorithm is investigated in the following
set of experiments. We tested our algorithm in three different scenarios namely, with
increasing (i) number of samples, (ii) matrix size, and (iii) number of clusters. For each
scenario we generated samples from a mixture of Log-normal distributions, where the
expectation of each component is assumed to be the true cluster center. To measure the
error in clustering, we compute the geodesic distance between each estimated cluster
center and its true value, and take the summation of error values over all clusters.
Fig. 2-6 depicts the time comparison between the aforementioned K-means
clustering techniques. It is clearly evident that the proposed method (ISM) is significantly
faster than other competing methods, in all the aforementioned settings of the
33
experiment. There are two reasons that support the time efficiency of ISM: (i) incremental
update of the Stein mean, which is achieved via the closed form expression in Eq. 2–16,
(ii) fast distance computation, by exploiting the Stein metric, as the Stein distance is
computed using a simple matrix determinant followed by a scalar logarithm, while
the Log-Euclidean, GL-invariant Riemannian distances and the KLs divergence,
require complicated matrix operations, e.g., matrix logarithm, inverse and square
root. Consequently, it can be seen in Fig. 2-6 that for large datasets, the recursive
Log-Euclidean, Fréchet and KLs mean methods are as slow as their non-recursive
counterparts, since a substantial portion of time is consumed in the distance computation
task involved in the algorithm.
Furthermore, Fig. 2-7 depicts the error defined earlier, for each experiment. It can
be seen that, in all the cases, the accuracy of the ISM estimator is very close to the
other competing methods, and in particular to the non-incremental Stein mean (SM) and
Fréchet mean (FM). Thus, accuracy wise, the proposed ISM estimator is as good as the
best in the class but far more computationally efficient. These experiments verify that the
proposed incremental method is a computationally attractive candidate for the task of
K-means clustering in the space of SPD matrices.
2.4.3 Application to Image Retrieval
In this section, we present results of applying our incremental Stein mean estimator
to the image hashing and retrieval problem. To this end, we present a novel hashing
function which is a generalization of spherical hashing applied to SPD matrices. The
spherical hashing was introduced in [20] for binary encoding of large scale image
databases. However, it can not be applied as is (without modifications) to the space of
SPD matrices, since it has been developed for inputs in a vector space. In this section
we describe our extensions to the spherical hashing technique in order to deal with
SPD matrices (which are elements of a Riemannian manifold with negative sectional
curvature).
34
Figure 2-6. Time comparison of the K-means clustering using various methods. Figure
(a) is the result for increasing number of clusters, with 1000 samples on P2 .
In (b) the database size is increased from 400 to 2000, with 5 clusters, on
P2 . Finally, in (c) the matrix dimension is increasing with 1000 samples and 3
clusters.
35
Figure 2-7. Error comparison of the K-means clustering using techniques specified in
Fig. 2-6. (a), (b) and (c) are the results for varying number of clusters,
number of samples and matrix dimensions, respectively.
36
Given a population of SPD matrices, our hashing function is based on the distances
to a set of fixed pivot points. Let P1 , P2 , ..., Pk be the set of produced pivot points for the
given population. The hashing function is denoted by H(X ) = (h1 (X ), ..., hk (X )), with X
being the given SPD matrix, and each hi defined by


 0 if dist(Pi , X ) > ri
hi (X ) =

 1 if dist(Pi , X ) ≤ ri
(2–31)
where dist(., .) denotes any distance defined on the manifold of SPD matrices. The
value of hi (X ) illustrates whether the given matrix X is inside the geodesic ball formed
around Pi , with the radius ri . In our experiments we used the Stein distance defined in
Equation (2–5), because it is more computationally appealing for large datasets.
An appropriate choice of pivot points as well as radii is crucial to guarantee the
accuracy of the hashing. In order to locate the pivot points we have employed the
K-means clustering based on the Stein mean, which was discussed in Section 2.4.2.
Furthermore, the radius ri is picked such that for the hashing function, hi satisfies,
Pr [hi (X ) = 1] =
1
2
(2–32)
which guarantees that each geodesic ball contains half of the samples. Based on this
framework, each member of a set of (n × n) SPD matrices is mapped to a binary code
with the length k. To measure similarity/dissimilarity between binary codes the spherical
Hamming distance described in [20] is used.
In order to evaluate the performance of the proposed incremental Stein mean
algorithm in this image hashing framework, we first located the pivot points by exploiting
four of the K-means clustering techniques discussed in Section 2.4.2: ISM, SM, IFEE
and RLEM. Then, the retrieval precision for each method is measured and compared.
Experiments were performed on the COREL image database [29], which contains
10K images categorized into 80 classes. For each image a set of feature vectors were
37
computed of the form
f = [Ir , Ig , Ib , IL , IA , IB , Ix , Iy , Ixx , Iyy , |G0,0 (x, y )|, ..., |G2,1 (x, y )|]
(2–33)
where the first three components represent the RGB color channels, the second three
encode the Lab color dimensions, and the next four specify the first and second order
gradients at each pixel. Further, as in [17], the Gu,v (x, y ) represent the response of a 2D
Gabor wavelet, centered at (x, y ) with scale v and orientation u. Finally, for the set of N
feature vectors extracted from each image, f1 , f2 , ..., fN , a covariance matrix was created
using
N
1∑
Cov =
(fi − f¯)(fi − f¯)T
N 1
(2–34)
where f¯ is the mean vector. Therefore, from this dataset ten thousand 16×16 covariance
matrices were extracted.
To compare the time efficiency, we record the total time to compute the pivots,
and also to find the radii, for each aforementioned technique. Furthermore, a set of
1000 random queries were picked from the dataset, and for each query its 10 nearest
neighbors were retrieved based on the spherical Hamming distance. The retrieval
precision for each query was measured by the number of correct matches to the total
number of retrieved images, namely 10. Total precision is then computed by averaging
these accuracies.
Fig. 2-8 shows the time taken by each method. As expected, it can be observed
that the incremental Stein mean estimator significantly outperforms other methods,
especially for longer binary codes. The incremental framework provides an efficient way
to update the mean covariance matrix. Further, IFEE which is based on the GL-invariant
Riemannian metric is much more computationally expensive than our incremental
Stein method. Fig. 2-9 depicts the accuracy for each technique. It can be seen that
the incremental Stein mean estimator provides almost the same accuracy as the
non-incremental Stein as well as the IFEE . Therefore, the accuracy and computational
38
Figure 2-8. Time consumption in initializing hashing functions, for incremental Stein
mean (ISM), non-incremental Stein mean (SM), recursive LogEuclidean
mean (RLEM) and Incremental Fréchet expectation estimator (IFEE ), over
increasing binary code lengths.
efficiency of our proposed method makes it an appealing choice for image indexing and
retrieval on huge datasets. Fig. 2-10 shows the outputs of the proposed system for four
sample queries. Note that all of the retrieved images shown in Fig. 2-10 belong to the
same class in the provided ground truth.
2.4.4 Application to Shape Retrieval
In this section, the image hashing technique presented in Section 2.4.3 is evaluated
in a shape retrieval experiment, using the MPEG-7 database [27], which consists of 70
different objects with 20 shapes per object, for a total of 1400 shapes. To extract the
covariance features from each shape, we first partition the image into four equal areas
and compute the 2 × 2 covariance matrices constructed from (x, y ) coordinates of the
edge points, in each region. Finally, we combined these matrices into a single block
diagonal matrix, resulting in an 8 × 8 covariance descriptor.
39
Figure 2-9. Comparison of retrieval accuracy, for techniques specified in Fig. 2-8
Table 2-1. Average shape retrieval precision (%) for the MPEG7 database, for different
Binary Code (BC) lengths.
BC Length
64
128
192
256
ISM
SM
IFEE RLEM
60.67 62.10 61.46 61.15
63.59 64.65 64.69 63.23
69.69 69.63 70.10 68.19
73.13 73.13 73.84 70.14
We used the same methods as in Section 2.4.3 to compare the shape retrieval
speed and precision. Table 2-1 contains the retrieval precision comparison, and it can be
seen that the ISM provides roughly the same retrieval accuracy as IFEE, while table 2-2
shows that ISM is significantly faster than all the competing methods.
40
Figure 2-10. Example results of proposed retrieval system, based on the incremental
Stein mean, with 640-bits binary codes. The leftmost column in each row
represents the query image, and the rest of the columns show the 5 most
similar images retrieved. The retrieved images are sorted in increasing
order with respect to the Hamming distance to the query, where the
Hamming distance is specified below each image.
Table 2-2. Time (in seconds) comparison for shape retrieval.
BC Length
64
128
192
256
ISM
48.76
53.44
89.04
105.33
SM
IFEE
RLEM
104.61 381.14 397.66
185.80 366.60 415.62
189.89 380.41 397.66
196.61 368.63 398.23
41
CHAPTER 3
INCREMENTAL FRÉCHET MEAN ESTIMATOR ON SPHERE
3.1 Background
In many applications in computer vision, machine learning and medical imaging,
the data lies on sphere. To mention a few, the directional data which often appear in
computer vision are points on the unit sphere S2 [33]. Furthermore, any 3 × 3 rotation
matrices can be parameterized by unit quaternions which can be represented by points
on the 3-dimensional unit sphere S3 [18]. Also, the square root density functions are
points on a hyper-sphere embedded in an infinite dimensional Hilbert space [45].
In most of the aforementioned applications, mean computation is a fundamental
component. For instance, in the interpolation and smoothing of Orientation Distribution
Functions (ODFs) [8], estimation of the mean rotation from several corresponding pair
of points in multi-view geometry [18], and statistical analysis of directional data [33].
The Riemannian geometry of the sphere have been well-studied in the past decades
[11, 38]. Given, a set of n points, X1 , X2 , ..., Xn , on the sphere, the Riemannian center of
mass, M, is defined as the (global) minimizer of the sum of squared geodesic distances,
M = argminY
n
∑
d 2 (Xi , Y )
(3–1)
i=1
where d(.) is the intrinsic distance defined on sphere. We will henceforth refer to this
center of mass by Fréchet mean, as opposed to the Karcher mean which is frequently
used in literature, because Karcher mean often refers to a local solution, while Fréchet
mean is the global minimizer of this cost function. For detailed discussions we refer
the reader to [1, 24]. It is known that there is no closed form solution for this objective
The material in this chapter with minor changes is going to be submitted to the
Information Processing in Medical Imaging (IPMI), Springer, 2015
42
function, the so called Fréchet function, on the sphere, and iterative schemes like
gradient descent must be employed. Therefore, the task of Fréchet mean computation
can be computationally expensive, specially for very large datasets.
In this chapter, we propose an incremental method to estimate the Fréchet
mean of a set of samples on sphere. The incremental way to update the mean is
computationally efficient, because, given the mean estimated for n samples, Mn ,
and the new given sample Xn+1 , one can update the mean to Mn+1 , in one shot and
no iterative optimization algorithm needs to be employed to compute the new mean
from scratch. Therefore, the incremental technique speeds up the compute time,
significantly. Moreover, an incremental method only needs to keep track of the most
recently computed Fréchet mean, and this provides considerable efficiency in space
consumption. Although this significant time/space efficiency comes with the cost of
lower accuracy, the major part of this chapter is devoted to showing that in the limit (over
the number of samples), our incremental technique converges to the true Fréchet mean,
for symmetric distributions.
In [6] authors proposed an incremental Fréchet mean estimator for the manifold of
(n × n) SPD matrices, denoted by P(n), and provided the convergence analysis of the
incremental estimator to the true Fréchet mean. However, it is known that the space of
SPD matrices is a Riemannian manifold with non-positive sectional curvature [34], while
sphere is an example of positively curved Riemannian manifolds [38]. This does indeed
make a significant difference to proving the convergence. Specially, the following two
items are the most important obstacles in extending the convergence analysis in [6] to a
similar estimator on sphere:
First, the existence and uniqueness of minimizer of Fréchet function for a set
of samples on a complete Riemannian manifold with positive sectional curvature, is
not guaranteed [1]. This is a consequence of the fact that the Fréchet function is not
necessarily convex on the entire manifold. Several authors tried to restrict the geodesic
43
ball containing the data points to guarantee the convexity of the Fréchet function
[1, 25].It was shown in [25] that if the sample points belong to a geodesic ball with radius
π
2
on a unit sphere Sk , the (L2 ) minimizer of the Fréchet function will exist and will be
unique. Therefore, in the rest of the chapter we assume that the samples belong only to
the (northern) hemisphere of Sk .
Second, the well-known parallelogram law in Euclidean space has its counterpart,
the so called semi-parallelogram law, in any complete negatively curved Riemannian
manifold, M, [46]; for any pair of points X , Y ∈ M, there exists a point M ∈ M, such that
∀Z ∈ M, d 2 (Z , M) ≤
1 2
1
1
d (X , Z ) + d 2 (Y , Z ) − d 2 (X , Y )
2
2
4
(3–2)
Note that the equality is satisfied only in a Euclidean space. This inequality is of
crucial importance in the convergence analysis of the incremental Fréchet mean on
non-positively curved spaces [6, 21, 46]. However, for a positively curved space, e.g.,
sphere, the opposite inequality holds, hence, further efforts must be made to prove the
convergence of incremental Fréchet mean estimator on sphere.
To the best of our knowledge, there is no convergence analysis proposed in
literature for the incremental Fréchet mean estimator, on any positively curved
Riemannian manifold. In this chapter, we show that the incremental estimator converges
to the true Fréchet mean in the limit over the number of samples. We employ the
well-known concept of Gnomonic Projection in computer vision [22] to project the
sample points to a (linear) projection space, in order to simplify the convergence proof.
The rest of this chapter is organized as follows. In section 3.2 we briefly introduce
the Riemannian geometry of sphere as well as gnomonic projection, and provide the
notations that are used in the rest of the chapter. The main convergence result will be
provided in section 3.3, along with the necessary theorems and lemmas. Finally, section
3.4 contains the experiments illustrating the efficiency and accuracy of our incremental
method.
44
3.2 Preliminaries
3.2.1 Riemannian Geometry of Sphere
Here, we provide a brief introduction to the Riemannian geometry of sphere. For
more details, reader is referred to [8, 45]. Let Sk denote the k-dimensional unit sphere,
embedded in Rk+1 , i.e., Sk = {X ∈ Rk+1 |||X || = 1}, where ||.|| is the L2 norm of
a vector. It is evident that sphere is not closed under vector operations, e.g., given
X , Y ∈ Sk , X + Y does not necessarily belong to Sk , hence it is not a vector space, but
a Riemannian metric space with positive constant sectional curvature [38]. Let TX Sk
denote the tangent space of Sk , at point X . For any two tangent vectors U, V ∈ TX Sk ,
the inner product between U = [u1 , u2 , ..., uk+1 ] and V = [v1 , v2 , ..., vk+1 ] is defined by:
< U, V >=
k+1
∑
ui vi
(3–3)
i=1
The curve length on sphere can be measured and the geodesic distance between
any given points X , Y ∈ Sk can be computed by
d(X , Y ) = cos−1 (< X , Y >)
(3–4)
The exponential map of a given vector V ∈ TX Sk is defined by
ExpX (V ) = X cos(||V ||) +
V
sin(||V ||)
||V ||
(3–5)
and the log map of Y ∈ Sk at any point X ∈ Sk is obtained by
LogX (Y ) =
Y − X cos(ϕ)
ϕ
||Y − X cos(ϕ)||
(3–6)
where ϕ =< X , Y >. Using the exponential and log map, the geodesic curve
between any pair of points X , Y ∈ Sk is given by
γ(t) = X #t Y = ExpX (tLogX (Y ))
45
(3–7)
with γ(0) = X and γ(1) = Y . The geodesic curve is a part of the great circle, i.e.,
circle with unit radius, that connects X and Y .
Using the geodesic distance provided above, one can define the Fréchet mean
of a set of points on sphere as the minimizer of sum of squared geodesic distances.
Formally speaking, let X1 , X2 , ..., Xn ∈ Sk be n given points. Then, the Fréchet mean is
defined by:
µ∗ = argminµ∈Sk
n
∑
d 2 (Xi , µ)
(3–8)
i=1
Let B(C , ρ), be the geodesic ball centered at C with radius ρ, i.e., B(C , ρ) = {Q ∈
Sk |d(C , Q) < ρ}. Authors in [1] showed that for any C ∈ Sk and for data samples in
B(C , π2 ), the minimizer of the Fréchet function exists and is unique (and also belongs
to B(C , π2 )). Therefore, in the rest of the chapter, we assume that this condition is
satisfied for any set of given points, Xi . For simplicity, we are particularly interested in
the samples belonging to the northern hemisphere, in which case C is the north pole,
e.g., C = [0, 0, 1] ∈ S2 , and ρ = π2 . Note that based on the strict inequality in definition of
B(C , ρ); d(C , Q) < π2 , hence the equator is excluded from the geodesic ball.
3.2.2 Gnomonic Projection
On a unit k-dimensional sphere Sk , the Gnomonic Projection of any point X ∈ Sk ,
is defined as the intersection of the tangent plane at the north pole and the line which
passes through the origin, i.e., O = [0, 0, ..., 0], and X [22]. For instance, in Fig. 3-1, xn+1
is the projection of Xn+1 ∈ Sk .
The gnomonic projection is not well-defined for the points on the equator, because
they are projected to infinity in the tangent plane, but this will not affect our statistical
analysis, since we assume that the data points belong to the hemisphere, with the
equater being excluded.
Using this gnomonic projection, the geodesic curve between any pair of points,
X and Y , on the hemisphere is projected to a straight line connecting x and y in the
46
Figure 3-1. Gnomonic Projection
projection space [18], where x and y are the projections of X and Y , respectively. We
employed the gnomonic projection to simplify the statistical analysis of points on sphere.
3.3 Incremental Fréchet Mean Estimator on Sphere
With the background materials established so far, we are now ready to present
our incremental Fréchet Mean Estimator (iFME) on sphere. The proposed method is
motivated by the idea in [6] which is similar to the Euclidean case; given the old mean,
Mn−1 , and the new sample, Xn , define the new mean, Mn , as the weighted mean of Mn−1
and Xn with the weights being
n−1
n
and n1 , respectively. From a geometric viewpoint, this
corresponds to the choice of the point on geodesic curve between Mn−1 and Xn , with the
parameter t = n1 .
Formally speaking, let X1 , X2 , ..., XN be a set of N samples on sphere Sk , which all
belong to the geodesic ball B(C , π2 ), and C is the north pole. Also, let Mn be the iFME
47
estimate for nth given sample, Xn , which is defined by:
M1 = X 1
(3–9)
Mn = Mn−1 # 1 Xn
(3–10)
n
where A#t B is the geodesic curve parameterized by t, from A to B (∈ Sk ), and
1
n
is
our weighting scheme which is henceforth called the Euclidean weight. In the rest of the
chapter, we will show that if the number of given samples, N, tends to infinity, the iFME
estimates will converge to the Fréchet mean of the distribution from which the samples
are drawn..
Our strategy is based on the idea of projecting the spherical samples, Xi , to
the tangent plane and perform the convergence analysis on this linear space on the
projected samples, i.e., xi , instead. We take advantage of the fact that the geodesic
curve between any pair of points on hemisphere, is projected to a straight line in the
tangent space at the north pole, via the gnomonic projection [18]. According to the
law of large numbers in Euclidean space [3], the arithmetic mean of a set of samples
converges to the mean of the distribution from which the samples are drawn, as number
of samples tends to infinity.
Despite the simplifications followed in the statistical analysis of iFME estimates
on sphere using gnomonic projection, there are two important obstacles that must be
considered. Suppose the true Fréchet mean of the input samples, Xi , is the north pole.
Then, it can be shown by counter examples that:
(1)
The use of Euclidean weights, n1 , to update the iFME estimates on Sk , does not
necessarily correspond to the same weighting scheme between the old mean and
the new sample, in the projection space.
(2)
The mean of the projected samples, xi ’s, does not necessarily coincide with the
north pole.
The first fact above can be illustrated using two sample points on a unit circle
(S1 ), X1 = π/6 and X2 = π/3, whose midpoint is M = π/4. Then, the midpoint
48
Figure 3-2. Illustration of the counterexample showing that the use of Euclidean weights
to update iFME in Sk , does not necessarily correspond to the same weights
in the tangent space.
of the gnomonic projections of X1 and X2 , which are denoted by x1 and x2 , is m̂ =
tan(π/3)+tan(π/6)
2
= 1.1547 ̸= tan(π/4) = m (see Fig. 3-2).
To observe the second fact, consider three points, X1 , X2 , X3 , in S1 , respectively
equal to π/4, π/12 and −π/3 (Fig. 3-3). Although the Fréchet mean of these points
is located at the north pole (c), the arithmetic mean of the gnomonic projections, ĉ,
is not. Nevertheless, in Lemma 1, we will show that for the sample points which are
symmetrically distributed around the north pole, the mean of the projected samples
coincides the north pole.
Lemma 1. For a set of samples, Xi ∈ Sk which are symmetrically distributed
around the north pole, C , the arithmetic mean of the projected points, xi , in the tangent
plane at the north pole, is the north pole. By symmetry we mean that ∀Xi ∈ X =
{X1 , X2 , ..., XN }, ∃Xj ∈ X, such that, Xi # 1 Xj = C .
2
Proof Sketch. By the symmetry assumption of the input, one can divide the
samples in X, into
N
2
disjoint pairs of points on Sk , i.e., Pm = {Xm,1 , Xm,2 }, 1 ≤ m ≤
N
,
2
N
2
such that ∀m, Xm,1 # 1 Xm,2 = C , and ∪m=1
Pm = X. Then, for the gnomonic projection of
2
each pair of points, the midpoint coincide the north pole, using the fact that ∀ϕ, tan(ϕ) +
tan(−ϕ) = 0. Therefore, the mean of projected points in the tangent plane will be
49
Figure 3-3. Demonstration of the counterexample to prove that the Fréchet mean of
samples on Sk , does not necessarily coincide with the arithmetic mean of
projected points in the tangent space.
reduced to the mean of
N
2
sample points, all located at the north pole. Hence, the result
holds. ■
In the rest of this section, we assume that the population of the samples are
symmetrically distributed around the Fréchet mean. Besides, without loss of generality,
we assume that the true Fréchet mean of N given samples is located at the north pole.
Since the gnomonic projection space is centered at the north pole, this assumption
makes significant simplifications in our convergence analysis. However a similar
convergence proof can be worked out for any arbitrary Fréchet mean, with the projection
space established at the mean location.
In what follows, we prove that the use of Euclidean weights, i.e., wn =
1
,
n
to
update the incremental Fréchet mean on sphere, corresponds to a set of weights in the
projection space, denoted henceforth by tn , for which the convergence of incremental
mean to the true Fréchet mean, can be shown.
50
3.3.1 Angle Bisector Theorem
The relation between the weights on sphere, and the corresponding weights on the
projection space, can be obtained in closed form, depending upon the point where the
projection space has been anchored.
In Fig. 3-1, Mn and Mn+1 denote the iFME estimates for n and n + 1 given samples,
respectively, and Xn+1 denotes the (n + 1)st sample. Further, mn , mn+1 , xn+1 are the
corresponding points in the projection space. Based on the Angle Bisector Theorem [2]:
tn =
||mn − mn+1 ||
||O − mn ||
sin(d(Mn , Mn+1 ))
=
×
||xn+1 − mn+1 ||
||O − xn+1 || sin(d(Mn+1 , Xn+1 ))
(3–11)
where d(.) is the geodesic distance on hemisphere. Note that in the standard law
of large number, tn = n1 . In the next sections, we assume that the input samples, Xi , are
within the geodesic ball, B(C , ϕ), where 0 < ϕ < π/2. Then, we bound the values that tn
can possibly take, with respect to the radius ϕ.
3.3.2 Lower Bound for tn
To find the lower bound for tn , we find the lower bounds for each fraction in right
hand side of Eq. 3–11. The first term reaches its minimum value, if Mn is located at the
north pole, and Xn+1 is located on the boundary of the geodesic ball, B(C , ϕ). In this
case, ||O − mn || = 1 and ||O − xn+1 || =
1
.
cos(ϕ)
This implies that:
||O − mn ||
≥ cos(ϕ)
||O − xn+1 ||
(3–12)
Next, note that based on the definition of iFME, this second fraction in 3–11 can be
rewritten as:
sin(d(Mn , Mn+1 ))
sin(d(Mn , Mn+1 ))
1
=
=
sin(d(Mn+1 , Xn+1 ))
sin(n × d(Mn , Mn+1 ))
Un−1 (cos(d(Mn , Mn+1 )))
51
(3–13)
where Un−1 (x) is the Chebyshev polynomial of the second kind [42]. For any
x ∈ [−1, 1], the maximum of Un−1 (x) is reached when x = 1, for which Un−1 (1) = n.
Therefore, Un−1 (x) ≤ n and
1
Un−1 (x)
≥ n1 . This implies that:
sin(d(Mn , Mn+1 ))
1
1
=
≥
sin(n × d(Mn+1 , Mn+1 ))
Un−1 (cos(d(Mn , Mn+1 )))
n
(3–14)
From inequalities 3–12 and 3–14,
tn ≥
cos(ϕ)
n
(3–15)
Note that when ϕ tends to zero, cos(ϕ) converges to one, and the above ratio tends
to n1 , which is the case in Euclidean space. On the other hand, if ϕ tends to π2 , then
cos(ϕ) tends to zero, and this ratio becomes very small.
3.3.3 Upper Bound for tn
First, the upper bound for the first term in 3–11 is reached when Mn is on the edge
of geodesic ball, and Xn+1 is given at the north pole. Therefore,
||O − mn ||
1
≤
||O − xn+1 ||
cos(ϕ)
(3–16)
Finding the upper bound for the sin term however is quite involved. Note that the
maximum of the angle between OMn and OXn+1 , denoted by α, is reached when Mn and
Xn+1 are both on the edge of the geodesic ball, i.e., α ≤ 2ϕ. Therefore, ϕ ∈ [0, π2 ) implies
that α ∈ [0, π).
Further, it has been shown in the Appendix that the following inequality holds for any
α ∈ (0, π).
nα
sin( n+1
)
α
≥ n cos2 ( ) = n cos2 (ϕ)
α
sin( n+1 )
2
From 3–16 and 3–17,
52
(3–17)
tn ≤
1
cos(ϕ)3 n
(3–18)
In summary, we showed that once iFME algorithm is employed using Euclidean
weights on the sphere, the sequence of the corresponding weights, tn , in the projection
space satisfy the following inequality. In the next section, we prove the main theorem of
convergence, using these bounds.
cos(ϕ)
1
≤ tn ≤
n
cos(ϕ)3 n
(3–19)
3.3.4 Convergence of iFME
So far, we have shown analytical bounds for the sequence of weights, tn , on
projection space, corresponding to Euclidean weights on sphere (Eq. 3–19). We
now prove the convergence of iFME estimates to the true Fréchet mean of samples,
when the sample size tends to infinity. We first show that the incremental mean in the
projection space using tn , is unbiased.
Theorem 1. Let x1 , x2 , ... be i.i.d. samples from a distribution in Rk . Also, let mn be
the incremental estimate corresponding to nth given sample, xn , which is defined by: (i)
m1 = x1 , (ii) mn = tn xn + (1 − tn )mn−1 . Then, mn is an unbiased estimator of E [x].
Proof. For n = 2; m2 = t2 x2 + (1 − t2 )x1 , hence E [m2 ] = t2 E [x] + (1 − t2 )E [x] = E [x].
Now, by induction hypothesis E [mn−1 ] = E [x]. Then, E [mn ] = tn E [x] + (1 − tn )E [x] =
E [x], hence the result. ■
Theorem 2. Let var [mn ] denotes the variance of the nth incremental estimate
(defined above), with
var [mn ]
var [x]
cos(ϕ)
n
≤ tn ≤
1
, ∀ϕ
cos(ϕ)3 n
∈ [0, π/2). Then, ∃p ∈ (0, 1], such that
≤ (np cos6 (ϕ))−1 .
First note that var [mn ] = tn2 var [x] + (1 − tn )2 var [mn−1 ]. Since, 0 ≤ tn ≤ 1, one can see
that var [mn ] ≤ var [x] for all n. Besides, for each n, the maximum of the right hand side is
achieved, when tn attains either its minimum or its maximum value. Therefore, we need
53
to prove the theorem for the following two values of tn , (i) tn =
cos(ϕ)
n
and (ii) tn =
1
.
n cos3 (ϕ)
These two cases will be discussed in Lemma 2 and Lemma 3, respectively.
Lemma 2. With the same assumptions as in Theorem 2, and tn =
∀ϕ ∈ [0, π/2), the following inequality is satisfied:
var [mn ]
var [x]
1
,
n cos3 (ϕ)
∀n and
≤ (n cos6 (ϕ))−1 .
Proof. For n = 1, var [m1 ] = var [x] which yields the result, since cos(ϕ) ≤ 1. Now,
assume by induction that
var [mn−1 ]
var [x]
≤ (n − 1) cos6 (ϕ))−1 . Then,
var [mn−1 ]
1
var [mn ]
= tn2 + (1 − tn )2
≤ tn2 + (1 − tn )2
var [x]
var [x]
(n − 1) cos6 (ϕ)
1
1
1
≤
+ (1 −
)2 ×
6
2
3
cos (ϕ)n
cos (ϕ)n
(n − 1) cos6 (ϕ)
1
1
1
≤
+ (1 − )2 ×
6
2
cos (ϕ)n
n
(n − 1) cos6 (ϕ)
1
n−1
1
=
+ 2
=
6
2
6
cos (ϕ)n
n cos (ϕ)
n cos6 (ϕ)
(3–20)
■
Lemma 3. With the same assumptions as in Theorem 1, and tn =
∀ϕ ∈ [0, π/2), the following inequality is satisfied:
var [mn ]
var [x]
cos(ϕ)
,
n
∀n and
≤ n−p for some 0 < p ≤ 1..
Proof. For n = 1, var [mn ] = var [x] which yields the result, since cos(ϕ) ≤ 1. Now,
assume by induction that
var [mn−1 ]
var [x]
≤ (n − 1)−p . Then,
var [mn ]
var [mn−1 ]
1
= tn2 + (1 − tn )2
≤ tn2 + (1 − tn )2
var [x]
var [x]
(n − 1)p
cos2 (ϕ) (n − cos(ϕ))2
1
≤
+
×
2
2
n
n
(n − 1)p
(n − 1)p cos2 (ϕ) + cos2 (ϕ) − 2n cos(ϕ) + n2
=
n2 (n − 1)p
(3–21)
Now, it suffices to show that the numerator of the above expression is not greater
than n2−p (n − 1)p . In other words:
(n − 1)p cos2 (ϕ) + cos2 (ϕ) − 2n cos(ϕ) + n2 − n2−p (n − 1)p ≤ 0
54
(3–22)
The above quadratic function with respect to cos(ϕ) is less than zero, when
n(
1 − (n − 1)p/2
√
( n−1
)p +
n
1
np
−1
1 + (n − 1)p
) ≤ cos(ϕ) ≤ n(
1 + (n − 1)p/2
√
( n−1
)p +
n
1
np
−1
1 + (n − 1)p
)
(3–23)
The inequality in right is satisfied for all cos values. Besides, it is easy to see that
the function in the left hand side is increasing w.r.t. n, hence attains its minimum over all
n > 1, when n = 2. This implies that:
√
21−p − 1 ≤ cos(ϕ)
√
→ ϕ ≤ cos−1 (1 − 21−p − 1)
1−
(3–24)
→ 0 < p ≤ 1 − log2 [(1 − cos(ϕ))2 + 1]
Note that p > 0, for all ϕ < π/2.
■
Proof of Theorem 2. With the above two results, it is easy to see that ∀ϕ ∈ [0, π/2),
there exists a p satisfying 0 < p ≤ 1, such that
-
If tn =
cos(ϕ)
,
n
-
If tn =
1
,
n cos3 (ϕ)
then
var [mn ]
var [x]
then
≤
var [mn ]
var [x]
1
np
≤
≤
1
,
np cos6 (ϕ)
1
n cos6 (ϕ)
≤
because cos(ϕ) ≤ 1.
1
,
np cos6 (ϕ)
because p ≤ 1.
These two pieces together complete the proof of convergence.
■
The inequality in Theorem 2 implies that when n → ∞, for any ϕ ∈ [0, π/2) the
variance of iFME estimates in the projection space tends to zero. Besides, when ϕ
approaches π/2, the corresponding power of n, as well as cos(ϕ), become very small,
hence the ratio of convergence gets slower.
55
3.4 Experiments
3.4.1 Synthetic Experiments
We now evaluate the effectiveness of iFME algorithm, compared to the non-incremental
Fréchet Mean (FM) of a set of samples on sphere, using synthetically generated data.
To this end, a set of samples, Xi ∈ S2 , are generated on the boundary of the geodesic
ball, B(C , ϕ), where ϕ < π/2, and C is the north pole.
Note that the value of ϕ controls the variance of the input samples. Further, the
variance of any given set of samples on the boundary of B(C , ϕ) can be computed in
closed form and is equal to Var [X ] = ϕ2 , since ∀i , d(Xi , C ) = ϕ.
We tried 4 different values of ϕ, i.e., ϕ ∈ {0.70, 1, 1.21, 1.40}. For each value of ϕ, a
set of 20 points are randomly picked on the boundary of B(C , ϕ), and fed into both iFME
and FM algorithms. Because of the randomness in generating the samples, we repeated
this experiment 100 times for each ϕ.
Let iFMn,i and FMn,i respectively denote the iFME and FM estimates of the mean,
for n given samples, in i th trial, where 1 ≤ i ≤ 100 and 1 ≤ n ≤ 20. Therefore, for each
number of samples, we obtain a population of iFME and FM estimates, from different
trials. Accordingly, for both methods, we are able to compute the ratio of the estimator
variance to the data variance, i.e., for any 1 ≤ n ≤ 20,
1
1 ∑ 2
d (iFMn,i , C ))
iRn = (
)(
Var [X ] 100 i=1
100
1
1 ∑ 2
Rn = (
)(
d (FMn,i , C ))
Var [X ] 100 i=1
100
(3–25)
where iRn and Rn are the ratio of variances for iFME and FM, respectively, and
Var [X ] = ϕ2 (see above).
Note that if iRn tends to zero for large values of n, then variance of iFME tends to
zero, hence iFME estimates converge to the true Fréchet mean. We want to emphasize
56
Figure 3-4. The comparison of the ratio of variances (defined in Eq. 3–25) between
iFME and FM, for different values of ϕ.
that in a Euclidean space, Rn = iRn = n1 , for any population of sample points. Besides,
in [6, 21] it was shown that for non-positively curved spaces, e.g., P(n), the following
inequality holds for any n, iRn ≤ n1 .
Fig. 3-4 illustrates the ratios defined in Eq. 3–25 for iFME and FM, over different
values of ϕ. It is evident from the plots that the iFME’s ratio is close to the non-incremental
version, i.e., FM, specially for smaller ϕ’s. In the right-most column, ϕ = 1.4 which is
relatively close to π/2 and the input variance is very large. It can be seen that even in
this case, iFME is still competitive to FM, with respect to the accuracy.
Fig. 3-5 compares the time consumptions of iFME and FM, in the above experiments.
We need to emphasize that the FM computes the mean iteratively, and its speed
depends upon the initial value. Therefore, in order to make a fair comparison, for each
new sample Xn , we used FMn−1 as the initial value of the gradient descent method, to
compute the mean over the augmented dataset. From the figure, one can see that iFME
is significantly faster than FM, specially for large number of samples. More importantly,
the time consumed by iFME for all values of ϕ, remains roughly the same, while FM
gets considerably slower when the sample variance increases. This is not surprising,
because our incremental method updates the mean in one shot, while FM re-computes
the mean from scratch. It also worths mentioning that for n = 2, the Fréchet Mean can
be computed in closed form, and no iterative scheme is needed. This justifies the jumps
in the time plots of FM in Fig. 3-5.
57
Figure 3-5. The time comparison between iFME and FM, for different values of ϕ.
3.4.2 Application to Incremental Shape-Preserving Fréchet Mean of SPD Matrices
In this section, we illustrate the effectiveness and accuracy of iFME on sphere, in
the shape preserving Fréchet mean computation of a group of 3 × 3 SPD matrices. As
described earlier, the space of n × n SPD matrices , denoted by P(n), is not a vector
space, but a Riemannian manifold with negative sectional curvature [47].
The Fréchet mean is defined as the minimizer of the sum of squared geodesic
distances on P(n) [34]. Authors in [6] proposed an incremental method to estimate
the Fréchet mean on P(n), and provided the convergence results, in the limit over
the number of samples. However, it is known that the Fréchet mean on P(n) does
not necessarily preserve the diffusion anisotropy which depends on the shape of the
tensor. For a more detailed discussion, we refer the reader to Fig. 1 in [50]. In many
applications including interpolation of diffusion MR data [4], it is more appealing to
compute a shape preserving mean, over the given population.
The idea of separating shape and orientation in the diffusion data was motivated
by the authors in [35] and later in [4]. More recently, Wang et al [50], applied this idea to
3 × 3 diffusion tensors and presented a Kalman filter on this new product manifold.
The eigen-decomposition of a 3 × 3 SPD matrix, D, is D = UΛU T , where U belongs
to the space of 3 × 3 special orthogonal matrices, denoted by SO(3), and Λ is a diagonal
matrix, with positive elements. The matrix Λ controls the shape of the tensor, and U
58
models the orientation. Following the idea in [4], we break down the mean computation
of SPD matrices, into the separated mean computation of orientations and shapes.
We now present a novel incremental shape-preserving mean for a group of
3 × 3 SPD matrices. First, the mean of the positive diagonal elements of the shape
components can be computed incrementally, as the space of such matrices is
isomorphic to R+ 3 . Besides, the elements in SO(3) can be parameterized by unit
quaternions which belong to the northern hemisphere in a 3-dimensional unit sphere, S3
[18], hence our iFME technique is applicable to these elements.
Formally speaking, let X1 , X2 , ... be a population of matrices in P(3). Also, assume
that U ∗ n−1 and Λ∗ n−1 , respectively, denote the orientation and shape components of the
incremental mean of n − 1 given samples. Then,
U ∗ n = U ∗ n−1 # 1 Un
(3–26)
n
where Un is the orientation part of the sample Xn . Further, the mean of the shape
part, Λ∗ n , is updated using geometric mean of the diagonal elements.
We evaluated the accuracy of this novel incremental estimator in a synthetic data
experiment. A set of 150 SPD matrices on P(3) are randomly generated, in the following
manner; the shape component of each tensor is assigned 1 + r , 0.25 + r and 0.25 + r
to its diagonal element, where r ∈ [0, 0.1] is picked randomly. Moreover, the orientation
part was sampled from a log-Normal distribution on S3 , centered at [1, 0, 0, 0] which
corresponds to the identity rotation matrix, with the variance set to 0.2.
We then input each sample SPD matrix to both iFME on P(3), as well as proposed
shape-preserving iFME on the manifold of shapes and orientations, i.e., SO(3) × R+ 3 .
For each increment, the mean of both methods are computed and are displayed in Fig.
3-6, along with the ground-truth mean. Furthermore, to compare the accuracy of these
two methods, we measured the Fractional Anisotropy (FA) of the output tensor, at each
59
Figure 3-6. Visual comparison of the mean tensor obtained from shape preserving iFME
on the product manifold (top row), and iFME applied on P(3) (bottom row).
The rightmost column shows the ground truth.
increment. The FA value for a SPD matrix is a scalar measuring the anisotropy of a
tensor, and is defined by
√ √
1 (λ1 − λ2 )2 + (λ2 − λ3 )2 ) + (λ1 − λ3 )2
√
FA =
2
λ21 + λ22 + λ33
(3–27)
Since the sample matrices were generated with very similar shapes, it is expected
that the FA value of the mean sample does not drastically change. Fig. 3-7 illustrates
the FA values computed from the iFME on P(3) as well as the iFME on the product
manifold. Although both of the incremental techniques are initialized equally, it is evident
that the FA values of iFME on P(3) rapidly drops after only 15 increments. In contrast,
the shape preserving version of iFME remains close to the ground-truth, for any number
of given samples. Fig. 3-6 demonstrates the significant differences between these two
estimates, visually.
Appendix
Lemma1 : For any angle α ∈ (0, π), the following inequality holds:
nα
sin( n+1
)
α
≥ ncos 2 ( )
α
sin( n+1 )
2
1
This lemma has been proven by Mr. Rudrasis Chakraborty.
60
(3–28)
Figure 3-7. Comparison of FA values between iFME on P(3), and iFME on the product
manifold. The ground-truth is the incremental geometric mean of the
samples’ FA values, at each increment.
Proof: Let
f = sin(nθ) − ncos 2 (
n+1
θ) sin(θ),
2
fθ = n cos(nθ) + 2n cos(
θ ∈ (0, α/(n + 1)), α ∈ (0, π), n ≥ 1
(3–29)
n+1
n+1
n+1
n+1
θ) sin(θ) sin(
θ) (
) − n cos 2 (
θ) cos(θ) (3–30)
2
2
2
2
Solving equation 3–30, as θ ∈ (0, π/(n + 1))we get
θ=0
But,
fθθ |θ=0 = 0
So, we check fθθθ .
fθθθ |θ=0 = −n3 + 1.5n (n + 1)2 + n
61
> 0, n ≥ 1
So, at θ = 0, f has a minima where θ ∈ (0, α/(n + 1)).
f |θ=0 = 0
(3–31)
Thus, f ≥ 0 as n ≥1.
Between θ ∈ (0, α/(n + 1)), sin(θ) > 0.
Thus,
f
≥0
sin(θ)
sin(nθ)
n+1
f
=
− ncos 2 (
θ)
sin(θ)
sin(θ)
2
Hence,
sin(nθ)
n+1
− ncos 2 (
θ) ≥ 0
sin(θ)
2
62
(3–32)
CHAPTER 4
IPGA: INCREMENTAL PRINCIPAL GEODESIC ANALYSIS WITH APPLICATIONS TO
MOVEMENT DISORDER CLASSIFICATION
4.1 Background
Principal Geodesic Analysis (PGA) captures variability in the data by using the
concept of principal geodesic subspaces which in this case are sub-manifolds of the
Riemannian manifold on which the given data lie. In order to achieve this goal, it is
required to know the Riemannian structure of the manifold, specifically, the geodesic
distance, the Riemannian log and exp maps and the Fréchet mean. For definitions of
Riemannian log and exp maps, the geodesic distance as well the Fréchet mean, see
section 4.2. PGA relies on use of the linear vector space structure of the tangent space
at the Fréchet mean by projecting all of the data points to this tangent space and then
performing standard PCA in this tangent space followed by projection of the principal
vectors back to the manifold using the Riemannian exp map yielding principal geodesic
subspaces. The representation of each manifold-valued data point in the principal
geodesic subspace has to be achieved by finding the closest (in the sense of geodesic
distance) point in the subspace to the given data point. This however involves a hard
optimization problem. The standard PGA however does a linear approximation by
projecting the given data point to the aforementioned tangent space, finding the closest
point to the principal linear subspace defined by the principal vectors in this tangent
space and then projecting it back to the manifold using the exp map [13, 14, 36]. Exact
PGA reported in literature by several researchers tries to solve this hard optimization
c
⃝2014
Springer. Reprinted with minor changes, with permission, from H. Salehian,
D. Vaillancourt, and B. C. Vemuri. ”iPGA: Incremental Principal Geodesic Analysis with
Applications to Movement Disorder Classification.” In Medical Image Computing and
Computer-Assisted Intervention–MICCAI 2014, pp. 765-772. Springer International
Publishing, October 2014. [40]
63
without the linear approximation [37, 43]. A generalization of the PGA reported in
[14, 36] to symmetric positive definite diffusion tensor fields was presented in [55]. In
[55], it was demonstrated that the Fréchet mean of several given (registered) tensor
fields computed using a voxel-wise Fréchet mean over the field is equivalent to the
Fréchet mean computed using the Fréchet mean in a product space representation
of the tensor fields. However, for higher order statistics, such as variance, such an
equivalence does not hold. This statement however holds for any manifold-valued fields,
not just for the diffusion tensor fields.
When dealing with large amounts of data, specifically, manifold-valued fields e.g.,
diffusion tensor fields, deformation tensor fields, ODF fields etc,, performing PGA can
be computationally quite expensive. That said, if we have a large number of tensor
fields to perform statistical analysis upon, and if we are provided the data incrementally,
rather than performing PGA from scratch in a batch mode each time a new data set is
provided, it would be computationally more efficient to perform PGA once for a given
data pool and then simply update the PGA each time a new data set is provided. To this
end, we propose a novel incremental PGA or iPGA algorithm in which we incrementally
update the Fréchet mean and the principal sub-manifolds rather than performing PGA in
a batch mode. This will lead to significant savings in computation time.
In the past few decades, the problem of incrementally updating the PCA has been
well studied in literature e.g., [56]. However, these methods require the data samples to
live in a Euclidean space, and hence are not directly applicable to the PGA problem. On
the other hand, Cheng et al. [6] and Ho et al. [21] have reported incremental algorithms
for computing the Fréchet expectation of a given set of SPD matrices. Besides, we
have shown in previous section the convergence of a similar incremental Fréchet mean
estimator, for samples living on a sphere.
Our iPGA algorithm is a novel combination of the incremental Fréchet expectation
algorithm of [6, 21], and the linearized PGA in [55]. We apply our iPGA to two types
64
of popular manifold-valued data: (1) a group of SPD tensor fields derived from high
angular resolution diffusion magnetic resonance images (HARDI), (2) a population of
samples on a high-dimensional unit sphere, derived from the 3-D shapes. Based on
these two iPGA techniques the classification of patients with movement disorders is
performed. We present synthetic experiments depicting the effectiveness and accuracy
of iPGA, compared to the batch-mode PGA. Furthermore, in the real data experiments,
given 67 human brain HARDI data, our iPGA based nearest neighbor classifier aims
to distinguish between controls, Parkinson’s Disease (PD) and Essential Tremor (ET)
patients. Our results demonstrate the effectiveness of iPGA, compared to the batch
mode scheme.
The rest of the chapter is organized as follows. Section 4.2 contains background
material on differential geometry of the space of SPD tensor fields. Further, a brief
review of the differential geometry of sphere is provided. Next, in section 4.3 the
proposed iPGA techniques applicable to both SPD tensor fields, and the spherical
samples, are described in detail. Moreover, sections 4.4 and 4.5 contain synthetic and
real data experiments, comparing PGA and iPGA with respect to computation time and
accuracy.
4.2 Preliminaries
4.2.1 Riemannian Geometry of the Space of SPD Tensor Fields
The Riemannian geometry of k−dimensional unit sphere, Sk , has been discussed
in section 3.2. Table 4-1 summarizes the Riemannian operations on Sk , as well as the
space of n × n SPD matrices, Pn , for convenience.
Based on the Riemannian geometry of Pn summarized in Table 4-1, we now
briefly introduce the basic relevant concepts of Riemannian geometry of the space
of SPD tensor fields denoted by Pm
n following the notation from [55]. For details on
the Riemannian geometry of Pn we refer the reader to [13]. Pn is the space of n × n
symmetric positive definite (SPD) matrices, which is a Riemannian manifold with GL(n),
65
the general linear group as the symmetry group. This can be easily generalized to
Pm
n , the product space of Pn using the product Riemannian structure. In particular,
expressions for the Riemannian geodesic distance, log and exponential maps can be
easily derived. Specifically, the group GL(n)m acts transitively on Pm
n with the group
action specified by
ϕG (X) = (G1 X1 G1T , ... , Gm Xm GmT )
(4–1)
where each Gi ∈ GL(n) is a n × n invertible matrix and Xi is an n × n positive-definite
m
matrix. The tangent space of Pm
n at any point can be identified with Sym(n) because
the tangent space of a product manifold is the product of tangent spaces. Let Y, Z ∈
m
TM Pm
n be two tangent vectors at M ∈ Pn . The inner product between two vectors using
the product Riemannian metric is given by,
⟨Y, Z⟩M =
m
∑
tr(Yi Mi−1 Zi Mi−1 )
(4–2)
i=1
The Riemannian exponential map at M maps Y the tangent vector, to a point in Pm
n and
is given by,
(
)
ExpM (Y) = G1 exp(G1−1 Y1 G1−T )G1T , ... , Gm exp(Gm−1 Ym Gm−T )GmT
(4–3)
(
)
where Gi ∈ GL(n) such that M = G1 G1T , ... , Gm GmT .
Given X ∈ Pm
n , and the log map at M is given by,
(
)
LogM (X) = G1 log(G1−1 X1 G1−T )G1T , ... , Gm log(Gm−1 Xm Gm−T )GmT
(4–4)
Using this definition of the log map in Pm
n , the geodesic distance between M and X
is computed as
v
u m
u∑ ( 2 −1
)
d(M, X) = ∥LogM (X)∥ = t
tr log (G Xi G −T )
i
i=1
66
i
(4–5)
Table 4-1. Summary of Riemannian geometry of the space of n × n positive definite
matrices, Pn , as well as the unit k−dimensional sphere, Sk . In the table, X ,
Y ∈ Pn and U, V ∈ TX Pn . Similarly, x, y ∈ Sk and u, v ∈ Tx Sk .
Pn
⟨U, V ⟩X = tr(UX −1 VX −1 )
ExpX (U) = X 1/2 exp(X −1/2 UX −1/2 )X 1/2
LogX (Y ) = X 1/2 log(X −1/2 YX −1/2 )X 1/2
√ (
)
dPn (X , Y ) = tr log2 (G −1 XG −T )
γ(t) = ExpX (tLogX (Y ))
∑
2
X̂ = arg minX ∈Pn N1 N
i=1 dPn (X , Xi )
Sk
∑k+1
⟨u, v⟩ = i=1 ui vi
u
sin(||u||)
Expx (u) = x cos(||u||) + ||u||
y−x cos(ϕ)
Logx (y) = ||y−x cos(ϕ)|| ϕ , ϕ = ⟨x, y⟩
dSk (x, y) = cos−1 (⟨x, y⟩)
α(t) = Expx (tLogx (y))
∑
2
x̂ = arg minx∈Sk N1 N
i=1 dSk (x, xi )
Using the expression for the geodesic distance given above, we can define the
(intrinsic) mean of N tensor fields as that tensor field which minimizes the following sum
of squared geodesic distances expression:
M = arg minm
M∈Pn
N
1∑
d(M, Xi )2
N i=1
(4–6)
Since the Fréchet mean is unique on Pn [13], this shows that M will be unique as well,
and it can be computed using an iterative algorithm similar to the one in [13]. After
obtaining the intrinsic mean M of the input tensor fields X1 , ... , XN , we compute the
modes of variation using the PGA algorithm for tensor fields described in [55].
4.2.2 Schild’s Ladder Approximation of Parallel Transport
Given two points X0 and Xp on a Riemannian manifold M, with the geodesic curve
γ(t) such that γ(0) = X0 and γ(1) = Xp , the Schild’s Ladder algorithm approximates the
parallel transport of any vector V ∈ TX0 M along γ [31].
This algorithm requires the geodesic curve, log-map and exp-map defined on the
manifold, hence is applicable to both Sk and Pm
n , using their corresponding Riemannian
operations, summarized in Table 4-1.
67
Figure 4-1. Illustration of Schild’s Ladder algorithm, described in Eq. 4–9.
Let X1 , X2 , ..., Xp−1 be some intermediate points on γ(t). Then, the parallel transport
of V to TXp M, denoted by ΓX0 →Xp (V ) is approximated by:
∀1 ≤ i ≤ p
A0 = ExpX0 (V )
(4–7)
Ai = Xi−1 #2 Bi
(4–8)
ΓX0 →Xp (V ) = LogXp (Ap )
(4–9)
Bi = Xi #1/2 Ai−1
where X #1/2 Y denotes the midpoint of geodesic curve between X and Y , and X #2 Y
is obtained by following the geodesic from X through Y for twice its length. For more
information, the reader is referred to [19, 31]. Figure 4-1 illustrates the algorithm
described above.
On the manifold of SPD matrices, the parallel transport from an arbitrary point X0 to
the identity matrix, I , is equivalent to the transform using group action [26]. Therefore,
for the case of SPD tensor fields, we apply the group action wherever applicable, as it is
more computationally efficient and accurate compared to the parallel transport using the
Schild’s ladder.
4.3
iPGA: Incremental Principal Geodesic Analysis
In order to develop the incremental Principal Geodesic Analysis on the space of
SPD tensor fields and the unit sphere, we first need to develop incremental Fréchet
68
mean update techniques applicable to tensor fields and the spherical samples. We will
address this sub-problem in the following paragraphs.
4.3.1 Incremental Fréchet Mean Estimator
As described earlier, the Fréchet mean of a group of manifold valued features is
defined as the minimizer of the sum of squared geodesic distances. Unfortunately,
this minimization problem does not have a closed form solution for a population of size
k
greater than two, in most Riemannian manifolds including Pm
n and S .
In section 3.3 we introduced an incremental algorithm to estimate the Fréchet
mean of a group of samples on sphere, and proved its convergence to the mean of the
distribution the samples are drawn from, as number of samples tends to infinity.
Similarly, in [21], authors presented an incremental Fréchet mean estimator, IFME,
for SPD matrices (not SPD tensor fields). Given the estimated Fréchet mean of the first
k SPD tensors, denoted by Mk , and the new sample Xk+1 , IFME locates the new mean,
Mk+1 , on the geodesic curve between Mk and Xk+1 using the Euclidean weight. More
formally,
Mk+1 = ExpMk (tLogMk (Xk+1 ))
where t =
(4–10)
1
.
k+1
We now generalize the above incremental Fréchet mean formula to the case
where the data samples are SPD tensor fields (not just SPD matrices), using exp
and log maps defined earlier on the product manifold of SPD tensor fields. Let Mk =
(Mk,1 , ..., Mk,m ) denote the estimated Fréchet mean of the first k samples, and Xk+1 =
(Xk+1,1 , ..., Xk+1,m ) be the new given tensor field. Based on the IFME algorithm and the
product space representation chosen here, it is straightforward to generalize the IFME to
the product space of tensor fields Pm
n . Thus, the new mean then is obtained by updating
the old mean via the following equation:
Mk+1 = (ExpMk,1 (
1
1
LogMk,1 (Xk+1,1 )), ..., ExpMk,m (
LogMk,m (Xk+1,m )))
k +1
k +1
69
(4–11)
4.3.2 Incremental Principal Geodesic Analysis on Pm
n
In this section we will develop the incremental version of the PGA algorithm in [55]
applicable to SPD tensor fields. Very briefly, in [55], the PGA computation problem on
the space of SPD tensor fields is approximated by applying PCA in the tangent plane
anchored at the Fréchet mean, in the following manner. First, the Fréchet mean, M, of
the set of tensor fields is computed. Next, each tensor field is projected to the tangent
space at the mean (i.e., TM Pm
n ), using log map, then transformed to the tangent space
at the identity. This tangent space is a standard Euclidean space denoted by TI Pm
n,
where I is the tensor field consisting of m identity matrices. Therefore, the ordinary PCA
algorithm is performed at TI Pm
n , and the obtained principal components are transformed
back to TM Pm
n . Note that this operation of transforming to the identity is crucial, since,
the inner product defined for Pm
n corresponds to the inner product in the Euclidean space
only at the identity I.
Equipped with the incremental Fréchet mean estimator, IFME, on the space of SPD
tensor fields, we are ready to reformulate this algorithm in an incremental form. In a
similar fashion, each SPD tensor field is projected using the log map and transformed
th
(by applying the group action) to TI Pm
tensor field,
n . More formally, let Xi denote the i
and Mk be the Fréchet mean of the k given samples. Define Yi = LogMk (Xi ) ∈ TMk Pm
n.
Each Yi is then transformed to TM Pm
n , to obtain Zi . Accordingly, the data matrix at
th
TI Pm
column corresponds to Zi in a
n , denoted by Ak , can be constructed where its i
vectorized form.
In the our algorithm, we keep track of the data matrix, Ak , at TI Pm
n . Let Xk+1 and Mk
denote the new SPD tensor field, and the Fréchet mean over all previous k tensor fields,
respectively. Then, to update the principal components we need to augment the data
matrix with an appropriate vector which represents Xk+1 , in TI Pm
n .
In order to find this vector, we first locate the new Fréchet mean Mk+1 , using Eq.
4–11 , then project Xk+1 to the tangent space at Mk+1 , i.e., Yk+1 = LogMk+1 (Xk+1 ). This
70
m
tangent vector is moved to TI Pm
n using the group action on Pn as shown below, where,
G = (G1 , ..., Gm ), and G is such that ∀i , Mk+1,i = Gi GiT .
Zk+1 = ΦG−1 (Yk+1 ) = (G1−1 Yk+1,1 G1−T , ..., Gm−1 Yk+1,m Gm−T )
(4–12)
Now, the old data matrix Ak and the vector Zk+1 are both in TI Pm
n which is the standard
Euclidean space.
However, we should emphasize that the data matrix Ak contains the transformed log
maps of the first k data points, at the old mean, i.e., Mk , while Zk+1 is the transformed
log vector of the k + 1st sample, at the new mean, i.e., Mk+1 . Consequently, while the
[
]
mean of log vectors in Ak is the zero vector, the columns of Ak Zk+1 will no longer
be zero-mean. This will affect the estimation accuracy of principal components, specially
for smaller values of k for which Mk and Mk+1 are further from each other. Hence, the
data matrix Ak should first be updated, accordingly, before it is augmented by the new
log vector.
Given the old data matrix Ak , the basic algorithm for this update problem consists
of the following steps: (1) compute the exp maps of all k log vectors at the identity, to
retrieve the first k data samples, (2) obtain the log maps of the data matrices at the new
location. It is evident that this method significantly slows down the incremental PGA,
hence is not a reasonable choice.
Instead, we apply the following faster heuristic solution. Let Yi = LogMk (Xi ) be the
log map of the i th data matrix at the old mean, and Zi be the corresponding transformed
vector to TI Pm
n . Also, assume that Lk+1 = LogMk+1 (Mk ), and Tk+1 is its translated
vector to TI Pm
n . Then, the updated vector is obtained by Ŷi = Yi + Tk+1 . Note that this
algorithm gives an accurate solution in linear spaces. Also, as will be shown shortly in
experiments, it does not sacrifice much accuracy in estimating PGA, especially when k
gets larger. Besides, this method is significantly faster, because for each new sample
71
Table 4-2. Incremental PGA Algorithm for SPD Tensor Fields
1: Input the data matrix Ak for k samples
the new tensor field Xk+1 , and the old mean Mk
2: Compute Mk+1 from Xk+1 and Mk , using Eq. 4–11
3: Yk+1 = LogMk+1 (Xk+1 )
4: Zk+1 = ΦG−1 (Yk+1 ), defined in Eq. 4–12
5: Compute Lk+1 = LogMk+1 (Mk ) and Tk+1 = ΦG−1 (Lk+1 )
6: Add Tk+1 to every column of Ak to obtain Âk
[
]
7: Perform standard PCA on Ak+1 = Âk Zk+1
8: Translate j th principal component, Pj ,
back to TMk+1 Pm
n , via Qj = ΦG (Pj )
Figure 4-2. Schematic illustration of the algorithm in Table 4-2.
Tk+1 is only computed once, and is added to all columns of Ak . This way, the old data
matrix Ak is updated to Âk .
[
=
Now, we can augment the updated data matrix with the new log vector: Ak+1
]
Âk Zk+1 , and perform PCA on new data matrix. At the end, the new principal
components are transformed back to TMk+1 Pm
n , using the transformation ΦG , where Φ
and G are the same as in Eq. 4–12. This method is summarized in Table 4-2. Also, Fig.
4-2 illustrates the variables used in the algorithm.
4.3.3 Incremental Principal Geodesic Analysis on Sk
We now introduce the iPGA algorithm applicable on Sk , in a very similar fashion
to the iPGA algorithm on Pm
n proposed so far. In specific, we discuss the modifications
72
Table 4-3. Incremental PGA Algorithm on Unit Sphere
1: Input the data matrix Ak = [v1 , ..., vk ] for k samples
the new sample xk+1 , and the old mean mk
2: Compute mk+1 from xk+1 and mk , using Eq. 3–10
3: yk+1 = Logmk+1 (xk+1 )
4: Parallel Transport zk+1 = Γmk+1 →n (yk+1 ), defined in Eq. 4–9, and n is the north pole
5: Compute rk+1 = Logmk+1 (mk ) and tk+1 = Γmk+1 →n (rk+1 )
6: Add tk+1 to every column of Ak to obtain Âk = [v̂1 , ..., v̂k ]
[
]
7: Perform standard PCA on Ak+1 = Âk zk+1
8: Parallel transport j th principal component, pj ,
back to Tmk+1 Sk , via qj = Γn→mk+1 (pj )
should be made to the previously discussed iPGA, in order to make it suitable for the
spherical samples.
First, note that the convergence analysis of iFME on Pn in [21] is not directly
applicable to the unit sphere. However, in Section 3.3 we provided the convergence
proof of iFME on sphere, using tools from Gnomonic projection. As an application, the
iFME method is used here to develop the iPGA algorithm on sphere.
Second, the inner product between any two tangent vectors of Sk , is equivalent to
the standard Euclidean inner product (see Table 4-1), and is independent of the point
that the vectors are anchored at. Consequently, the standard PCA can be employed on
the tangent plane at any point in Sk , in contrast to the PGA algorithm on Pm
n . However,
in our incremental PGA technique, we always keep track of the data matrix at the north
pole (or any other arbitrary point on sphere), because this way only the new log vector
needs to be translated for each new sample.
Third, the group action applied to the case of Pm
n is replaced with the parallel
transport, approximated by the Schild’s Ladder technique, which was described in
4.2.2. With these modifications being made, the new iPGA technique on Sk can be
summarized in Table 4-3.
73
Figure 4-3. Step by step illustration of the iPGA algorithm on Sk , summarized in Table
4-3. From left to right, and top to bottom steps 1 through 8 are shown,
respectively.
74
4.4
Synthetic Experiments
In this section we present several experiments with the synthetically generated data,
using the proposed iPGA methods, on both Sk and Pm
n . The accuracy and efficiency of
the proposed algorithms have been evaluated compared to the non-incremental PGA
counterparts.
4.4.1 Manifold of SPD Tensor Fields
Data Description: We generated a group of 25, 16 × 16 SPD tensor fields,
synthetically. The 3 × 3 SPD matrices in all tensor fields are ellipsoidal. There are two
types of SPD matrices in each tensor field, whose principal eigenvectors differ by 90
degree. In generated tensor fields, the angles of principal eigenvectors of the first and
the second matrices are uniformly chosen in [0, π] and [ π2 , 3π
], respectively.
2
Time Consumption: Given a pool of tensor fields, they are incrementally input
(in random order) to both iPGA and PGA algorithms and the CPU time consumed (on
an Intel-7 2.76GHz CPU with 8GB RAM) by each method to compute the principal
components is recorded. We repeat this experiment 10 times on the data pool of 25
tensor fields and plot the average time/accuracy for each method. The left plot in Fig.
4-5 demonstrate that time consumption for iPGA is significantly less compared to that of
PGA, especially for a large number of input data samples.
Error Measurement: In order to measure the accuracy of each method, we
computed the residual sum defined in [43] for estimated principal components. For N
∑
2
input tensor fields, the residual sum is defined by N1 N
j=1 d (Xj , π̂SU (Xj )), where d is the
geodesic distance on Pnm , and π̂S (Xj ) is the estimated projection of Xj to the geodesic
subspace spanned by the principal components, denoted by SU . The projection, πSU , is
estimated in the tangent space (see Eq.6 in [43] for details). This estimation is illustrated
in Fig 4-4. The bar chart on the right in Fig. 4-5 depicts the error comparison between
PGA and iPGA at each iteration. It can be observed that iPGA’s residual error is very
75
Figure 4-4. Estimation of the projection πS (X ) to the 1-D principal geodesic submanifold
(red curve).
Figure 4-5. Time consumption and residual error comparison between iPGA (proposed)
and PGA on Pm
n.
close to PGA’s. Thus, from an accuracy viewpoint, iPGA is on an equal footing with PGA
but from a computational efficiency viewpoint, it is significantly better.
4.4.2 Unit Sphere Sk
We generated a group of 25 random samples on a high-dimensional unit-sphere,
i.e., S10000 . We picked this very high dimensional space, in order to simulate the data
points we are going to deal with in the real data experiments.
76
Figure 4-6. Mean angular error of iPGA estimates w.r.t. PGA on S10000 .
We fed the samples into both PGA and iPGA methods defined on sphere,
incrementally, and recorded the time consumed by each method to estimate the
principal components. Also, in order to evaluate the accuracy of iPGA for each new
sample, we considered the PGA estimate as ground-truth, and measured the angle
between the first principal components obtained from iPGA and PGA, in the tangent
plane at the north pole. This error is henceforth called the angular error.
The experiment is repeated 500 times and the average plots are shown here.
Figure 4-6 illustrates the angular error of iPGA over the number of samples. It can be
seen that the angular error of iPGA with respect to PGA is bounded by 10 degrees and
keeps decreasing, as the sample size gets larger. Besides, it is evident from figure 4-7
that the time consumed by iPGA is significantly less than the non-incremental version,
which makes it an appealing choice especially for large data dimensionality.
4.5 Real Data Experiments: Classification of PD vs. ET vs. Controls
In this section we present an application of iPGA to real data sets. We applied
proposed iPGA techniques on both the unit sphere, as well as the space of SPD tensor
fields. Our real data consists of HARDI acquisitions from patients with Parkinson’s
disease (PD), essential tremor (ET) and controls. The goal here is to be able to
automatically discriminate between these groups using features derived from the
77
Figure 4-7. Time comparison of incremental and non-incremental PGA estimators on
S10000 .
data. Earlier work in this context in the field of movement disorders involved use of
DTI based ROI analysis specifically using scalar valued measures such as fractional
anisotropy [49]. They showed that DTI had high potential of being a non-invasive early
trait biomarker. All our HARDI data were acquired using a 3T Phillips MR scanner with
s
the following parameters: TR = 7748ms, TE = 86ms, b−values: 0, 1000 mm
2 , 64 gradient
directions and voxel size = 2 × 2 × 2mm3 .
4.5.1 Classification Results using Deformation Tensor Features
In the first part, we perform the classification task using SPD tensor field features.
We use the ensemble average propagators (EAP) at each voxel estimated using the
technique in [23]. We extract the Cauchy deformation tensor field which is computed
from a non-rigid registration of the given EAP fields to the control atlas EAP field
(constructed using the approach in [7]) – see figure 4-8. The Cauchy deformation tensor
√
is defined as JJ t , where J is the Jacobian of the deformation at each voxel. The
Cauchy deformation tensor is an SPD matrix of size (3, 3) in this case. This gives us an
SPD field as a derived feature corresponding to each given EAP field. We use the iPGA
described earlier and use the nearest geodesic distance-based neighbor to classify the
probe data set. Note that the geodesic distance in this case is the distance between the
probe data set and the geodesic submanifold representation of each class namely, PD,
78
Figure 4-8. (a) and (b) are the corresponding S0 (zero magnetic gradient) slices of the
atlas and a control subject, respectively, and (c) shows the EAPs of the
same slice as in (b), with the Substantia Nigra as the ROI. Similarly, (d) and
(e) are the corresponding S0 slices of the atlas and a Parkinson subject,
respectively, and (f) illustrates the EAPs computed for the slice in (e), with
the Substantia Nigra as the ROI.
Table 4-4. Classification results of iPGA, PGA, PCA using SPD tensor field features
Accuracy
Sensitivity
Specificity
Control vs. PD
iPGA
PGA
PCA
89.00
89.95
56.37
92.72
93.33
65.29
85.28
86.57
47.45
Control vs. ET
iPGA
PGA
PCA
86.44 87.13 63.43
87.01 88.94 66.27
85.87 85.32 60.59
iPGA
89.18
95.57
82.79
PD vs. ET
PGA
90.28
96.47
84.09
PCA
58.53
64.71
52.35
ET and Controls. The probe is assigned the label of that class with smallest geodesic
distance.
Classification is performed on 26 PD, 16 ET and 25 control subjects using the PGA
of the Cauchy deformation tensor fields described above, where 10 subjects from PD
and control, as well as 6 subjects from ET were randomly picked as test group, and the
rest of the subjects we used for training. The experiment is repeated 300 times and the
mean values are reported. Table 4-4 summarizes the accuracy for each method, where
Accuracy =
TP+TN
,
FP+FN
Sensitivity =
TP
TP+FN
and Specificity =
79
TN
FP+TN
and FN denotes the
number of False Negatives, similarly for TP, TN and FP. For comparison, we also used
the standard PCA method, which is applied to a vectorized version of the tensor fields.
The size of the tensor fields was restricted to the ROIs instead of the whole image.
Thus, the dimensionality was 600 ∗ 6 = 3600 and we used just the first two principal
components in all competing methods to achieve the classification reported in the table.
From the table, it is evident that iPGA and PGA provide very similar accuracies in all
three classifications. Further, iPGA is considerably more accurate than PCA, because in
the later method the non-linearity of Pm
n is not taken into account.
4.5.2 Classification Results using Shape Features
In the second part, we evaluated the iPGA algorithm applied on unit sphere, in
the task of movement disorder classification. To this end, we used the shape of the
Substantia Nigra region in the brain images, as the discriminant feature. Recently, in
[10], a Schrodinger Distance Transform (SDT) was introduced and applied to represent
the point clouds (in 2-D or 3-D) as points on an infinite dimensional Hilbert sphere.
The shape of Substantia Nigra region was hand-segmented in all rigidly aligned
datasets, consisting of 25 controls, 24 PD and 15 ET images. We first collected the
same number of random samples on the boundary of each 3-D shape, and applied the
SDT technique to represent each shape as a point on a unit sphere. The 3-D shape
domain was set to 28 × 28 × 15, resulting in 11760-dimensional unit vectors from SDT.
Therefore, the samples are now living on the S11759 manifold. Figure 4-9 demonstrates
the extracted shapes of Substantia Nigra in 25 control images.
Once all shapes are represented as points on the unit sphere, we can apply our
incremental PGA method for spherical features. Figure 4-10 illustrates the mean shape,
along with the first principal components from PGA and iPGA methods, with coefficients
√
√
1.5 λ and 3 λ, where λ is the corresponding coefficient of the first principal component
estimated from each method.
80
Table 4-5. Classification results of iPGA, PGA, PCA using shape descriptor features
Accuracy
Sensitivity
Specificity
Control vs. PD
iPGA
PGA
PCA
91.46
92.95
67.32
87.98
90.93
51.96
94.94
94.98
82.69
Control vs. ET
iPGA
PGA
PCA
88.28 90.14 75.69
86.34 88.18 77.87
92.16 94.05 71.32
iPGA
86.13
80.54
97.32
PD vs. ET
PGA
87.58
82.38
98.00
PCA
64.60
48.36
97.08
Figure 4-9. Population of Substantia Nigra regions extracted from the control brain
images.
Next, a PGA-based classification was performed, in a similar manner to the
previous section. We randomly selected 10 Control, 10 PD and 5 ET images as the
test set, and used the rest of the images for training. The classification task is repeated
300 times using various training sets and the average accuracy is computed. The
classification results using the shape descriptors are summarized in Table 4-5. It can
be seen that the accuracy of iPGA is reasonably close the the PGA, while they both
outperform the standard linear version of PCA.
81
Figure 4-10. Comparison of incremental (bottom row) and non-incremental (top row)
results
√ of (1) Fréchet Means (left column), (2) PGA with the
√coefficient
1.5 λ (middle column), and (3) PGA with the coefficient 3 λ (right column)
82
CHAPTER 5
SUMMARY AND DISCUSSION
In this dissertation we developed novel incremental algorithms for statistical analysis
of manifold-valued data. In the first part, we proposed an incremental (intrinsic) mean
computation technique for the space of Symmetric Positive Definite (SPD) matrices,
based on the Stein distance. The key contribution entailed the derivation of a closed
form solution for the computation of a weighted Stein mean for two SPD matrices
which was then used in developing an incremental algorithm for computing the Stein
mean of a population of SPD matrices. Further, using this incremental Stein mean
estimator, we experimentally demonstrated significant gains in computation time over
the non-incremental counter part while maintaining the approximately same accuracy.
Second, we presented a new incremental algorithm for computing the Fréchet mean
for samples on a sphere. We presented the proof of convergence for this incremental
algorithm, when the number of samples tends to infinity. Several applications of sample
data that live on the sphere are considered and results depict superior performance of
our incremental algorithm over the non-incremental counterpart. Finally, we presented a
novel incremental algorithm to perform Principal Geodesic Analysis (PGA) applicable to
the manifold of SPD matrices as well as a sphere. Further, we demonstrated significant
time gains using our incremental algorithm, while maintaining the accuracy to be
approximately same as that of the non-incremental counterpart.
83
REFERENCES
[1] Afsari, Bijan. “Riemannian Lp center of mass: Existence, uniqueness, and
convexity.” Proceedings of the American Mathematical Society 139 (2011).2:
655–673.
[2] Amarasinghe, GW. “On the standard lengths of angle bisectors and the angle
bisector theorem.” Global Journal of Advanced Research on Classical and Modern
Geometries 1 (2012).1.
[3] Baisnab, AP and Jas, AP Baisnab Manoranjan. Elements of Probability and
Statistics. Tata McGraw-Hill Education, 1993.
[4] Cetingul, Hasan Ertan, Afsari, Bijan, Wright, Margaret J, Thompson, Paul M, and
Vidal, René. “Group action induced averaging for HARDI processing.” Biomedical Imaging (ISBI), 2012 9th IEEE International Symposium on. IEEE, 2012,
1389–1392.
[5] Chebbi, Z. and Moakher, M. “Means of Hermitian positive-definite matrices based
on the log-determinant divergence function.” Linear Algebra and its Applications 40
(2012).
[6] Cheng, Guang, Salehian, Hesamoddin, and Vemuri, Baba C. “Efficient recursive
algorithms for computing the mean diffusion tensor and applications to DTI
segmentation.” ECCV. Springer, 2012.
[7] Cheng, Guang, Vemuri, Baba C, Hwang, Min-Sig, Howland, Dena, and Forder,
John R. “Atlas construction from high angular resolution diffusion imaging data
represented by Gaussian Mixture fields.” Biomedical Imaging: From Nano to Macro,
2011 IEEE International Symposium on. IEEE, 2011, 549–552.
[8] Cheng, Jian, Ghosh, Aurobrata, Jiang, Tianzi, and Deriche, Rachid. “A Riemannian
framework for orientation distribution function computing.” Medical Image Computing and Computer-Assisted Intervention–MICCAI 2009. Springer, 2009. 911–918.
[9] Cherian, A., Sra, S., Banerjee, A., and Papanikolopoulos, N. “Efficient similarity
search for covariance matrices via the JB LogDet Divergence.” ICCV. 2011,
2399–2406.
[10] Deng, Yan, Rangarajan, Anand, Eisenschenk, Stephan, and Vemuri, Baba C. “A
Riemannian Framework for Matching Point Clouds Represented by the Schrodinger
Distance Transform.” 2014.
[11] Do Carmo, Manfredo P. Riemannian geometry. Springer, 1992.
[12] Fillard, P., Arsigny, V., Pennec, X., Thompson, M., and Ayache, N. “Extrapolation of
sparse tensor fields: application to the modeling of brain variability.” International
Conference on Information Processing in Medical Imaging (IPMI). 2005.
84
[13] Fletcher, P Thomas and Joshi, Sarang. “Riemannian geometry for the statistical
analysis of diffusion tensor data.” Signal Processing 87 (2007).2: 250–262.
[14] Fletcher, P.T., Lu, C., Pizer, S.M., and Joshi, S. “Principal geodesic analysis for the
study of nonlinear statistics of shape.” Medical Imaging, IEEE Transactions on 23
(2004).8: 995–1005.
[15] Fréchet, Maurice. “Les éléments aléatoires de nature quelconque dans un espace
distancié.” Annales de l’institut Henri Poincaré. vol. 10. Presses universitaires de
France, 1948, 215–310.
[16] Grove, Karsten and Karcher, Hermann. “How to conjugateC 1-close group actions.”
Mathematische Zeitschrift 132 (1973).1: 11–20.
[17] Harandi, M., Sanderson, C., Hartley, R., and Lovell, B.C. “Sparse Coding and
Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach.”
European Conference on Computer Vision (ECCV). 2012.
[18] Hartley, Richard, Trumpf, Jochen, Dai, Yuchao, and Li, Hongdong. “Rotation
averaging.” International journal of computer vision 103 (2013).3: 267–305.
[19] Hauberg, Søren, Lauze, François, and Pedersen, Kim Steenstrup. “Unscented
kalman filtering on riemannian manifolds.” Journal of mathematical imaging and
vision 46 (2013).1: 103–120.
[20] Heo, Jae-Pil, Lee, YoungWoon, He, Junfeng, Chang, Shih-Fu, and Yoon, Sung-eui.
“Spherical Hashing.” IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR). 2012.
[21] Ho, Jeffrey, Cheng, Guang, Salehian, Hesamoddin, and Vemuri, Baba. “Recursive
Karcher Expectation Estimators And Geometric Law of Large Numbers.” Proceedings of the Sixteenth International Conference on Artificial Intelligence and
Statistics. 2013, 325–332.
[22] Horn, Berthold. Robot vision. MIT press, 1986.
[23] Jian, Bing and Vemuri, Baba C. “A Unified Computational Framework for
Deconvolution to Reconstruct Multiple Fibers From DWMRI.” IEEE TMI 26 (2007):
1464–1471.
[24] Karcher, Hermann. “Riemannian Center of Mass and so called karcher mean.”
arXiv preprint arXiv:1407.2087 (2014).
[25] Kendall, Wilfrid S. “Probability, convexity, and harmonic maps with small image I:
uniqueness and fine existence.” Proceedings of the London Mathematical Society 3
(1990).2: 371–406.
[26] Kim, Hyunwoo J, Adluru, Nagesh, Bendlin, Barbara B, Johnson, Sterling C, Vemuri,
Baba C, and Singh, Vikas. “Canonical Correlation Analysis on Riemannian
85
Manifolds and Its Applications.” Computer Vision–ECCV 2014. Springer, 2014.
251–267.
[27] Latecki, Longin Jan, Lakamper, Rolf, and Eckhardt, T. “Shape descriptors for
non-rigid shapes with a single closed contour.” CVPR. 2000, 424–429.
[28] Lenglet, C., Rousson, M., and Deriche, R. “DTI segmentation by statistical surface
evolution.” IEEE Transactions on Medical Imaging 25 (2006).6: 685–700.
[29] Li, Jia and Wang, James Z. “Automatic linguistic indexing of pictures by a statistical
modeling approach.” PAMI (2003).
[30] Lim, Yongdo and Pálfia, Miklós. “Weighted inductive means.” Linear Algebra and its
Applications 453 (2014): 59–83.
[31] Lorenzi, Marco, Ayache, Nicholas, and Pennec, Xavier. “Schilds Ladder for the
parallel transport of deformations in time series of images.” Information Processing
in Medical Imaging. Springer, 2011, 463–474.
[32] Lowe, David G. “Object recognition from local scale-invariant features.” Computer
vision, 1999. The proceedings of the seventh IEEE international conference on.
vol. 2. Ieee, 1999, 1150–1157.
[33] Mardia, Kanti V and Jupp, Peter E. Directional statistics, vol. 494. John Wiley &
Sons, 2009.
[34] Moakher, M. and Batchelor, P. G. SPD Matrices: From Geometry to Applications
and Visualization. Visual. and Proc. of Tensor Fields, 2006, 285–298.
[35] Ncube, Sentibaleng and Srivastava, Anuj. “A novel Riemannian metric for
analyzing HARDI data.” SPIE Medical Imaging. International Society for Optics
and Photonics, 2011, 79620Q–79620Q.
[36] Pennec, Xavier. “Intrinsic statistics on Riemannian manifolds: Basic tools for
geometric measurements.” JMIV 25 (2006).1: 127–154.
[37] Said, Salem, Courty, Nicolas, Le Bihan, Nicolas, Sangwine, Stephen J, et al. “Exact
principal geodesic analysis for data on so (3).” Proceedings of the 15th European
Signal Processing Conference, EUSIPCO-2007. 2007, 1700–1705.
[38] Sakai, Takashi. Riemannian geometry, vol. 149. American Mathematical Soc.,
1996.
[39] Salehian, Hesamoddin, Cheng, Guang, Vemuri, Baba C, and Ho, Jeffrey.
“Recursive Estimation of the Stein Center of SPD Matrices and Its Applications.”
Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013,
1793–1800.
86
[40] Salehian, Hesamoddin, Vaillancourt, David, and Vemuri, Baba C. “iPGA:
Incremental Principal Geodesic Analysis with Applications to Movement Disorder
Classification.” Medical Image Computing and Computer-Assisted Intervention–
MICCAI 2014. Springer, 2014. 765–772.
[41] Schwartzman, Armin. Random ellipsoids and false discovery rates: Statistics for
diffusion tensor imaging data. Ph.D. thesis, Stanford University, 2006.
[42] Sloane, Neil JA et al. “The on-line encyclopedia of integer sequences.” 2003.
[43] Sommer, Stefan, Lauze, François, Hauberg, Søren, and Nielsen, Mads. “Manifold
valued statistics, exact principal geodesic analysis and the effect of linear
approximations.” Computer Vision–ECCV 2010. Springer, 2010. 43–56.
[44] Sra, S. “Positive Definite Matrices and the Symmetric Stein Divergence.” Available
in author’s website at ”http://people.kyb.tuebingen.mpg.de/suvrit/” (2011).
[45] Srivastava, Anuj, Jermyn, Ian, and Joshi, Shantanu. “Riemannian analysis of
probability density functions with applications in vision.” Computer Vision and
Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 2007, 1–8.
[46] Sturm, K. T. “Probability Measures on Metric Spaces of Nonpositive Curvature.”
Heat Kernels and Analysis on Manifolds, Graphs, and Metric Spaces. 2003.
[47] Terras, A. Harmonic Analysis on Symmetric Spaces and Applications.
Springer-Verlag, 1985.
[48] Tournier, Maxime, Wu, Xiaomao, Courty, Nicolas, Arnaud, Elise, and Reveret,
Lionel. “Motion compression using principal geodesics analysis.” Computer
Graphics Forum. vol. 28. Wiley Online Library, 2009, 355–364.
[49] Vaillancourt, DE, Spraker, MB, Prodoehl, J, Abraham, I, Corcos, DM, Zhou,
XJ, Comella, CL, and Little, DM. “High-resolution diffusion tensor imaging in
the substantia nigra of de novo Parkinson disease.” Neurology 72 (2009).16:
1378–1384.
[50] Wang, Yuanxiang, Salehian, Hesamoddin, Cheng, Guang, and Vemuri, Baba.
“Tracking on the Product Manifold of Shape and Orientation for Tractography from
Diffusion MRI.” Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. 2013, 3051–3056.
[51] Wang, Z. and Vemuri, B. “Tensor field segmentation using region based active
contour model.” European Conference on Computer Vision (ECCV). 2004,
304–315.
[52] Woods, Roger P. “Characterizing volume and surface deformations in an atlas
framework: theory, applications, and implementation.” NeuroImage 18 (2003).3:
769–788.
87
[53] Wu, Jing, Smith, William AP, and Hancock, Edwin R. “Weighted principal geodesic
analysis for facial gender classification.” IAPR. Springer, 2007, 331–339.
[54] Wu, Yi, Wang, Jinqiao, and Lu, Hanqing. “Real-Time Visual Tracking via
Incremental Covariance Model Update on Log-Euclidean Riemannian Manifold.”
CCPR. 2009.
[55] Xie, Yuchen, Vemuri, Baba C, and Ho, Jeffrey. “Statistical analysis of tensor fields.”
Medical Image Computing and Computer-Assisted Intervention–MICCAI 2010.
Springer, 2010. 682–689.
[56] Zha, Hongyuan and Simon, Horst D. “On updating problems in latent semantic
indexing.” SIAM Journal on Scientific Computing 21 (1999).2: 782–791.
[57] Zhang, Miaomiao and Fletcher, P Thomas. “Probabilistic Principal Geodesic
Analysis.” NIPS. 2013.
88
BIOGRAPHICAL SKETCH
Hesamoddin Salehian was born in 1987 in Tehran, Iran. He graduated from
high school in Semnan, Iran, in 2006. He received his Bachelor of Science degree
in Computer Engineering from Sharif University of Technology, Tehran, Iran, in June
2010. He earned his Master of Science degree from University of Florida, Gainesville, in
Computer Engineering in September 2014. He received his Doctor of Philosophy degree
in Computer Engineering from University of Florida, in December 2014. His research
interests revlove around Medical Image Analysis, Computer Vision and Machine
Learning.
89