INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF
Transcription
INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF
INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF MANIFOLD VALUED DATA By HESAMODDIN SALEHIAN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2014 c 2014 Hesamoddin Salehian ⃝ 2 To the memory of my mother, who devoted her life to my education and has always been truly my encouragement. My wife who has always been supportive and proud of my work, and shared many challenges and sacrifices towards completing my PhD. My father, who taught me to persist and work hard throughout my life. My brothers, who have been always my leaders in education and taught me to be ambitious with high goals. 3 ACKNOWLEDGMENTS First and foremost, I would like to thank my advisor, Dr. Baba C. Vemuri, for his persistent support to make this dissertation. His creativity, excellent knowledge and patience encouraged me all along my PhD study. This dissertation would have not been completed without his support. I would also like to thank my committee, Dr. Arunava Banerjee and Dr. Anand Rangarajan, Dr. William Hager and Dr. John Forder, for making valuable comments and providing wonderful advice. Dr. Banerjee and Dr. Rangarajan have always been very supportive and generous with their time, and taught me fundamental and advanced machine learning concepts. Dr. Hager had a great impact on my knowledge of linear algebra and matrix analysis. Dr. Forder kindly provided data for medical imaging applications. Also, special thanks to Dr. Jeffrey Ho, for his excellent support through my PhD. I had the honor to collaborate with him in several publications, and I would like to thank him for his insightful guidance, dedication, and his wonderful attitude. I cannot express my gratitude enough to my deceased mother, Zahra Khatibi, who devoted her entire life to my education, and was always an excellent encouragement and support all along this road. I never got a chance to say goodbye to her when she passed away overseas, but her memories was the strongest encouragement to overcome all the difficulties towards completing this degree and to make her wishes come true. I am very thankful to my kind wife, Pegah, who has always been proud of my accomplishments and has been by my side through highest highs and lowest lows. I cannot imagine how this dissertation could have been completed, without her persistent help and support. Special thanks for my father, Manouchehr Salehian, who have always been my role model of hard working, strength and great personality, and my older brothers, Hamid 4 and Hamed, who were truly my leaders in education, in music arts and in sport, since I was a little child till present. Last, but not least, I want to thank my former lab-mate, Dr. Guang Cheng for his help and guidance and his excellent work in our several collaborations. Besides, I am thankful to my friendly and knowledgeable colleagues in CVGMI Laboratory, Yuchen, Meizhu, Ting, Dohyung, Wenxing, Yan, Yuanxiang, Jiaqi, Ted, Rudrasis, Monami, and others. The research in this dissertation was in part supported by NIH grant NS066340 to Dr. Baba C. Vemuri. I also received the Student Travel Award from MICCAI’14 Conference, and the Internship Program at Google. I gratefully acknowledge the permission granted by IEEE and Springer to reuse materials from my previous publications in this dissertation. 5 TABLE OF CONTENTS page ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 INCREMENTAL ESTIMATION OF THE STEIN CENTER OF SPD MATRICES AND ITS APPLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . 2.2 Incremental Stein Mean Computation . . . . . . . . 2.3 Properties of Pn Equipped with the Stein Distance . 2.3.1 Global Non-Positive Curvature Spaces . . . . 2.3.2 Discussion . . . . . . . . . . . . . . . . . . . . 2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Performance of the Incremental Stein Center 2.4.2 Application to K-means Clustering . . . . . . 2.4.3 Application to Image Retrieval . . . . . . . . . 2.4.4 Application to Shape Retrieval . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 20 23 23 30 30 30 31 34 39 INCREMENTAL FRÉCHET MEAN ESTIMATOR ON SPHERE . . . . . . . . . 42 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Riemannian Geometry of Sphere . . . . . . . . . . . . . . . . 3.2.2 Gnomonic Projection . . . . . . . . . . . . . . . . . . . . . . . 3.3 Incremental Fréchet Mean Estimator on Sphere . . . . . . . . . . . . 3.3.1 Angle Bisector Theorem . . . . . . . . . . . . . . . . . . . . . 3.3.2 Lower Bound for tn . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Upper Bound for tn . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Convergence of iFME . . . . . . . . . . . . . . . . . . . . . . 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Application to Incremental Shape-Preserving Fréchet Mean of SPD Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 45 45 46 47 51 51 52 53 56 56 . . . 58 IPGA: INCREMENTAL PRINCIPAL GEODESIC ANALYSIS WITH APPLICATIONS TO MOVEMENT DISORDER CLASSIFICATION . . . . . . . . . . . . . . . . . 63 6 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Riemannian Geometry of the Space of SPD Tensor Fields 4.2.2 Schild’s Ladder Approximation of Parallel Transport . . . . 4.3 iPGA: Incremental Principal Geodesic Analysis . . . . . . . . . . 4.3.1 Incremental Fréchet Mean Estimator . . . . . . . . . . . . 4.3.2 Incremental Principal Geodesic Analysis on Pm n . . . . . . k 4.3.3 Incremental Principal Geodesic Analysis on S . . . . . . 4.4 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Manifold of SPD Tensor Fields . . . . . . . . . . . . . . . 4.4.2 Unit Sphere Sk . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Real Data Experiments: Classification of PD vs. ET vs. Controls 4.5.1 Classification Results using Deformation Tensor Features 4.5.2 Classification Results using Shape Features . . . . . . . . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 65 65 67 68 69 70 72 75 75 76 77 78 80 SUMMARY AND DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7 LIST OF TABLES Table page 2-1 Average shape retrieval precision (%) for the MPEG7 database, for different Binary Code (BC) lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2-2 Time (in seconds) comparison for shape retrieval. . . . . . . . . . . . . . . . . 41 4-1 Summary of Riemannian geometry of the space of n×n positive definite matrices, Pn , as well as the unit k−dimensional sphere, Sk . . . . . . . . . . . . . . . . . . 67 4-2 Incremental PGA Algorithm for SPD Tensor Fields . . . . . . . . . . . . . . . . 72 4-3 Incremental PGA Algorithm on Unit Sphere . . . . . . . . . . . . . . . . . . . . 73 4-4 Classification results of iPGA, PGA, PCA using SPD tensor field features . . . 79 4-5 Classification results of iPGA, PGA, PCA using shape descriptor features . . . 81 8 LIST OF FIGURES Figure page 2-1 Schematic view of x1 , x2 , x3 , x4 in Reshetnyak’s quadruple comparison. . . . . . 24 2-2 Illustration of the proof of Reshetnyak’s inequality for the quadruple (I , D2↓ , X3 , X4↓ ), from the quadruple (I , D2↓ , X3↓ , X4↓ ). . . . . . . . . . . . . . . . . . . . . . . . . . 29 2-3 Error comparison of the incremental (red) versus non-incremental (blue) Stein mean computation for data on P3 . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2-4 Time comparison of the incremental (red) versus non-incremental (blue) Stein mean computation for data on P3 . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2-5 Illustration of the incremental mean updates in K-means clustering. . . . . . . . 33 2-6 Time comparison of the K-means clustering using various methods. . . . . . . 35 2-7 Error comparison of the K-means clustering. . . . . . . . . . . . . . . . . . . . 36 2-8 Time consumption in initializing hashing functions. . . . . . . . . . . . . . . . . 39 2-9 Comparison of retrieval accuracy, for techniques specified in Fig. 2-8 . . . . . . 40 2-10 Example results of proposed retrieval system, based on the incremental Stein mean, with 640-bits binary codes. . . . . . . . . . . . . . . . . . . . . . . . . . 41 3-1 Gnomonic Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3-2 Use of Euclidean weights to update iFME in Sk , does not necessarily correspond to the same weights in the tangent space. . . . . . . . . . . . . . . . . . . . . . 49 3-3 Fréchet mean of samples on Sk , does not necessarily coincide with the arithmetic mean of projected points in the tangent space. . . . . . . . . . . . . . . . . . . 50 3-4 The comparison of the ratio of variances (defined in Eq. 3–25) between iFME and FM, for different values of ϕ. . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3-5 The time comparison between iFME and FM, for different values of ϕ. . . . . . 58 3-6 Visual comparison of the mean tensor obtained from shape preserving iFME on the product manifold (top row), and iFME applied on P(3) (bottom row). . . 60 3-7 Comparison of FA values between iFME on P(3), and iFME on the product manifold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4-1 Illustration of Schild’s Ladder algorithm, described in Eq. 4–9. . . . . . . . . . . 68 4-2 Schematic illustration of the algorithm in Table 4-2. . . . . . . . . . . . . . . . . 72 4-3 Step by step illustration of the iPGA algorithm on Sk , summarized in Table 4-3. 9 74 4-4 Estimation of the projection πS (X ) to the 1-D principal geodesic submanifold (red curve). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4-5 Time consumption and residual error comparison between iPGA (proposed) and PGA on Pm n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4-6 Mean angular error of iPGA estimates w.r.t. PGA on S10000 . . . . . . . . . . . . 77 4-7 Time comparison of incremental and non-incremental PGA estimators on S10000 . 78 4-8 S0 images of a control and a Parkinson subject, along with the computed atlas. 79 4-9 Population of Substantia Nigra regions extracted from the control brain images. 81 4-10 Comparison of incremental (bottom row) and non-incremental (top √ row) results of (1) Fréchet Means (left column), (2) PGA√with the coefficient 1.5 λ (middle column), and (3) PGA with the coefficient 3 λ (right column) . . . . . . . . . . 82 10 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy INCREMENTAL ALGORITHMS FOR STATISTICAL ANALYSIS OF MANIFOLD VALUED DATA By Hesamoddin Salehian December 2014 Chair: Baba C. Vemuri Major: Computer Engineering Manifold-valued features are ubiquitous in many applications in computer vision, machine learning and medical image analysis. Statistical analysis of a population of such data is commonly encountered in many tasks in the aforementioned fields such as, object recognition, shape analysis, facial expression analysis, longitudinal studies quantifying for example disease related changes in structure/function, and many others. In this dissertation we present a suite of efficient incremental tools and techniques for statistical analysis of a given population of manifold-valued data. Most of the existing tools suffer from computational and storage (memory) inefficiency, due to the complexities introduced when dealing with manifold-valued features. Therefore, an incremental technique is an appealing choice in these applications, because, when the input population is augmented, one only needs to update the most recently estimated statistical feature (e.g., mean, principal component, etc), without having to re-compute it from scratch. We start the dissertation with efficient statistical analysis algorithms of a population of Symmetric Positive Definite (SPD) matrices. In this regard, we first propose a novel incremental algorithm to compute the mean of a population of SPD matrices, based on the recently introduced Stein distance. It is known that the compute time of the Stein distance between two SPD matrices is far less than that required for computing the geodesic distance using the canonical GL-invariant metric . However, there is no closed 11 from solution for the Stein mean of a group of SPD tensors, which is defined as the minimizer of the sum of squared Stein distances. Therefore, our incremental Stein mean estimator plays a crucial role to speed up many applications dealing with SPD matrices. In a wide variety of applications the input data lies on a sphere which is an example of Riemannian manifolds with positive constant sectional curvature. We develop a novel incremental mean computation algorithm for features lying on a sphere, which is one of the most widely used manifolds in science and engineering problems. Although there are several convergence results in recent literature for many manifestations of an incremental mean estimator, these analysis are all limited to the non-positively curved spaces. We analytically show the convergence of the incremental method to the true mean on sphere, when the number of samples tends to infinity. To the best of our knowledge, there is no similar convergence analysis introduced in literature, for positively curved spaces. We provide several synthetic and real data experiments to illustrate the effectiveness and efficiency of the proposed incremental method. Next, we continue the statistical analysis of manifold-valued data, with the introduction of a novel incremental Principal Geodesic Analysis (PGA) algorithm. PGA is the non-linear counterpart of the well-known Principal Component Analysis (PCA), and is applicable to manifold-valued data. However, the existing PGA algorithms are computationally very expensive, specially for very large data. Using our incremental method, we show considerable gains in computation time over the standard PGA algorithm, while retaining the same accuracy. 12 CHAPTER 1 INTRODUCTION In many applications in computer vision, machine learning and medical imaging, features do not belong to a vector space. For instance, having a unit norm is a constraint which is frequently imposed on a group of vectors, but it is easy to verify that this fundamental constraint is not necessarily closed under linear operations. Therefore, these types of data can be best interpreted as features belonging to some manifold. To mention a few, Symmetric Positive Definite (SPD) matrices which frequently appear in computer vision and medical imaging, belong to a Riemannian manifold with negative sectional curvature [47], most of the popular image features such as SIFT [32] are often defined on spheres, due to normalization, etc. Statistical analysis of manifold-valued features is encountered in most of the applications mentioned above, either to characterize the uncertainty of the noisy data, or to compare and classify the observations in group difference and longitudinal studies. However, due to the lack of the vector space structure, standard statistical analysis tools, e.g., arithmetic mean, Principal Component Analysis (PCA), etc., can not be directly applied to a group of these features. In this dissertation, we introduce computationally efficient tools for statistical analysis of a given population of manifold-valued data. This is achieved by developing incremental algorithms for computing the statistics. Finding the mean of a population of manifold-valued features has gotten a lot of attentions in recent years. Computing the mean of data lying on a manifold, can be achieved through minimization of the sum of squared geodesic distances between the manifold-valued data points and the unknown mean. Mathematically speaking, for a set of given points, xi , on a Riemannian manifold M, ∗ µ = argminµ∈M n ∑ i=1 13 d 2 (xi , µ) (1–1) This cost function is usually called the Fréchet function, in literature, and its global minimizer is referred to as the Fréchet mean [15]. The uniqueness of Fréchet mean for general manifolds cannot be guaranteed, unless some conditions are satisfied [52]. Consequently, any point that is a local minimizer of the above sum of squared distances is known as Karcher mean. For Riemannian manifolds with non-positive sectional curvatures, Cartan showed that the Fréchet mean always exists and is unique [38, p. 222]. Later, Grove and Karcher in [16] tried to generalize Cartan’s theorem, and proved the uniqueness of this center of mass in general Riemannian manifolds, but for the samples within a geodesic ball with small enough radius. We refer the interested reader to [1, 15, 52] for further details. Among various examples of Riemannian manifolds, we are particularly interested in the statistical analysis of the features lying on one the these two well-known manifolds which widely appear in computer vision, medical image analysis and machine learning literature: (i) the space of (n × n) Symmetric Positive Definite (SPD) matrices which is denoted by P(n), and is a Riemannian manifold with negative sectional curvature [47], (ii) the k-dimensional unit sphere embedded in Rk+1 , which is denoted by Sk , and is a standard instance of positively curved spaces [11]. Symmetric Positive Definite (SPD) matrices have been widely used in many computer vision and medical imaging applications. For instance, structure tensors and covariance descriptors are ubiquitous in computer vision problems, including but not limited to classification, object tracking and recognition. Also, in medical imaging, they are often encountered in Diffusion Tensor Imaging (DTI), Conductance Imaging, elastography, etc. In DTI, they are used to characterize the diffusion of water molecules, in elastography, the elasticity tensor is used to describe the material properties of the tissue and so on and so forth. Cauchy-Green deformation tensors are another example of such matrices which appear in fluid and solid mechanics. 14 On the other hand, spherical features are frequently used in many applications in computer vision and machine learning. To mention a few, any probability distribution function can be parameterized, using square root density and thus mapping it to a point on a hyper-sphere in an infinite dimensional Hilbert space [45]. (3 × 3) orthogonal matrices can be represented by unit quaternions which are points on a 4-dimensional unit sphere [18]. Also, any directional feature, due to normalization, inherently lie on a 3-dimensional unit sphere [33]. It is known that the geodesic distance computation on P(n) is computationally inefficient, specially for large matrix dimensions. The Stein distance is a recently proposed alternative [9], which is more efficient. However, lack of a closed form solution for the Stein mean of more than two SPD matrices, makes it less appealing, because iterative optimization techniques must be employed to compute the mean. In Chapter 2, we present a novel incremental algorithm to compute the Fréchet mean of a group of SPD matrices, based on the Stein distance. Through several synthetic and real data experiments, we demonstrate significant time gains achieved by our incremental method, compared to its non-incremental counterpart, while the accuracy of the two methods are very similar. Further, in Chapter 3, the incremental Fréchet mean estimator for data lying on sphere, is presented. The existing incremental mean computation techniques in literature e.g., [6, 21, 30, 46], are applicable to non-positively curved Riemannian manifolds, while sphere is a space with positive sectional curvature [11]. Therefore, convergence results in the aforementioned references are not directly applicable to this case. We analytically prove the convergence of the incremental estimator to the true Fréchet mean for symmetric distributions, when the number of samples tends to infinity. To the best of our knowledge, there is no similar convergence results for positively curved manifolds, in literature. We demonstrate the efficiency of our incremental method, in several applications. 15 Principal Component Analysis (PCA) is a well-known statistical analysis tool which is widely used in literature. The non-linear version of PCA is called Principal Geodesic Analysis (PGA) and was first introduced in [14]. PGA has been applied to many problems in the past decade. To mention a few, in medical imaging literature, it was used in [13, 14, 57] and [55] for statistical shape analysis and tensor field classification, respectively. Also, in computer vision it was applied to facial gender classification [53] and motion compression [48]. We continue the statistical analysis of manifold-valued data, by presenting a novel incremental PGA (iPGA) algorithm for both a population of SPD tensor fields, as well as spherical features, in Chapter 4. To this end, we present a novel iPGA method using the incremental Fréchet mean estimation technique presented in [21], and reformulate the PGA algorithm in [55] in an incremental form. In order to illustrate the effectiveness and accuracy of the proposed method we compare the performance of iPGA and the batch-mode PGA via synthetic and real data experiments. 16 CHAPTER 2 INCREMENTAL ESTIMATION OF THE STEIN CENTER OF SPD MATRICES AND ITS APPLICATIONS 2.1 Background Finding the mean of data lying on Pn can be achieved through a minimization process. More formally, the mean of a set of N data xi ∈ Pn is defined by ∗ x = argminx N ∑ d 2 (xi , x) (2–1) i=1 where d is the chosen distance/divergence. Depending on the choice of d, different types of means are obtained. Many techniques have been published on computing the mean SPD matrix based on different kinds of similarity distances/divergences. In [51], symmetrized Kullback-Leibler divergence was used to measure the similarities between SPD matrices, and the mean was computed in closed-form and applied to texture and diffusion tensor image (DTI) segmentation. Fréchet mean was obtained by using the GL-invariant (GL denotes the general linear group i.e., the group of (n, n) invertible matrices) Riemannian metric on Pn and used for DTI segmentation in [28] and for interpolation in [34]. Another popular distance is the so called Log-Euclidean distance introduced in [12] and used for computing the mean. More recently, in [9] the LogDet divergence was introduced and applied for tensor clustering and covariance tracking. Each one of these distances and divergences possesses their own properties with regards to invariance to group transformations/operations. For instance, the natural geodesic distance derived from the GL-invariant metric is GL-invariant. The c ⃝2013 IEEE. Reprinted with minor changes, with permission, from H. Salehian, G. Cheng, B.C. Vemuri and J. Ho, ”Recursive Estimation of the Stein Center of SPD Matrices and Its Applications”, In Computer Vision (ICCV), 2013 IEEE International Conference on, pp. 1793-1800. IEEE, December 2013. [39] 17 LogEuclidean distance is invariant to the group of rigid motions and so on. Among these distances/divergences, the LogDet divergence was shown to posses interesting bounding properties with regards to the natural Riemannian distance in [9] and much more computationally attractive for computing the mean. However, no closed form expression exists for computing the mean using the LogDet divergence, for more than two matrices. When the number of samples in the population is large and the size of SPD matrices is larger, it would be desirable to have a computationally more attractive algorithm for computing the mean using this divergence. An incremental form can effectively address this problem. Incremental formulation leads to considerable efficiency in mean computation, because for each new sample, all one needs to do is to update the old. Consequently, the algorithm only needs to keep track of the most recently computed mean, while computing the mean in a batch mode requires one to store all previously given samples. This can prove to be quite storage intensive for large problems. Thus, by using an incremental formula we can significantly reduce the time and storage consumption. Recently, in [6] recursive algorithms to estimate the mean SPD matrix based on the natural GL-invariant Riemannian metric and symmetrized KL-divergence were proposed and applied to the task of DTI segmentation. Also in [54] a recursive form of Log-Euclidean based mean was introduced. In this chapter we present a novel incremental algorithm for computing the mean of a set of SPD matrices, using the Stein metric. The Jensen-Bregman LogDet (JBLD) divergence was recently introduced in [9] for (n × n) SPD matrices. Compared to the standard approaches, the JBLD has a much lower computational cost since the formula does not require any eigen decompositions of the SPD matrices. Moreover, it has been shown that it is useful for use in nearest neighbor retrieval [9]. However, JBLD is not a metric on Pn , since it does not satisfy the triangle inequality. In [44] the authors proved that the square root of JBLD is a metric, which is called Stein metric. Unfortunately, the mean of SPD matrices based on the 18 Stein metric can not be computed in a closed form, for more than two matrices [5, 9]. Therefore, iterative optimization schemes are applied to find the mean for a given set of SPD matrices. The computational efficiency of these iterative schemes is effected considerably especially when the number of samples and size of matrices is large. This makes the Stein based mean inefficient for computer vision applications which deal with huge amounts of data. In this chapter, we introduce an efficient incremental formula to compute the Stein mean. To illustrate the effectiveness of proposed algorithm we first show that applying the incremental Stein mean estimator to the task of K-means clustering leads to significant gain in compute time when compared to using the batch mode Stein center, as well as other recursive mean estimators based on aforementioned distances/divergences. Furthermore, we develop a novel hashing technique which is a generalization of the work in [20] to SPD matrices. The key contributions are: (i) derivation of a closed form solution to the weighted Stein center of two matrices which is then used in the formulation of the incremental form for the Stein center estimation of more than two SPD matrices. (ii) Empirical evidence of convergence of the incremental estimator of Stein mean to the true Stein mean is shown. (iii) A new hashing technique for image indexing and retrieval using covariance descriptors. (iv) Synthetic and real data experiments depicting significant gains in computation time for SPD matrix clustering and image retrieval (using covariance descriptor features), using our incremental Stein center estimator. The rest of this chapter is organized as follows: in Section 3.3 we present the incremental algorithm to find the Stein distance based mean of a set of SPD matrices. Then in Section 2.3 we provide an overview of the important properties of Pn equipped with the Stein distance. Section 2.4 includes the empirical evidences of the convergence of incremental Stein mean estimator to the true Stein mean. Further, we present a set of synthetic and real data experiments showing the improvements in compute time of SPD matrix clustering and hashing. 19 2.2 Incremental Stein Mean Computation The action of the general linear group of n × n invertible matrices (denoted by GL(n)) on Pn defines the natural group action and is defined as follows: ∀g ∈ GL(n), ∀X ∈ Pn , X [g] = gXg T , where T denotes the matrix transpose operation. Let A and B be any two points in Pn . The geodesic distance on this manifold is defined by the following GL(n)-invariant Riemannian metric: dR (A, B)2 = trace(Log(A−1 B)2 ), (2–2) where Log is the matrix logarithm. The mean of a set of N SPD matrices based on the above Riemannian metric is called the Fréchet mean, and is defined as X ∗ = argminX N ∑ dR2 (X , Xi ), (2–3) i=1 where X ∗ is the Fréchet mean, and Xi are the given matrix-valued data. However, computation of the distance using (2–2), requires eigen decomposition of the matrix, which for large matrices slows down the computation considerably. Furthermore, the minimization problem (2–3) does not have a closed form solution in general (for more than two matrices) and iterative schemes such as the gradient descent technique are employed to find the solution. Recently in [9], the Jensen-Bregman LogDet (JBLD) divergence was introduced to measure similarity/dissimilarity between SPD matrices. It is defined as DLD (A, B) = logdet( A+B 1 ) − logdet(AB), 2 2 (2–4) where A and B are two given SPD matrices. It can be seen that JBLD is much more computationally efficient than the Riemannian metric, as no eigen decomposition is required. JBLD is however not a metric, because it does not satisfy the triangle inequality. However, in [44], it was shown that the square root of JBLD divergence is a metric, i.e., it is non-negative definite, symmetric and satisfies the triangle inequality. 20 This new metric is called Stein metric and is defined by, dS (A, B) = √ DLD (A, B), (2–5) where DLD is defined in (2–4). Clearly, Stein metric can also be computed efficiently. Accordingly, the mean of a set of SPD tensors, based on Stein metric is defined by ∗ X = argminX N ∑ dS2 (X , Xi ). (2–6) i=1 Let X1 , X2 , , XN ∈ Pn be a set of SPD matrices. The incremental Stein mean can be defined as M1 = X 1 (2–7) Mk+1 (wk+1 ) = argminM (1 − wk+1 )dS2 (Mk , M) +wk+1 dS2 (Xk+1 , M) where wk+1 = 1 , k+1 (2–8) Mk is the old mean of k SPD matrices, Xk+1 is the new incoming sample and Mk+1 is the updated mean for k + 1 matrices. Note that (2–8) can be thought of as a weighted Stein mean between the old mean and the new sample point, with the weight being set to be the same as in Euclidean mean update. Now, we show that (2–8) has a closed form solution for SPD matrices. Let A and B be two matrices in Pn . The weighted mean of A and B, denoted by C , with the weights being wa and wb such that wa + wb = 1, should minimize (2–8). Therefore, one can compute the gradient of this objective function and set it to zero to find the minimizer C wa [( C + A −1 C + B −1 ) − C −1 ] + wb [( ) − C −1 ] = 0 2 2 (2–9) Multiplying both sides of (2–9) by matrices C , C + A and C + B in a right order yields: CA−1 C + (wb − wa )C (I − A−1 B) − B = 0 21 (2–10) It can be verified that for any matrices A, B and C in Pn , satisfying (2–10), the matrices A− 2 CA− 2 and A− 2 BA− 2 commute. In other words 1 1 1 1 A−1 CA−1 B = A−1 BA−1 C (2–11) Left multiplication of (2–10) by A−1 yields A−1 CA−1 C + (wb − wa )A−1 C (I − A−1 B) = A−1 B (2–12) The equation above can be rewritten in a matrix quadratic form as the following, by using the equality in (2–11) 2 (wb − wa ) (I − A−1 B)) = 2 (wb − wa )2 A−1 B + (I − A−1 B)2 4 (A−1 C + Taking the square root of both sides and rearranging yields √ (wb − wa )2 −1 A C = A−1 B + (I − A−1 B)2 4 (wb − wa ) (I − A−1 B) − 2 (2–13) (2–14) Therefore, the solution of (2–10) for C can be written in the following closed form √ (wb − wa )2 C = A[ A−1 B + (I − A−1 B)2 4 wb − wa − (I − A−1 B)] (2–15) 2 It can be verified that the solution in (2–15) satisfies Eq. (2–11). Therefore, Eq. (2–8) for incremental Stein mean estimation can be rewritten as Mk+1 = √ Mk [ Mk−1 Xk+1 + (2wk+1 − 1)2 (I − Mk−1 Xk+1 )2 4 2wk+1 − 1 − (I − Mk−1 Xk+1 )] 2 22 (2–16) with wk+1 , Mk , Mk+1 and Xk+1 being the same as in (2–8). 2.3 Properties of Pn Equipped with the Stein Distance In this section we briefly remark on the metric geometry of Pn equipped with the Stein metric. Both the Stein metric dS and the GL(n)-invariant Riemannian metric dR are GL(n)-invariant. However, their similarity does not go beyond this GL(n)-invariance. In particular, we first show in this section that Pn equipped with the Stein metric is not a global Non-Positive Curvature (NPC) space defined in [46]. Lack of this important property makes it impossible to directly apply the convergence results of the incremental mean estimators on global NPC spaces, provided in [46], to our incremental Stein mean estimator. However, we will show that the Stein metric still shares important similarities and features with global NPC spaces that can serve as strong piece of evidence in favor of the algorithm’s convergence. 2.3.1 Global Non-Positive Curvature Spaces In [46], Sturm had provided a study of probability theory on metric spaces of non-positive curvature (so called global NPC spaces). An important requirement for this type of spaces is that, aside from being a metric space, the distance between two arbitrary points in the space M, denoted by dM , can be realized as the arc-length of a length-minimizing path (geodesic) joining the two points. Non-positive curvature, in this broader context, is formulated using several important inequalities, and the foremost of which is the following inequality among three arbitrary points x, y , z ∈ M and the geodesic path γ(t) joining x, y (with γ(0) = x, γ(1) = y ): d2M (z, γ(t)) ≤ (1 − t)d2M (z, x) + td2M (z, y ) − t(1 − t)d2M (x, y ). (2–17) This important inequality then implies the following well-known Reshetnyak’s quadruple comparison: for all x1 , x2 , x3 , x4 ∈ M, we have d2M (x1 , x3 ) + d2M (x2 , x4 ) ≤ d2M (x2 , x3 ) + d2M (x1 , x4 ) + d2M (x1 , x2 ) + d2M (x3 , x4 ). 23 Figure 2-1. Schematic view of x1 , x2 , x3 , x4 in Reshetnyak’s quadruple comparison. Reshetnyak’s quadruple comparison is a particularly useful result for deducing important theorems for global NPC spaces (see [46] and the references therein). In particular, for any global NPC space M and a set of samples, x1 , x2 , ... defined on M, its Fréchet mean (or barycenter in [46]) will be a unique point on M. Besides, the incremental mean estimator (similar to [6]) will asymptotically converge to the true Fréchet mean. Proposition 2.1. Pn with Stein metric is not a global NPC space. Proof. (Sketch) Proposition 2.3 in [46] states that if a metric space (M, dM ) is a global NPC space, then it is a geodesic space. However we show in the following proposition that (Pn , dS ) is not a geodesic space. Proposition 2.2. Let x, y be two arbitrary points in Pn . Their midpoints, ma , ms , with respect to the affine-invariant Riemannian metric and the Stein metric, respectively, coincide: ma = ms . However, in general, we have dS (x, ms ) = dS (y , ms ) but dS (x, ms ) ̸= 1 dS (x, y ). 2 Proof. (Sketch) The coincidence of midpoint is a consequence of [5]. The difference between dS (x, ms ) and 12 dS (x, y ) can be easily shown with a counter-example. Let x = 1 24 and y = 4, where x, y ∈ P1 , then the coincidence of midpoint implies that ms = ma = 2. But, it can be verified that dS (x, ms ) = dS (y , ms ) = 0.2427, while 12 dS (x, y ) = 0.2362, hence dS (x, ms ) ̸= 21 dS (x, y ). Therefore, based on Proposition 1.2 in [46], (Pn , dS ) is not a geodesic space. However, the following proposition illustrates that Pn with Stein metric shares an important similarity with global NPC spaces, although it is not one. Proposition 2.3. Pn with Stein metric satisfies Reshetnyak’s quadruple comparison. In other words, for all x1 , x2 , x3 , x4 ∈ Pn , the inequality in 2.3.1 is satisfied. To prove the theorem we will need to make use of the following lemmas. Lemma 1. For any quadruple of positive real numbers (matrices in P1 ) the Reshetnyak’s inequality holds. Proof. For positive real numbers, x and y , Stein distance can be rewritten as: √ dS (x, y ) = x +y log √ 2 xy (2–18) Therefore, the Reshetnyak’s inequality can be expressed by the following summation of real log functions x1 + x3 x2 + x4 x1 + x2 x2 + x3 x3 + x4 x4 + x1 log √ + log √ ≤ log √ + log √ + log √ + log √ 2 x1 x3 2 x2 x4 2 x1 x2 2 x2 x3 2 x3 x4 2 x4 x1 ⇒ log (x1 + x3 )(x2 + x4 ) (x1 + x2 )(x2 + x3 )(x3 + x4 )(x4 + x1 ) ≤ log √ 4 x1 x2 x3 x4 16x1 x2 x3 x4 (x1 + x3 )(x2 + x4 ) (x1 + x2 )(x2 + x3 )(x3 + x4 )(x4 + x1 ) ≤ √ 4 x1 x2 x3 x4 16x1 x2 x3 x4 √ √ √ √ x1 x2 x3 x4 x2 x3 x1 x4 ⇒( + + + )≤ x3 x4 x1 x2 x1 x4 x2 x3 √ √ √ √ √ √ √ √ 1 x1 x3 x2 x4 x2 x3 x1 x4 x2 x4 x1 x3 x1 x2 x3 x4 ( + + + )( + + + ) 4 x2 x4 x1 x3 x1 x4 x2 x3 x1 x3 x2 x4 x3 x4 x1 x2 ⇒ ⇒ (a + 1 1 1 1 1 1 1 + b + ) ≤ (b + + c + )(a + + c + ) a b 4 b c a c 25 (2–19) where a = √ x1 x2 , x3 x4 √ b= x1 x4 x2 x3 and c = But, for any positive number x, x + 1 x √ x1 x3 . x2 x4 ≥ 2. Therefore, A=a+ 1 ≥2 a B =b+ 1 ≥2 b C =c+ 1 ≥2 c So, the inequality 3–19 can be rewritten as 4(A + B) ≤ (C + A)(C + B) ⇒ C 2 + C (A + B) + AB − 4(A + B) ≥ 0 ⇒ (A + B)(C − 4) + C 2 + AB ≥ 0 (2–20) We already know that, C ≥2 ⇒ C − 4 ≥ −2 ⇒ (C − 4)(A + B) ≥ −8 (2–21) since A ≥ 2 and B ≥ 2 and hence A + B ≥ 4. On the other hand: C 2 ≥ 4 and also AB ≥ 4. Summing up these two inequalities with Eq. 2–21 shows the correctness of Eq. 2–20. Lemma 2. For any quadruple of diagonal matrices on Pn , the Reshetnyak’s inequality is satisfied. Proof. The previous result can be immediately extended to the diagonal matrices on Pn . Let X and Y be diagonal matrices, and xi and yi be their diagonal elements, 26 respectively. Then the Stein distance between X and Y can be obtained as dS2 (X , Y ) = n ∑ ∑ xi + yi 1 d 2 (xi , yi ) ) − log(xi yi ) = 2 2 i=1 n log( i=1 Now, let X , Y , Z , and W are diagonal matrices, with diagonal elements being xi , yi , zi and wi , respectively. Based on lemma 1, the inequality for each i is satisfied, resulting n inequalities for real numbers. Summing up these inequalities and using 2–22 completes the proof. Lemma 3. Let A and B be two SPD matrices. There is a matrix P for which P T AP = I and P T BP = D ↓ , where I is the identity matrix and D ↓ is a diagonal matrix whose diagonal elements are sorted in decreasing order. Proof. (Based on the intuition from [44]) Let A = UΛU T , and define S = Λ −1 2 U. Now define C = S T U T BUS, hence there exists a matrix V such that C = VD ↓ V T , where D ↓ is diagonal with elements sorted in decreasing order. The proof will be followed by setting P = USV , because: P T AP = V T S T U T UΛU T USV = V T U T Λ −1 2 ΛΛ −1 2 UV = I (2–22) also, by construction of P, P T BP = V T S T U T BUSV = V T CV = D ↓ (2–23) Proof of Proposition 2.3 Let A1 , A2 , A3 and A4 be the given quadruple. Based on Lemma 3, there exists a matrix P such that P T A1 P = I and P T A2 P = D2↓ , where I is the identity matrix and D2↓ is a diagonal matrix in which the diagonal elements are sorted in decreasing order. Assume that P T A3 P = X3 and P T A4 P = X4 . Therefore, based on the congruence invariance of the Stein metric, it will be sufficient to prove the inequality for the new quadruple (I , D2↓ , X3 , X4 ). 27 Let Xi↓ be the diagonal matrix with diagonal elements being the eigenvalues of Xi , sorted in decreasing order. Based on lemma 2, the Reshetnyak’s inequality holds for quadruple (I , D2↓ , X3↓ , X4↓ ), as all these matrices are diagonal. Mathematically, 2 2 2 2 2 2 dS (I , X4↓ ) + dS (D2↓ , X3↓ ) ≤ dS (I , D2↓ ) + dS (D2↓ , X4↓ ) + dS (X4↓ , X3↓ ) + dS (X3↓ , I ) (2–24) Now, we want to show the inequality for (I , D2↓ , X3 , X4↓ ), where X3↓ is replaced by X3 . To this end, we make use of the congruence invariance property of the Stein metric. There exists a matrix Q for which Q T D2↓ Q = I and Q T X3↓ Q = Y3↓ , where I is the identity and Y3↓ is a diagonal matrix with decreasing diagonal elements. Suppose I , X3 and X4↓ are moved to Y1 , Y3 and Y4 by the congruent transform Q, respectively. Based on the congruence invariance, the inequality holds for (Y1 , I , Y3↓ , Y4 ): 2 2 dS (Y1 , Y4 )2 + dS (I , Y3↓ ) ≤ dS (Y1 , I )2 + dS (I , Y4 )2 + dS (Y4 , Y3↓ ) + dS (Y3↓ , Y1 ) 2 (2–25) Moreover, it has been shown in [44] that for all pairs of SPD matrices, dS (A, B) ≥ dS (A↓ , B ↓ ), and in the special case, dS (I , A) = dS (I , A↓ ). Accordingly, dS (X3↓ , X4↓ ) ≤ dS (X3 , X4↓ ) and dS (I , X3↓ ) = dS (I , X3 ). Based on the congruence invariance property, these two relations can be extended to dS (Y3↓ , Y4 ) ≤ dS (Y3 , Y4 ) and dS (Y1 , Y3↓ ) = dS (Y1 , Y3 ). Furthermore, in the new quadruple we can obviously see that dS (I , Y3↓ ) = dS (I , Y3 ). According to these relations we can replace Y3↓ by Y3 in Eq. 2–25, which implies that dS (Y1 , Y4 )2 + dS (I , Y3 )2 ≤ dS (Y1 , I )2 + dS (I , Y4 )2 + dS (Y4 , Y3 )2 + dS (Y3 , Y1 )2 (2–26) At the end, we can apply the group action, Q −1 , to get the original quadruple, which proves the inequality for (I , D2↓ , X3 , X4↓ ). The sequence of the above group actions is illustrated in the Fig. 2-2. Note that the curves between each pair of points are drawn only for demonstration of the corresponding Stein distances, and they do not represent geodesic curves. 28 Figure 2-2. Illustration of the proof of Reshetnyak’s inequality for the quadruple (I , D2↓ , X3 , X4↓ ), from the quadruple (I , D2↓ , X3↓ , X4↓ ). In the last step we will prove the inequality for (I , D2↓ , X3 , X4 ), where X4↓ is replaced by X4 . Similar to above, we apply the congruence invariance in the following manner; there exists a matrix R for which R T X3 R = I and R T X4↓ R = Z4↓ . The matrices I , X4 and D2↓ are moved to Z1 , Z4 and Z2 , respectively under this transformation. Congruence invariance implies that 2 2 2 dS (Z1 , Z4↓ ) + dS (Z2 , I )2 ≤ dS (Z1 , Z2 )2 + dS (Z2 , Z4↓ ) + dS (Z4↓ , I ) + dS (I , Z1 )2 (2–27) In a similar fashion to the last part we can say that dS (Z1 , Z4↓ ) = dS (Z1 , Z4 ) and also dS (Z2 , Z4↓ ) ≤ dS (Z2 , Z4 ). Using these relations we will end up with the following inequality dS (Z1 , Z4 )2 + dS (Z2 , I )2 ≤ dS (Z1 , Z2 )2 + dS (Z2 , Z4 )2 + dS (Z4 , I )2 + dS (I , Z1 )2 (2–28) Applying the group action, R −1 , asserts that 2 2 2 dS (I , X4 )2 + dS (D2↓ , X3 ) ≤ dS (I , D2↓ ) + dS (D2↓ , X4 ) + dS (X4 , X3 )2 + dS (X3 , I )2 29 (2–29) Finally, we will use the group action P −1 to get the original quadruple dS (A1 , A4 )2 + dS (A2 , A3 )2 ≤ dS (A1 , A2 )2 + dS (A2 , A4 )2 + dS (A4 , A3 )2 + dS (A3 , A1 )2 (2–30) which completes the proof.□ 2.3.2 Discussion If Pn equipped with the Stein metric were a global Non-Positive Curvature (NPC) space [46], Sturm shows that Mk+1 resulted in 2–16 converges to the unique Stein expectation as k → ∞ [46]. Unfortunately, as shown in this section, it is not a geodesic space, and consequently not a global NPC space. Therefore, the proof of convergence for our case requires further efforts. However, we present empirical evidence for 100 SPD matrices randomly drawn from a log-Normal distribution to indicate that the incremental estimates of the Stein mean converge to the batch mode Stein mean (see Fig. 2-3). 2.4 Experiments In this section, we present several synthetic and real data experiments. All of the execution times reported in this section are for experiments performed on a machine with a 2.67GHz Intel-7 CPU with 8GB RAM. 2.4.1 Performance of the Incremental Stein Center To illustrate the performance of the proposed incremental algorithm, we generate 100 i.i.d samples form a Log-normal distribution [41] on P3 with the variance and expectation set to 0.25 and the identity matrix respectively. Then, we input these random samples to the incremental Stein based mean estimator (ISM) and its non-incremental counterpart (SM). To compare the accuracy of ISM and SM we compute the Stein distance between the ground truth and the computed estimate. Further, the computation time for each newly acquired sample is recorded. We repeat this experiment 20 times and plot the average error and the average computation time at each step. Fig. 2-3 depicts the accuracies of ISM and SM in the same plot. It can be seen that for the 30 Figure 2-3. Error comparison of the incremental (red) versus non-incremental (blue) Stein mean computation for data on P3 . given 100 samples, as desired, the accuracy of the incremental and non-incremental algorithms are almost the same. It should be noted that ISM computes the new mean by a simple matrix operations, e.g., summations and multiplications, which makes it very fast for any number of samples. This means that the incremental Stein based mean is computationally far more efficient, especially when the number of samples is very large and the samples are input incrementally, for example as in clustering and some segmentation algorithms. 2.4.2 Application to K-means Clustering In this section we evaluate the performance of our proposed incremental algorithm applied to K-means clustering. The two fundamental components of the K-means algorithm at each step are: (i) distance computation and (ii) the mean update. Due to the computational efficiency involved in evaluating the Stein metric, the distances can be efficiently computed. However, due to the lack of a closed form formula for computing the Stein mean, the cluster center update is more time consuming. To tackle this problem we employ our incremental Stein mean estimator. 31 Figure 2-4. Time comparison of the incremental (red) versus non-incremental (blue) Stein mean computation for data on P3 . To this end, at the end of each K-means iteration, only the matrices that change cluster membership in previous iteration are considered. Then, each cluster center is updated only by applying the changes imposed by the matrices that most recently changed cluster memberships. For instance, let C1i and C2i be the centers of the first and second clusters, at the end of the i -th iteration. Also, let X be a matrix which has moved from the first cluster to the second one. Therefore, we can directly update C1i by removing X from it to get C1i+1 , and adding X to C2i in its update, to get C2i+1 . This will significantly decrease the computation time of the K-means algorithm, especially for huge datasets. This process is shown in Fig. 2-5. To illustrate the efficiency resulting from using our proposed incremental Stein mean (ISM) update, we compared its performance to the non-incremental Stein mean (SM), as well as the following three widely used mean computation techniques: Fréchet mean (FM), symmetric Kullback-Leibler mean (KLsM) and Log-Euclidean (LEM) mean. Furthermore, to show the effectiveness of the Stein metric in K-means distance computation, we included comparisons to the following recursive mean 32 Figure 2-5. Illustration of the incremental mean updates in K-means clustering. estimators recently introduced in literature: Recursive Log-Euclidean mean (RLEM) [54], Incremental Fréchet Expectation Estimator (IFEE) and Recursive KLs mean (RKLsM) in [6]. We should emphasize that for each of these mean estimators we used the corresponding distance/divergence in the K-means algorithm. The efficiency of the proposed K-means algorithm is investigated in the following set of experiments. We tested our algorithm in three different scenarios namely, with increasing (i) number of samples, (ii) matrix size, and (iii) number of clusters. For each scenario we generated samples from a mixture of Log-normal distributions, where the expectation of each component is assumed to be the true cluster center. To measure the error in clustering, we compute the geodesic distance between each estimated cluster center and its true value, and take the summation of error values over all clusters. Fig. 2-6 depicts the time comparison between the aforementioned K-means clustering techniques. It is clearly evident that the proposed method (ISM) is significantly faster than other competing methods, in all the aforementioned settings of the 33 experiment. There are two reasons that support the time efficiency of ISM: (i) incremental update of the Stein mean, which is achieved via the closed form expression in Eq. 2–16, (ii) fast distance computation, by exploiting the Stein metric, as the Stein distance is computed using a simple matrix determinant followed by a scalar logarithm, while the Log-Euclidean, GL-invariant Riemannian distances and the KLs divergence, require complicated matrix operations, e.g., matrix logarithm, inverse and square root. Consequently, it can be seen in Fig. 2-6 that for large datasets, the recursive Log-Euclidean, Fréchet and KLs mean methods are as slow as their non-recursive counterparts, since a substantial portion of time is consumed in the distance computation task involved in the algorithm. Furthermore, Fig. 2-7 depicts the error defined earlier, for each experiment. It can be seen that, in all the cases, the accuracy of the ISM estimator is very close to the other competing methods, and in particular to the non-incremental Stein mean (SM) and Fréchet mean (FM). Thus, accuracy wise, the proposed ISM estimator is as good as the best in the class but far more computationally efficient. These experiments verify that the proposed incremental method is a computationally attractive candidate for the task of K-means clustering in the space of SPD matrices. 2.4.3 Application to Image Retrieval In this section, we present results of applying our incremental Stein mean estimator to the image hashing and retrieval problem. To this end, we present a novel hashing function which is a generalization of spherical hashing applied to SPD matrices. The spherical hashing was introduced in [20] for binary encoding of large scale image databases. However, it can not be applied as is (without modifications) to the space of SPD matrices, since it has been developed for inputs in a vector space. In this section we describe our extensions to the spherical hashing technique in order to deal with SPD matrices (which are elements of a Riemannian manifold with negative sectional curvature). 34 Figure 2-6. Time comparison of the K-means clustering using various methods. Figure (a) is the result for increasing number of clusters, with 1000 samples on P2 . In (b) the database size is increased from 400 to 2000, with 5 clusters, on P2 . Finally, in (c) the matrix dimension is increasing with 1000 samples and 3 clusters. 35 Figure 2-7. Error comparison of the K-means clustering using techniques specified in Fig. 2-6. (a), (b) and (c) are the results for varying number of clusters, number of samples and matrix dimensions, respectively. 36 Given a population of SPD matrices, our hashing function is based on the distances to a set of fixed pivot points. Let P1 , P2 , ..., Pk be the set of produced pivot points for the given population. The hashing function is denoted by H(X ) = (h1 (X ), ..., hk (X )), with X being the given SPD matrix, and each hi defined by 0 if dist(Pi , X ) > ri hi (X ) = 1 if dist(Pi , X ) ≤ ri (2–31) where dist(., .) denotes any distance defined on the manifold of SPD matrices. The value of hi (X ) illustrates whether the given matrix X is inside the geodesic ball formed around Pi , with the radius ri . In our experiments we used the Stein distance defined in Equation (2–5), because it is more computationally appealing for large datasets. An appropriate choice of pivot points as well as radii is crucial to guarantee the accuracy of the hashing. In order to locate the pivot points we have employed the K-means clustering based on the Stein mean, which was discussed in Section 2.4.2. Furthermore, the radius ri is picked such that for the hashing function, hi satisfies, Pr [hi (X ) = 1] = 1 2 (2–32) which guarantees that each geodesic ball contains half of the samples. Based on this framework, each member of a set of (n × n) SPD matrices is mapped to a binary code with the length k. To measure similarity/dissimilarity between binary codes the spherical Hamming distance described in [20] is used. In order to evaluate the performance of the proposed incremental Stein mean algorithm in this image hashing framework, we first located the pivot points by exploiting four of the K-means clustering techniques discussed in Section 2.4.2: ISM, SM, IFEE and RLEM. Then, the retrieval precision for each method is measured and compared. Experiments were performed on the COREL image database [29], which contains 10K images categorized into 80 classes. For each image a set of feature vectors were 37 computed of the form f = [Ir , Ig , Ib , IL , IA , IB , Ix , Iy , Ixx , Iyy , |G0,0 (x, y )|, ..., |G2,1 (x, y )|] (2–33) where the first three components represent the RGB color channels, the second three encode the Lab color dimensions, and the next four specify the first and second order gradients at each pixel. Further, as in [17], the Gu,v (x, y ) represent the response of a 2D Gabor wavelet, centered at (x, y ) with scale v and orientation u. Finally, for the set of N feature vectors extracted from each image, f1 , f2 , ..., fN , a covariance matrix was created using N 1∑ Cov = (fi − f¯)(fi − f¯)T N 1 (2–34) where f¯ is the mean vector. Therefore, from this dataset ten thousand 16×16 covariance matrices were extracted. To compare the time efficiency, we record the total time to compute the pivots, and also to find the radii, for each aforementioned technique. Furthermore, a set of 1000 random queries were picked from the dataset, and for each query its 10 nearest neighbors were retrieved based on the spherical Hamming distance. The retrieval precision for each query was measured by the number of correct matches to the total number of retrieved images, namely 10. Total precision is then computed by averaging these accuracies. Fig. 2-8 shows the time taken by each method. As expected, it can be observed that the incremental Stein mean estimator significantly outperforms other methods, especially for longer binary codes. The incremental framework provides an efficient way to update the mean covariance matrix. Further, IFEE which is based on the GL-invariant Riemannian metric is much more computationally expensive than our incremental Stein method. Fig. 2-9 depicts the accuracy for each technique. It can be seen that the incremental Stein mean estimator provides almost the same accuracy as the non-incremental Stein as well as the IFEE . Therefore, the accuracy and computational 38 Figure 2-8. Time consumption in initializing hashing functions, for incremental Stein mean (ISM), non-incremental Stein mean (SM), recursive LogEuclidean mean (RLEM) and Incremental Fréchet expectation estimator (IFEE ), over increasing binary code lengths. efficiency of our proposed method makes it an appealing choice for image indexing and retrieval on huge datasets. Fig. 2-10 shows the outputs of the proposed system for four sample queries. Note that all of the retrieved images shown in Fig. 2-10 belong to the same class in the provided ground truth. 2.4.4 Application to Shape Retrieval In this section, the image hashing technique presented in Section 2.4.3 is evaluated in a shape retrieval experiment, using the MPEG-7 database [27], which consists of 70 different objects with 20 shapes per object, for a total of 1400 shapes. To extract the covariance features from each shape, we first partition the image into four equal areas and compute the 2 × 2 covariance matrices constructed from (x, y ) coordinates of the edge points, in each region. Finally, we combined these matrices into a single block diagonal matrix, resulting in an 8 × 8 covariance descriptor. 39 Figure 2-9. Comparison of retrieval accuracy, for techniques specified in Fig. 2-8 Table 2-1. Average shape retrieval precision (%) for the MPEG7 database, for different Binary Code (BC) lengths. BC Length 64 128 192 256 ISM SM IFEE RLEM 60.67 62.10 61.46 61.15 63.59 64.65 64.69 63.23 69.69 69.63 70.10 68.19 73.13 73.13 73.84 70.14 We used the same methods as in Section 2.4.3 to compare the shape retrieval speed and precision. Table 2-1 contains the retrieval precision comparison, and it can be seen that the ISM provides roughly the same retrieval accuracy as IFEE, while table 2-2 shows that ISM is significantly faster than all the competing methods. 40 Figure 2-10. Example results of proposed retrieval system, based on the incremental Stein mean, with 640-bits binary codes. The leftmost column in each row represents the query image, and the rest of the columns show the 5 most similar images retrieved. The retrieved images are sorted in increasing order with respect to the Hamming distance to the query, where the Hamming distance is specified below each image. Table 2-2. Time (in seconds) comparison for shape retrieval. BC Length 64 128 192 256 ISM 48.76 53.44 89.04 105.33 SM IFEE RLEM 104.61 381.14 397.66 185.80 366.60 415.62 189.89 380.41 397.66 196.61 368.63 398.23 41 CHAPTER 3 INCREMENTAL FRÉCHET MEAN ESTIMATOR ON SPHERE 3.1 Background In many applications in computer vision, machine learning and medical imaging, the data lies on sphere. To mention a few, the directional data which often appear in computer vision are points on the unit sphere S2 [33]. Furthermore, any 3 × 3 rotation matrices can be parameterized by unit quaternions which can be represented by points on the 3-dimensional unit sphere S3 [18]. Also, the square root density functions are points on a hyper-sphere embedded in an infinite dimensional Hilbert space [45]. In most of the aforementioned applications, mean computation is a fundamental component. For instance, in the interpolation and smoothing of Orientation Distribution Functions (ODFs) [8], estimation of the mean rotation from several corresponding pair of points in multi-view geometry [18], and statistical analysis of directional data [33]. The Riemannian geometry of the sphere have been well-studied in the past decades [11, 38]. Given, a set of n points, X1 , X2 , ..., Xn , on the sphere, the Riemannian center of mass, M, is defined as the (global) minimizer of the sum of squared geodesic distances, M = argminY n ∑ d 2 (Xi , Y ) (3–1) i=1 where d(.) is the intrinsic distance defined on sphere. We will henceforth refer to this center of mass by Fréchet mean, as opposed to the Karcher mean which is frequently used in literature, because Karcher mean often refers to a local solution, while Fréchet mean is the global minimizer of this cost function. For detailed discussions we refer the reader to [1, 24]. It is known that there is no closed form solution for this objective The material in this chapter with minor changes is going to be submitted to the Information Processing in Medical Imaging (IPMI), Springer, 2015 42 function, the so called Fréchet function, on the sphere, and iterative schemes like gradient descent must be employed. Therefore, the task of Fréchet mean computation can be computationally expensive, specially for very large datasets. In this chapter, we propose an incremental method to estimate the Fréchet mean of a set of samples on sphere. The incremental way to update the mean is computationally efficient, because, given the mean estimated for n samples, Mn , and the new given sample Xn+1 , one can update the mean to Mn+1 , in one shot and no iterative optimization algorithm needs to be employed to compute the new mean from scratch. Therefore, the incremental technique speeds up the compute time, significantly. Moreover, an incremental method only needs to keep track of the most recently computed Fréchet mean, and this provides considerable efficiency in space consumption. Although this significant time/space efficiency comes with the cost of lower accuracy, the major part of this chapter is devoted to showing that in the limit (over the number of samples), our incremental technique converges to the true Fréchet mean, for symmetric distributions. In [6] authors proposed an incremental Fréchet mean estimator for the manifold of (n × n) SPD matrices, denoted by P(n), and provided the convergence analysis of the incremental estimator to the true Fréchet mean. However, it is known that the space of SPD matrices is a Riemannian manifold with non-positive sectional curvature [34], while sphere is an example of positively curved Riemannian manifolds [38]. This does indeed make a significant difference to proving the convergence. Specially, the following two items are the most important obstacles in extending the convergence analysis in [6] to a similar estimator on sphere: First, the existence and uniqueness of minimizer of Fréchet function for a set of samples on a complete Riemannian manifold with positive sectional curvature, is not guaranteed [1]. This is a consequence of the fact that the Fréchet function is not necessarily convex on the entire manifold. Several authors tried to restrict the geodesic 43 ball containing the data points to guarantee the convexity of the Fréchet function [1, 25].It was shown in [25] that if the sample points belong to a geodesic ball with radius π 2 on a unit sphere Sk , the (L2 ) minimizer of the Fréchet function will exist and will be unique. Therefore, in the rest of the chapter we assume that the samples belong only to the (northern) hemisphere of Sk . Second, the well-known parallelogram law in Euclidean space has its counterpart, the so called semi-parallelogram law, in any complete negatively curved Riemannian manifold, M, [46]; for any pair of points X , Y ∈ M, there exists a point M ∈ M, such that ∀Z ∈ M, d 2 (Z , M) ≤ 1 2 1 1 d (X , Z ) + d 2 (Y , Z ) − d 2 (X , Y ) 2 2 4 (3–2) Note that the equality is satisfied only in a Euclidean space. This inequality is of crucial importance in the convergence analysis of the incremental Fréchet mean on non-positively curved spaces [6, 21, 46]. However, for a positively curved space, e.g., sphere, the opposite inequality holds, hence, further efforts must be made to prove the convergence of incremental Fréchet mean estimator on sphere. To the best of our knowledge, there is no convergence analysis proposed in literature for the incremental Fréchet mean estimator, on any positively curved Riemannian manifold. In this chapter, we show that the incremental estimator converges to the true Fréchet mean in the limit over the number of samples. We employ the well-known concept of Gnomonic Projection in computer vision [22] to project the sample points to a (linear) projection space, in order to simplify the convergence proof. The rest of this chapter is organized as follows. In section 3.2 we briefly introduce the Riemannian geometry of sphere as well as gnomonic projection, and provide the notations that are used in the rest of the chapter. The main convergence result will be provided in section 3.3, along with the necessary theorems and lemmas. Finally, section 3.4 contains the experiments illustrating the efficiency and accuracy of our incremental method. 44 3.2 Preliminaries 3.2.1 Riemannian Geometry of Sphere Here, we provide a brief introduction to the Riemannian geometry of sphere. For more details, reader is referred to [8, 45]. Let Sk denote the k-dimensional unit sphere, embedded in Rk+1 , i.e., Sk = {X ∈ Rk+1 |||X || = 1}, where ||.|| is the L2 norm of a vector. It is evident that sphere is not closed under vector operations, e.g., given X , Y ∈ Sk , X + Y does not necessarily belong to Sk , hence it is not a vector space, but a Riemannian metric space with positive constant sectional curvature [38]. Let TX Sk denote the tangent space of Sk , at point X . For any two tangent vectors U, V ∈ TX Sk , the inner product between U = [u1 , u2 , ..., uk+1 ] and V = [v1 , v2 , ..., vk+1 ] is defined by: < U, V >= k+1 ∑ ui vi (3–3) i=1 The curve length on sphere can be measured and the geodesic distance between any given points X , Y ∈ Sk can be computed by d(X , Y ) = cos−1 (< X , Y >) (3–4) The exponential map of a given vector V ∈ TX Sk is defined by ExpX (V ) = X cos(||V ||) + V sin(||V ||) ||V || (3–5) and the log map of Y ∈ Sk at any point X ∈ Sk is obtained by LogX (Y ) = Y − X cos(ϕ) ϕ ||Y − X cos(ϕ)|| (3–6) where ϕ =< X , Y >. Using the exponential and log map, the geodesic curve between any pair of points X , Y ∈ Sk is given by γ(t) = X #t Y = ExpX (tLogX (Y )) 45 (3–7) with γ(0) = X and γ(1) = Y . The geodesic curve is a part of the great circle, i.e., circle with unit radius, that connects X and Y . Using the geodesic distance provided above, one can define the Fréchet mean of a set of points on sphere as the minimizer of sum of squared geodesic distances. Formally speaking, let X1 , X2 , ..., Xn ∈ Sk be n given points. Then, the Fréchet mean is defined by: µ∗ = argminµ∈Sk n ∑ d 2 (Xi , µ) (3–8) i=1 Let B(C , ρ), be the geodesic ball centered at C with radius ρ, i.e., B(C , ρ) = {Q ∈ Sk |d(C , Q) < ρ}. Authors in [1] showed that for any C ∈ Sk and for data samples in B(C , π2 ), the minimizer of the Fréchet function exists and is unique (and also belongs to B(C , π2 )). Therefore, in the rest of the chapter, we assume that this condition is satisfied for any set of given points, Xi . For simplicity, we are particularly interested in the samples belonging to the northern hemisphere, in which case C is the north pole, e.g., C = [0, 0, 1] ∈ S2 , and ρ = π2 . Note that based on the strict inequality in definition of B(C , ρ); d(C , Q) < π2 , hence the equator is excluded from the geodesic ball. 3.2.2 Gnomonic Projection On a unit k-dimensional sphere Sk , the Gnomonic Projection of any point X ∈ Sk , is defined as the intersection of the tangent plane at the north pole and the line which passes through the origin, i.e., O = [0, 0, ..., 0], and X [22]. For instance, in Fig. 3-1, xn+1 is the projection of Xn+1 ∈ Sk . The gnomonic projection is not well-defined for the points on the equator, because they are projected to infinity in the tangent plane, but this will not affect our statistical analysis, since we assume that the data points belong to the hemisphere, with the equater being excluded. Using this gnomonic projection, the geodesic curve between any pair of points, X and Y , on the hemisphere is projected to a straight line connecting x and y in the 46 Figure 3-1. Gnomonic Projection projection space [18], where x and y are the projections of X and Y , respectively. We employed the gnomonic projection to simplify the statistical analysis of points on sphere. 3.3 Incremental Fréchet Mean Estimator on Sphere With the background materials established so far, we are now ready to present our incremental Fréchet Mean Estimator (iFME) on sphere. The proposed method is motivated by the idea in [6] which is similar to the Euclidean case; given the old mean, Mn−1 , and the new sample, Xn , define the new mean, Mn , as the weighted mean of Mn−1 and Xn with the weights being n−1 n and n1 , respectively. From a geometric viewpoint, this corresponds to the choice of the point on geodesic curve between Mn−1 and Xn , with the parameter t = n1 . Formally speaking, let X1 , X2 , ..., XN be a set of N samples on sphere Sk , which all belong to the geodesic ball B(C , π2 ), and C is the north pole. Also, let Mn be the iFME 47 estimate for nth given sample, Xn , which is defined by: M1 = X 1 (3–9) Mn = Mn−1 # 1 Xn (3–10) n where A#t B is the geodesic curve parameterized by t, from A to B (∈ Sk ), and 1 n is our weighting scheme which is henceforth called the Euclidean weight. In the rest of the chapter, we will show that if the number of given samples, N, tends to infinity, the iFME estimates will converge to the Fréchet mean of the distribution from which the samples are drawn.. Our strategy is based on the idea of projecting the spherical samples, Xi , to the tangent plane and perform the convergence analysis on this linear space on the projected samples, i.e., xi , instead. We take advantage of the fact that the geodesic curve between any pair of points on hemisphere, is projected to a straight line in the tangent space at the north pole, via the gnomonic projection [18]. According to the law of large numbers in Euclidean space [3], the arithmetic mean of a set of samples converges to the mean of the distribution from which the samples are drawn, as number of samples tends to infinity. Despite the simplifications followed in the statistical analysis of iFME estimates on sphere using gnomonic projection, there are two important obstacles that must be considered. Suppose the true Fréchet mean of the input samples, Xi , is the north pole. Then, it can be shown by counter examples that: (1) The use of Euclidean weights, n1 , to update the iFME estimates on Sk , does not necessarily correspond to the same weighting scheme between the old mean and the new sample, in the projection space. (2) The mean of the projected samples, xi ’s, does not necessarily coincide with the north pole. The first fact above can be illustrated using two sample points on a unit circle (S1 ), X1 = π/6 and X2 = π/3, whose midpoint is M = π/4. Then, the midpoint 48 Figure 3-2. Illustration of the counterexample showing that the use of Euclidean weights to update iFME in Sk , does not necessarily correspond to the same weights in the tangent space. of the gnomonic projections of X1 and X2 , which are denoted by x1 and x2 , is m̂ = tan(π/3)+tan(π/6) 2 = 1.1547 ̸= tan(π/4) = m (see Fig. 3-2). To observe the second fact, consider three points, X1 , X2 , X3 , in S1 , respectively equal to π/4, π/12 and −π/3 (Fig. 3-3). Although the Fréchet mean of these points is located at the north pole (c), the arithmetic mean of the gnomonic projections, ĉ, is not. Nevertheless, in Lemma 1, we will show that for the sample points which are symmetrically distributed around the north pole, the mean of the projected samples coincides the north pole. Lemma 1. For a set of samples, Xi ∈ Sk which are symmetrically distributed around the north pole, C , the arithmetic mean of the projected points, xi , in the tangent plane at the north pole, is the north pole. By symmetry we mean that ∀Xi ∈ X = {X1 , X2 , ..., XN }, ∃Xj ∈ X, such that, Xi # 1 Xj = C . 2 Proof Sketch. By the symmetry assumption of the input, one can divide the samples in X, into N 2 disjoint pairs of points on Sk , i.e., Pm = {Xm,1 , Xm,2 }, 1 ≤ m ≤ N , 2 N 2 such that ∀m, Xm,1 # 1 Xm,2 = C , and ∪m=1 Pm = X. Then, for the gnomonic projection of 2 each pair of points, the midpoint coincide the north pole, using the fact that ∀ϕ, tan(ϕ) + tan(−ϕ) = 0. Therefore, the mean of projected points in the tangent plane will be 49 Figure 3-3. Demonstration of the counterexample to prove that the Fréchet mean of samples on Sk , does not necessarily coincide with the arithmetic mean of projected points in the tangent space. reduced to the mean of N 2 sample points, all located at the north pole. Hence, the result holds. ■ In the rest of this section, we assume that the population of the samples are symmetrically distributed around the Fréchet mean. Besides, without loss of generality, we assume that the true Fréchet mean of N given samples is located at the north pole. Since the gnomonic projection space is centered at the north pole, this assumption makes significant simplifications in our convergence analysis. However a similar convergence proof can be worked out for any arbitrary Fréchet mean, with the projection space established at the mean location. In what follows, we prove that the use of Euclidean weights, i.e., wn = 1 , n to update the incremental Fréchet mean on sphere, corresponds to a set of weights in the projection space, denoted henceforth by tn , for which the convergence of incremental mean to the true Fréchet mean, can be shown. 50 3.3.1 Angle Bisector Theorem The relation between the weights on sphere, and the corresponding weights on the projection space, can be obtained in closed form, depending upon the point where the projection space has been anchored. In Fig. 3-1, Mn and Mn+1 denote the iFME estimates for n and n + 1 given samples, respectively, and Xn+1 denotes the (n + 1)st sample. Further, mn , mn+1 , xn+1 are the corresponding points in the projection space. Based on the Angle Bisector Theorem [2]: tn = ||mn − mn+1 || ||O − mn || sin(d(Mn , Mn+1 )) = × ||xn+1 − mn+1 || ||O − xn+1 || sin(d(Mn+1 , Xn+1 )) (3–11) where d(.) is the geodesic distance on hemisphere. Note that in the standard law of large number, tn = n1 . In the next sections, we assume that the input samples, Xi , are within the geodesic ball, B(C , ϕ), where 0 < ϕ < π/2. Then, we bound the values that tn can possibly take, with respect to the radius ϕ. 3.3.2 Lower Bound for tn To find the lower bound for tn , we find the lower bounds for each fraction in right hand side of Eq. 3–11. The first term reaches its minimum value, if Mn is located at the north pole, and Xn+1 is located on the boundary of the geodesic ball, B(C , ϕ). In this case, ||O − mn || = 1 and ||O − xn+1 || = 1 . cos(ϕ) This implies that: ||O − mn || ≥ cos(ϕ) ||O − xn+1 || (3–12) Next, note that based on the definition of iFME, this second fraction in 3–11 can be rewritten as: sin(d(Mn , Mn+1 )) sin(d(Mn , Mn+1 )) 1 = = sin(d(Mn+1 , Xn+1 )) sin(n × d(Mn , Mn+1 )) Un−1 (cos(d(Mn , Mn+1 ))) 51 (3–13) where Un−1 (x) is the Chebyshev polynomial of the second kind [42]. For any x ∈ [−1, 1], the maximum of Un−1 (x) is reached when x = 1, for which Un−1 (1) = n. Therefore, Un−1 (x) ≤ n and 1 Un−1 (x) ≥ n1 . This implies that: sin(d(Mn , Mn+1 )) 1 1 = ≥ sin(n × d(Mn+1 , Mn+1 )) Un−1 (cos(d(Mn , Mn+1 ))) n (3–14) From inequalities 3–12 and 3–14, tn ≥ cos(ϕ) n (3–15) Note that when ϕ tends to zero, cos(ϕ) converges to one, and the above ratio tends to n1 , which is the case in Euclidean space. On the other hand, if ϕ tends to π2 , then cos(ϕ) tends to zero, and this ratio becomes very small. 3.3.3 Upper Bound for tn First, the upper bound for the first term in 3–11 is reached when Mn is on the edge of geodesic ball, and Xn+1 is given at the north pole. Therefore, ||O − mn || 1 ≤ ||O − xn+1 || cos(ϕ) (3–16) Finding the upper bound for the sin term however is quite involved. Note that the maximum of the angle between OMn and OXn+1 , denoted by α, is reached when Mn and Xn+1 are both on the edge of the geodesic ball, i.e., α ≤ 2ϕ. Therefore, ϕ ∈ [0, π2 ) implies that α ∈ [0, π). Further, it has been shown in the Appendix that the following inequality holds for any α ∈ (0, π). nα sin( n+1 ) α ≥ n cos2 ( ) = n cos2 (ϕ) α sin( n+1 ) 2 From 3–16 and 3–17, 52 (3–17) tn ≤ 1 cos(ϕ)3 n (3–18) In summary, we showed that once iFME algorithm is employed using Euclidean weights on the sphere, the sequence of the corresponding weights, tn , in the projection space satisfy the following inequality. In the next section, we prove the main theorem of convergence, using these bounds. cos(ϕ) 1 ≤ tn ≤ n cos(ϕ)3 n (3–19) 3.3.4 Convergence of iFME So far, we have shown analytical bounds for the sequence of weights, tn , on projection space, corresponding to Euclidean weights on sphere (Eq. 3–19). We now prove the convergence of iFME estimates to the true Fréchet mean of samples, when the sample size tends to infinity. We first show that the incremental mean in the projection space using tn , is unbiased. Theorem 1. Let x1 , x2 , ... be i.i.d. samples from a distribution in Rk . Also, let mn be the incremental estimate corresponding to nth given sample, xn , which is defined by: (i) m1 = x1 , (ii) mn = tn xn + (1 − tn )mn−1 . Then, mn is an unbiased estimator of E [x]. Proof. For n = 2; m2 = t2 x2 + (1 − t2 )x1 , hence E [m2 ] = t2 E [x] + (1 − t2 )E [x] = E [x]. Now, by induction hypothesis E [mn−1 ] = E [x]. Then, E [mn ] = tn E [x] + (1 − tn )E [x] = E [x], hence the result. ■ Theorem 2. Let var [mn ] denotes the variance of the nth incremental estimate (defined above), with var [mn ] var [x] cos(ϕ) n ≤ tn ≤ 1 , ∀ϕ cos(ϕ)3 n ∈ [0, π/2). Then, ∃p ∈ (0, 1], such that ≤ (np cos6 (ϕ))−1 . First note that var [mn ] = tn2 var [x] + (1 − tn )2 var [mn−1 ]. Since, 0 ≤ tn ≤ 1, one can see that var [mn ] ≤ var [x] for all n. Besides, for each n, the maximum of the right hand side is achieved, when tn attains either its minimum or its maximum value. Therefore, we need 53 to prove the theorem for the following two values of tn , (i) tn = cos(ϕ) n and (ii) tn = 1 . n cos3 (ϕ) These two cases will be discussed in Lemma 2 and Lemma 3, respectively. Lemma 2. With the same assumptions as in Theorem 2, and tn = ∀ϕ ∈ [0, π/2), the following inequality is satisfied: var [mn ] var [x] 1 , n cos3 (ϕ) ∀n and ≤ (n cos6 (ϕ))−1 . Proof. For n = 1, var [m1 ] = var [x] which yields the result, since cos(ϕ) ≤ 1. Now, assume by induction that var [mn−1 ] var [x] ≤ (n − 1) cos6 (ϕ))−1 . Then, var [mn−1 ] 1 var [mn ] = tn2 + (1 − tn )2 ≤ tn2 + (1 − tn )2 var [x] var [x] (n − 1) cos6 (ϕ) 1 1 1 ≤ + (1 − )2 × 6 2 3 cos (ϕ)n cos (ϕ)n (n − 1) cos6 (ϕ) 1 1 1 ≤ + (1 − )2 × 6 2 cos (ϕ)n n (n − 1) cos6 (ϕ) 1 n−1 1 = + 2 = 6 2 6 cos (ϕ)n n cos (ϕ) n cos6 (ϕ) (3–20) ■ Lemma 3. With the same assumptions as in Theorem 1, and tn = ∀ϕ ∈ [0, π/2), the following inequality is satisfied: var [mn ] var [x] cos(ϕ) , n ∀n and ≤ n−p for some 0 < p ≤ 1.. Proof. For n = 1, var [mn ] = var [x] which yields the result, since cos(ϕ) ≤ 1. Now, assume by induction that var [mn−1 ] var [x] ≤ (n − 1)−p . Then, var [mn ] var [mn−1 ] 1 = tn2 + (1 − tn )2 ≤ tn2 + (1 − tn )2 var [x] var [x] (n − 1)p cos2 (ϕ) (n − cos(ϕ))2 1 ≤ + × 2 2 n n (n − 1)p (n − 1)p cos2 (ϕ) + cos2 (ϕ) − 2n cos(ϕ) + n2 = n2 (n − 1)p (3–21) Now, it suffices to show that the numerator of the above expression is not greater than n2−p (n − 1)p . In other words: (n − 1)p cos2 (ϕ) + cos2 (ϕ) − 2n cos(ϕ) + n2 − n2−p (n − 1)p ≤ 0 54 (3–22) The above quadratic function with respect to cos(ϕ) is less than zero, when n( 1 − (n − 1)p/2 √ ( n−1 )p + n 1 np −1 1 + (n − 1)p ) ≤ cos(ϕ) ≤ n( 1 + (n − 1)p/2 √ ( n−1 )p + n 1 np −1 1 + (n − 1)p ) (3–23) The inequality in right is satisfied for all cos values. Besides, it is easy to see that the function in the left hand side is increasing w.r.t. n, hence attains its minimum over all n > 1, when n = 2. This implies that: √ 21−p − 1 ≤ cos(ϕ) √ → ϕ ≤ cos−1 (1 − 21−p − 1) 1− (3–24) → 0 < p ≤ 1 − log2 [(1 − cos(ϕ))2 + 1] Note that p > 0, for all ϕ < π/2. ■ Proof of Theorem 2. With the above two results, it is easy to see that ∀ϕ ∈ [0, π/2), there exists a p satisfying 0 < p ≤ 1, such that - If tn = cos(ϕ) , n - If tn = 1 , n cos3 (ϕ) then var [mn ] var [x] then ≤ var [mn ] var [x] 1 np ≤ ≤ 1 , np cos6 (ϕ) 1 n cos6 (ϕ) ≤ because cos(ϕ) ≤ 1. 1 , np cos6 (ϕ) because p ≤ 1. These two pieces together complete the proof of convergence. ■ The inequality in Theorem 2 implies that when n → ∞, for any ϕ ∈ [0, π/2) the variance of iFME estimates in the projection space tends to zero. Besides, when ϕ approaches π/2, the corresponding power of n, as well as cos(ϕ), become very small, hence the ratio of convergence gets slower. 55 3.4 Experiments 3.4.1 Synthetic Experiments We now evaluate the effectiveness of iFME algorithm, compared to the non-incremental Fréchet Mean (FM) of a set of samples on sphere, using synthetically generated data. To this end, a set of samples, Xi ∈ S2 , are generated on the boundary of the geodesic ball, B(C , ϕ), where ϕ < π/2, and C is the north pole. Note that the value of ϕ controls the variance of the input samples. Further, the variance of any given set of samples on the boundary of B(C , ϕ) can be computed in closed form and is equal to Var [X ] = ϕ2 , since ∀i , d(Xi , C ) = ϕ. We tried 4 different values of ϕ, i.e., ϕ ∈ {0.70, 1, 1.21, 1.40}. For each value of ϕ, a set of 20 points are randomly picked on the boundary of B(C , ϕ), and fed into both iFME and FM algorithms. Because of the randomness in generating the samples, we repeated this experiment 100 times for each ϕ. Let iFMn,i and FMn,i respectively denote the iFME and FM estimates of the mean, for n given samples, in i th trial, where 1 ≤ i ≤ 100 and 1 ≤ n ≤ 20. Therefore, for each number of samples, we obtain a population of iFME and FM estimates, from different trials. Accordingly, for both methods, we are able to compute the ratio of the estimator variance to the data variance, i.e., for any 1 ≤ n ≤ 20, 1 1 ∑ 2 d (iFMn,i , C )) iRn = ( )( Var [X ] 100 i=1 100 1 1 ∑ 2 Rn = ( )( d (FMn,i , C )) Var [X ] 100 i=1 100 (3–25) where iRn and Rn are the ratio of variances for iFME and FM, respectively, and Var [X ] = ϕ2 (see above). Note that if iRn tends to zero for large values of n, then variance of iFME tends to zero, hence iFME estimates converge to the true Fréchet mean. We want to emphasize 56 Figure 3-4. The comparison of the ratio of variances (defined in Eq. 3–25) between iFME and FM, for different values of ϕ. that in a Euclidean space, Rn = iRn = n1 , for any population of sample points. Besides, in [6, 21] it was shown that for non-positively curved spaces, e.g., P(n), the following inequality holds for any n, iRn ≤ n1 . Fig. 3-4 illustrates the ratios defined in Eq. 3–25 for iFME and FM, over different values of ϕ. It is evident from the plots that the iFME’s ratio is close to the non-incremental version, i.e., FM, specially for smaller ϕ’s. In the right-most column, ϕ = 1.4 which is relatively close to π/2 and the input variance is very large. It can be seen that even in this case, iFME is still competitive to FM, with respect to the accuracy. Fig. 3-5 compares the time consumptions of iFME and FM, in the above experiments. We need to emphasize that the FM computes the mean iteratively, and its speed depends upon the initial value. Therefore, in order to make a fair comparison, for each new sample Xn , we used FMn−1 as the initial value of the gradient descent method, to compute the mean over the augmented dataset. From the figure, one can see that iFME is significantly faster than FM, specially for large number of samples. More importantly, the time consumed by iFME for all values of ϕ, remains roughly the same, while FM gets considerably slower when the sample variance increases. This is not surprising, because our incremental method updates the mean in one shot, while FM re-computes the mean from scratch. It also worths mentioning that for n = 2, the Fréchet Mean can be computed in closed form, and no iterative scheme is needed. This justifies the jumps in the time plots of FM in Fig. 3-5. 57 Figure 3-5. The time comparison between iFME and FM, for different values of ϕ. 3.4.2 Application to Incremental Shape-Preserving Fréchet Mean of SPD Matrices In this section, we illustrate the effectiveness and accuracy of iFME on sphere, in the shape preserving Fréchet mean computation of a group of 3 × 3 SPD matrices. As described earlier, the space of n × n SPD matrices , denoted by P(n), is not a vector space, but a Riemannian manifold with negative sectional curvature [47]. The Fréchet mean is defined as the minimizer of the sum of squared geodesic distances on P(n) [34]. Authors in [6] proposed an incremental method to estimate the Fréchet mean on P(n), and provided the convergence results, in the limit over the number of samples. However, it is known that the Fréchet mean on P(n) does not necessarily preserve the diffusion anisotropy which depends on the shape of the tensor. For a more detailed discussion, we refer the reader to Fig. 1 in [50]. In many applications including interpolation of diffusion MR data [4], it is more appealing to compute a shape preserving mean, over the given population. The idea of separating shape and orientation in the diffusion data was motivated by the authors in [35] and later in [4]. More recently, Wang et al [50], applied this idea to 3 × 3 diffusion tensors and presented a Kalman filter on this new product manifold. The eigen-decomposition of a 3 × 3 SPD matrix, D, is D = UΛU T , where U belongs to the space of 3 × 3 special orthogonal matrices, denoted by SO(3), and Λ is a diagonal matrix, with positive elements. The matrix Λ controls the shape of the tensor, and U 58 models the orientation. Following the idea in [4], we break down the mean computation of SPD matrices, into the separated mean computation of orientations and shapes. We now present a novel incremental shape-preserving mean for a group of 3 × 3 SPD matrices. First, the mean of the positive diagonal elements of the shape components can be computed incrementally, as the space of such matrices is isomorphic to R+ 3 . Besides, the elements in SO(3) can be parameterized by unit quaternions which belong to the northern hemisphere in a 3-dimensional unit sphere, S3 [18], hence our iFME technique is applicable to these elements. Formally speaking, let X1 , X2 , ... be a population of matrices in P(3). Also, assume that U ∗ n−1 and Λ∗ n−1 , respectively, denote the orientation and shape components of the incremental mean of n − 1 given samples. Then, U ∗ n = U ∗ n−1 # 1 Un (3–26) n where Un is the orientation part of the sample Xn . Further, the mean of the shape part, Λ∗ n , is updated using geometric mean of the diagonal elements. We evaluated the accuracy of this novel incremental estimator in a synthetic data experiment. A set of 150 SPD matrices on P(3) are randomly generated, in the following manner; the shape component of each tensor is assigned 1 + r , 0.25 + r and 0.25 + r to its diagonal element, where r ∈ [0, 0.1] is picked randomly. Moreover, the orientation part was sampled from a log-Normal distribution on S3 , centered at [1, 0, 0, 0] which corresponds to the identity rotation matrix, with the variance set to 0.2. We then input each sample SPD matrix to both iFME on P(3), as well as proposed shape-preserving iFME on the manifold of shapes and orientations, i.e., SO(3) × R+ 3 . For each increment, the mean of both methods are computed and are displayed in Fig. 3-6, along with the ground-truth mean. Furthermore, to compare the accuracy of these two methods, we measured the Fractional Anisotropy (FA) of the output tensor, at each 59 Figure 3-6. Visual comparison of the mean tensor obtained from shape preserving iFME on the product manifold (top row), and iFME applied on P(3) (bottom row). The rightmost column shows the ground truth. increment. The FA value for a SPD matrix is a scalar measuring the anisotropy of a tensor, and is defined by √ √ 1 (λ1 − λ2 )2 + (λ2 − λ3 )2 ) + (λ1 − λ3 )2 √ FA = 2 λ21 + λ22 + λ33 (3–27) Since the sample matrices were generated with very similar shapes, it is expected that the FA value of the mean sample does not drastically change. Fig. 3-7 illustrates the FA values computed from the iFME on P(3) as well as the iFME on the product manifold. Although both of the incremental techniques are initialized equally, it is evident that the FA values of iFME on P(3) rapidly drops after only 15 increments. In contrast, the shape preserving version of iFME remains close to the ground-truth, for any number of given samples. Fig. 3-6 demonstrates the significant differences between these two estimates, visually. Appendix Lemma1 : For any angle α ∈ (0, π), the following inequality holds: nα sin( n+1 ) α ≥ ncos 2 ( ) α sin( n+1 ) 2 1 This lemma has been proven by Mr. Rudrasis Chakraborty. 60 (3–28) Figure 3-7. Comparison of FA values between iFME on P(3), and iFME on the product manifold. The ground-truth is the incremental geometric mean of the samples’ FA values, at each increment. Proof: Let f = sin(nθ) − ncos 2 ( n+1 θ) sin(θ), 2 fθ = n cos(nθ) + 2n cos( θ ∈ (0, α/(n + 1)), α ∈ (0, π), n ≥ 1 (3–29) n+1 n+1 n+1 n+1 θ) sin(θ) sin( θ) ( ) − n cos 2 ( θ) cos(θ) (3–30) 2 2 2 2 Solving equation 3–30, as θ ∈ (0, π/(n + 1))we get θ=0 But, fθθ |θ=0 = 0 So, we check fθθθ . fθθθ |θ=0 = −n3 + 1.5n (n + 1)2 + n 61 > 0, n ≥ 1 So, at θ = 0, f has a minima where θ ∈ (0, α/(n + 1)). f |θ=0 = 0 (3–31) Thus, f ≥ 0 as n ≥1. Between θ ∈ (0, α/(n + 1)), sin(θ) > 0. Thus, f ≥0 sin(θ) sin(nθ) n+1 f = − ncos 2 ( θ) sin(θ) sin(θ) 2 Hence, sin(nθ) n+1 − ncos 2 ( θ) ≥ 0 sin(θ) 2 62 (3–32) CHAPTER 4 IPGA: INCREMENTAL PRINCIPAL GEODESIC ANALYSIS WITH APPLICATIONS TO MOVEMENT DISORDER CLASSIFICATION 4.1 Background Principal Geodesic Analysis (PGA) captures variability in the data by using the concept of principal geodesic subspaces which in this case are sub-manifolds of the Riemannian manifold on which the given data lie. In order to achieve this goal, it is required to know the Riemannian structure of the manifold, specifically, the geodesic distance, the Riemannian log and exp maps and the Fréchet mean. For definitions of Riemannian log and exp maps, the geodesic distance as well the Fréchet mean, see section 4.2. PGA relies on use of the linear vector space structure of the tangent space at the Fréchet mean by projecting all of the data points to this tangent space and then performing standard PCA in this tangent space followed by projection of the principal vectors back to the manifold using the Riemannian exp map yielding principal geodesic subspaces. The representation of each manifold-valued data point in the principal geodesic subspace has to be achieved by finding the closest (in the sense of geodesic distance) point in the subspace to the given data point. This however involves a hard optimization problem. The standard PGA however does a linear approximation by projecting the given data point to the aforementioned tangent space, finding the closest point to the principal linear subspace defined by the principal vectors in this tangent space and then projecting it back to the manifold using the exp map [13, 14, 36]. Exact PGA reported in literature by several researchers tries to solve this hard optimization c ⃝2014 Springer. Reprinted with minor changes, with permission, from H. Salehian, D. Vaillancourt, and B. C. Vemuri. ”iPGA: Incremental Principal Geodesic Analysis with Applications to Movement Disorder Classification.” In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014, pp. 765-772. Springer International Publishing, October 2014. [40] 63 without the linear approximation [37, 43]. A generalization of the PGA reported in [14, 36] to symmetric positive definite diffusion tensor fields was presented in [55]. In [55], it was demonstrated that the Fréchet mean of several given (registered) tensor fields computed using a voxel-wise Fréchet mean over the field is equivalent to the Fréchet mean computed using the Fréchet mean in a product space representation of the tensor fields. However, for higher order statistics, such as variance, such an equivalence does not hold. This statement however holds for any manifold-valued fields, not just for the diffusion tensor fields. When dealing with large amounts of data, specifically, manifold-valued fields e.g., diffusion tensor fields, deformation tensor fields, ODF fields etc,, performing PGA can be computationally quite expensive. That said, if we have a large number of tensor fields to perform statistical analysis upon, and if we are provided the data incrementally, rather than performing PGA from scratch in a batch mode each time a new data set is provided, it would be computationally more efficient to perform PGA once for a given data pool and then simply update the PGA each time a new data set is provided. To this end, we propose a novel incremental PGA or iPGA algorithm in which we incrementally update the Fréchet mean and the principal sub-manifolds rather than performing PGA in a batch mode. This will lead to significant savings in computation time. In the past few decades, the problem of incrementally updating the PCA has been well studied in literature e.g., [56]. However, these methods require the data samples to live in a Euclidean space, and hence are not directly applicable to the PGA problem. On the other hand, Cheng et al. [6] and Ho et al. [21] have reported incremental algorithms for computing the Fréchet expectation of a given set of SPD matrices. Besides, we have shown in previous section the convergence of a similar incremental Fréchet mean estimator, for samples living on a sphere. Our iPGA algorithm is a novel combination of the incremental Fréchet expectation algorithm of [6, 21], and the linearized PGA in [55]. We apply our iPGA to two types 64 of popular manifold-valued data: (1) a group of SPD tensor fields derived from high angular resolution diffusion magnetic resonance images (HARDI), (2) a population of samples on a high-dimensional unit sphere, derived from the 3-D shapes. Based on these two iPGA techniques the classification of patients with movement disorders is performed. We present synthetic experiments depicting the effectiveness and accuracy of iPGA, compared to the batch-mode PGA. Furthermore, in the real data experiments, given 67 human brain HARDI data, our iPGA based nearest neighbor classifier aims to distinguish between controls, Parkinson’s Disease (PD) and Essential Tremor (ET) patients. Our results demonstrate the effectiveness of iPGA, compared to the batch mode scheme. The rest of the chapter is organized as follows. Section 4.2 contains background material on differential geometry of the space of SPD tensor fields. Further, a brief review of the differential geometry of sphere is provided. Next, in section 4.3 the proposed iPGA techniques applicable to both SPD tensor fields, and the spherical samples, are described in detail. Moreover, sections 4.4 and 4.5 contain synthetic and real data experiments, comparing PGA and iPGA with respect to computation time and accuracy. 4.2 Preliminaries 4.2.1 Riemannian Geometry of the Space of SPD Tensor Fields The Riemannian geometry of k−dimensional unit sphere, Sk , has been discussed in section 3.2. Table 4-1 summarizes the Riemannian operations on Sk , as well as the space of n × n SPD matrices, Pn , for convenience. Based on the Riemannian geometry of Pn summarized in Table 4-1, we now briefly introduce the basic relevant concepts of Riemannian geometry of the space of SPD tensor fields denoted by Pm n following the notation from [55]. For details on the Riemannian geometry of Pn we refer the reader to [13]. Pn is the space of n × n symmetric positive definite (SPD) matrices, which is a Riemannian manifold with GL(n), 65 the general linear group as the symmetry group. This can be easily generalized to Pm n , the product space of Pn using the product Riemannian structure. In particular, expressions for the Riemannian geodesic distance, log and exponential maps can be easily derived. Specifically, the group GL(n)m acts transitively on Pm n with the group action specified by ϕG (X) = (G1 X1 G1T , ... , Gm Xm GmT ) (4–1) where each Gi ∈ GL(n) is a n × n invertible matrix and Xi is an n × n positive-definite m matrix. The tangent space of Pm n at any point can be identified with Sym(n) because the tangent space of a product manifold is the product of tangent spaces. Let Y, Z ∈ m TM Pm n be two tangent vectors at M ∈ Pn . The inner product between two vectors using the product Riemannian metric is given by, ⟨Y, Z⟩M = m ∑ tr(Yi Mi−1 Zi Mi−1 ) (4–2) i=1 The Riemannian exponential map at M maps Y the tangent vector, to a point in Pm n and is given by, ( ) ExpM (Y) = G1 exp(G1−1 Y1 G1−T )G1T , ... , Gm exp(Gm−1 Ym Gm−T )GmT (4–3) ( ) where Gi ∈ GL(n) such that M = G1 G1T , ... , Gm GmT . Given X ∈ Pm n , and the log map at M is given by, ( ) LogM (X) = G1 log(G1−1 X1 G1−T )G1T , ... , Gm log(Gm−1 Xm Gm−T )GmT (4–4) Using this definition of the log map in Pm n , the geodesic distance between M and X is computed as v u m u∑ ( 2 −1 ) d(M, X) = ∥LogM (X)∥ = t tr log (G Xi G −T ) i i=1 66 i (4–5) Table 4-1. Summary of Riemannian geometry of the space of n × n positive definite matrices, Pn , as well as the unit k−dimensional sphere, Sk . In the table, X , Y ∈ Pn and U, V ∈ TX Pn . Similarly, x, y ∈ Sk and u, v ∈ Tx Sk . Pn ⟨U, V ⟩X = tr(UX −1 VX −1 ) ExpX (U) = X 1/2 exp(X −1/2 UX −1/2 )X 1/2 LogX (Y ) = X 1/2 log(X −1/2 YX −1/2 )X 1/2 √ ( ) dPn (X , Y ) = tr log2 (G −1 XG −T ) γ(t) = ExpX (tLogX (Y )) ∑ 2 X̂ = arg minX ∈Pn N1 N i=1 dPn (X , Xi ) Sk ∑k+1 ⟨u, v⟩ = i=1 ui vi u sin(||u||) Expx (u) = x cos(||u||) + ||u|| y−x cos(ϕ) Logx (y) = ||y−x cos(ϕ)|| ϕ , ϕ = ⟨x, y⟩ dSk (x, y) = cos−1 (⟨x, y⟩) α(t) = Expx (tLogx (y)) ∑ 2 x̂ = arg minx∈Sk N1 N i=1 dSk (x, xi ) Using the expression for the geodesic distance given above, we can define the (intrinsic) mean of N tensor fields as that tensor field which minimizes the following sum of squared geodesic distances expression: M = arg minm M∈Pn N 1∑ d(M, Xi )2 N i=1 (4–6) Since the Fréchet mean is unique on Pn [13], this shows that M will be unique as well, and it can be computed using an iterative algorithm similar to the one in [13]. After obtaining the intrinsic mean M of the input tensor fields X1 , ... , XN , we compute the modes of variation using the PGA algorithm for tensor fields described in [55]. 4.2.2 Schild’s Ladder Approximation of Parallel Transport Given two points X0 and Xp on a Riemannian manifold M, with the geodesic curve γ(t) such that γ(0) = X0 and γ(1) = Xp , the Schild’s Ladder algorithm approximates the parallel transport of any vector V ∈ TX0 M along γ [31]. This algorithm requires the geodesic curve, log-map and exp-map defined on the manifold, hence is applicable to both Sk and Pm n , using their corresponding Riemannian operations, summarized in Table 4-1. 67 Figure 4-1. Illustration of Schild’s Ladder algorithm, described in Eq. 4–9. Let X1 , X2 , ..., Xp−1 be some intermediate points on γ(t). Then, the parallel transport of V to TXp M, denoted by ΓX0 →Xp (V ) is approximated by: ∀1 ≤ i ≤ p A0 = ExpX0 (V ) (4–7) Ai = Xi−1 #2 Bi (4–8) ΓX0 →Xp (V ) = LogXp (Ap ) (4–9) Bi = Xi #1/2 Ai−1 where X #1/2 Y denotes the midpoint of geodesic curve between X and Y , and X #2 Y is obtained by following the geodesic from X through Y for twice its length. For more information, the reader is referred to [19, 31]. Figure 4-1 illustrates the algorithm described above. On the manifold of SPD matrices, the parallel transport from an arbitrary point X0 to the identity matrix, I , is equivalent to the transform using group action [26]. Therefore, for the case of SPD tensor fields, we apply the group action wherever applicable, as it is more computationally efficient and accurate compared to the parallel transport using the Schild’s ladder. 4.3 iPGA: Incremental Principal Geodesic Analysis In order to develop the incremental Principal Geodesic Analysis on the space of SPD tensor fields and the unit sphere, we first need to develop incremental Fréchet 68 mean update techniques applicable to tensor fields and the spherical samples. We will address this sub-problem in the following paragraphs. 4.3.1 Incremental Fréchet Mean Estimator As described earlier, the Fréchet mean of a group of manifold valued features is defined as the minimizer of the sum of squared geodesic distances. Unfortunately, this minimization problem does not have a closed form solution for a population of size k greater than two, in most Riemannian manifolds including Pm n and S . In section 3.3 we introduced an incremental algorithm to estimate the Fréchet mean of a group of samples on sphere, and proved its convergence to the mean of the distribution the samples are drawn from, as number of samples tends to infinity. Similarly, in [21], authors presented an incremental Fréchet mean estimator, IFME, for SPD matrices (not SPD tensor fields). Given the estimated Fréchet mean of the first k SPD tensors, denoted by Mk , and the new sample Xk+1 , IFME locates the new mean, Mk+1 , on the geodesic curve between Mk and Xk+1 using the Euclidean weight. More formally, Mk+1 = ExpMk (tLogMk (Xk+1 )) where t = (4–10) 1 . k+1 We now generalize the above incremental Fréchet mean formula to the case where the data samples are SPD tensor fields (not just SPD matrices), using exp and log maps defined earlier on the product manifold of SPD tensor fields. Let Mk = (Mk,1 , ..., Mk,m ) denote the estimated Fréchet mean of the first k samples, and Xk+1 = (Xk+1,1 , ..., Xk+1,m ) be the new given tensor field. Based on the IFME algorithm and the product space representation chosen here, it is straightforward to generalize the IFME to the product space of tensor fields Pm n . Thus, the new mean then is obtained by updating the old mean via the following equation: Mk+1 = (ExpMk,1 ( 1 1 LogMk,1 (Xk+1,1 )), ..., ExpMk,m ( LogMk,m (Xk+1,m ))) k +1 k +1 69 (4–11) 4.3.2 Incremental Principal Geodesic Analysis on Pm n In this section we will develop the incremental version of the PGA algorithm in [55] applicable to SPD tensor fields. Very briefly, in [55], the PGA computation problem on the space of SPD tensor fields is approximated by applying PCA in the tangent plane anchored at the Fréchet mean, in the following manner. First, the Fréchet mean, M, of the set of tensor fields is computed. Next, each tensor field is projected to the tangent space at the mean (i.e., TM Pm n ), using log map, then transformed to the tangent space at the identity. This tangent space is a standard Euclidean space denoted by TI Pm n, where I is the tensor field consisting of m identity matrices. Therefore, the ordinary PCA algorithm is performed at TI Pm n , and the obtained principal components are transformed back to TM Pm n . Note that this operation of transforming to the identity is crucial, since, the inner product defined for Pm n corresponds to the inner product in the Euclidean space only at the identity I. Equipped with the incremental Fréchet mean estimator, IFME, on the space of SPD tensor fields, we are ready to reformulate this algorithm in an incremental form. In a similar fashion, each SPD tensor field is projected using the log map and transformed th (by applying the group action) to TI Pm tensor field, n . More formally, let Xi denote the i and Mk be the Fréchet mean of the k given samples. Define Yi = LogMk (Xi ) ∈ TMk Pm n. Each Yi is then transformed to TM Pm n , to obtain Zi . Accordingly, the data matrix at th TI Pm column corresponds to Zi in a n , denoted by Ak , can be constructed where its i vectorized form. In the our algorithm, we keep track of the data matrix, Ak , at TI Pm n . Let Xk+1 and Mk denote the new SPD tensor field, and the Fréchet mean over all previous k tensor fields, respectively. Then, to update the principal components we need to augment the data matrix with an appropriate vector which represents Xk+1 , in TI Pm n . In order to find this vector, we first locate the new Fréchet mean Mk+1 , using Eq. 4–11 , then project Xk+1 to the tangent space at Mk+1 , i.e., Yk+1 = LogMk+1 (Xk+1 ). This 70 m tangent vector is moved to TI Pm n using the group action on Pn as shown below, where, G = (G1 , ..., Gm ), and G is such that ∀i , Mk+1,i = Gi GiT . Zk+1 = ΦG−1 (Yk+1 ) = (G1−1 Yk+1,1 G1−T , ..., Gm−1 Yk+1,m Gm−T ) (4–12) Now, the old data matrix Ak and the vector Zk+1 are both in TI Pm n which is the standard Euclidean space. However, we should emphasize that the data matrix Ak contains the transformed log maps of the first k data points, at the old mean, i.e., Mk , while Zk+1 is the transformed log vector of the k + 1st sample, at the new mean, i.e., Mk+1 . Consequently, while the [ ] mean of log vectors in Ak is the zero vector, the columns of Ak Zk+1 will no longer be zero-mean. This will affect the estimation accuracy of principal components, specially for smaller values of k for which Mk and Mk+1 are further from each other. Hence, the data matrix Ak should first be updated, accordingly, before it is augmented by the new log vector. Given the old data matrix Ak , the basic algorithm for this update problem consists of the following steps: (1) compute the exp maps of all k log vectors at the identity, to retrieve the first k data samples, (2) obtain the log maps of the data matrices at the new location. It is evident that this method significantly slows down the incremental PGA, hence is not a reasonable choice. Instead, we apply the following faster heuristic solution. Let Yi = LogMk (Xi ) be the log map of the i th data matrix at the old mean, and Zi be the corresponding transformed vector to TI Pm n . Also, assume that Lk+1 = LogMk+1 (Mk ), and Tk+1 is its translated vector to TI Pm n . Then, the updated vector is obtained by Ŷi = Yi + Tk+1 . Note that this algorithm gives an accurate solution in linear spaces. Also, as will be shown shortly in experiments, it does not sacrifice much accuracy in estimating PGA, especially when k gets larger. Besides, this method is significantly faster, because for each new sample 71 Table 4-2. Incremental PGA Algorithm for SPD Tensor Fields 1: Input the data matrix Ak for k samples the new tensor field Xk+1 , and the old mean Mk 2: Compute Mk+1 from Xk+1 and Mk , using Eq. 4–11 3: Yk+1 = LogMk+1 (Xk+1 ) 4: Zk+1 = ΦG−1 (Yk+1 ), defined in Eq. 4–12 5: Compute Lk+1 = LogMk+1 (Mk ) and Tk+1 = ΦG−1 (Lk+1 ) 6: Add Tk+1 to every column of Ak to obtain Âk [ ] 7: Perform standard PCA on Ak+1 = Âk Zk+1 8: Translate j th principal component, Pj , back to TMk+1 Pm n , via Qj = ΦG (Pj ) Figure 4-2. Schematic illustration of the algorithm in Table 4-2. Tk+1 is only computed once, and is added to all columns of Ak . This way, the old data matrix Ak is updated to Âk . [ = Now, we can augment the updated data matrix with the new log vector: Ak+1 ] Âk Zk+1 , and perform PCA on new data matrix. At the end, the new principal components are transformed back to TMk+1 Pm n , using the transformation ΦG , where Φ and G are the same as in Eq. 4–12. This method is summarized in Table 4-2. Also, Fig. 4-2 illustrates the variables used in the algorithm. 4.3.3 Incremental Principal Geodesic Analysis on Sk We now introduce the iPGA algorithm applicable on Sk , in a very similar fashion to the iPGA algorithm on Pm n proposed so far. In specific, we discuss the modifications 72 Table 4-3. Incremental PGA Algorithm on Unit Sphere 1: Input the data matrix Ak = [v1 , ..., vk ] for k samples the new sample xk+1 , and the old mean mk 2: Compute mk+1 from xk+1 and mk , using Eq. 3–10 3: yk+1 = Logmk+1 (xk+1 ) 4: Parallel Transport zk+1 = Γmk+1 →n (yk+1 ), defined in Eq. 4–9, and n is the north pole 5: Compute rk+1 = Logmk+1 (mk ) and tk+1 = Γmk+1 →n (rk+1 ) 6: Add tk+1 to every column of Ak to obtain Âk = [v̂1 , ..., v̂k ] [ ] 7: Perform standard PCA on Ak+1 = Âk zk+1 8: Parallel transport j th principal component, pj , back to Tmk+1 Sk , via qj = Γn→mk+1 (pj ) should be made to the previously discussed iPGA, in order to make it suitable for the spherical samples. First, note that the convergence analysis of iFME on Pn in [21] is not directly applicable to the unit sphere. However, in Section 3.3 we provided the convergence proof of iFME on sphere, using tools from Gnomonic projection. As an application, the iFME method is used here to develop the iPGA algorithm on sphere. Second, the inner product between any two tangent vectors of Sk , is equivalent to the standard Euclidean inner product (see Table 4-1), and is independent of the point that the vectors are anchored at. Consequently, the standard PCA can be employed on the tangent plane at any point in Sk , in contrast to the PGA algorithm on Pm n . However, in our incremental PGA technique, we always keep track of the data matrix at the north pole (or any other arbitrary point on sphere), because this way only the new log vector needs to be translated for each new sample. Third, the group action applied to the case of Pm n is replaced with the parallel transport, approximated by the Schild’s Ladder technique, which was described in 4.2.2. With these modifications being made, the new iPGA technique on Sk can be summarized in Table 4-3. 73 Figure 4-3. Step by step illustration of the iPGA algorithm on Sk , summarized in Table 4-3. From left to right, and top to bottom steps 1 through 8 are shown, respectively. 74 4.4 Synthetic Experiments In this section we present several experiments with the synthetically generated data, using the proposed iPGA methods, on both Sk and Pm n . The accuracy and efficiency of the proposed algorithms have been evaluated compared to the non-incremental PGA counterparts. 4.4.1 Manifold of SPD Tensor Fields Data Description: We generated a group of 25, 16 × 16 SPD tensor fields, synthetically. The 3 × 3 SPD matrices in all tensor fields are ellipsoidal. There are two types of SPD matrices in each tensor field, whose principal eigenvectors differ by 90 degree. In generated tensor fields, the angles of principal eigenvectors of the first and the second matrices are uniformly chosen in [0, π] and [ π2 , 3π ], respectively. 2 Time Consumption: Given a pool of tensor fields, they are incrementally input (in random order) to both iPGA and PGA algorithms and the CPU time consumed (on an Intel-7 2.76GHz CPU with 8GB RAM) by each method to compute the principal components is recorded. We repeat this experiment 10 times on the data pool of 25 tensor fields and plot the average time/accuracy for each method. The left plot in Fig. 4-5 demonstrate that time consumption for iPGA is significantly less compared to that of PGA, especially for a large number of input data samples. Error Measurement: In order to measure the accuracy of each method, we computed the residual sum defined in [43] for estimated principal components. For N ∑ 2 input tensor fields, the residual sum is defined by N1 N j=1 d (Xj , π̂SU (Xj )), where d is the geodesic distance on Pnm , and π̂S (Xj ) is the estimated projection of Xj to the geodesic subspace spanned by the principal components, denoted by SU . The projection, πSU , is estimated in the tangent space (see Eq.6 in [43] for details). This estimation is illustrated in Fig 4-4. The bar chart on the right in Fig. 4-5 depicts the error comparison between PGA and iPGA at each iteration. It can be observed that iPGA’s residual error is very 75 Figure 4-4. Estimation of the projection πS (X ) to the 1-D principal geodesic submanifold (red curve). Figure 4-5. Time consumption and residual error comparison between iPGA (proposed) and PGA on Pm n. close to PGA’s. Thus, from an accuracy viewpoint, iPGA is on an equal footing with PGA but from a computational efficiency viewpoint, it is significantly better. 4.4.2 Unit Sphere Sk We generated a group of 25 random samples on a high-dimensional unit-sphere, i.e., S10000 . We picked this very high dimensional space, in order to simulate the data points we are going to deal with in the real data experiments. 76 Figure 4-6. Mean angular error of iPGA estimates w.r.t. PGA on S10000 . We fed the samples into both PGA and iPGA methods defined on sphere, incrementally, and recorded the time consumed by each method to estimate the principal components. Also, in order to evaluate the accuracy of iPGA for each new sample, we considered the PGA estimate as ground-truth, and measured the angle between the first principal components obtained from iPGA and PGA, in the tangent plane at the north pole. This error is henceforth called the angular error. The experiment is repeated 500 times and the average plots are shown here. Figure 4-6 illustrates the angular error of iPGA over the number of samples. It can be seen that the angular error of iPGA with respect to PGA is bounded by 10 degrees and keeps decreasing, as the sample size gets larger. Besides, it is evident from figure 4-7 that the time consumed by iPGA is significantly less than the non-incremental version, which makes it an appealing choice especially for large data dimensionality. 4.5 Real Data Experiments: Classification of PD vs. ET vs. Controls In this section we present an application of iPGA to real data sets. We applied proposed iPGA techniques on both the unit sphere, as well as the space of SPD tensor fields. Our real data consists of HARDI acquisitions from patients with Parkinson’s disease (PD), essential tremor (ET) and controls. The goal here is to be able to automatically discriminate between these groups using features derived from the 77 Figure 4-7. Time comparison of incremental and non-incremental PGA estimators on S10000 . data. Earlier work in this context in the field of movement disorders involved use of DTI based ROI analysis specifically using scalar valued measures such as fractional anisotropy [49]. They showed that DTI had high potential of being a non-invasive early trait biomarker. All our HARDI data were acquired using a 3T Phillips MR scanner with s the following parameters: TR = 7748ms, TE = 86ms, b−values: 0, 1000 mm 2 , 64 gradient directions and voxel size = 2 × 2 × 2mm3 . 4.5.1 Classification Results using Deformation Tensor Features In the first part, we perform the classification task using SPD tensor field features. We use the ensemble average propagators (EAP) at each voxel estimated using the technique in [23]. We extract the Cauchy deformation tensor field which is computed from a non-rigid registration of the given EAP fields to the control atlas EAP field (constructed using the approach in [7]) – see figure 4-8. The Cauchy deformation tensor √ is defined as JJ t , where J is the Jacobian of the deformation at each voxel. The Cauchy deformation tensor is an SPD matrix of size (3, 3) in this case. This gives us an SPD field as a derived feature corresponding to each given EAP field. We use the iPGA described earlier and use the nearest geodesic distance-based neighbor to classify the probe data set. Note that the geodesic distance in this case is the distance between the probe data set and the geodesic submanifold representation of each class namely, PD, 78 Figure 4-8. (a) and (b) are the corresponding S0 (zero magnetic gradient) slices of the atlas and a control subject, respectively, and (c) shows the EAPs of the same slice as in (b), with the Substantia Nigra as the ROI. Similarly, (d) and (e) are the corresponding S0 slices of the atlas and a Parkinson subject, respectively, and (f) illustrates the EAPs computed for the slice in (e), with the Substantia Nigra as the ROI. Table 4-4. Classification results of iPGA, PGA, PCA using SPD tensor field features Accuracy Sensitivity Specificity Control vs. PD iPGA PGA PCA 89.00 89.95 56.37 92.72 93.33 65.29 85.28 86.57 47.45 Control vs. ET iPGA PGA PCA 86.44 87.13 63.43 87.01 88.94 66.27 85.87 85.32 60.59 iPGA 89.18 95.57 82.79 PD vs. ET PGA 90.28 96.47 84.09 PCA 58.53 64.71 52.35 ET and Controls. The probe is assigned the label of that class with smallest geodesic distance. Classification is performed on 26 PD, 16 ET and 25 control subjects using the PGA of the Cauchy deformation tensor fields described above, where 10 subjects from PD and control, as well as 6 subjects from ET were randomly picked as test group, and the rest of the subjects we used for training. The experiment is repeated 300 times and the mean values are reported. Table 4-4 summarizes the accuracy for each method, where Accuracy = TP+TN , FP+FN Sensitivity = TP TP+FN and Specificity = 79 TN FP+TN and FN denotes the number of False Negatives, similarly for TP, TN and FP. For comparison, we also used the standard PCA method, which is applied to a vectorized version of the tensor fields. The size of the tensor fields was restricted to the ROIs instead of the whole image. Thus, the dimensionality was 600 ∗ 6 = 3600 and we used just the first two principal components in all competing methods to achieve the classification reported in the table. From the table, it is evident that iPGA and PGA provide very similar accuracies in all three classifications. Further, iPGA is considerably more accurate than PCA, because in the later method the non-linearity of Pm n is not taken into account. 4.5.2 Classification Results using Shape Features In the second part, we evaluated the iPGA algorithm applied on unit sphere, in the task of movement disorder classification. To this end, we used the shape of the Substantia Nigra region in the brain images, as the discriminant feature. Recently, in [10], a Schrodinger Distance Transform (SDT) was introduced and applied to represent the point clouds (in 2-D or 3-D) as points on an infinite dimensional Hilbert sphere. The shape of Substantia Nigra region was hand-segmented in all rigidly aligned datasets, consisting of 25 controls, 24 PD and 15 ET images. We first collected the same number of random samples on the boundary of each 3-D shape, and applied the SDT technique to represent each shape as a point on a unit sphere. The 3-D shape domain was set to 28 × 28 × 15, resulting in 11760-dimensional unit vectors from SDT. Therefore, the samples are now living on the S11759 manifold. Figure 4-9 demonstrates the extracted shapes of Substantia Nigra in 25 control images. Once all shapes are represented as points on the unit sphere, we can apply our incremental PGA method for spherical features. Figure 4-10 illustrates the mean shape, along with the first principal components from PGA and iPGA methods, with coefficients √ √ 1.5 λ and 3 λ, where λ is the corresponding coefficient of the first principal component estimated from each method. 80 Table 4-5. Classification results of iPGA, PGA, PCA using shape descriptor features Accuracy Sensitivity Specificity Control vs. PD iPGA PGA PCA 91.46 92.95 67.32 87.98 90.93 51.96 94.94 94.98 82.69 Control vs. ET iPGA PGA PCA 88.28 90.14 75.69 86.34 88.18 77.87 92.16 94.05 71.32 iPGA 86.13 80.54 97.32 PD vs. ET PGA 87.58 82.38 98.00 PCA 64.60 48.36 97.08 Figure 4-9. Population of Substantia Nigra regions extracted from the control brain images. Next, a PGA-based classification was performed, in a similar manner to the previous section. We randomly selected 10 Control, 10 PD and 5 ET images as the test set, and used the rest of the images for training. The classification task is repeated 300 times using various training sets and the average accuracy is computed. The classification results using the shape descriptors are summarized in Table 4-5. It can be seen that the accuracy of iPGA is reasonably close the the PGA, while they both outperform the standard linear version of PCA. 81 Figure 4-10. Comparison of incremental (bottom row) and non-incremental (top row) results √ of (1) Fréchet Means (left column), (2) PGA with the √coefficient 1.5 λ (middle column), and (3) PGA with the coefficient 3 λ (right column) 82 CHAPTER 5 SUMMARY AND DISCUSSION In this dissertation we developed novel incremental algorithms for statistical analysis of manifold-valued data. In the first part, we proposed an incremental (intrinsic) mean computation technique for the space of Symmetric Positive Definite (SPD) matrices, based on the Stein distance. The key contribution entailed the derivation of a closed form solution for the computation of a weighted Stein mean for two SPD matrices which was then used in developing an incremental algorithm for computing the Stein mean of a population of SPD matrices. Further, using this incremental Stein mean estimator, we experimentally demonstrated significant gains in computation time over the non-incremental counter part while maintaining the approximately same accuracy. Second, we presented a new incremental algorithm for computing the Fréchet mean for samples on a sphere. We presented the proof of convergence for this incremental algorithm, when the number of samples tends to infinity. Several applications of sample data that live on the sphere are considered and results depict superior performance of our incremental algorithm over the non-incremental counterpart. Finally, we presented a novel incremental algorithm to perform Principal Geodesic Analysis (PGA) applicable to the manifold of SPD matrices as well as a sphere. Further, we demonstrated significant time gains using our incremental algorithm, while maintaining the accuracy to be approximately same as that of the non-incremental counterpart. 83 REFERENCES [1] Afsari, Bijan. “Riemannian Lp center of mass: Existence, uniqueness, and convexity.” Proceedings of the American Mathematical Society 139 (2011).2: 655–673. [2] Amarasinghe, GW. “On the standard lengths of angle bisectors and the angle bisector theorem.” Global Journal of Advanced Research on Classical and Modern Geometries 1 (2012).1. [3] Baisnab, AP and Jas, AP Baisnab Manoranjan. Elements of Probability and Statistics. Tata McGraw-Hill Education, 1993. [4] Cetingul, Hasan Ertan, Afsari, Bijan, Wright, Margaret J, Thompson, Paul M, and Vidal, René. “Group action induced averaging for HARDI processing.” Biomedical Imaging (ISBI), 2012 9th IEEE International Symposium on. IEEE, 2012, 1389–1392. [5] Chebbi, Z. and Moakher, M. “Means of Hermitian positive-definite matrices based on the log-determinant divergence function.” Linear Algebra and its Applications 40 (2012). [6] Cheng, Guang, Salehian, Hesamoddin, and Vemuri, Baba C. “Efficient recursive algorithms for computing the mean diffusion tensor and applications to DTI segmentation.” ECCV. Springer, 2012. [7] Cheng, Guang, Vemuri, Baba C, Hwang, Min-Sig, Howland, Dena, and Forder, John R. “Atlas construction from high angular resolution diffusion imaging data represented by Gaussian Mixture fields.” Biomedical Imaging: From Nano to Macro, 2011 IEEE International Symposium on. IEEE, 2011, 549–552. [8] Cheng, Jian, Ghosh, Aurobrata, Jiang, Tianzi, and Deriche, Rachid. “A Riemannian framework for orientation distribution function computing.” Medical Image Computing and Computer-Assisted Intervention–MICCAI 2009. Springer, 2009. 911–918. [9] Cherian, A., Sra, S., Banerjee, A., and Papanikolopoulos, N. “Efficient similarity search for covariance matrices via the JB LogDet Divergence.” ICCV. 2011, 2399–2406. [10] Deng, Yan, Rangarajan, Anand, Eisenschenk, Stephan, and Vemuri, Baba C. “A Riemannian Framework for Matching Point Clouds Represented by the Schrodinger Distance Transform.” 2014. [11] Do Carmo, Manfredo P. Riemannian geometry. Springer, 1992. [12] Fillard, P., Arsigny, V., Pennec, X., Thompson, M., and Ayache, N. “Extrapolation of sparse tensor fields: application to the modeling of brain variability.” International Conference on Information Processing in Medical Imaging (IPMI). 2005. 84 [13] Fletcher, P Thomas and Joshi, Sarang. “Riemannian geometry for the statistical analysis of diffusion tensor data.” Signal Processing 87 (2007).2: 250–262. [14] Fletcher, P.T., Lu, C., Pizer, S.M., and Joshi, S. “Principal geodesic analysis for the study of nonlinear statistics of shape.” Medical Imaging, IEEE Transactions on 23 (2004).8: 995–1005. [15] Fréchet, Maurice. “Les éléments aléatoires de nature quelconque dans un espace distancié.” Annales de l’institut Henri Poincaré. vol. 10. Presses universitaires de France, 1948, 215–310. [16] Grove, Karsten and Karcher, Hermann. “How to conjugateC 1-close group actions.” Mathematische Zeitschrift 132 (1973).1: 11–20. [17] Harandi, M., Sanderson, C., Hartley, R., and Lovell, B.C. “Sparse Coding and Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach.” European Conference on Computer Vision (ECCV). 2012. [18] Hartley, Richard, Trumpf, Jochen, Dai, Yuchao, and Li, Hongdong. “Rotation averaging.” International journal of computer vision 103 (2013).3: 267–305. [19] Hauberg, Søren, Lauze, François, and Pedersen, Kim Steenstrup. “Unscented kalman filtering on riemannian manifolds.” Journal of mathematical imaging and vision 46 (2013).1: 103–120. [20] Heo, Jae-Pil, Lee, YoungWoon, He, Junfeng, Chang, Shih-Fu, and Yoon, Sung-eui. “Spherical Hashing.” IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). 2012. [21] Ho, Jeffrey, Cheng, Guang, Salehian, Hesamoddin, and Vemuri, Baba. “Recursive Karcher Expectation Estimators And Geometric Law of Large Numbers.” Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics. 2013, 325–332. [22] Horn, Berthold. Robot vision. MIT press, 1986. [23] Jian, Bing and Vemuri, Baba C. “A Unified Computational Framework for Deconvolution to Reconstruct Multiple Fibers From DWMRI.” IEEE TMI 26 (2007): 1464–1471. [24] Karcher, Hermann. “Riemannian Center of Mass and so called karcher mean.” arXiv preprint arXiv:1407.2087 (2014). [25] Kendall, Wilfrid S. “Probability, convexity, and harmonic maps with small image I: uniqueness and fine existence.” Proceedings of the London Mathematical Society 3 (1990).2: 371–406. [26] Kim, Hyunwoo J, Adluru, Nagesh, Bendlin, Barbara B, Johnson, Sterling C, Vemuri, Baba C, and Singh, Vikas. “Canonical Correlation Analysis on Riemannian 85 Manifolds and Its Applications.” Computer Vision–ECCV 2014. Springer, 2014. 251–267. [27] Latecki, Longin Jan, Lakamper, Rolf, and Eckhardt, T. “Shape descriptors for non-rigid shapes with a single closed contour.” CVPR. 2000, 424–429. [28] Lenglet, C., Rousson, M., and Deriche, R. “DTI segmentation by statistical surface evolution.” IEEE Transactions on Medical Imaging 25 (2006).6: 685–700. [29] Li, Jia and Wang, James Z. “Automatic linguistic indexing of pictures by a statistical modeling approach.” PAMI (2003). [30] Lim, Yongdo and Pálfia, Miklós. “Weighted inductive means.” Linear Algebra and its Applications 453 (2014): 59–83. [31] Lorenzi, Marco, Ayache, Nicholas, and Pennec, Xavier. “Schilds Ladder for the parallel transport of deformations in time series of images.” Information Processing in Medical Imaging. Springer, 2011, 463–474. [32] Lowe, David G. “Object recognition from local scale-invariant features.” Computer vision, 1999. The proceedings of the seventh IEEE international conference on. vol. 2. Ieee, 1999, 1150–1157. [33] Mardia, Kanti V and Jupp, Peter E. Directional statistics, vol. 494. John Wiley & Sons, 2009. [34] Moakher, M. and Batchelor, P. G. SPD Matrices: From Geometry to Applications and Visualization. Visual. and Proc. of Tensor Fields, 2006, 285–298. [35] Ncube, Sentibaleng and Srivastava, Anuj. “A novel Riemannian metric for analyzing HARDI data.” SPIE Medical Imaging. International Society for Optics and Photonics, 2011, 79620Q–79620Q. [36] Pennec, Xavier. “Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements.” JMIV 25 (2006).1: 127–154. [37] Said, Salem, Courty, Nicolas, Le Bihan, Nicolas, Sangwine, Stephen J, et al. “Exact principal geodesic analysis for data on so (3).” Proceedings of the 15th European Signal Processing Conference, EUSIPCO-2007. 2007, 1700–1705. [38] Sakai, Takashi. Riemannian geometry, vol. 149. American Mathematical Soc., 1996. [39] Salehian, Hesamoddin, Cheng, Guang, Vemuri, Baba C, and Ho, Jeffrey. “Recursive Estimation of the Stein Center of SPD Matrices and Its Applications.” Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013, 1793–1800. 86 [40] Salehian, Hesamoddin, Vaillancourt, David, and Vemuri, Baba C. “iPGA: Incremental Principal Geodesic Analysis with Applications to Movement Disorder Classification.” Medical Image Computing and Computer-Assisted Intervention– MICCAI 2014. Springer, 2014. 765–772. [41] Schwartzman, Armin. Random ellipsoids and false discovery rates: Statistics for diffusion tensor imaging data. Ph.D. thesis, Stanford University, 2006. [42] Sloane, Neil JA et al. “The on-line encyclopedia of integer sequences.” 2003. [43] Sommer, Stefan, Lauze, François, Hauberg, Søren, and Nielsen, Mads. “Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations.” Computer Vision–ECCV 2010. Springer, 2010. 43–56. [44] Sra, S. “Positive Definite Matrices and the Symmetric Stein Divergence.” Available in author’s website at ”http://people.kyb.tuebingen.mpg.de/suvrit/” (2011). [45] Srivastava, Anuj, Jermyn, Ian, and Joshi, Shantanu. “Riemannian analysis of probability density functions with applications in vision.” Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 2007, 1–8. [46] Sturm, K. T. “Probability Measures on Metric Spaces of Nonpositive Curvature.” Heat Kernels and Analysis on Manifolds, Graphs, and Metric Spaces. 2003. [47] Terras, A. Harmonic Analysis on Symmetric Spaces and Applications. Springer-Verlag, 1985. [48] Tournier, Maxime, Wu, Xiaomao, Courty, Nicolas, Arnaud, Elise, and Reveret, Lionel. “Motion compression using principal geodesics analysis.” Computer Graphics Forum. vol. 28. Wiley Online Library, 2009, 355–364. [49] Vaillancourt, DE, Spraker, MB, Prodoehl, J, Abraham, I, Corcos, DM, Zhou, XJ, Comella, CL, and Little, DM. “High-resolution diffusion tensor imaging in the substantia nigra of de novo Parkinson disease.” Neurology 72 (2009).16: 1378–1384. [50] Wang, Yuanxiang, Salehian, Hesamoddin, Cheng, Guang, and Vemuri, Baba. “Tracking on the Product Manifold of Shape and Orientation for Tractography from Diffusion MRI.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 3051–3056. [51] Wang, Z. and Vemuri, B. “Tensor field segmentation using region based active contour model.” European Conference on Computer Vision (ECCV). 2004, 304–315. [52] Woods, Roger P. “Characterizing volume and surface deformations in an atlas framework: theory, applications, and implementation.” NeuroImage 18 (2003).3: 769–788. 87 [53] Wu, Jing, Smith, William AP, and Hancock, Edwin R. “Weighted principal geodesic analysis for facial gender classification.” IAPR. Springer, 2007, 331–339. [54] Wu, Yi, Wang, Jinqiao, and Lu, Hanqing. “Real-Time Visual Tracking via Incremental Covariance Model Update on Log-Euclidean Riemannian Manifold.” CCPR. 2009. [55] Xie, Yuchen, Vemuri, Baba C, and Ho, Jeffrey. “Statistical analysis of tensor fields.” Medical Image Computing and Computer-Assisted Intervention–MICCAI 2010. Springer, 2010. 682–689. [56] Zha, Hongyuan and Simon, Horst D. “On updating problems in latent semantic indexing.” SIAM Journal on Scientific Computing 21 (1999).2: 782–791. [57] Zhang, Miaomiao and Fletcher, P Thomas. “Probabilistic Principal Geodesic Analysis.” NIPS. 2013. 88 BIOGRAPHICAL SKETCH Hesamoddin Salehian was born in 1987 in Tehran, Iran. He graduated from high school in Semnan, Iran, in 2006. He received his Bachelor of Science degree in Computer Engineering from Sharif University of Technology, Tehran, Iran, in June 2010. He earned his Master of Science degree from University of Florida, Gainesville, in Computer Engineering in September 2014. He received his Doctor of Philosophy degree in Computer Engineering from University of Florida, in December 2014. His research interests revlove around Medical Image Analysis, Computer Vision and Machine Learning. 89