On Methods for Solving Symmetric Systems of Linear
Transcription
On Methods for Solving Symmetric Systems of Linear
On Methods for Solving Symmetric Systems of Linear Equations Arising in Optimization TOVE ODLAND Doctoral Thesis Stockholm, Sweden 2015 TRITA-MAT-A 2015:05 ISRN KTH/MAT/A-15/05-SE ISBN 978-91-7595-514-8 Department of Mathematics Royal Institute of Technology 100 44 Stockholm, Sweden Akademisk avhandling som med tillstånd av Kungl Tekniska Högskolan framlägges till offentlig granskning för avläggande av teknologie doktorsexamen, fredagen den 12 juni 2015 klockan 14.00 i sal F3, Lindstedtsvägen 26, Kungl Tekniska Högskolan, Stockholm. © Tove Odland, 2015 Print: Universitetsservice US AB, Stockholm, 2015 To F and S.B. "You are the sky; everything else is just weather." -Ani Pema Chödrön Abstract In this thesis we present research on mathematical properties of methods for solving symmetric systems of linear equations that arise in various optimization problem formulations and in methods for solving such problems. In the first and third paper (Paper A and Paper C), we consider the connection between the method of conjugate gradients and quasi-Newton methods on strictly convex quadratic optimization problems or equivalently on a symmetric system of linear equations with a positive definite matrix. We state conditions on the quasi-Newton matrix and the update matrix such that the search directions generated by the corresponding quasi-Newton method and the method of conjugate gradients respectively are parallel. In paper A, we derive such conditions on the update matrix based on a sufficient condition to obtain mutually conjugate search directions. These conditions are shown to be equivalent to the one-parameter Broyden family. Further, we derive a one-to-one correspondence between the Broyden parameter and the scaling between the search directions from the method of conjugate gradients and a quasi-Newton method employing some well-defined update scheme in the one-parameter Broyden family. In paper C, we give necessary and sufficient conditions on the quasi-Newton matrix and on the update matrix such that equivalence with the method of conjugate gradients hold for the corresponding quasi-Newton method. We show that the set of quasiNewton schemes admitted by these necessary and sufficient conditions is strictly larger than the one-parameter Broyden family. In addition, we show that this set of quasiNewton schemes includes an infinite number of symmetric rank-one update schemes. In the second paper (Paper B), we utilize an unnormalized Krylov subspace framework for solving symmetric systems of linear equations. These systems may be incompatible and the matrix may be indefinite/singular. Such systems of symmetric linear equations arise in constrained optimization. In the case of an incompatible symmetric system of linear equations we give a certificate of incompatibility based on a projection on the null space of the symmetric matrix and characterize a minimum-residual solution. Further we derive a minimum-residual method, give explicit recursions for the minimum-residual iterates and characterize a minimum-residual solution of minimum Euclidean norm. Keywords: symmetric system of linear equations, method of conjugate gradients, quasi-Newton method, unconstrained optimization, unconstrained quadratic optimization, Krylov subspace method, unnormalized Lanczos vectors, minimum-residual method vii Sammanfattning I denna avhandling betraktar vi matematiska egenskaper hos metoder för att lösa symmetriska linjära ekvationssystem som uppkommer i formuleringar och metoder för en mängd olika optimeringsproblem. I första och tredje artikeln (Paper A och Paper C), undersöks kopplingen mellan konjugerade gradientmetoden och kvasi-Newtonmetoder när dessa appliceras på strikt konvexa kvadratiska optimeringsproblem utan bivillkor eller ekvivalent på ett symmetrisk linjärt ekvationssystem med en positivt definit symmetrisk matris. Vi ställer upp villkor på kvasi-Newtonmatrisen och uppdateringsmatrisen så att sökriktningen som fås från motsvarande kvasi-Newtonmetod blir parallell med den sökriktning som fås från konjugerade gradientmetoden. I den första artikeln (Paper A), härleds villkor på uppdateringsmatrisen baserade på ett tillräckligt villkor för att få ömsesidigt konjugerade sökriktningar. Dessa villkor på kvasi-Newtonmetoden visas vara ekvivalenta med att uppdateringsstrategin tillhör Broydens enparameterfamilj. Vi tar också fram en ett-till-ett överensstämmelse mellan Broydenparametern och skalningen mellan sökriktningarna från konjugerade gradientmetoden och en kvasi-Newtonmetod som använder någon väldefinierad uppdateringsstrategi från Broydens enparameterfamilj. I den tredje artikeln (Paper C), ger vi tillräckliga och nödvändiga villkor på en kvasi-Newtonmetod så att nämnda ekvivalens med konjugerade gradientmetoden erhålls. Mängden kvasi-Newtonstrategier som uppfyller dessa villkor är strikt större än Broydens enparameterfamilj. Vi visar också att denna mängd kvasi-Newtonstrategier innehåller ett oändligt antal uppdateringsstrategier där uppdateringsmatrisen är en symmetrisk matris av rang ett. I den andra artikeln (Paper B), används ett ramverk för icke-normaliserade Krylovunderrumsmetoder för att lösa symmetriska linjära ekvationssystem. Dessa ekvationssystem kan sakna lösning och matrisen kan vara indefinit/singulär. Denna typ av symmetriska linjära ekvationssystem uppkommer i en mängd formuleringar och metoder för optimeringsproblem med bivillkor. I fallet då det symmetriska linjära ekvationssystemet saknar lösning ger vi ett certifikat för detta baserat på en projektion på nollrummet för den symmetriska matrisen och karaktäriserar en minimum-residuallösning. Vi härleder även en minimum-residualmetod i detta ramverk samt ger explicita rekursionsformler för denna metod. I fallet då det symmetriska linjära ekvationssystemet saknar lösning så karaktäriserar vi en minimum-residuallösning av minsta euklidiska norm. Nyckelord: symmetriska linjära ekvationssystem, konjugerade gradientmetoden, kvasiNewtonmetoder, optimering utan bivillkor, kvadratisk optimering utan bivillkor, Krylovunderrumsmetoder, icke-normaliserade Lanczosvektorer, minimum-residualmetoden viii Acknowledgments First of all I would like to express my deepest gratitude to my academic advisor and coauthor Professor Anders Forsgren. Your mathematical expertise, attention to detail and patience have made it a privilege to work with you these last five years. It has been an inspiration to work with a researcher who sees all of the possibilities and is fearless in the face of a sometimes conservative community. Secondly, I would like to thank my academic co-advisor Professor Krister Svanberg for supporting me and for giving me valuable pointers when writing this thesis. My officemate Hildur Æsa Oddsdóttir, I have really appreciated sharing a room with you. You are an expert, not only in your academic field, but also when it comes to the Stockholm housing situation, were to buy the best electronic equipment and so much more. We have had so many great discussions on everything from feminism to LaTeX. Thank you for being a great co-worker and friend. I would also like to thank the faculty members and postdocs at the Division of Optimization and Systems Theory at KTH and my fellow students, past and present, in the Applied and Computational Mathematics doctoral program. I gratefully acknowledge that the research described in this thesis has been partly financially supported by the Swedish Research Council (VR). Thank you to my family, my partner Jonas and my friends for your love and support. Stockholm, May 2015 Tove Odland ix Table of Contents Abstract vii Acknowledgments ix Table of Contents xi I 1 1 2 3 4 6 Introduction I.1 Introduction to optimization . . . . . . . . . . . . . . . . . . . . . . . . I.2 Unconstrained optimization . . . . . . . . . . . . . . . . . . . . . . . . . I.2.1 Linesearch methods for unconstrained optimization . . . . . . . . I.2.2 Quasi-Newton methods . . . . . . . . . . . . . . . . . . . . . . . I.2.3 The method of conjugate gradients . . . . . . . . . . . . . . . . . I.2.4 Connection between the method of conjugate gradients and quasiNewton methods on quadratic problems . . . . . . . . . . . . . . I.3 Symmetric systems of linear equations arising in constrained optimization I.3.1 Solving symmetric systems of linear equations . . . . . . . . . . I.3.2 Methods based on the symmetric Lanczos process . . . . . . . . I.4 Summary of papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I.4.1 Contribution by co-author . . . . . . . . . . . . . . . . . . . . . I.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 10 12 13 15 15 A Paper A A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2.1 Conjugate gradient method . . . . . . . . . . . . . . . . . . . . . A.2.2 Quasi-Newton methods . . . . . . . . . . . . . . . . . . . . . . . A.2.3 Background results . . . . . . . . . . . . . . . . . . . . . . . . . A.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.1 Update matrices defined by Proposition A.3.3 . . . . . . . . . . . A.3.2 Relation between δk and φk . . . . . . . . . . . . . . . . . . . . A.3.3 Remarks on when the one-parameter Broyden family is well-defined A.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 22 22 23 24 26 26 29 31 33 33 35 xi B Paper B B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.1 An extended representation of the unnormalized Lanczos vectors . B.3 Properties of the unnormalized Krylov framework . . . . . . . . . . . . . B.3.1 Convergence in the unnormalized Krylov framework . . . . . . . B.3.2 An unnormalized Krylov algorithm . . . . . . . . . . . . . . . . B.3.3 On the case when normalization is well defined . . . . . . . . . . B.4 Connection to the minimum-residual method . . . . . . . . . . . . . . . B.4.1 Convergence of the minimum-residual method . . . . . . . . . . B.4.2 A minimum-residual algorithm based on the unnormalized Krylov method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.5 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . B.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.6.1 A result on the Lanczos vectors . . . . . . . . . . . . . . . . . . B.6.2 Properties of the sequence {γk } . . . . . . . . . . . . . . . . . . B.6.3 The method of conjugate gradients . . . . . . . . . . . . . . . . . B.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C Paper C C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.2.1 Previous results on a sufficient conditions for equivalence for CG and QN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.3 Results on the equivalence of CG and QN on (QP) . . . . . . . . . . . . . C.3.1 Necessary and sufficient conditions for equivalence of CG and QN C.3.2 Symmetric rank-one update schemes . . . . . . . . . . . . . . . . C.3.3 Review of known results for the one-parameter Broyden family . C.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 37 39 41 41 44 45 46 46 48 49 49 53 54 55 55 56 58 59 63 65 66 67 69 69 73 76 78 78 80 Introduction I.1 Introduction to optimization Optimization is used in a great number of applications such as telecommunications, intensity-modulated radiation therapy treatment planning, crew-scheduling, modeling of metabolic reaction networks, structure/topology design, portfolio theory and traffic modeling to mention just a few. Depending on the application the difficulty may lay in either the modeling and/or solving the posed optimization problem. An optimization problem is a decision problem on the form min f (x), x∈F (I.1) where x ∈ Rn is the vector of decision variables, f , the objective function, is some function from Rn → R and F is the feasible region for x. The field of optimization can roughly be divided into two major parts. The first part is posing the problem (I.1), that is finding a suitable f and F to represent the problem at hand. This requires consideration since the choices made highly affects how hard the problem will be to solve. We will consider formulations where f is some smooth function over F, which means that f will be continuously differentiable as many times as required. The second part is to solve the posed problem (I.1) and determining the quality of the solution obtained. The choices made in the modeling will determine which procedures for solving (I.1) that will be adequate. In this thesis we focus on the mathematics behind the second part and consider in great detail methods for solving (I.1) and the symmetric systems of linear equations that arise in various formulations of optimization problems and methods for solving these. When solving a problem (I.1), the goal is to find a vector x� that is a global minimizer, i.e. x� ∈ F and f (x� ) ≤ f (x) for all x ∈ F. In some cases, a globally optimal solution can not be attained and then a locally optimal solution is sought, such a solution x� satisfy f (x� ) ≤ f (x) in some neighborhood of x� . In order to find x� one can use some method that systematically searches F without evaluating all feasible points. Depending on the characteristics of f and F there are different methods that are suitable. In general these methods work the following way: given a starting guess x0 , the method generates a sequence of iterates x1 , x2 , . . . , where each new iterate is in some way a better estimation than the previous of x� . Note that in some methods the intermediate iterates may not be feasible to F. This process continues until 1 2 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION one reaches an iterate that satisfies some stopping criteria based on feasibility and some conditions that guarantee local or global optimality. All the work in this thesis is done assuming exact arithmetic. There are many aspects that needs to be considered when implementing an algorithm and solving a problem in finite arithmetic that are beyond the scope of the work presented in this thesis. However, understanding the mechanics of the inner-most parts of optimization methods is crucial and we have devoted this thesis to the mathematics behind this. Notation The letters i, j and k are used to denote integer indices, other lowercase letters such as g, x, p and c denote column vectors in Rn , possibly with super- and/or subscripts. For a symmetric matrix H ∈ Rn×n , H � 0 denotes that H is positive definite, i.e. that v T Hv > 0 for all 0 �= v ∈ Rn . Analogously, H � 0 is used to denote that H is positive semidefinite, i.e. that v T Hv ≥ 0 for all v ∈ Rn . The null space and range space of H are denoted by N (H) and R(H) respectively. I.2 Unconstrained optimization Consider an unconstrained optimization problem, i.e. F = Rn in (I.1), given by min f (x), x∈Rn (P) where the function f : Rn → R is smooth. For such a general unconstrained problem it holds that if, i) ∇f (x� ) = 0, ii) ∇2 f (x� ) � 0, then x� is locally optimal solution. The above conditions are known as second-order sufficient optimality conditions in that they are sufficient for local optimality, see, e.g., [26, Chapter 11] and [1, Chapter 4]. An optimization problem is convex if the feasible region is convex and the objective function f is a convex function over the feasible region. If an optimization problem is convex, then all locally optimal solutions are globally optimal solutions and therefore this is a desirable property to have for a problem one wants to solve. See, e.g., [26, Chapter 11] for an introduction on the topic of convexity. It holds that an unconstrained problem (P) is convex if and only if ∇2 f (x) � 0 for all x ∈ Rn . Note that if f is a convex function on Rn then x� is a globally optimal solution if and only if ∇f (x� ) = 0, see. e.g. [1, Chapter 4]. For this reason, many methods for solving (P) will search for an iterate where the gradient of the objective function is zero, a so-called stationary point to the problem. In the next section a certain class of such methods known as linesearch methods is introduced. I NTRODUCTION 3 I.2.1 Linesearch methods for unconstrained optimization For our purposes, methods for unconstrained optimization problems (P) are divided into two classes, methods where second-derivative information is available and used and methods which only use first-derivative information. It deserves to be mentioned that there are also so-called derivative-free methods that uses neither first- nor second-derivative information, see, e.g., [9] for an introduction to such methods. We consider so-called linesearch methods that takes us from xk to the next iterate xk+1 by taking some step θk along a non-zero search direction pk , (I.2) xk+1 = xk + θk pk . Different linesearch methods arise depending on how the search direction pk is generated at each step k. A general requirement on a linesearch method is that it should have the descent property, i.e. f (xk + θk pk ) < f (xk ) for all k. This property is for example guaranteed if ∇f (xk )T pk < 0, i.e. pk is a descent direction of f at xk . Then there always exists a positive θk such that f (xk + θk pk ) < f (xk ) is satisfied, see, e.g. [22, Chapter 3]. There are many strategies on how to choose the steplength θk . Ideally, one would like to find the optimal value of θk corresponding to θk = argmin f (xk + θpk ), θ∈R given xk and pk . Using such a value of θk is referred to the linesearch method employing exact linesearch. However, it is not always worthwhile to use exact linesearch, especially for iterates that are far from the optimal solution. See, e.g., [41, Chapter 3] for a discussion on other strategies for choosing θk and their benefits. A common stopping criteria for linesearch methods is ∇f (x) = 0, i.e. the current iterate is a stationary point. If a linesearch method is implemented and used in finite precision then this stopping criteria will be �∇f (x)� < �, where � should be chosen as some small number in an appropriate manner. In addition it should be pointed out that for finite precision computations there are more stable choices for stopping criteria and one can also consider using a combination of such choices, see, e.g. [1, Chapter 11] and [22, Chapter 8]. When second-derivative information is available the perhaps most famous line-search method is Newton’s method, for which the search direction pk is given by solving ∇2 f (xk )pk = −∇f (xk ). (I.3) Newton’s method makes a quadratic model of f at xk . If ∇ f (xk ) � 0 then the unit step along pk takes us to the minimum of this quadratic model. If ∇2 f (xk ) is not positive definite then the quadratic model may not have a minimum and in this case there are strategies in which one uses a related positive-definite matrix instead of ∇2 f (xk ) in (I.3), see, e.g., [22, Chapter 4]. If Newton’s method is started sufficiently close to x� , then the convergence rate of the method is at least quadratic, see, e.g. [37, Chapter 7]. Note that if f is a strictly convex quadratic function then Newton’s methods finds the optimal solution in one iteration. 2 4 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION In this thesis we are concerned with methods for problems on the form (P) where second-derivative information is not available, but for which the gradient, ∇f (x), is available for all x ∈ Rn . One of the oldest and simplest methods of this kind is the method of steepest descent, also known as gradient descent, see, e.g., [45] for references to the origin of this method. In the method of steepest descent, the search direction pk is chosen in the negative direction of the gradient at xk , i.e. pk = −∇f (xk ). (I.4) Even though the search direction pk will always be a descent direction for ∇f (xk ) �= 0, (∇f (xk )T pk = −∇f (xk )T ∇f (xk ) < 0) this method may have a very poor convergence rate for some starting guesses and displaying a behavior known as ’zig-zagging’. Further, with the norm of the gradient as a stopping criteria one might run into problems as this norm tends to oscillate. For a discussion on the magnitude of the oscillations of the gradient norm in the method of steepest descent see, e.g., [40]. Next a class of first-order methods that is studied in two of the appended papers (Paper A and Paper C), namely quasi-Newton methods, is introduced. I.2.2 Quasi-Newton methods The first quasi-Newton method was suggested by Davidon in 1959 [11], using the term variable metric method. Davidon’s work caught the attention of the optimization community when Fletcher and Powell [17] presented a slightly modified method that outdid all other existing methods of its time. The paper by Fletcher and Powell marked the start of a period of intense research of quasi-Newton methods. For a thorough treatment of quasiNewton methods see, e.g., [41, Chapter 8], [16, Chapter 3] and [37, Chapter 9] and the references therein. An interesting note is that Davidon’s original paper was not published until 1991. Quasi-Newton methods comprise a class of methods that mimic Newton’s method by using quasi-Newton matrix Bk that is an approximation of the true Hessian ∇2 f (xk ) in (I.3). In effect, in a quasi-Newton method the search direction at step k is generated by solving Bk pk = −∇f (xk ), (I.5) usually with B0 = I, i.e. p0 is the steepest descent step. Different quasi-Newton methods arise depending on the choice of Bk in each iteration. The matrix Bk is usually given by adding an update matrix Uk to the previous approximation Bk−1 , i.e. Bk = Bk−1 + Uk . (I.6) One often considers updating a Cholesky factorization of Bk , then (I.5) can be solved in order of n2 operation. Further if the update matrix is of low rank, then one does not need to recompute the Cholesky factorization at each step, see, e.g., [22, Chapter 4]. In many paper, especially the earlier ones, an approximation of the inverse Hessian, Mk , is used and (I.5) then takes the form pk = −Mk gk . From a theoretical point of view I NTRODUCTION 5 it makes little difference if one works with Bk or Mk . In the work presented in this thesis we choose to work with quasi-Newton methods on the form given by (I.5). A famous class update schemes for quasi-Newton methods is the one-parameter Broyden family [3]. This family of update schemes depends on φk , a free parameter, known as the Broyden parameter. The one-parameter Broyden family includes the DFP (DavidonFletcher-Powell) update scheme, for φk = 1, which was the first quasi-Newton scheme to be suggested in [11, 17], the BFGS update scheme, for φk = 0, discovered independently by Broyden [4], Fletcher [15], Goldfarb [23] and Shanno [47] and the symmetric rank-one update scheme, SR1. All update schemes in the one-parameter Broyden family are of rank two except SR1 which is of rank one. There is a subclass of the one-parameter Broyden family called the convex Broyden family which is given by all convex combinations of the BFGS and DFP update schemes. Other parametric formulations of quasi-Newton update schemes are possible, see, e.g., [47, 49]. The update schemes in the one-parameter Broyden family can become not well defined e.g. for certain values (degenerate values) of the Broyden parameter φ, see, e.g., [19]. In many treatments the matrix Bk is assumed to be symmetric, for example the oneT parameter Broyden family has the property of hereditary symmetry, i.e. if Bk−1 = Bk−1 T then Bk = Bk . It is also possible to consider nonsymmetric update schemes, e.g. the Huang family of updates which has three degrees of freedom, see [34]. The symmetric part of the Huang family of updates can be shown to be equivalent to a class of updates schemes with two degrees of freedom defined by the self-scaling quasi-Newton algorithm by Oren and Luenberger [42]. In this thesis quasi-Newton methods for which Bk = BkT are the main focus. An interesting characteristic of the SR1 update scheme is that on a quadratic problem given any set of linearly independent search directions, the optimal solution will be found in at most n steps without the use of linear searches in each step, see, e.g. [10, 15]. In [52], the class of methods presented in [34] is modified to obtain this characteristic. For very large problems it might be too expensive to compute and store the full n × n matrix Bk at each step. In this case one can consider a limited-memory quasi-Newton method that uses an implicit representation of Bk , see, e.g., [41, Chapter 9]. Quasi-Newton methods can be shown to have Q-superlinear convergence, see, [13]. Note that, for this convergence, it is not required the {Bk } tends to the true Hessian at x� , only that it does so along the search directions {pk }. In 1972, Dixon [14] showed that all well-defined update schemes in the one-parameter Broyden family generate parallel search directions on (P) for any smooth function if perfect linesearch is used. Perfect linesearch is a generalization of exact linesearch for smooth nonconvex functions. For a strictly convex quadratic functions these search directions will in addition be mutually conjugate with respect to the Hessian and hence linearly independent. Hence, a quasi-Newton method using any well-defined update scheme in the one-parameter Broyden family has the property of finite termination on the corresponding quadratic optimization problem, see, e.g., [34]. Further, on a quadratic problem the search directions generated by a quasi-Newton method, using any well-defined update scheme from the one-parameter Broyden family, 6 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION will be parallel with those generated by the method of conjugate gradients(introduced in the next section) provided that both methods employ exact linesearch. I.2.3 The method of conjugate gradients Consider an unconstrained quadratic optimization problem 1 min q(x) = minn xT Hx + cT x, x∈R 2 x∈Rn (QP) where H = H T � 0. Note that solving (QP) is equivalent to solving a symmetric system of linear equations on the form Hx + c = 0, (I.7) where H = H T � 0. For (QP), the second-order sufficient optimality conditions are given by i) Hx� + c = 0, ii) H � 0. If H � 0 then there exists a unique solution to (I.7) or equivalently a unique optimal solution to (QP) given by x� = −H −1 c. For the case H � 0 there exist an infinite number of solutions to (I.7) if c ∈ R(H) and no solutions if c ∈ / R(H). In this section the method of conjugate gradients by Hestenes and Stiefel [33] is considered. This is a method for solving a system of symmetric linear equations on the form (I.7) and which can also be expressed as a linesearch method for (QP). The method of conjugate gradients is a first-order method and hence the matrix H is only needed to be available through matrix-vector multiplication. In a linesearch description of the method of conjugate gradients for solving (QP) the gradient of the objective function at xk is denoted by gk , i.e. gk = Hxk + c. In accordance with (I.2) the gradient is updated in each step as gk+1 = gk + θk Hpk . (I.8) On (QP) it is natural use exact linesearch for θk . Consider θ to be unknown and solve for the optimal θ, 1 min q(xk + θpk ) = min θ2 pTk Hpk + θpTk gk + q(xk ). θ∈R θ∈R 2 Note that q(xk ) is a constant term, hence, 0= � d� q(xk + θpk ) = θpTk Hpk + pTk gk . dθ It follows that the exact linesearch step is given by θk = − pTk gk . pTk Hpk (I.9) I NTRODUCTION 7 As Hpk is already needed for computation of gk+1 in (I.8), calculating the exact linesearch step only entails two vector-vector multiplications. In a linesearch description of the method of conjugate gradients the search directions are given by pCG = −g0 , 0 pCG = −gk + k gkT gk pCG k−1 , T g gk−1 k−1 (I.10) k ≥ 1. For the methods of conjugate gradients it holds that giT gj = 0 for all i �= j and also that gkT pCG = 0, for all j = 0, . . . , k − 1. In fact the gradients {gk } in the method of j conjugate gradients form an orthogonal basis for the Krylov subspaces generated by H and c, K0 (c, H) = {0}, Kk (c, H) = span{c, Hc, H 2 c, . . . , H k−1 c}, k ≥ 1, (I.11) i.e., it holds that gk ∈ Kk+1 (c, H) ∩ Kk (c, H)⊥ . Let r be the minimum index k for which Kk+1 (c, H) = Kk (c, H), then after generating r mutually orthogonal, and hence linearly independent, gradients the stopping criteria is reached as gr = 0. Hence, on (QP) the method of conjugate gradients has the property of finite termination. Further, the search directions given by (I.10) form an H-orthogonal basis for the expanding Krylov subspaces given by (I.11), i.e. the search directions are mutually conjugate with respect to H. See, e.g., [30] for an introduction to Krylov subspaces and Krylov subspace methods for solving systems of linear equations. There are several other ways to derive the method of conjugate gradients. For example, as a conjugate directions method where the Gram-Schmidt conjugation is based on the orthogonal gradients, as minimization of q(x) where x belongs expanding Krylov subspaces, as minimization of �xk −x� �H or equivalently �Hxk +c�H −1 , and as a method for solving a symmetric system of linear equations based on a normalized symmetric Lanczos process. For a variation of treatments of the method of conjugate gradients see, e.g. [12, Chapter 6], [25, Chapter 10], [41, Chapter 5] and [50] and the references therein. The method of conjugate gradients was first extended to a nonlinear version for solving general unconstrained problems (P) by Fletcher and Reeves [18]. For a survey on the Fletcher-Reeves and other nonlinear versions of the methods of conjugate gradients see, e.g., [32]. In [48], the topic of the method of conjugate gradients with inexact line searches is considered. Next the connection between the method of conjugate gradients and quasi-Newton methods on (QP) or equivalently for solving a symmetric system of linear equations with a positive definite matrix is discussed. This is one of the central themes of the thesis. Besides being important in its own right, quadratic optimization problems are often at the core of algorithms for more challenging non-linear optimizations problems. Further, many problems tend to behave quadratically in the region of an optimal solution. In addition, some methods, for example the method of conjugate gradients and BFGS-methods, 8 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION are considered effective when one wants to obtain an approximate solution to a quadratic problem with desirable characteristics in a timely manner, see, e.g. [6] for an article on intensity-modulated radiation therapy treatment planning where this property is used. I.2.4 Connection between the method of conjugate gradients and quasi-Newton methods on quadratic problems Even though the method of conjugate gradients and quasi-Newton methods may seem to be very different kind of methods, it is well-known that, when solving (QP) or equivalently when solving (I.7) with H = H T � 0, the method of conjugate gradients and a quasiNewton methods using any well-defined update from the one-parameter Broyden family generate parallel search directions provided that exact linesearch is used in both methods, see, e.g., [16, Chapter 3]. This implies that the corresponding x-iterates will be identical in exact arithmetic. In [38], this result is shown for the DFP-update scheme and in [39], Nazareth shows that for the BFGS update scheme the search directions are, in addition to being parallel, of the same length. The connection between the method of conjugate gradients and the BFGS update scheme for a quasi-Newton method is further explored by Buckley [5] where one of the results is that the BFGS update scheme can be carried out using any of the previous approximation matrices and equivalence with the method of conjugate gradients still holds on (QP). The search direction, pk , in a quasi-Newton method, given by solving (I.5), is determined by the choice of Bk in each iteration k. One can consider what the precise conditions are on Bk and/or the update matrix Uk such that pk and pCG are parallel, i.e. pk = δk pCG k k for some non-zero scalar δk . In two of the appended papers in this thesis (Paper A and Paper C), this topic is investigated. Next symmetric systems of linear equations that arise in problem formulations and methods for constrained optimization problems are considered. I.3 Symmetric systems of linear equations arising in constrained optimization Consider a constrained optimization problem on the form minx∈Rn subject to f (x) h(x) = 0, g(x) ≥ 0, (I.12) where f , h and g are smooth functions. Depending on further characteristics of the functions f , h and g there are different methods that are suitable when one wants to solve (I.12). Some methods in which symmetric systems of linear equations arise are mentioned next. Consider a problem on the form of (I.12) with a quadratic objective function and linear equality constraints, 1 T T minx∈Rn 2 x Hx + c x, (I.13) subject to Ax = b, I NTRODUCTION 9 where A ∈ Rm×n is of rank m. It holds that, if x� is a local minimizer of (I.13), then there exists λ� ∈ Rm such that � �� � � � x� −c H AT = . (I.14) � −λ b A 0 The vector λ� is the Lagrange multipliers corresponding to the iterate x� . The symmetric system of linear equations with a symmetric and indefinite matrix given in (I.14) are the first-order necessary optimality conditions for (I.13). Let Z be a matrix whose columns form a basis for the null space of A. It holds that x� is a global minimizer to (I.13) if and only if there exists λ� ∈ Rm such that (I.14) holds and the reduced Hessian Z T HZ is positive semi-definite. Solving a symmetric system of linear equations on the form (I.14) is important, not only for finding an optimal solution to (I.13) itself, but also in methods for solving more complex problems. An example of this is methods that employ sequential quadratic programming (SQP). An SQP-method is a method for solving (I.12) where in each step a search direction is found by solving a quadratic subproblem. For simplicity consider (I.12) with only equality constraints, then the subproblem will be on the form (I.13) where the objective function is a quadratic model of the Lagrangian function L(x, λ) = f (x) − λT h(x) at the current iterates (xk , λk ) and the constraints are given by a linearization of h around the current iterate xk . The first-order necessary optimality conditions of such a quadratic subproblem is given by (I.14), a symmetric system of linear equations with a matrix that is symmetric and indefinite. Such an SQP-method is equivalent to applying Newton iterations to the first-order necessary optimality conditions of (I.12) itself. Note that there are SQP-methods for (I.12) with both inequality and equality constraints. The iterates of an SQP-method will in general not be feasible to the original problem. See, e.g., [26, Chapter 15] for an introduction to SQP. Consider a quadratic problem with linear inequality constraints, minx∈Rn subject to 1 T 2 x Hx Ax ≥ b. + cT x, (I.15) When it comes to handling inequality constraints we present two approches. The first one is to consider an active-set method, in which one keeps track of the set of active constraints and treat them as equality constraints while ignoring the inactive constraints. In each iteration one then gets a problem on the form (I.13) and obtains a search direction to a new iterate at which the set of active constraints is updated. This implies that in each iteration one solves a symmetric system of linear equations with an indefinite symmetric matrix. The iterates of an active-set method will always be feasible to the original problem. See, e.g., [41, Chapter 15] for an introduction to active-set methods. The second approach entails moving in the interior of the feasible region, and in a sense letting all constraints be inactive but eventually reaching an optimal solution at which some constraints are active. Such a method is called an interior-point method. The main idea of interior-point methods is to perturb the first-order necessary optimality conditions of the 10 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION given problem. The perturbed optimality conditions for (I.15) for some µ > 0 are given by Ax − s = b, Hx − AT λ = −c, SΛe = µe, where S and Λ are diagonal matrices with the elements of s and λ on their respective diagonals and e is a vector of ones. The inequalities s ≥ 0 and λ ≥ 0 are kept implicitly in the method. These perturbed optimality conditions are called the primal-dual nonlinear equations. An primal-dual interior-point method is based on applying Newton iterations to the primal-dual nonlinear equations. Such an iteration can be written on the form of a symmetric system of linear equations with a matrix that is symmetric and indefinite given by � �� � � � ∆x H AT Hx + c − AT λ =− , −1 −1 −∆λ A −SΛ Ax − b − µΛ e where ∆s has been eliminated. Interior-point methods are illustrated here for solving (I.15), however these methods can be applied directly to (I.12) where the inequality constraints can be handled by a barrier transformation. There are also interior-point methods for solving linear programming problems on standard form. Symmetric systems of linear equations with an indefinite symmetric matrix arise in all of the above mentioned cases. See, e.g., [20, 21, 41, 53] for introductions to interior-point methods. I.3.1 Solving symmetric systems of linear equations Symmetric systems of linear equations on the form Hx + c = 0, (I.16) with H = H T arise in many methods for constrained optimization as indicated in the previous section. In addition, finding a solution to (I.16), if such a solution exists, or in the incompatible case a minimum-residual solution, argminx∈IRn �Hx + c�22 , are both important problems in the field of numerical linear algebra. As the only assumption on (I.16) is that H = H T , it may be the case that H is indefinite/singular and the system of linear equations (I.16) may be incompatible. Note that H and c in (I.16) represents the whole matrix on the left-hand side and the whole right-hand side of a symmetric system of linear equations, not to be confused with the H and c that appears as a components in e.g. the matrix and right-hand side of (I.14). In this thesis the mathematics behind so-called Krylov subspace methods for solving (I.16) is considered. These methods are based on generating an orthogonal basis for the expanding Krylov subspaces generated by H and c given in (I.11). In particular, this orthogonal basis will be generated by a symmetric version of a process given by Lanczos in [35, 36] and this process is commonly referred to as a symmetric Lanczos process. There has been a significant amount of research devoted to the theory on using Lanczos I NTRODUCTION 11 processes to solve (I.16) both for H symmetric and H non symmetric, see, e.g., [24,29,43]. The discussion here on solving (I.16) is aimed towards the treatment of this problem in the appended papers, it should be pointed out that solving (I.16) can also be done by applying different matrix-factorization techniques to H, see, e.g., [25, Chapter 5]. The symmetric Lanczos process is given as follows. Choose q0 = c/�c�2 such that �q0 �2 = 1, and generate {qk }rk=1 as −β0 q1 = −Hq0 + α0 q0 −βk qk+1 = −Hqk + αk qk + βk−1 qk−1 , k = 1, . . . , r − 1, (I.17) where the coefficients α and β are chosen such that one gets orthonormal vectors {qk }r−1 k=1 . These choices are given by, α0 = and β0 = q0T Hq0 , q0T q0 q0T Hq1 , q0T q0 αk = βk = qkT Hqk , qkT qk (I.18) k = 1, . . . r − 1, T qk−1 Hqk , T q qk−1 k−1 (I.19) k = 1, . . . r − 1. ⊥ Then {qk }r−1 k=0 are such that qk ∈ Kk+1 (c, H) ∩ Kk (c, H) and �qk �2 = 1 for all k = 0, . . . , r−1 and qr = 0. The three-term recurrence in (I.17) is due to the fact that H = H T , see, e.g. [46, Chapter 6]. Note that if one only requires that the generated vectors should be orthogonal then there is one degree of freedom in each iteration and (I.17) may be expressed as q1 = θ0 (−Hq0 + α0 q0 ) qk+1 = θk (−Hqk + αk qk + βk−1 qk−1 ), (I.20) k = 1, . . . , r − 1, where the scalar θk represents the available scaling factor in each iteration k and α and β are as in (I.18) and (I.19) respectively. These vectors {qk }rk=0 given by (I.20) are referred to as Lanczos vectors. Let Qk be the matrix with the Lanczos vectors q0 , q1 , . . . , qk as columns vectors, then (I.20) may be written on matrix form as, HQk = Qk+1 T̄k = Qk Tk − 1 qk+1 eTk+1 , θk where α0 − θ1 0 Tk = β0 .. . .. . .. .. . . 1 − θk−1 βk−1 αk , T̄k = � Tk − θ1k eTk+1 � . (I.21) 12 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION For the choice of θk = − β1k one obtains the recursion (I.17) and it holds that ||qk ||2 = ||q0 ||2 for all k. Further, for this choice of θk , Tk will be a symmetric matrix. Changing the set of {θk }r−1 k=0 can be seen as a similarity transform of Tk , see, e.g., [29]. Many methods for solving (I.16) use matrix-factorization techniques on Tk or T̄k given in (I.21). For an introduction to how Krylov subspace methods are formalized in this way, see, e.g., [8, 43]. In the work presented in this thesis we choose to work with the recursion in (I.20) directly. As the set of vectors {qk }r−1 k=0 form an orthogonal basis for the Krylov subspaces in (I.11), each vector may be expressed as qk = k � (j) γk H j c = H j=0 k �� j=1 � (j) (0) γk H j−1 c + γk c, k = 0, . . . , r, (j) for some scalars {γk }kj=0 . The following choice θ0 = 1 , α0 θk = 1 , αk + βk−1 (I.22) k = 1, . . . , r − 1, (0) of θk in (I.20) will give a vector qk+1 on the form of a gradient gk+1 , i.e. with γk = 1. (0) Requiring γk = 1 is usually referred to as the normalization/consistency condition. We will denote by a normalized symmetric Lanczos process the symmetric Lanczos process where the particular choice of scaling (I.22) is used and the generated vector are made to satisfy the normalization condition. Note that (I.22) may not always be a well defined expression, so a normalized symmetric Lanczos process may suffer a so-called pivot breakdown. The idea of utilizing an unnormalized symmetric Lanczos process in order to avoid such breakdowns is due to Gutknecht [27, 28].1 In an unnormalized setting (0) there is no restriction on the sign or value of γk . One can show that this quantity can not be zero in two consecutive iterations. This property is used in for example composite step biconjugate gradient methods and other methods employing look-ahead techniques to avoid breakdowns, see, e.g., [2, 7, 31]. I.3.2 Methods based on the symmetric Lanczos process A symmetric Lanczos process (normalized or not) is by itself not sufficient to obtain a solution to (I.16). Hence, a method for solving (I.16) needs to manage additional quantities such that a solution can be extracted when the stopping criteria is reached. We shall mention a few such Krylov subspace methods, but for a thorough treatment of such methods, both for the symmetric and non symmetric case, see, e.g., [46, Chapters 5, 6, 7] and [29]. The method of conjugate gradients may be viewed as a method for solving (I.16) based on the normalized symmetric Lanczos process. In the normalized case search directions 1 Gutknecht considers the non symmetric case and uses the unnormalized framework to avoid pivot breakdowns. When solving (I.16) for a nonsymmetric H there are several possible breakdowns for methods based on a nonsymmetric Lanczos process, see, e.g., [29, 46]. For the symmetric case the pivot breakdown is the only one that can happen. I NTRODUCTION 13 may be defined and the three-term recurrence in (I.20) may be reduced to a two-term recurrence using these search directions. Note that when viewing the method of conjugate gradients as a method based on a normalized symmetric Lanczos process it becomes clear that the search directions are not necessary for the formulation but rather may be viewed as a product of the normalization itself. The breakdown of the method of conjugate gradients when solving (I.16) corresponds exactly to the case when normalization is not possible. For the case H = H T � 0 no breakdown of the method of conjugate gradients will occur, and further it holds that if H = H T � 0 and c ∈ R(H) normalization will always be possible. As mentioned it is also possible to consider a method based on the unnormalized symmetric Lanczos process in order to avoid pivot-breakdowns. Such an unnormalized Krylov subspace method is described in Paper B and in terms of the survey [29] this method is called an inconsistent ORes version of the method of conjugate gradients. Further in the case when (I.16) is incompatible one can consider a minimum-residual method that delivers a minimum-residual solution to (I.16). Minimum residual methods was first suggested in Lanczos’ early paper [36] and by Stiefel in [51]. The name, minimum-residual method, is adopted from the implementation of the method, MINRES, by Paige and Saunders, see [44]. In this implementation, which is based on the symmetric Lanczos process, the optimal solution of minx∈IRn �Hx + c�22 obtained is in general not of minimum Euclidean norm. In [8], Choi, Paige and Saunders present a MINRESlike algorithm, MINRES-QLP, that does deliver a minimum-residual solution of minimum Euclidean norm. In the second paper of this thesis (Paper B) the mathematical properties of methods based on an unnormalized symmetric Lanczos process are considered and both a minimumresidual solution and a minimum-residual solution of minimum Euclidean norm in the case when (I.16) is incompatible are characterized. I.4 Summary of papers In this section a short summary of each of the appended papers is given, highlighting the main contributions. The common theme of the three papers is the in-depth study of the mathematical properties of methods for solving symmetric systems of linear equations arising in various optimization problem formulations and derivation of new theoretical results. Paper A A. Forsgren and T. Odland. On the connection between the conjugate gradient method and quasi-Newton methods on quadratic problems. Computational Optimization and Applications, 60 (2015), pp 377-392. We derive conditions on the update matrix Uk , and hence on the quasi-Newton matrix Bk such that the search directions generated by the corresponding quasi-Newton method and the method of conjugate gradients are parallel when applied to an unconstrained strictly 14 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION convex quadratic optimization problem or equivalently a symmetric system of linear equations with a symmetric positive definite matrix. The conditions state that the range of the update matrix must lie in the last two dimensions of the Krylov subspace generated by H and c and that in addition the so-called quasi-Newton condition must be satisfied. These conditions on Uk , that are based on a sufficient condition to obtain mutually conjugate search directions, are shown to be equivalent to the quasi-Newton scheme belonging to the one-parameter Broyden family. An explicit formula for the one-to-one correspondence between the Broyden parameter and the scaling of the search direction from the corresponding quasi-Newton method compared to the search direction in the method of conjugate gradients is stated. Paper B A. Forsgren and T. Odland. On solving symmetric systems of linear equations in an unnormalized Krylov subspace framework. arXiv:1409.4937 [math.OC], 2014. We study the mathematical properties of an unnormalized Krylov subspace framework for solving a symmetric system of linear equations Hx + c = 0. This framework is based on an unnormalized symmetric Lanczos process for generating an orthogonal basis for the expanding Krylov subspaces generated by H and c. Using only the three-term recurrence relations for a triple of quantities associated with each orthogonal vector from the unnormalized symmetric Lanczos process we give conditions on whether the symmetric system of linear equations is compatible or not. In the compatible case a solution is obtained and in the incompatible case we obtain a certificate of incompatibility based on a projection on the null space of H and characterize a minimum-residual solution to the symmetric system of linear equations. Further we derive a minimum-residual method in the unnormalized Krylov subspace framework and give explicit recursions of the minimum residual iterates. In the case of an incompatible symmetric system of linear equations we characterize a minimum-residual solution of minimum Euclidean norm. Paper C A. Forsgren and T. Odland. On the equivalence of the method of conjugate gradients and quasi-Newton methods on quadratic problems. arXiv:1503.01892 [math.OC], 2015. Submitted to Computational Optimization and Applications. We derive necessary and sufficient conditions on the quasi-Newton matrix Bk and the update matrix Uk such that the search direction in the corresponding quasi-Newton method is parallel to the search direction in the method of conjugate gradients on an unconstrained strictly convex quadratic problem or equivalently a symmetric system of linear equations with a symmetric positive definite matrix. We show that the set of quasi-Newton schemes admitted by these necessary and sufficient conditions are strictly larger than the one-parameter Broyden family. Further we show that this set of quasi-Newton schemes contains an infinite number of symmetric rank-one I NTRODUCTION 15 update schemes. The necessary and sufficient conditions derived in this paper admit nonsymmetric quasi-Newton schemes as well, however it is shown that symmetry of Bk can be achieved without effort. I.4.1 Contribution by co-author Throughout the work Anders Forsgren has acted as an advisor in the classical sense and is therefore co-author of all three papers. I.5 References [1] N. Andréasson, A. Evgrafov, and M. Patriksson. An Introduction to Continuous Optimization. Professional Publishing Svc., 2007. [2] R. E. Bank and T. F. Chan. An analysis of the composite step biconjugate gradient method. Numer. Math., 66(3):295–319, 1993. [3] C. G. Broyden. Quasi-Newton methods and their application to function minimisation. Math. Comp., 21:368–381, 1967. [4] C. G. Broyden. The convergence of a class of double-rank minimization algorithms. II. The new algorithm. J. Inst. Math. Appl., 6:222–231, 1970. [5] A. Buckley. Termination and equivalence results for conjugate gradient algorithms. Math. Programming, 29(1):64–76, 1984. [6] F. Carlsson and A. Forsgren. Iterative regularization in intensity-modulated radiation therapy optimization. Medical physics, 33(1):225–234, 2006. [7] T. F. Chan and T. Szeto. Composite step product methods for solving nonsymmetric linear systems. SIAM J. Sci. Comput., 17(6):1491–1508, 1996. [8] S.-C. T. Choi, C. C. Paige, and M. A. Saunders. MINRES-QLP: a Krylov subspace method for indefinite or singular symmetric systems. SIAM J. Sci. Comput., 33(4):1810–1836, 2011. [9] A. R. Conn, K. Scheinberg, and L. N. Vicente. Introduction to derivative-free optimization, volume 8 of MPS/SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Programming Society (MPS), Philadelphia, PA, 2009. [10] J. Cullum and R. K. Brayton. Some remarks on the symmetric rank-one update. J. Optim. Theory Appl., 29(4):493–519, 1979. [11] W. C. Davidon. Variable metric method for minimization. SIAM J. Optim., 1(1):1–17, 1991. 16 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION [12] J. W. Demmel. Applied numerical linear algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997. [13] J. E. Dennis, Jr. and J. J. Moré. A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comp., 28:549–560, 1974. [14] L. C. W. Dixon. Quasi-Newton algorithms generate identical points. Math. Programming, 2:383–387, 1972. [15] R. Fletcher. A new approach to variable metric algorithms. The Computer Journal, 13(3):317–322, Jan. 1970. [16] R. Fletcher. Practical methods of optimization. A Wiley-Interscience Publication. John Wiley & Sons Ltd., Chichester, second edition, 1987. [17] R. Fletcher and M. J. D. Powell. A rapidly convergent descent method for minimization. Comput. J., 6:163–168, 1963/1964. [18] R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. Comput. J., 7:149–154, 1964. [19] R. Fletcher and J. W. Sinclair. Degenerate values for Broyden methods. J. Optim. Theory Appl., 33(3):311–324, 1981. [20] A. Forsgren, P. E. Gill, and J. R. Shinnerl. Stability of symmetric ill-conditioned systems arising in interior methods for constrained optimization. SIAM J. Matrix Anal. Appl., 17:187–211, 1996. [21] A. Forsgren, P. E. Gill, and M. H. Wright. Interior methods for nonlinear optimization. SIAM Rev., 44(4):525–597 (2003), 2002. [22] P. E. Gill, W. Murray, and M. H. Wright. Practical optimization. Academic Press Inc. [Harcourt Brace Jovanovich Publishers], London, 1981. [23] D. Goldfarb. A family of variable-metric methods derived by variational means. Math. Comp., 24:23–26, 1970. [24] G. H. Golub and D. P. O’Leary. Some history of the conjugate gradient and Lanczos algorithms: 1948–1976. SIAM Rev., 31(1):50–102, 1989. [25] G. H. Golub and C. F. Van Loan. Matrix computations, volume 3 of Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, 1983. [26] I. Griva, S. G. Nash, and A. Sofer. Linear and nonlinear optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 2009. [27] M. H. Gutknecht. The unsymmetric Lanczos algorithms and their relations to Pade approximation, continued fractions, and the QD algorithm. Preliminary Proceedings of the Copper Mountain Conference on Iterative methods, April, 1990. I NTRODUCTION 17 [28] M. H. Gutknecht. A completed theory of the unsymmetric Lanczos process and related algorithms, part I. SIAM J. Matrix Anal. Appl., 13(2):594–639, 1992. [29] M. H. Gutknecht. Lanczos-type solvers for nonsymmetric linear systems of equations. In Acta numerica, 1997, volume 6 of Acta Numer., pages 271–397. Cambridge Univ. Press, Cambridge, 1997. [30] M. H. Gutknecht. A brief introduction to Krylov space methods for solving linear systems. In Frontiers of Computational Science, pages 53–62. Springer-Verlag Berlin, 2007. International Symposium on Frontiers of Computational Science, Nagoya Univ, Nagoya, Japan, Dec 12-13, 2005. [31] M. H. Gutknecht and K. J. Ressel. Look-ahead procedures for Lanczos-type product methods based on three-term Lanczos recurrences. SIAM J. Matrix Anal. Appl., 21(4):1051–1078, 2000. [32] W. W. Hager and H. Zhang. A survey of nonlinear conjugate gradient methods. Pac. J. Optim., 2(1):35–58, 2006. [33] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. J. Research Nat. Bur. Standards, 49:409–436 (1953), 1952. [34] H. Y. Huang. Unified approach to quadratically convergent algorithms for function minimization. J. Optimization Theory Appl., 5:405–423, 1970. [35] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Research Nat. Bur. Standards, 45:255–282, 1950. [36] C. Lanczos. Solution of systems of linear equations by minimized-iterations. J. Research Nat. Bur. Standards, 49:33–53, 1952. [37] D. G. Luenberger. Linear and nonlinear programming. Addison-Wesley Pub Co, Boston, MA, second edition, 1984. [38] G. E. Myers. Properties of the conjugate-gradient and Davidon methods. J. Optimization Theory Appl., 2:209–219, 1968. [39] L. Nazareth. A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms. SIAM J. Numer. Anal., 16(5):794–800, 1979. [40] J. Nocedal, A. Sartenaer, and C. Zhu. On the behavior of the gradient norm in the steepest descent method. Computational Optimization and Applications, 22(1):5–35, 2002. [41] J. Nocedal and S. J. Wright. Numerical optimization. Springer Series in Operations Research. Springer-Verlag, New York, 1999. 18 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION [42] S. S. Oren and D. G. Luenberger. Self-scaling variable metric (SSVM) algorithms. I. Criteria and sufficient conditions for scaling a class of algorithms. Management Sci., 20:845–862, 1973/74. Mathematical programming. [43] C. C. Paige. Krylov subspace processes, Krylov subspace methods, and iteration polynomials. In Proceedings of the Cornelius Lanczos International Centenary Conference (Raleigh, NC, 1993), pages 83–92, Philadelphia, PA, 1994. SIAM. [44] C. C. Paige and M. A. Saunders. Solutions of sparse indefinite systems of linear equations. SIAM J. Numer. Anal., 12(4):617–629, 1975. [45] S. S. Petrova and A. D. Solov’ev. The origin of the method of steepest descent. Historia Mathematica, 24(4):361 – 375, 1997. [46] Y. Saad. Iterative methods for sparse linear systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, second edition, 2003. [47] D. F. Shanno. Conditioning of quasi-Newton methods for function minimization. Math. Comp., 24:647–656, 1970. [48] D. F. Shanno. Conjugate gradient methods with inexact searches. Math. Oper. Res., 3(3):244–256, 1978. [49] D. F. Shanno and P. C. Kettler. Optimal conditioning of quasi-Newton methods. Math. Comp., 24:657–664, 1970. [50] J. R. Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical report, Pittsburgh, PA, USA, 1994. [51] E. Stiefel. Relaxationsmethoden bester Strategie zur Lösung linearer Gleichungssysteme. Comment. Math. Helv., 29:157–179, 1955. [52] A. Tits. Some investigations about a unified approach to quadratically convergent algorithms for function minimization. J. Optimization Theory Appl., 20(4):489–496, 1976. [53] S. J. Wright. Primal-dual interior-point methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997.