Advanced Calculus Michael E. Taylor Contents 0. One variable calculus
Transcription
Advanced Calculus Michael E. Taylor Contents 0. One variable calculus
Advanced Calculus Michael E. Taylor Contents 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. One variable calculus The derivative Inverse function and implicit function theorem Fundamental local existence theorem for ODE Constant coefficient linear systems; exponentiation of matrices Variable coefficient linear systems of ODE; Duhamel’s principle Dependence of solutions on initial data, and other parameters Flows and vector fields Lie brackets Differential equations from classical mechanics Central force problems The Riemann integral in n variables Integration on surfaces Differential forms Products and exterior derivatives of forms The general Stokes formula The classical Gauss, Green, and Stokes formulas Holomorphic functions and harmonic functions 18. 19. 20. 21. Variational problems; the stationary action principle The symplectic form; canonical transformations Topological applications of differential forms Critical points and index of a vector field A. Metric spaces, compactness, and all that B. Sard’s Theorem 1 2 Introduction The goal of this text is to present a compact treatment of the basic results of Advanced Calculus, to students who have had 3 semesters of calculus, including an introductory treatment of multivariable calculus, and who have had a course in linear algebra. We begin with an introductory section, §0, presenting the elements of onedimensional calculus. We first define the integral of a class of functions on an interval. We then introduce the derivative, and establish the Fundamental Theorem of Calculus, relating differentiation and integration as essentially inverse operations. Further results are dealt with in the exercises, such as the change of variable formula for integrals, and the Taylor formula with remainder. In §1 we define the derivative of a function F : O → Rm , when O is an open subset of Rn , as a linear map from Rn to Rm . We establish some basic properties, such as the chain rule. We use the one-dimensional integral as a tool to show that, if the matrix of first order partial derivatives of F is continuous on O, then F is differentiable on O. We also discuss two convenient multi-index notations for higher derivatives, and derive the Taylor formula with remainder for a smooth function F on O ⊂ Rn . In §2 we establish the inverse function theorem, stating that a smooth map F : O → Rn with an invertible derivative DF (p) has a smooth inverse defined near q = F (p). We derive the implicit function theorem as a consequence of this. In §3 we establish the fundamental local existence theorem for systems of ordinary differential equations. Historically and logically, differential equations is a major subject in calculus. Our treatmant emphasizes the role of ODE, to a greater extent than most contemporary treatments of ‘Advanced Calculus,’ in this respect following such classics as [Wil]. Sections 4-8 develop further the theory of ODE. We discuss explicit solutions to constant coefficient linear systems in §4, via the matrix exponential. In §4 we also derive a number of basic results on matrix theory and linear algebra. We proceed to some general results on variable coefficient linear systems, in §5, and on dependence of solutions to nonlinear systems on parameters, in §6. This leads to a geometrical picture of ODE theory, in §§7-8, via a study of flows generated by vector fields, and the role of the Lie bracket of vector fields. In §9 we discuss a major source of ODE problems, namely problems arising from classical mechanics, expressed as differential equations via Newton’s laws. We show how some can be converted to first order systems of a sort called Hamiltonian systems. We will return to general results on such systems in §§18 and 19. For the present, in §10 we treat a special class of equations, arising from central force problems, including particularly the Kepler problem, for planetary motion. In §11 we take up the multidimensional Riemann integral. The basic definition is quite parallel to the one-dimensional case, but a number of central results, such as 3 the change of variable formula, while parallel in statement to the one-dimensional case, require more elaborate demonstrations in higher dimensions. We also study the reduction of multiple integrals to iterated integrals, a convenient computational device. In §12 we pass from integrals of functions defined on domains in Rn to integrals of functions defined on n-dimensional surfaces. This includes the study of surface area. Surface area is defined via a ‘metric tensor,’ an object which is a little more complicated than a vector field. In §13 we introduce a further class of objects which are defined on surfaces, differential forms. A k-form can be integrated over a k-dimensional surface, endowed with an extra piece of structure, an ‘orientation.’ The change of variable formula established in §11 plays a central role in establishing this. Properties of differential forms are developed further in the next two sections. In §14 we define exterior products of forms, and interior products of forms with vector fields. Then we define the exterior derivative of a form, and the Lie derivative of a form, and relate these concepts via an important identity. We establish the Poincare Lemma, characterizing when a k-form is the exterior derivative of some (k − 1)-form, on a ball in Rn . Section 15 is devoted to the general Stokes formula, an important integral identity which contains as special cases classical identities of Green, Gauss, and Stokes. These special cases are discussed in some detail in §16. In §17 we use Green’s Theorem to derive fundamental properties of holomorphic functions of a complex variable. Sprinkled throughout earlier sections are some allusions to functions of complex variables, particularly in some of the exercises in §§1, 2, and 6. Readers who have had no prior exposure to complex variables should be able to return to these exercises after getting through §17. In this section, we also discuss some results on the closely related study of harmonic functions. One result is Liouville’s Theorem, stating that a bounded harmonic function on all of Rn must be constant. When specialized to holomorphic functions on C = R2 , this yields a proof of the Fundamental Theorem of Algebra. In §18 we delve more deeply into matters introduced in §9, showing how many problems in classical physics can be systematically treated as variational problems, by writing down a ‘Lagrangian,’ and showing how a broad class of variational problems can be converted to Hamiltonian systems. Other variational problems, such as finding geodesics, i.e., length-minimizing curves on surfaces, are also treated. In §19 we introduce a differential form, called the symplectic form, and show how its use clarifies the study of Hamiltonian systems. In §20 we derive some topological applications of differential forms. We define the Brouwer degree of a map between two compact oriented surfaces, and establish some of its basic properties, as a consequence of a characterization of which n-forms on a compact oriented surface of dimension n are exact. We establish a number of topological results, including Brouwer’s fixed point theorem, and the Jordan curve theorem and the Jordan-Brouwer separation theorem (in the smooth case). In the exercises we sketch a degree-theory proof of the Fundamental Theorem of Algebra, which one can contrast with the proof given in §17, using complex analysis. In §21 4 we use degree theory to study the index of a vector field, an important quantity defined in terms of the nature of the critical points of a vector field. There are two appendices at the end of these notes. Appendix A covers some basic notions of metric spaces and compactness, used from time to time throughout the notes, particularly in the study of the Riemann integral and in the proof of the fundamental existence theorem for ODE. Appendix B discusses the easy case of Sard’s Theorem, used at one point in the development of degree thory in §20. There are also scattered exercises intended to give the reader a fresh perspective on topics familiar from previous study. We mention in particular exercises on determinants in §1, on trigonometric functions in §3, and on the cross product in §5. 5 0. One variable calculus In this brief discussion of one variable calculus, we first consider the integral, and then relate it to the derivative. We will define the Riemann integral of a bounded function over an interval I = [a, b] on the real line. To do this, we partition I into smaller intervals. A partition P of I is a finite collection of subintervals {Jk : 0 ≤ k ≤ N }, disjoint except for their endpoints, whose union is I. We can order the Jk so that Jk = [xk , xk+1 ], where (0.1) x0 < x1 < · · · < xN < xN +1 , x0 = a, xN +1 = b. We call the points xk the endpoints of P. We set (0.2) `(Jk ) = xk+1 − xk , We then set I P (f ) = maxsize(P) = max `(Jk ) 0≤k≤N X k (0.3) I P (f ) = X k sup f (x) `(Jk ), Jk inf f (x) `(Jk ). Jk Note that I P (f ) ≤ I P (f ). These quantities should approximate the Riemann integral of f, if the partition P is sufficiently ‘fine.’ To be more precise, if P and Q are two partitions of I, we say P refines Q, and write P  Q, if P is formed by partitioning each interval in Q. Equivalently, P  Q if and only if all the endpoints of Q are also endpoints of P. It is easy to see that any two partitions have a common refinement; just take the union of their endpoints, to form a new partition. Note also that (0.4) P  Q =⇒ I P (f ) ≤ I Q (f ), and I P (f ) ≥ I Q (f ). Consequently, if Pj are any two partitions and Q is a common refinement, we have (0.5) I P1 (f ) ≤ I Q (f ) ≤ I Q (f ) ≤ I P2 (f ). Now, whenever f : I → R is bounded, the following quantities are well defined: (0.6) I(f ) = inf P∈Π(I) I P (f ), I(f ) = sup P∈Π(I) I P (f ), where Π(I) is the set of all partitions of I. Clearly, by (0.5), I(f ) ≤ I(f ). We then say that f is Riemann integrable provided I(f ) = I(f ), and in such a case, we set Z b Z (0.7) f (x)dx = f (x)dx = I(f ) = I(f ). a I We will denote the set of Riemann integrable functions on I by R(I). We derive some basic properties of the Riemann integral. 6 Proposition 0.1. If f, g ∈ R(I), then f + g ∈ R(I), and Z Z Z (0.8) (f + g) dx = f dx + g dx. I I I Proof. If Jk is any subinterval of I, then sup (f + g) ≤ sup f + sup g, Jk Jk Jk so, for any partition P, we have I P (f + g) ≤ I P (f ) + I P (g). Also, using common refinements, we can simultaneously approximate I(f ) and I(g) by I P (f ) and I P (g). Thus the characterization (0.6) implies I(f + g) ≤ I(f ) + I(g). A parallel argument implies I(f + g) ≥ I(f ) + I(g), and the proposition follows. Next, there is a fair supply of Riemann integrable functions. Proposition 0.2. If f is continuous on I, then f is Riemann integrable. Proof. Any continuous function on a compact interval is uniformly continuous; let ω(δ) be a modulus of continuity for f, so (0.9) |x − y| ≤ δ =⇒ |f (x) − f (y)| ≤ ω(δ), ω(δ) → 0 as δ → 0. Then (0.10) maxsize(P) ≤ δ =⇒ I P (f ) − I P (f ) ≤ ω(δ) · `(I), which yields the proposition. We denote the set of continuous functions on I by C(I). Thus Proposition 0.2 says C(I) ⊂ R(I). The last argument, showing that every continuous function is Riemann integrable, also provides a criterion on a partition P guaranteeing that I P (f ) and I P (f ) R are close to f dx, when f is continuous. The following is a useful extension. Let I f ∈ R(I), take ε > 0, and let P0 be a partition such that Z (0.11) I P0 (f ) − ε ≤ f dx ≤ I P0 (f ) + ε. I Let (0.12) M = sup |f (x)|, I δ = minsize(P0 ) = min {`(Jk ) : Jk ∈ P0 }. 7 Proposition 0.3. Under the hypotheses above, if P is any partition of I satisfying (0.13) maxsize(P) ≤ δ , k then Z (0.14) I P (f ) − ε1 ≤ f dx ≤ I P (f ) + ε1 , with ε1 = ε + 2M `(I). k I Proof. Consider on the one hand those intervals in P which are contained in intervals in P0 and on the other hand those intervals in P which are not contained in intervals in P0 (whose lengths sum to ≤ `(I)/k). Let P1 denote the minimal common refinement of P and P0 . We obtain: I P (f ) ≤ I P1 (f ) + 2M `(I), k I P (f ) ≥ I P1 (f ) − 2M `(I). k Since also I P1 (f ) ≤ I P0 (f ) and I P1 (f ) ≥ I P0 (f ), this implies (0.14). The following corollary is sometimes called Darboux’s Theorem. Corollary 0.4. Let Pν be any sequence of partitions of I into ν intervals Jνk , 1 ≤ k ≤ ν, such that maxsize(Pν ) = δν → 0, and let ξνk be any choice of one point in each interval Jνk of the partition Pν . Then, whenever f ∈ R(I), Z (0.15) f (x) dx = lim ν→∞ I ν X f (ξνk ) `(Jνk ). k=1 However, one should be warned that, once such a specific choice of Pν and ξνk has been made, the limit on the right side of (0.15) might exist for a bounded function f that is not Riemann integrable. This and other phenomena are illustrated by the following example of a function which is not Riemann integrable. For x ∈ I, set (0.16) ϑ(x) = 1 if x ∈ Q, ϑ(x) = 0 if x ∈ / Q, where Q is the set of rational numbers. Now every interval J ⊂ I of positive length contains points in Q and points not in Q, so for any partition P of I we have I P (ϑ) = `(I) and I P (ϑ) = 0, hence (0.17) I(ϑ) = `(I), I(ϑ) = 0. 8 Note that, if Pν is a partition of I into ν equal subintervals, then we could pick each ξνk to be rational, in which case the limit on the right side of (10.15) would be `(I), or we could pick each ξνk to be irrational, in which case this limit would be zero. Alternatively, we could pick half of them to be rational and half to be irrational, and the limit would be 12 `(I). Associated to the Riemann integral is a notion of size of a set S, called content. If S is a subset of I, define the ‘characteristic function’ (0.18) χS (x) = 1 if x ∈ S, 0 if x ∈ / S. We define ‘upper content’ cont+ and ‘lower content’ cont− by cont+ (S) = I(χS ), (0.19) cont− (S) = I(χS ). We say S ‘has content,’ or ‘is contented’ is these quantities are equal, which happens if and only if χS ∈ R(I), in which case the common value of cont+ (S) and cont− (S) is Z (0.20) m(S) = χS (x) dx. I It is easy to see that (0.21) + cont (S) = inf N nX o `(Jk ) : S ⊂ J1 ∪ · · · ∪ JN , k=1 where Jk are intervals. Here, we require S to be in the union of a finite collection of intervals. There is a more sophisticated notion of the size of a subset of I, called Lebesgue measure. The key to the construction of Lebesgue measure is to cover a set S by a countable (either finite or infinite) set of intervals. The outer measure of S ⊂ I is defined by nX [ o Jk . (0.22) m∗ (S) = inf `(Jk ) : S ⊂ k≥1 k≥1 Here {Jk } is a finite or countably infinite collection of intervals. Clearly (0.23) m∗ (S) ≤ cont+ (S). Note that, if S = I ∩ Q, then χS = ϑ, defined by (0.16). In this case it is easy to see that cont+ (S) = `(I), but m∗ (S) = 0. Zero is the ‘right’ measure of this set. More material on the development of measure theory can be found in a number of books, including [Fol] and [T2]. R It is useful to note that I f dx is additive in I, in the following sense. 9 ¯ ¯ Proposition 0.5. If a < b < c, f : [a, c] → R, f1 = f ¯[a,b] , f2 = f ¯[b,c] , then ¡ ¢ ¡ ¢ ¡ ¢ f ∈ R [a, c] ⇐⇒ f1 ∈ R [a, b] and f2 ∈ R [b, c] , (0.24) and, if this holds, Z Z c (0.25) f dx = a Z b f1 dx + a c f2 dx. b Proof. Since any partition of [a, c] has a refinement for which b is an endpoint, we may as well consider a partition P = P1 ∪ P2 , where P1 is a partition of [a, b] and P2 is a partition of [b, c]. Then (0.26) I P (f ) = I P1 (f1 ) + I P2 (f2 ), I P (f ) = I P1 (f1 ) + I P2 (f2 ), so (0.27) © ª © ª I P (f ) − I P (f ) = I P1 (f1 ) − I P1 (f1 ) + I P2 (f2 ) − I P2 (f2 ) . Since both terms in braces in (0.27) are ≥ 0, we have equivalence in (0.24). Then (0.25) follows from (0.26) upon taking sufficiently fine partitions. Let I = [a, b]. If f ∈ R(I), then f ∈ R([a, x]) for all x ∈ [a, b], and we can consider the function Z x (0.28) g(x) = f (t) dt. a If a ≤ x0 ≤ x1 ≤ b, then Z (0.29) x1 g(x1 ) − g(x0 ) = f (t) dt, x0 so, if |f | ≤ M, (0.30) |g(x1 ) − g(x0 )| ≤ M |x1 − x0 |. In other words, if f ∈ R(I), then g is Lipschitz continuous on I. A function g : (a, b) → R is said to be differentiable at x ∈ (a, b) provided there exists the limit (0.31) lim h→0 ¤ 1£ g(x + h) − g(x) = g 0 (x). h When such a limit exists, g 0 (x), also denoted dg/dx, is called the derivative of g at x. Clearly g is continuous wherever it is differentiable. The next result is part of the Fundamental Theorem of Calculus. 10 Theorem 0.6. If f ∈ C([a, b]), then the function g, defined by (0.28), is differentiable at each point x ∈ (a, b), and g 0 (x) = f (x). (0.32) Proof. Parallel to (0.29), we have, for h > 0, (0.33) ¤ 1 1£ g(x + h) − g(x) = h h Z x+h f (t) dt. x If f is continuous at x, then, for any ε > 0, there exists δ > 0 such that |f (t) − f (x)| ≤ ε whenever |t − x| ≤ δ. Thus the right side of (0.33) is within ε of f (x) whenever h ∈ (0, δ]. Thus the desired limit exists as h & 0. A similar argument treats h % 0. The next result is the rest of the Fundamental Theorem of Calculus. Theorem 0.7. If G is differentiable and G0 (x) is continuous on [a, b], then Z b (0.34) G0 (t) dt = G(b) − G(a). a Proof. Consider the function Z (0.35) x g(x) = G0 (t) dt. a We have g ∈ C([a, b]), g(a) = 0, and, by Theorem 0.6, g 0 (x) = G0 (x), ∀ x ∈ (a, b). Thus f (x) = g(x) − G(x) is continuous on [a, b], and (0.36) f 0 (x) = 0, ∀ x ∈ (a, b). We claim that (0.36) implies f is constant on [a, b]. Granted this, since f (a) = g(a) − G(a) = −G(a), we have f (x) = −G(a) for all x ∈ [a, b], so the integral (0.31) is equal to G(x) − G(a) for all x ∈ [a, b]. Taking x = b yields (0.30). The fact that (0.36) implies f is constant on [a, b] is a consequence of the following result, known as the Mean Value Theorem. 11 Theorem 0.8. Let f : [a, β] → R be continuous, and assume f is differentiable on (a, β). Then ∃ ξ ∈ (a, β) such that (0.37) f 0 (ξ) = f (β) − f (a) . β−a Proof. Replacing f (x) by fe(x) = f (x) − κ(x − a), where κ is the right side of (0.37), we can assume without loss of generality that f (a) = f (β). Then we claim that f 0 (ξ) = 0 for some ξ ∈ (a, β). Indeed, since [a, β] is compact, f must assume a maximum and a minumum on [a, β]. If f (a) = f (β), one of these must be assumed at an interior point, ξ, at which f 0 clearly vanishes. Now, to see that (0.36) implies f is constant on [a, b], if not, ∃ β ∈ (a, b] such that f (β) 6= f (a). Then just apply Theorem 0.8 to f on [a, β]. This completes the proof of Theorem 0.7. If a function G is differentiable on (a, b), and G0 is continuous on (a, b), ¡we say¢ ¡ ¢ G is a C 1 function, ¡and write G ∈ C 1 (a, b) . Inductively, we say G ∈ C k (a, b) ¢ provided G0 ∈ C k−1 (a, b) . An easy consequence of the definition (0.31) of the derivative is that, for any real constants a, b, and c, f (x) = ax2 + bx + c =⇒ df = 2ax + b. dx Now, it is a simple enough step to replace a, b, c by y, z, w, in these formulas. Having done that, we can regard y, z, and w as variables, along with x : F (x, y, z, w) = yx2 + zx + w. We can then hold y, z and w fixed (e.g., set y = a, z = b, w = c), and then differentiate with respect to x; we get ∂F = 2yx + z, ∂x the partial derivative of F with respect to x. Generally, if F is a function of n variables, x1 , . . . , xn , we set ∂F (x1 , . . . , xn ) ∂xj ¤ 1£ = lim F (x1 , . . . , xj−1 , xj + h, xj+1 , . . . , xn ) − F (x1 , . . . , xj−1 , xj , xj+1 , . . . , xn ) , h→0 h where the limit exists. Section one carries on with a further investigation of the derivative of a function of several variables. 12 Exercises 1. Let c > 0 and let f : [ac, bc] → R be Riemann integrable. Working directly with the definition of integral, show that Z b a 1 f (cx) dx = c Z bc f (x) dx. ac More generally, show that Z b−d/c a−d/c 1 f (cx + d) dx = c Z bc f (x) dx. ac n 2. R Let f : I × S → R be continuous, where I = [a, b] and S ⊂ R . Take ϕ(y) = f (x, y) dx. Show that ϕ is continuous on S. I Hint. If fj : I → R are continuous and |f1 (x) − f2 (x)| ≤ δ on I, then Z ¯Z ¯ ¯ ¯ ¯ f1 dx − f2 dx¯ ≤ `(I)δ. I I 3. With f as in problem 2, suppose gj : S → R are continuous and a ≤ g0 (y) < R g (y) g1 (y) ≤ b. Take ϕ(y) = g01(y) f (x, y) dx. Show that ϕ is continuous on S. Hint. Make a change of variables, linear in y, to reduce this to problem 2. 4. Suppose ¡ f ¢: (a, b) → (c, d) and g : (c, d) → R are differentiable. Show that h(x) = g f (x) is differentiable and ¡ ¢ h0 (x) = g 0 f (x) f 0 (x). (0.38) This is the chain rule. 5. If f1 and f2 are differentiable on (a, b), show that f1 (x)f2 (x) is differentiable and (0.39) ¢ d ¡ f1 (x)f2 (x) = f10 (x)f2 (x) + f1 (x)f20 (x). dx 13 If f2 (x) 6= 0, ∀ x ∈ (a, b), show that f1 (x)/f2 (x) is differentiable and d ³ f1 (x) ´ f10 (x)f2 (x) − f1 (x)f20 (x) (0.40) = . dx f2 (x) f2 (x)2 6. If f : (a, b) → R is differentiable, show that d (0.41) f (x)n = nf (x)n−1 f 0 (x), dx for all positive integers n. If f (x) 6= 0 ∀ x ∈ (a, b), show that this holds for all n ∈ Z. Hint. To get (0.41) for positive n, use induction and apply (0.39). 7. Let ϕ : [a, b] → [A, B] be C 1 on a neighborhood J of [a, b], with ϕ0 (x) > 0 for all x ∈ [a, b]. Assume ϕ(a) = A, ϕ(b) = B. Show that the identity Z b Z B ¡ ¢ f (y) dy = f ϕ(t) ϕ0 (t) dt, (0.42) A a for any f ∈ C(J), follows from the chain rule and the fundamental theorem of calculus. Hint. Replace b by x, B by ϕ(x), and differentiate. Note that this result contains that of Problem 1. Try to establish (0.42) directly by working with the definition of the integral as a limit of Riemann sums. 8. Show that, if f and g are C 1 on a neighborhood of [a, b], then Z b Z b £ ¤ 0 (0.43) f (s)g (s) ds = − f 0 (s)g(s) ds + f (b)g(b) − f (a)g(a) . a a This transformation of integrals is called ‘integration by parts.’ 9. Let f : (−a, a) → R be a C j+1 function. Show that, for x ∈ (−a, a), (0.44) where f (x) = f (0) + f 0 (0)x + Z f 00 (0) 2 f (j) (0) j x + ··· + x + Rj (x), 2 j! Z 1 ´ ³¡ 1 ¢ (x − s)j (j+1) xj+1 (0.45) Rj (x) = f (s) ds = f (j+1) 1 − t j+1 x dt. j! (j + 1)! 0 0 This is Taylor’s formula with remainder. Hint. Use induction. If (0.44)-(0.45) holds for 0 ≤ j ≤ k, show that it holds for j = k + 1, by showing that Z x Z x (x − s)k (k+1) f (k+1) (0) k+1 (x − s)k+1 (k+2) (0.46) f (s) ds = x + f (s) ds. k! (k + 1)! (k + 1)! 0 0 To establish this, use the integration by parts formula (0.43), with f (s) replaced by f (k+1) (s), and with appropriate g(s). x 14 1. The derivative Let O be an open subset of Rn , and F : O → Rm a continuous function. We say F is differentiable at a point x ∈ O, with derivative L, if L : Rn → Rm is a linear transformation such that, for y ∈ Rn , small, (1.1) F (x + y) = F (x) + Ly + R(x, y) with (1.2) kR(x, y)k/kyk → 0 as y → 0. We denote the derivative at x by DF (x) = L. With respect to the standard bases of Rn and Rm , DF (x) is simply the matrix of partial derivatives, ¡ ¢ (1.3) DF (x) = ∂Fj /∂xk , so that, if v = (v1 , . . . , vn ), (regarded as a column vector) then ³X ´ X (1.4) DF (x)v = (∂F1 /∂xk )vk , . . . , (∂Fm /∂xk )vk . k k It will be shown below that F is differentiable whenever all the partial derivatives exist and are continuous on O. In such a case we say F is a C 1 function on O. More generally, F is said to be C k if all its partial derivatives of order ≤ k exist and are continuous. In (1.2), we can use the Euclidean norm on Rn and Rm . This norm is defined by (1.5) ¡ ¢1 kxk = x21 + · · · + x2n 2 for x = (x1 , . . . , xn ) ∈ Rn . Any other norm would do equally well. We now derive the chain rule for the derivative. Let F : O → Rm be differentiable at x ∈ O, as above, let U be a neighborhood of z = F (x) in Rm , and let G : U → Rk be differentiable at z. Consider H = G ◦ F. We have (1.6) H(x + y) = G(F (x + y)) ¡ ¢ = G F (x) + DF (x)y + R(x, y) ¡ ¢ = G(z) + DG(z) DF (x)y + R(x, y) + R1 (x, y) = G(z) + DG(z)DF (x)y + R2 (x, y) with kR2 (x, y)k/kyk → 0 as y → 0. 15 Thus G ◦ F is differentiable at x, and (1.7) D(G ◦ F )(x) = DG(F (x)) · DF (x). Another useful remark is that, by the fundamental theorem of calculus, applied to ϕ(t) = F (x + ty), Z (1.8) F (x + y) = F (x) + 1 DF (x + ty)y dt, 0 provided F is C 1 . For a typical application, see (6.6). A closely related application of the fundamental theorem of calculus is that, if we assume F : O → Rm is differentiable in each variable separately, and that each ∂F/∂xj is continuous on O, then n n X X £ ¤ F (x + y) = F (x) + F (x + zj ) − F (x + zj−1 ) = F (x) + Aj (x, y)yj , (1.9) j=1 Z Aj (x, y) = 0 j=1 1 ¢ ∂F ¡ x + zj−1 + tyj ej dt, ∂xj where z0 = 0, zj = (y1 , . . . , yj , 0, . . . , 0), and {ej } is the standard basis of Rn . Now (1.9) implies F is differentiable on O, as we stated below (1.4). As is shown in many calculus texts, one can use the mean value theorem instead of the fundamental theorem of calculus, and obtain a slightly sharper result. We leave the reconstruction of this argument to the reader. We now describe two convenient notations to express higher order derivatives of a C k function f : Ω → R, where Ω ⊂ Rn is open. In one, let J be a k-tuple of integers between 1 and n; J = (j1 , . . . , jk ). We set (1.10) f (J) (x) = ∂jk · · · ∂j1 f (x), ∂j = ∂/∂xj . We set |J| = k, the total order of differentiation. As will be seen in the exercises, ∂i ∂j f = ∂j ∂i f provided f ∈ C 2 (Ω). It follows that, if f ∈ C k (Ω), then ∂jk · · · ∂j1 f = ∂`k · · · ∂`1 f whenever {`1 , . . . , `k } is a permutation of {j1 , . . . , jk }. Thus, another convenient notation to use is the following. Let α be an n-tuple of non-negative integers, α = (α1 , . . . , αn ). Then we set (1.11) f (α) (x) = ∂1α1 · · · ∂nαn f (x), |α| = α1 + · · · + αn . Note that, if |J| = |α| = k and f ∈ C k (Ω), (1.12) f (J) (x) = f (α) (x), with αi = #{` : j` = i}. 16 Correspondingly, there are two expressions for monomials in x = (x1 , . . . , xn ) : xJ = xj1 · · · xjk , (1.13) αn 1 xα = xα 1 · · · xn , and xJ = xα provided J and α are related as in (1.12). Both these notations are called ‘multi-index’ notations. We now derive Taylor’s formula with remainder for a smooth function F : Ω → R, making use of these multi-index notations. We will apply the one variable formula (0.44)-(0.45), i.e., (1.14) ϕ(t) = ϕ(0) + ϕ0 (0)t + 21 ϕ00 (0)t2 + · · · + 1 (k) (0)tk k! ϕ + rk (t), with 1 rk (t) = k! (1.15) Z t (t − s)k ϕ(k+1) (s) ds, 0 given ϕ ∈ C k+1 (I), I = (−a, a). Let us assume 0 ∈ Ω, and that the line segment from 0 to x is contained in Ω. We set ϕ(t) = F (tx), and apply (1.14)-(1.15) with t = 1. Applying the chain rule, we have 0 (1.16) ϕ (t) = n X ∂j F (tx)xj = j=1 X F (J) (tx)xJ . |J|=1 Differentiating again, we have X (1.17) ϕ00 (t) = F (J+K) (tx)xJ+K = X F (J) (tx)xJ , |J|=2 |J|=1,|K|=1 where, if |J| = k, |K| = `, we take J + K = (j1 , . . . , jk , k1 , . . . , k` ). Inductively, we have X (1.18) ϕ(k) (t) = F (J) (tx)xJ . |J|=k Hence, from (1.14) with t = 1, F (x) = F (0) + X F (J) (0)xJ + · · · + |J|=1 1 X (J) F (0)xJ + Rk (x), k! |J|=k or, more briefly, (1.19) F (x) = X |J|≤k 1 (J) F (0)xJ + Rk (x), |J|! 17 where (1.20) 1 Rk (x) = k! X ³Z |J|=k+1 1 k (1 − s) F (J) ´ (sx) ds xJ . 0 This gives Taylor’s formula with remainder for F ∈ C k+1 (Ω), in the J-multi-index notation. We also want to write the formula in the α-multi-index notation. We have X X (1.21) F (J) (tx)xJ = ν(α)F (α) (tx)xα , |J|=k |α|=k where (1.22) ν(α) = #{J : α = α(J)}, and we define the relation α = α(J) to hold provided the condition (1.12) holds, or equivalently provided xJ = xα . Thus ν(α) is uniquely defined by X (1.23) ν(α)xα = |α|=k X xJ = (x1 + · · · + xn )k . |J|=k One sees that, if |α| = k, then ν(α) is equal to the product of the number of combinations of k objects, taken α1 at a time, times the number of combinations of k − α1 objects, taken α2 at a time, · · · times the number of combinations of k − (α1 + · · · + αn−1 ) objects, taken αn at a time. Thus µ (1.24) ν(α) = k α1 ¶µ ¶ µ ¶ k − α1 k − α1 − · · · − αn−1 k! ··· = α2 αn α1 !α2 ! · · · αn ! In other words, for |α| = k, (1.25) k! , where α! = α1 ! · · · αn ! α! ν(α) = Thus the Taylor formula (1.19) can be rewritten (1.26) F (x) = X 1 F (α) (0)xα + Rk (x), α! |α|≤k where (1.27) Rk (x) = X |α|=k+1 Z ´ k + 1³ 1 (1 − s)k F (α) (sx) ds xα . α! 0 18 Exercises 1. Let Mn×n be the space of complex n × n matrices, det : Mn×n → C the determinant. Show that, if I is the identity matrix, D det(I)B = T r B, i.e., (1.28) (d/dt) det(I + tB)|t=0 = T r B. ¡ ¢ 2. If A(t) = ajk (t) is a curve in Mn×n , use the expansion of (d/dt) det A(t) as a sum of n determinants, in which the rows of A(t) are successively differentiated, to show that, for A ∈ Mn×n , ¡ (1.29) D det(A)B = T r Cof(A)t · B), where Cof(A) is the cofactor matrix of A. 3. Suppose A ∈ Mn×n is invertible. Using det(A + tB) = (det A) det(I + tA−1 B), show that D det(A)B = (det A) T r(A−1 B). Comparing the result of exercise 2, deduce Cramer’s formula: (1.30) (det A)A−1 = Cof(A)t . 4. Identify R2 and C via z = x + iy. Then multiplication by i on C corresponds to applying µ ¶ 0 −1 J= 1 0 Let O ⊂ R2 be open, f : O → R2 be C 1 . Say f = (u, v). Regard Df (x, y) as a 2 × 2 real matrix. One says f is holomorphic, or complex analytic, provided the Cauchy-Riemann equations hold: ∂u/∂x = ∂v/∂y, ∂u/∂y = −∂v/∂x. 19 Show that this is equivalent to the condition Df (x, y)J = JDf (x, y). Generalize to O open in Cm , f : O → Cn . 5. If f is C 1 on a region in R2 containing [a, b] × {y}, show that d dy Z Z b f (x, y) dx = a a b ∂f (x, y) dx. ∂y Hint. Show that the left side is equal to Z b lim h→0 a 1 h Z 0 h ∂f (x, y + s) ds dx. ∂y 6. Suppose F : O → Rm is a C 2 function. Applying the fundamental theorem of calculus, first to Gj (x) = F (x + hej ) − F (x) (as a function of h) and then to Hjk (x) = Gj (x + hek ) − Gj (x), where {ej } is the standard basis of Rn , show that, if x ∈ O and h is small, then Z h Z h F (x+hej +hek )−F (x+hek )−F (x+hej )+F (x) = 0 0 ¢ ∂ ∂F ¡ x+sej +tek ds dt. ∂xk ∂xj (Regard this simply as an iterated integral.) Similarly show that this quantity is equal to Z hZ h ¢ ∂ ∂F ¡ x + sej + tek dt ds. 0 0 ∂xj ∂xk Taking h → 0, deduce that (1.31) ∂ ∂F ∂ ∂F (x) = (x). ∂xk ∂xj ∂xj ∂xk Hint. Use problem 5. You should be able to avoid use of results on double integrals, such as established in §11. Arguments that use the mean value theorem instead of the fundamental theorem of calculus can be found in many calculus texts. 20 7. Considering the power series f (x) = f (y) + f 0 (y)(x − y) + · · · + show that f (j) (y) (x − y)j + Rj (x, y), j! 1 ∂Rj = − f (j+1) (y)(x − y)j , ∂y j! Rj (x, x) = 0. Use this to re-derive (0.45). Auxiliary exercises on determinants If Mn×n denotes the space of n × n complex matrices, we want to show that there is a map (1.32) Det : Mn×n → C which is uniquely specified as a function ϑ : Mn×n → C satisfying: (a) ϑ is linear in each column aj of A, e = −ϑ(A) if A e is obtained from A by interchanging two columns. (b) ϑ(A) (c) ϑ(I) = 1. 1. Let A = (a1 , . . . , an ), where aj are column vectors; aj = (a1j , . . . , anj )t . Show that, if (a) holds, we have the expansion Det A = (1.33) X aj1 Det (ej , a2 , . . . , an ) = · · · j = X aj1 1 · · · ajn n Det (ej1 , ej2 , . . . , ejn ), j1 ,··· ,jn where {e1 , . . . , en } is the standard basis of Cn . 2. Show that, if (b) and (c) also hold, then (1.34) Det A = X (sgn σ) aσ(1)1 aσ(2)2 · · · aσ(n)n , σ∈Sn where Sn is the set of permutations of {1, . . . , n}, and (1.35) sgn σ = Det (eσ(1) , . . . , eσ(n) ) = ±1. 21 To define sgn σ, the ‘sign’ of a permutation σ, we note that every permutation σ can be written as a product of transpositions: σ = τ1 · · · τν , where a transposition of {1, . . . , n} interchanges two elements and leaves the rest fixed. We say sgn σ = 1 if ν is even and sgn σ = −1 if ν is odd. It is necessary to show that sgn σ is independent of the choice of such a product representation. (Referring to (1.35) begs the question until we know that Det is well defined.) 3. Let σ ∈ Sn act on a function of n variables by (1.36) (σf )(x1 , . . . , xn ) = f (xσ(1) , . . . , xσ(n) ). Let P be the polynomial (1.37) P (x1 , . . . , xn ) = Y (xj − xk ). 1≤j<k≤n Show that (1.38) (σP )(x) = (sgn σ) P (x), and that this implies that sgn σ is well defined. 4. Deduce that there is a unique determinant satisfying (a)-(c), and that it is given by (1.34). 5. Show that (1.34) implies (1.39) Det A = Det At . Conclude that one can replace columns by rows in the characterization (a)-(c) of determinants. Hint. aσ(j)j = a`τ (`) with ` = σ(j), τ = σ −1 . Also, sgn σ = sgn τ. 6. Show that, if (a)-(c) hold (for rows), it follows that e = ϑ(A) if A e is obtained from A by adding cρ` to ρk , for some c ∈ C, (d) ϑ(A) where ρ1 , . . . , ρn are the rows of A. 22 Re-prove the uniqueness of ϑ satisfying (a)-(d) (for rows) by applying row operations to A until either some row vanishes or A is converted to I. 7. Show that (1.40) Det (AB) = (Det A)(Det B). Hint. For fixed B ∈ Mn×n , compare ϑ1 (A) = Det(AB) and ϑ2 (A) = (Det A)(Det B). For uniqueness, use an argument from problem 6. 8. Show that (1.41) 1 0 Det ... a12 a22 .. . ··· ··· 0 an2 ··· a1n 1 0 a2n 0 a22 = Det .. .. ... . . ann 0 an2 ··· ··· ··· 0 a2n = Det A11 .. . ann where A11 = (ajk )2≤j,k≤n . Hint. Do the first identity by the analogue of (d), for columns. Then exploit uniqueness for Det on M(n−1)×(n−1) . 9. Deduce that Det(ej , a2 , . . . , an ) = (−1)j−1 Det A1j where Akj is formed by deleting the k th column and the j th row from A. 10. Deduce from the first sum in (1.33) that (1.42) Det A = n X (−1)j−1 aj1 Det A1j . j=1 More generally, for any k ∈ {1, . . . , n}, (1.43) Det A = n X (−1)j−k ajk Det Akj . j=1 This is called an expansion of Det A by minors, down the k th column. 11. Show that (1.44) Det a11 a12 a22 ··· ··· .. . a1n a2n = a11 a22 · · · ann . .. . ann Hint. Use (1.41) and induction. 23 2. Inverse function and implicit function theorem The Inverse Function Theorem, together with its corollary the Implicit Function Theorem is a fundamental result in several variable calculus. First we state the Inverse Function Theorem. Theorem 2.1. Let F be a C k map from an open neighborhood Ω of p0 ∈ Rn to Rn , with q0 = F (p0 ). Suppose the derivative DF (p0 ) is invertible. Then there is a neighborhood U of p0 and a neighborhood V of q0 such that F : U → V is one to one and onto, and F −1 : V → U is a C k map. (One says F : U → V is a diffeomorphism.) Using the chain rule, it is easy to reduce to the case p0 = q0 = 0 and DF (p0 ) = I, the identity matrix, so we suppose this has been done. Thus (2.1) F (u) = u + R(u), R(0) = 0, DR(0) = 0. For v small, we want to solve (2.2) F (u) = v. This is equivalent to u + R(u) = v, so let (2.3) Tv (u) = v − R(u). Thus solving (2.2) is equivalent to solving (2.4) Tv (u) = u. We look for a fixed point u = K(v) = F −1 (v). Also, we want to prove that DK(0) = I, i.e., that K(v) = v+r(v) with r(v) = o(kvk). If we succeed in doing this, it follows easily that, for general x close to 0, ³ ¡ ¢´−1 DK(x) = DF K(x) , and a simple inductive argument shows that K is C k if F is C k . A tool we will use to solve (2.4) is the following general result, known as the Contraction Mapping Principle. Theorem 2.2. Let X be a complete metric space, and let T : X → X satisfy (2.5) dist(T x, T y) ≤ r dist(x, y), 24 for some r < 1. (We say T is a contraction.) Then T has a unique fixed point x. For any y0 ∈ X, T k y0 → x as k → ∞. Proof. Pick y0 ∈ X and let yk = T k y0 . Then dist(yk+1 , yk ) ≤ rk dist(y1 , y0 ), so dist(yk+m , yk ) ≤ dist(yk+m , yk+m−1 ) + · · · + dist(yk+1 , yk ) ¡ ¢ ≤ rk + · · · + rk+m−1 dist(y1 , y0 ) (2.6) ¡ ¢−1 ≤ rk 1 − r dist(y1 , y0 ). It follows that (yk ) is a Cauchy sequence, so it converges; yk → x. Since T yk = yk+1 and T is continuous, it follows that T x = x, i.e., x is a fixed point. Uniqueness of the fixed point is clear from the estimate dist(T x, T x0 ) ≤ r dist(x, x0 ), which implies dist(x, x0 ) = 0 if x and x0 are fixed points. This proves Theorem 2.2. Returning to the problem of solving (2.4), we consider (2.7) Tv : Xv −→ Xv with (2.8) Xv = {u ∈ Ω : ku − vk ≤ Av } where we set (2.9) Av = sup kR(w)k. kwk≤2kvk We claim that (2.7) holds if kvk is sufficiently small. To prove this, note that Tv (u) − v = −R(u), so we need to show that, provided kvk is small, u ∈ Xv implies kR(u)k ≤ Av . But indeed, if u ∈ Xv , then kuk ≤ kvk + Av , which is ≤ 2kvk if kvk is small, so then kR(u)k ≤ sup kR(w)k = Av ; this establishes (2.7). kwk≤2kvk Note that if kvk is small enough, the map (2.7) is a contraction map, so there exists a unique fixed point u = K(v) ∈ Xv . Note that, since u ∈ Xv , (2.10) kK(v) − vk ≤ Av = o(kvk), so the inverse function theorem is proved. Thus if DF is invertible on the domain of F, F is a local diffeomorphism, though stronger hypotheses are needed to guarantee that F is a global diffeomorphism onto its range. Here is one result along these lines. Proposition 2.3. If Ω ⊂ Rn is open and convex, F : Ω → Rn is C 1 , and the symmetric part of DF (u) is positive definite for each u ∈ Ω, then F is one to one on Ω. Proof. Suppose F (u1 ) = F (u2 ), u2 = u1 + w. Consider ϕ : [0, 1] → R given by ϕ(t) = w · F (u1 + tw). 0 Thus ϕ(0) = ϕ(1), so ϕ (t0 ) must vanish for some t0 ∈ (0, 1), by the mean value theorem. But ϕ0 (t) = w · DF (u1 + tw)w > 0, if w 6= 0, by the hypothesis on DF. This shows F is one to one. We can obtain the following Implicit Function Theorem as a consequence of the Inverse Function Theorem. 25 Theorem 2.4. Suppose U is a neighborhood of x0 ∈ Rk , V a neighborhood of z0 ∈ R` , and F : U × V −→ R` (2.11) is a C k map. Assume Dz F (x0 , z0 ) is invertible; say F (x0 , z0 ) = u0 . Then the equation F (x, z) = u0 defines z = f (x, u0 ) for x near x0 , with f a C k map. To prove this, consider H : U × V → Rk × R` defined by ¡ ¢ H(x, z) = x, F (x, z) . (2.12) We have µ (2.13) DH = I 0 Dx F Dz F ¶ Thus DH(x0 , z0 ) is invertible, so J = H −1 exists and is C k , by the Inverse Function Theorem. It is clear that J(x, u0 ) has the form ¡ ¢ J(x, u0 ) = x, f (x, u0 ) , (2.14) and f is the desired map. Exercises 1. Supose F : U → Rn is a C 2 map, p ∈ U, open in Rn , and DF (p) is invertible. With q = F (p), define a map N on a neighborhood of p by (2.15) ¡ ¢ N (x) = x + DF (x)−1 q − F (x) . Show that there exists ε > 0 and C < ∞ such that, for 0 ≤ r < ε, kx − pk ≤ r =⇒ kN (x) − pk ≤ C r2 . Conclude that, if kx1 − pk ≤ r with r < min(ε, 1/2C), then xj+1 = N (xj ) defines a sequence converging very rapidly to p. This is the basis of Newton’s method, for solving F (p) = q for p. Hint. Apply F to both sides of (2.15). 26 2. Applying Newton’s method to f (x) = 1/x, show that you get a fast approximation to division using only addition and multiplication. Hint. Carry out the calculation of N (x) in this case and notice a ‘miracle.’ 3. Identify R2n with Cn via z = x + iy, as in exercise 4 of §1. Let U ⊂ R2n be open, F : U → R2n be C 1 . Assume p ∈ U, DF (p) invertible. If F −1 : V → U is given as in Theorem 2.1, show that F −1 is holomorphic provided F is. 4. Let O ⊂ Rn be open. We say a function f ∈ C ∞ (O) is real analytic provided that, for each x0 ∈ O, we have a convergent power series expansion (2.16) f (x) = X 1 f (α) (x0 )(x − x0 )α , α! α≥0 valid in a neighborhood of x0 . Show that we can let x be complex in (2.16), and obtain an extension of f to a neighborhood of O in Cn . Show that the extended function is holomorphic, i.e., satisfies the Cauchy-Riemann equations. Remark. It can be shown that, conversely, any holomorphic function has a power series expansion. See §17. For the next exercise, assume this as known. 5. Let O ⊂ Rn be open, p ∈ O, f : O → Rn be real analytic, with Df (p) invertible. Take f −1 : V → U as in Theorem 2.1. Show f −1 is real analytic. Hint. Consider a holomorphic extension F : Ω → Cn of f and apply exercise 3. 27 3. Fundamental local existence theorem for ODE The goal of this section is to establish existence of solutions to an ODE (3.1) dy/dt = F (t, y), y(t0 ) = y0 . We will prove the following fundamental result. Theorem 3.1. Let y0 ∈ O, an open subset of Rn , I ⊂ R an interval containing t0 . Suppose F is continuous on I × O and satisfies the following Lipschitz estimate in y : (3.2) kF (t, y1 ) − F (t, y2 )k ≤ Lky1 − y2 k for t ∈ I, yj ∈ O. Then the equation (3.1) has a unique solution on some t-interval containing t0 . To begin the proof, we note that the equation (3.1) is equivalent to the integral equation Z t (3.3) y(t) = y0 + F (s, y(s)) ds. t0 Existence will be established via the Picard iteration method, which is the following. Guess y0 (t), e.g., y0 (t) = y0 . Then set Z t ¡ ¢ (3.4) yk (t) = y0 + F s, yk−1 (s) ds. t0 We aim to show that, as k → ∞, yk (t) converges to a (unique) solution of (3.3), at least for t close enough to t0 . To do this, we use the Contraction Mapping Principle, established in §2. We look for a fixed point of T, defined by Z t (3.5) (T y)(t) = y0 + F (s, y(s))ds. t0 Let (3.6) © ª X = u ∈ C(J, Rn ) : u(t0 ) = y0 , sup ku(t) − y0 k ≤ K . t∈J Here J = [t0 − ε, t0 + ε], where ε will be chosen, sufficiently small, below. K is picked so {y : ky − y0 k ≤ K} is contained in O, and we also suppose J ⊂ I. Then there exists M such that (3.7) sup s∈J,ky−y0 k≤K kF (s, y)k ≤ M. 28 Then, provided (3.8) ε ≤ K/M, we have (3.9) T : X → X. Now, using the Lipschitz hypothesis (3.2), we have, for t ∈ J, Z t k(T y)(t) − (T z)(t)k ≤ (3.10) Lky(s) − z(s)kds t0 ≤ ε L sup ky(s) − z(s)k s∈J assuming y and z belong to X. It follows that T is a contraction on X provided one has (3.11) ε < 1/L in addition to the hypotheses above. This proves Theorem 3.1. In view of the lower bound on the length of the interval J on which the existence theorem works, it is easy to show that the only way a solution can fail to be globally defined, i.e., to exist for all t ∈ I, is for y(t) to ‘explode to infinity’ by leaving every compact set K ⊂ O, as t → t1 , for some t1 ∈ I. Often one wants to deal with a higher order ODE. There is a standard method of reducing an nth order ODE (3.12) y (n) (t) = f (t, y, y 0 , . . . , y (n−1) ) to a first order system. One sets u = (u0 , . . . , un−1 ) with (3.13) u0 = y, uj = y (j) , and then (3.14) du/dt = (u1 , . . . , un−1 , f (t, u0 , . . . , un−1 )) = g(t, u). If y takes values in Rk , then u takes values in Rkn . If the system (3.1) is non-autonomous, i.e., if F explicitly depends on t, it can be converted to an autonomous system (one with no explicit t-dependence) as follows. Set z = (t, y). We then have (3.15) dz/dt = (1, dy/dt) = (1, F (z)) = G(z). 29 Sometimes this process destroys important features of the original system (3.1). For example, if (3.1) is linear, (3.15) might be nonlinear. Nevertheless, the trick of converting (3.1) to (3.15) has some uses. Many systems of ODE are difficult to solve explicitly. There is one very basic class of ODE which can be solved explicitly, in terms of integrals, namely the single first order linear ODE: (3.16) dy/dt = a(t)y + b(t), y(0) = y0 , where a(t) and b(t) are continuous real or complex valued functions. Set Z (3.17) A(t) = t a(s)ds. 0 Then (3.16) can be written as (3.18) ¡ ¢ eA(t) (d/dt) e−A(t) y = b(t). See exercise 5 below for a comment on the exponential function. From (3.18) we get Z t A(t) A(t) (3.19) y(t) = e y0 + e e−A(s) b(s)ds. 0 Compare formulas (4.42) and (5.8), in subsequent sections. Exercises 1. Solve the initial value problem dy/dt = y 2 , y(0) = a, given a ∈ R. On what t-interval is the solution defined? 2. Under the hypotheses of Theorem 3.1, if y solves (3.1) for t ∈ [T0 , T1 ], and y(t) ∈ K, compact in O, for all such t, prove that y(t) extends to a solution for t ∈ [S0 , S1 ], with S0 < T0 , T1 > T0 , as stated below (3.11). 3. Let M be a compact smooth surface in Rn . Suppose F : Rn → Rn is a smooth map (vector field), such that, for each x ∈ M, F (x) is tangent to M, i.e., the line 30 γx (t) = x + tF (x) is tangent to M at x, at t = 0. Show that, if x ∈ M, then the initial value problem dy/dt = F (y), y(0) = x has a solution for all t ∈ R, and y(t) ∈ M for all t. Hint. Locally, straighten out M to be a linear subspace of Rn , to which F is tangent. Use uniqueness. Material in §2 helps do this local straightening. Reconsider this problem after reading §7. 4. Show that the initial value problem dx dy = −x(x2 + y 2 ), = −y(x2 + y 2 ), dt dt x(0) = x0 , y(0) = y0 has a solution for all t ≥ 0, but not for all t < 0, unless (x0 , y0 ) = (0, 0). 5. Let a ∈ R. Show that the unique solution to u0 (t) = au(t), u(0) = 1 is given by (3.20) u(t) = ∞ X aj j=0 j! tj . We denote this function by u(t) = eat , the exponential function. We also write exp(t) = et . Hint. Integrate the series term by term and use the fundamental theorem of calculus. Alternative. Setting u0 (t) = 1, and using the Picard iteration method (3.4) to k P define the sequence uk (t), show that uk (t) = aj tj /j! j=0 6. Show that, for all s, t ∈ R, (3.21) ea(s+t) = eas eat . Hint. Show that u1 (t) = ea(s+t) and u2 (t) = eas eat solve the same initial value problem. 7. Show that exp : R → (0, ∞) is a diffeomorphism. We denote the inverse by log : (0, ∞) −→ R. 31 Show that v(x) = log x solves the ODE dv/dx = 1/x, v(1) = 0, and deduce that Z x (3.22) 1 8. Let a ∈ R, i = given by √ 1 dy = log x. y −1. Show that the unique solution to f 0 (t) = iaf, f (0) = 1 is (3.23) f (t) = ∞ X (ia)j j=0 j! tj . We denote this function by f (t) = eiat . Show that, for all t ∈ R, eia(s+t) = eias eiat . (3.24) 9. Write (3.25) it e = ∞ X (−1)j j=0 (2j)! Show that t 2j ∞ X (−1)j 2j+1 +i t = u(t) + iv(t). (2j + 1)! j=0 u0 (t) = −v(t), v 0 (t) = u(t). We denote these functions by u(t) = cos t, v(t) = sin t. The identity eit = cos t + i sin t (3.26) is called Euler’s formula. Auxiliary exercises on trigonometric functions We use the definitions of sin t and cos t given in problem 9 of the last exercise set. 1. Use (3.24) to derive the identities (3.27) sin(x + y) = sin x cos y + cos x sin y cos(x + y) = cos x cos y − sin x sin y. 32 2. Use (3.26)-(3.27) to show that (3.28) sin2 t + cos2 t = 1, cos2 t = ¡ 1 2 ¢ 1 + cos 2t . 3. Show that γ(t) = (cos t, sin t) is a map of R onto the unit circle S 1 ⊂ R2 with non-vanishing derivative, and, as t increases, γ(t) moves monotonically, counterclockwise. We define π to be the smallest number t1 ∈ (0, ∞) such that γ(t1 ) = (−1, 0), so cos π = −1, sin π = 0. Show that 2π is the smallest number t2 ∈ (0, ∞) such that γ(t2 ) = (1, 0), so cos 2π = 1, Show that sin 2π = 0. cos(t + 2π) = cos t, sin(t + 2π) = sin t cos(t + π) = − cos t, sin(t + π) = − sin t. Show that γ(π/2) = (0, 1), and that cos(t + 21 π) = − sin t, sin(t + 12 π) = cos t. 4. Show that sin : (− 12 π, 21 π) → (−1, 1) is a diffeomorphism. We denote its inverse by arcsin : (−1, 1) −→ (− 12 π, 12 π). Show that u(t) = arcsin t solves the ODE du 1 =√ , u(0) = 0. dt 1 − t2 ¡ ¢ Hint. Apply the chain rule to sin u(t) = t. Deduce that, for t ∈ (−1, 1), Z (3.29) t √ arcsin t = 0 5. Show that 1 e 3 πi = 1 2 + √ 3 2 i, 1 dx . 1 − x2 e 6 πi = √ 3 2 + 12 i. 33 √ ¢ ¡ 1 1 3 Hint. First compute 12 + 23 i and use problem 3. Then compute e 2 πi e− 3 πi . For intuition behind these formulas, look at Fig.3.1. 6. Show that sin 16 π = 12 , and hence that π = 6 where a0 = 1, an+1 = Z 1 2 √ 0 2n+1 2n+2 an . ∞ X an ³ 1 ´2n+1 dx = , 2n + 1 2 1 − x2 n=0 Show that π X an ³ 1 ´2n+1 4−k − . < 6 n=0 2n + 1 2 3(2k + 3) k Using a calculator, sum the series over 0 ≤ n ≤ 20, and verify that π ≈ 3.141592653589 · · · 7. For x 6= (k + 12 )π, k ∈ Z, set tan x = sin x . cos x Show that 1 + tan2 x = 1/ cos2 x. Show that w(x) = tan x satisfies the ODE dw = 1 + w2 , dx w(0) = 0. 8. Show that tan : (− 12 π, 12 π) → R is a diffeomorphism. Denote the inverse by arctan : R −→ (− 12 π, 12 π). Show that Z (3.30) arctan y = 0 y dx . 1 + x2 34 4. Constant coefficient linear systems; exponentiation of matrices Let A be an n × n matrix, real or complex. We consider the linear ODE (4.1) dy/dt = Ay; y(0) = y0 . In analogy to the scalar case, we can produce the solution in the form y(t) = etA y0 , (4.2) where we define etA = (4.3) ∞ k X t k=0 k! Ak . We will establish estimates implying the convergence of this infinite series for all real t, indeed for all complex t. Then term by term differentiation is valid, and gives (4.1). To discuss convergence of (4.3), we need the notion of the norm of a matrix. If u = (u1 , . . . , un ) belongs to Rn or to Cn , set, as in (1.5), (4.4) ¡ ¢1 kuk = |u1 |2 + · · · + |un |2 2 . Then, if A is an n × n matrix, set (4.5) kAk = sup{kAuk : kuk ≤ 1}. The norm (4.4) possesses the following properties: (4.6) kuk ≥ 0, kuk = 0 if and only if u = 0, (4.7) kcuk = |c| kuk, for real or complex c, (4.8) ku + vk ≤ kuk + kvk. The last, known as the triangle inequality, follows from Cauchy’s inequality: (4.9) |(u, v)| ≤ kuk · kvk, 35 where the inner product is (u, v) = u1 v 1 + · · · + un v n . To deduce (4.8) from (4.9), just square both sides of (4.8). To prove (4.9), use (u − v, u − v) ≥ 0 to get 2 Re (u, v) ≤ kuk2 + kvk2 . Then replace u by eiθ u to deduce 2|(u, v)| ≤ kuk2 + kvk2 . Next, replace u by tu and v by t−1 v, to get 2|(u, v)| ≤ t2 kuk2 + t−2 kvk2 , for any t > 0. Picking t so that t2 = kvk/kuk, we have Cauchy’s inequality (4.9). Granted (4.6)-(4.8), we easily get kAk ≥ 0, (4.10) kcAk = |c| kAk, kA + Bk ≤ kAk + kBk. Also, kAk = 0 if and only if A = 0. The fact that kAk is the smallest constant K such that kAuk ≤ Kkuk gives (4.11) kABk ≤ kAk · kBk. In particular, (4.12) kAk k ≤ kAkk . This makes it easy to check convergence of the power series (4.3). Power series manipulations can be used to establish the identity (4.13) esA etA = e(s+t)A . Another way to prove this is the following. Regard t as fixed; denote the left side of (4.13) as X(s) and the right side as Y (s). Then differentiation with respect to s gives, respectively (4.14) X 0 (s) = AX(s), X(0) = etA , Y 0 (s) = AY (s), Y (0) = etA , so uniqueness of solutions to the ODE implies X(s) = Y (s) for all s. We note that (4.13) is a special case of the following. 36 Proposition 4.1. et(A+B) = etA etB for all t, if and only if A and B commute. Proof. Let (4.15) Y (t) = et(A+B) , Z(t) = etA etB . Note that Y (0) = Z(0) = I, so it suffices to show that Y (t) and Z(t) satisfy the same ODE, to deduce that they coincide. Clearly Y 0 (t) = (A + B)Y (t). (4.16) Meanwhile Z 0 (t) = AetA etB + etA BetB . (4.17) Thus we get the equation (4.16) for Z(t) provided we know that etA B = BetA if AB = BA. (4.18) This folows from the power series expansion for etA , together with the fact that (4.19) Ak B = BAk for all k ≥ 0, if AB = BA. For the converse, if Y (t) = Z(t) for all t, then etA B = BetA , by (4.17), and hence, taking the t-derivative, etA AB = BAetA ; setting t = 0 gives AB = BA. If A is in diagonal form (4.20) a1 A= .. . an then clearly (4.21) etA = eta1 .. . etan The following result makes it useful to diagonalize A in order to compute etA . Proposition 4.2. If K is an invertible matrix and B = KAK −1 , then (4.22) etB = K etA K −1 . Proof. This follows from the power series expansion (4.3), given the observation that (4.23) B k = K Ak K −1 . 37 In view of (4.20)-(4.22), it is convenient to record a few standard results about eigenvalues and eigenvectors here. Let A be an n × n matrix over F, F = R or C. An eigenvector of A is a nonzero u ∈ F n such that (4.24) Au = λu for some λ ∈ F. Such an eigenvector exists if and only if A − λI : F n → F n is not invertible, i.e., if and only if (4.25) det(A − λI) = 0. Now (4.25) is a polynomial equation, so it always has a complex root. This proves the following. Proposition 4.3. Given an n × n matrix A, there exists at least one (complex) eigenvector u. Of course, if A is real and we know there is a real root of (4.25) (e.g., if n is odd) then a real eigenvector exists. One important class of matrices guaranteed to have real eigenvalues is the class of self adjoint matrices. The adjoint of an n × n complex matrix is specified by the identity (Au, v) = (u, A∗ v). Proposition 4.4. If A = A∗ , then all eigenvalues of A are real. Proof. Au = λu implies (4.26) λkuk2 = (λu, u) = (Au, u) = (u, Au) = (u, λu) = λkuk2 . Hence λ = λ, if u 6= 0. We now establish the following important result. Theorem 4.5. If A = A∗ , then there is an orthonormal basis of Cn consisting of eigenvectors of A. Proof. Let u1 be one unit eigenvector; Au1 = λu1 . Existence is guaranteed by Proposition 4.3. Let V = (u1 )⊥ be the orthogonal complement of the linear span of u1 . Then dim V is n − 1 and (4.27) A : V → V, if A = A∗ . The result follows by induction on n. Corollary 4.6. If A = At is a real symmetric matrix, then there is an orthonormal basis of Rn consisting of eigenvectors of A. Proof. By Proposition 4.4 and the remarks following Proposition 4.3, there is one unit eigenvector u1 ∈ Rn . The rest of the proof is as above. 38 The proof of the last four results rests on the fact that every nonconstant polynomial has a complex root. This is the Fundamental Theorem of Algebra. Two proofs of this result are given in §17 and §20. An alternative approach to Proposition 4.3 when A = A∗ , yielding Proposition 4.4-Corollary 4.6, is given in one of the exercises below. Given an ODE in upper triangular form, a11 ∗ ∗ .. (4.28) dy/dt = . ∗ y ann you can solve the last ODE for yn , as it is just dyn /dt = ann yn . Then you get a single inhomogeneous ODE for yn−1 , which can be solved as demonstrated in (0.6)(0.9), and you can continue inductively to solve. Thus it is often useful to be able to put an n × n matrix A in upper triangular form, with respect to a convenient choice of basis. We will establish two results along these lines. The first is due to Schur. Theorem 4.7. For any n × n matrix A, there is an orthonormal basis u1 , . . . , un of Cn with respect to which A is in upper triangular form. This result is equivalent to: Proposition 4.8. For any A, there is a sequence of vector spaces Vj of dimension j, contained in Cn , with (4.29) Vn ⊃ Vn−1 ⊃ · · · ⊃ V1 and (4.30) A : Vj −→ Vj . To see the equivalence, if we are granted (4.29)-(4.30), pick un ⊥Vn−1 , a unit vector, then pick un−1 ∈ Vn−1 such that un−1 ⊥Vn−2 , and so forth. Meanwhile, Proposition 4.8 is a simple inductive consequence of the following result. Lemma 4.9. For any matrix A acting on Vn , there is a linear subspace Vn−1 , of codimension 1, such that A : Vn−1 → Vn−1 . Proof. Use Proposition 4.3, applied to A∗ . There is a vector v1 such that A∗ v1 = λv1 . Let Vn−1 = (v1 )⊥ . This completes the proof of the Lemma, hence of Theorem 4.7. Let’s look more closely at what you can say about solutions to an ODE which has been put in the form (4.28). As mentioned, we can obtain yj inductively by solving nonhomogeneous scalar ODEs (4.31) dyj /dt = ajj yj + bj (t) 39 where bj (t) is a linear combination of yj+1 (t), . . . , yn (t), and the formula (0.9) applies, with A(t) = ajj t. We have yn (t) = Ceann t , so bn−1 (t) is a multiple of eann t . If an−1,n−1 6= ann , yn−1 (t) will be a linear combination of eann t and ean−1,n−1 t , but if an−n,n−1 = ann , yn−1 (t)R may be a linear combination of eann t and teann t . Further integration will involve p(t)eαt dt where p(t) is a polynomial. That no other sort of function will arise is guaranteed by the following result. Lemma 4.10. If p(t) ∈ Pn , the space of polynomials of degree ≤ n, and α 6= 0, then Z (4.32) p(t)eαt dt = q(t)eαt + C for some q(t) ∈ Pn . Proof. The map p = T q defined by (d/dt)(q(t)eαt ) = p(t)eαt is a map on Pn ; in fact we have T q(t) = αq(t) + q 0 (t). (4.33) It suffices to show T : Pn → Pn is invertible. But D = d/dt is nilpotent on Pn ; Dn+1 = 0. Hence ¡ ¢ T −1 = α−1 (I + α−1 D)−1 = α−1 I − α−1 D + · · · + α−n (−D)n . Note that this gives a neat formula for the integral (4.32). For example, Z (4.34) tn e−t dt = −(tn + ntn−1 + · · · + n!)e−t + C ³ 1 1 ´ = −n! 1 + t + t2 + · · · + tn e−t + C. 2 n! This could also be established by integration by parts and induction. Of course, when α = 0 in (4.32) the result is different; q(t) is a polynomial of degree n + 1. Now the implication for the solution to (4.28) is that all the components of y(t) are products of polynomials and exponentials. By Theorem 4.7, we can draw the same conclusion about the solution to dy/dt = Ay for any n × n matrix A. We can formally state the result as follows. Proposition 4.11. For any n × n matrix A, (4.35) etA v = X eλj t vj (t), where {λj } is the set of eigenvalues of A and vj (t) are Cn -valued polynomials. All the vj (t) are constant when A is diagonalizable. 40 To see that the λj are the eigenvalues of A, note that, in the upper triangular case, only the exponentials eajj t arise, and in that case the eigenvalues are precisely the diagonal elements. If we let Eλ denote the space of Cn -valued functions of the form V (t) = eλt v(t), where v(t) is a Cn -valued polynomial, then Eλ is invariant under the action of both d/dt and A, hence of d/dt − A. Hence, if a sum V1 (t) + · · · + Vk (t), Vj (t) ∈ Eλj (with λj s distinct) is annihilated by d/dt − A, so is each term in this sum. Therefore, if (4.35) is a sum over the distinct eigenvalues λj of A, it follows that each term eλj t vj (t) is annihilated by d/dt − A, or equivalently is of the form etA wj where wj = vj (0). This leads to the following conclusion. Set (4.36) Gλ = {v ∈ Cn : etA v = etλ v(t), v(t) polynomial}. Then Cn has a direct sum decomposition (4.37) Cn = Gλ1 + · · · + Gλk where λ1 , . . . , λk are the distinct eigenvalues of A. Furthermore, each Gλj is invariant under A, and (4.38) Aj = A|Gλj has exactly one eigenvalue, λj . This last statement holds because, when v ∈ Gλj , etA v involves only the exponential eλj t . We say Gλj is the generalized eigenspace of A, with eigenvalue λj . Of course, Gλj contains ker (A − λj I). Now Bj = Aj − λj I has only 0 as an eigenvalue. It is subject to the following result. Lemma 4.12. If B : Ck → Ck has only 0 as an eigenvalue, then B is nilpotent, i.e., (4.39) B m = 0 for some m. Proof. Let Wj = B j (Ck ); then Ck ⊃ W1 ⊃ W2 ⊃ · · · is a sequence of finite dimensional vector spaces, each invariant under B. This sequence must stabilize, so for some m, B : Wm → Wm bijectively. If Wm 6= 0, B has a nonzero eigenvalue. We next discuss the famous Jordan normal form of a complex n × n matrix. The result is the following. Theorem 4.13. If A is an n × n matrix, then there is a basis of Cn with respect to which A becomes a direct sum of blocks of the form λj 1 .. . λ j (4.40) . . . 1 λj 41 In light of the decomposition (4.37) and Lemma 4.12, it suffices to establish the Jordan normal form for a nilpotent matrix B. Given v0 ∈ Ck , let m be the smallest integer such that B m v0 = 0; m ≤ k. If m = k, then {v0 , Bv0 , . . . , B m−1 v0 } gives a basis of Ck putting B in Jordan normal form. We then say v0 is a cyclic vector for B, and Ck is generated by v0 . We call {v0 , . . . , B m−1 v0 } a string. We will have a Jordan normal form precisely if we can write Ck as a direct sum of cyclic subspaces. We establish that this can be done by induction on the dimension. Thus, inductively, we can suppose W1 = B(Ck ) is a direct sum of cyclic subspaces, so W1 has a basis that is a union of strings, let’s say a union of d strings {vj , Bvj , . . . , B `j vj }, 1 ≤ j ≤ d. In this case, ker B ∩ W1 = N1 has dimension d, and the vectors B `j vj , 1 ≤ j ≤ d, span N1 . Furthermore, each vj has the form vj = Bwj for some wj ∈ Ck . Now dim ker B = k −r ≥ d, where r = dim W1 . Let {z1 , . . . , zk−r−d } span a subspace of ker B complementary to N1 . Then the strings {wj , vj = Bwj , . . . , B `j vj }, 1 ≤ j ≤ d, and {z1 }, . . . , {zk−r−d } generate cyclic subspaces whose direct sum is Ck , giving the Jordan normal form. The argument above is part of an argument of Filippov. In fact, Filippov’s proof contains a further clever twist, enabling one to prove Theorem 4.13 without using the decomposition (4.37). However, since we got this decomposition almost for free as a byproduct of the ODE analysis in Proposition 4.11, I decided to make use of it. See Strang [Str] for Filippov’s proof. We have seen how constructing etA solves the equation (4.1). We can also use it to solve an inhomogeneous equation, of the form (4.41) dy/dt = Ay + b(t); y(0) = y0 . Direct calculation shows that the solution is given by Z (4.42) t tA y(t) = e y0 + e(t−s)A b(s) ds. 0 Note how this partially generalizes the formula (3.19). This formula is a special case of Duhamel’s Principle, which will be discussed further in §5. Exercises 1. In addition to the ‘operator norm’ kAk of an n × n matrix, defined by (4.5), we consider the ‘Hilbert-Schmidt norm’ kAkHS , defined by kAk2HS = X j,k |ajk |2 , 42 ¡ ¢ if A = ajk . Show that kAk ≤ kAkHS . Hint. If r1 , . . . , rn are the rows of A, then for u ∈ Cn , Au has entries rj · u, 1 ≤ j ≤ n. Use Cauchy’s inequality (4.9) to estimate |rj · u|2 . Show also that X |ajk |2 ≤ kAk2 for each k, j and hence kAk2HS ≤ nkAk2 . Hint. kAk ≥ kAek k for each standard basis vector ek . 2. Show that, in analogy with (4.11), we have kABkHS ≤ kAkHS kBkHS . Indeed, show that kABkHS ≤ kAk · kBkHS , the first factor on the right being the operator norm kAk. 3. Let X be an n × n matrix. Show that det eX = eT r X . Hint. Use a normal form. In problems 4-5 we will deal with the set SO(n) of real n × n matrices T such that T t T = I and Det T > 0 (hence Det T = 1). Also, let Skew(n) denote the set of real n × n skew-symmetric matrices, and let M (n) denote the set of real n × n real matrices. Set Exp(X) = eX . 4. Show that Exp : Skew(n) −→ SO(n). Show that this map is onto. 43 5. Show that the map Exp : Skew(n) → M (n) satisfies D Exp(0)A = A, for A ∈ Skew(n), so D Exp(0) is the inclusion ι : Skew(n) ,→ M (n). n−1 6. Let A : Rn → Rn be symmetric. = {x ∈ ¯ Let Q(x) = (Ax, x). Let v1 ∈ S n R : |x| = 1} be a point where Q¯S n−1 assumes a maximum. Show that v1 is an eigenvector of A. Hint. Show that ∇Q(v1 ) is parallel to ∇E(v1 ), where E(x) = (x, x). Use this result to give an alternative proof of Corollary 4.6. Extend this argument to establish Theorem 4.5. 44 5. Variable coefficient linear systems of ODE. Duhamel’s principle Let A(t) be a continuous n × n matrix valued function of t ∈ I. We consider the general linear homogeneous ODE (5.1) dy/dt = A(t)y, y(0) = y0 . The general theory of §2 gives local solutions. We claim the solutions here exist for all t ∈ I. This follows from the discussion after the proof of Theorem 2.1, together with the following estimate on the solution to (5.1). Proposition 5.1. If kA(t)k ≤ M for t ∈ I, then the solution to (5.1) satisfies (5.2) ky(t)k ≤ eM |t| ky0 k. It suffices to prove this for t ≥ 0. Then z(t) = e−M t y(t) satisfies (5.3) dz/dt = C(t)z, z(0) = y0 , with C(t) = A(t) − M. Hence C(t) satisfies (5.4) Re (C(t)u, u) ≤ 0 for all u ∈ Cn . Thus (5.2) is a consequence of the next energy estimate, of independent interest. Proposition 5.2. If z solves (5.3) and if (5.4) holds for C(t), then kz(t)k ≤ kz(0)k for t ≥ 0. Proof. We have (d/dt)kz(t)k2 = (z 0 (t), z(t)) + (z(t), z 0 (t)) (5.8) = 2 Re (C(t)z(t), z(t)) ≤ 0. Thus we have global existence for (5.1). There is a matrix valued function S(t, s) such that the unique solution to (5.1) satisfies (5.6) y(t) = S(t, s)y(s). Using this solution operator, we can treat the inhomogeneous equation (5.7) dy/dt = A(t)y + b(t), Indeed, direct calculation yields (5.8) Z y(t) = S(t, 0)y0 + y(0) = y0 . t S(t, s)b(s)ds. 0 This identity is known as Duhamel’s Principle. We next prove an identity which might be called the ‘noncommutative fundamental theorem of calculus.’ 45 Proposition 5.3. If A(t) is a continuous matrix function and S(t, 0) is defined as above, then (5.9) S(t, 0) = lim e(t/n)A((n−1)t/n) · · · e(t/n)A(0) , n→∞ where there are n factors on the right. Proof. To prove this at t = T, divide the interval [0, T ] into n equal parts. Set y = S(t, 0)y0 and define zn (t) by zn (0) = y0 , and (5.10) ¡ ¢ dzn /dt = A(jT /n)zn for t ∈ jT /n, (j + 1)T /n , requiring continuity across each endpoint of these intervals. We see that (5.11) dzn /dt = A(t)zn + Rn (t) with (5.12) kRn (t)k ≤ δn kzn (t)k, δn → 0 as n → ∞. Meanwhile we see that, on [0, T ], kzn (t)k ≤ CT ky0 k. We want to compare zn (t) and y(t). We have (5.13) (d/dt)(zn − y) = A(t)(zn − y) + Rn (t); zn (0) − y(0) = 0. Hence Duhamel’s principle gives Z (5.14) t zn (t) − y(t) = S(t, s)Rn (s)ds, 0 and since we have an a priori bound kS(t, s)k ≤ K for |s|, |t| ≤ T, we get (5.15) kzn (t) − y(t)k ≤ KT CT δn ky0 k → 0 as n → ∞, |t| ≤ T. In particular, zn (T ) → y(T ) as n → ∞. Since zn (T ) is given by the right side of (5.9) with t = T, this proves (5.9). 46 Exercises. 1. Let A(t) and X(t) be n × n matrices, satisfying dX/dt = A(t)X. We form the Wronskian W (t) = det X(t). Show that W satisfies the ODE dW/dt = a(t)W, a(t) = T r A(t). Hint. Use exercise 2 of §1 to write dW/dt = T r(Cof(X)t dX/dt), and use Cramer’s formula, (det X)X −1 = Cof(X)t . Alternative. Write X(t + h) = ehA(t) X(t) + O(h2 ) and use exercise 3 of §4 to write det ehA(t) = eha(t) , hence W (t + h) = eha(t) W (t) + O(h2 ). 2. Let u(t) = ky(t)k2 , for a solution y to (5.1). Show that (5.16) provided kA(t)k ≤ equality du/dt ≤ M (t)u(t), 1 2 M (t). Such a differential inequality implies the integral inZ (5.17) u(t) ≤ A + t M (s)u(s)ds, t ≥ 0, 0 with A = u(0). The following is a Gronwall inequality, namely if (5.17) holds for a real valued function u, then, provided M (s) ≥ 0, we have, for t ≥ 0, Z t N (t) (5.18) u(t) ≤ Ae , N (t) = M (s)ds. 0 Prove this. Note that the quantity dominating u(t) in (5.18) is equal to U, solving U (0) = A, dU/dt = M (t)U (t). 3. Generalize the Gronwall inequality of exercise 2 as follows. Let U be a real valued solution to (5.19) dU/dt = F (t, U ), U (0) = A, and let u satisfy the integral inequality Z t (5.20) u(t) ≤ A + F (s, u(s))ds. 0 47 Then prove that (5.21) u(t) ≤ U (t), for t ≥ 0, provided ∂F/∂u ≥ 0. Show that this continues to hold if we replace (5.19) by Z t (5.19’) U (t) ≥ A + F (s, U (s))ds. 0 Hint. Set v = u − U. Then (5.19’)-(5.20) imply Z t £ ¤ v(t) ≤ F (s, u(s)) − F (s, U (s)) ds 0 Z t = M (s)v(s)ds 0 where Z 1 M (s) = ¡ ¢ Fu s, τ u(s) + (1 − τ )U (s) dτ. 0 Thus (5.17) applies, with A = 0. 4. Let x(t) be a smooth curve in R3 ; assume it is parametrized by arclength, so T (t) = x0 (t) has unit length; T (t) · T (t) = 1. Differentiating, we have T 0 (t)⊥T (t). The curvature is defined to be κ(t) = kT 0 (t)k. If κ(t) 6= 0, we set N (t) = T 0 /kT 0 k, so T 0 = κN, and N is a unit vector orthogonal to T. We define B(t) by (5.22) B = T × N. Note that (T, N, B) form an orthonormal basis of R3 for each t, and (5.23) T = N × B, N = B × T. By (5.22) we have B 0 = T × N 0 . Deduce that B 0 is orthogonal to both T and B, hence parallel to N. We set B 0 = −τ N, for smooth τ (t), called the torsion. 5. From N 0 = B 0 × T + B × T 0 and the formulas for T 0 and B 0 above, deduce the following system, called the Frenet-Serret formula: T0 = (5.24) κN N 0 = −κT 0 B = + τB − τN 48 Form the 3 × 3 matrix (5.25) 0 A(t) = κ 0 −κ 0 τ 0 −τ 0 and deduce that the 3 × 3 matrix F (t) whose columns are T, N, B : F = (T, N, B) satisfies the ODE dF/dt = F A(t). 6. Derive the following converse to the Frenet-Serret formula. Let T (0), N (0), B(0) be an orthonormal set in R3 , such that B(0) = T (0) × N (0), let κ(t) and τ (t) be given smooth functions, and solve the system (5.24). Show that there is a unique curve x(t) such that x(0) = 0 and T (t), N (t), B(t) are associated to x(t) by the construction in problem 4, so in particular the curve has curvature κ(t) and torsion τ (t). Hint. To prove that (5.22)-(5.23) hold for all t, consider the next exercise. 7. Let A(t) be a smooth n × n real matrix function which is skew adjoint for all t (of which (5.25) is an example). Suppose F (t) is a real n × n matrix function satisfying dF/dt = F A(t). If F (0) is an orthogonal matrix, show that F (t) is orthogonal for all t. Hint. Set J(t) = F (t)∗ F (t). Show that J(t) and J0 (t) = I both solve the initial value problem dJ/dt = [J, A(t)], J(0) = I. 8. Let U1 = T, U2 = N, U3 = B, and set ω(t) = τ T + κB. Show that (5.24) is equivalent to Uj0 = ω × Uj , 1 ≤ j ≤ 3. 9. Suppose τ and κ are constant. Show that ω is constant, so T (t) satisfies the constant coefficient ODE T 0 (t) = ω × T (t). Note that ω · T (0) = τ. Show that, after a translation and rotation, x(t) takes the form ¡ ¢ γ(t) = λκ2 cos λt, λκ2 sin λt, λτ t , λ2 = κ2 + τ 2 . 49 Auxiliary exercises on the cross product 1. If u, v ∈ R3 , show that the formula (5.26) w1 w · (u × v) = Det w2 w3 v1 v2 v3 u1 u2 u3 for u × v = κ(u, v) defines uniquely a bilinear map κ : R3 × R3 → R3 . Show that it satisfies i × j = k, j × k = i, k × i = j, where {i, j, k} is the standard basis of R3 . 2. We say T ∈ SO(3) provided that T is a real 3 × 3 matrix satisfying T t T = I and Det T > 0, (hence Det T = 1). Show that (5.27) T ∈ SO(3) =⇒ T u × T v = T (u × v). Hint. Multiply the 3 × 3 matrix in problem 1 on the left by T. 3. Show that, if θ is the angle between u and v in R3 , then (5.28) |u × v| = |u| |v| | sin θ|. Hint. Check this for u = i, v = ai + bj, and use problem 2 to show this suffices. 4. Show that κ : R3 → Skew(3), given by 0 (5.29) κ(y1 , y2 , y3 ) = y3 −y2 −y3 0 y1 y2 −y1 0 satisfies (5.30) Kx = y × x, K = κ(y). Show that, with [A, B] = AB − BA, (5.31) £ ¤ κ(x × y) = κ(x), κ(y) , ¡ ¢ Tr κ(x)κ(y)t = 2x · y. 50 6. Dependence of solutions on initial data, and other parameters We consider how a solution to an ODE depends on the initial conditions. Consider a nonlinear system (6.1) dy/dt = F (y), y(0) = x. As noted in the introduction, we can consider an autonomous system, such as (6.1), without loss of generality. Suppose F : U → Rn is smooth, U ⊂ Rn open; for simplicity we assume U is convex. Say y = y(t, x). We want to examine smoothness in x. Note that formally differentiating (6.1) with respect to x suggests that W = Dx y(t, x) satisfies an ODE called the linearization of (6.1): (6.2) dW/dt = DF (y)W, W (0) = I. In other words, w(t, x) = Dx y(t, x)w0 satisfies (6.3) dw/dt = DF (y)w, w(0) = w0 . To justify this, we want to compare w(t) and (6.4) z(t) = y1 (t) − y(t) = y(t, x + w0 ) − y(t, x). It would be convenient to show that z satisfies an ODE similar to (6.3). Indeed, z(t) satisfies dz/dt = F (y1 ) − F (y) = Φ(y1 , y)z (6.5) z(0) = w0 , where Z (6.6) Φ(y1 , y) = 1 ¡ ¢ DF τ y1 + (1 − τ )y dτ. 0 If we assume (6.7) kDF (u)k ≤ M for u ∈ U, 51 then the solution operator S(t, 0) of the linear ODE d/dt − B(t), with B(y) = Φ(y1 (t), y(t)), satisfies a bound kS(t, 0)k ≤ e|t|M as long as y(t), y1 (t) ∈ U. Hence ky1 (t) − y(t)k ≤ e|t|M kw0 k. (6.8) This establishes that y(t, x) is Lipschitz in x. To continue, since Φ(y, y) = DF (y), we rewrite (6.5) as dz/dt = Φ(y + z, y)z = DF (y)z + R(y, z) (6.9) w(0) = w0 . where (6.10) F ∈ C 1 (U ) =⇒ kR(y, z)k = o(kzk) = o(kw0 k). Now comparing the ODE (6.9) with (6.3), we have (6.11) (d/dt)(z − w) = DF (y)(z − w) + R(y, z) (z − w)(0) = 0. Then Duhamel’s principle yields Z (6.12) z(t) − w(t) = t ¡ ¢ S(t, s)R y(s), z(s) ds, 0 so by the bound kS(t, s)k ≤ e|t−s|M and (6.10) we have (6.13) z(t) − w(t) = o(kw0 k). This is precisely what is required to show that y(t, x) is differentiable with respect to x, with derivative W = Dx y(t, x) satisfying (6.2). We state our first result. Proposition 6.1. If F ∈ C 1 (U ), and if solutions to (6.1) exist for t ∈ (−T0 , T1 ), then for each such t, y(t, x) is C 1 in x, with derivative Dx y(t, x) = W (t, x) satisfying (6.2). So far we have shown that y(t, x) is both Lipschitz and differentiable in x, but the continuity of W (t, x) in x follows easily by comparing the ODEs of the form (6.2) for W (t, x) and W (t, x + w0 ), in the spirit of the analysis of (6.11). If F possesses further smoothness, we can obtain higher differentiability of y(t, x) in x by the following trick. Couple (6.1) and (6.2), to get an ODE for (y, W ) : (6.14) dy/dt = F (y) dW/dt = DF (y)W with initial condition (6.15) y(0) = x, W (0) = I. We can reiterate the argument above, getting results on Dx (y, W ), i.e., on Dx2 y(t, x), and continue, proving: 52 Proposition 6.2. If F ∈ C k (U ), then y(t, x) is C k in x. Similarly we can consider dependence of the solution to a system of the form (6.16) dy/dt = F (τ, y), y(0) = x on a parameter τ, assuming F is smooth jointly in τ, y. This result can be deduced from the previous one by the following trick: consider the ODE (6.17) dy/dt = F (z, y), dz/dt = 0; y(0) = x, z(0) = τ. Thus we get smoothness of y(t, τ, x) in (τ, x). As one special case, let F (τ, y) = τ F (y). In this case y(t0 , τ, x) = y(τ t0 , 1, x), so we can improve the conclusion of Proposition 6.2 to: (6.18) F ∈ C k (U ) =⇒ y ∈ C k jointly in (t, x). It is also true that if F is analytic, then one has analytic dependence of solutions on parameters, especially on t, so that power series techniques work in that case. One approach to the proof of this is given in exercises below. Exercises 1. Let Ω be open in R2n , identified with Cn , via z = x + iy. Let X : Ω → R2n have components X = (a1 , . . . , an , b1 , . . . , bn ), where aj (x, y) and bj (x, y) are real valued. Denote the solution to du/dt = X(u), u(0) = z by u(t, z). Assume fj (z) = aj (z) + ibj (z) is holomorphic in z, which is to say its derivative commutes with J, acting on R2k = Ck as multiplication by i. Show that, for each t, u(t, z) is holomorphic in z, i.e., Dz u(t, z) commutes with J. Hint. Use the linearized equation (6.2) to show that K(t) = [W (t), J] satisfies the ODE dK/dt = DX(z)K, K(0) = 0. 2. If O ⊂ Rn is open and F : O → Rn is real analytic, show that the solution y(t, x) to (6.1) is real analytic in x. Hint. With F = (a1 , . . . , an ), take holomorphic extensions fj (z) of aj (x) and use exercise 1. Using the trick leading to (6.18), show that y(t, x) is real analytic jointly in (t, x). 53 7. Flows and vector fields Let U ⊂ Rn be open. A vector field on U is a smooth map X : U −→ Rn . (7.1) Consider the corresponding ODE (7.2) dy/dt = X(y), y(0) = x, with x ∈ U. A curve y(t) solving (7.2) is called an integral curve of the vector field X. It is also called an orbit. For fixed t, write (7.3) t (x). y = y(t, x) = FX t The locally defined FX , mapping (a subdomain of) U to U, is called the flow generated by the vector field X. The vector field X defines a differential operator on scalar functions, as follows: £ ¤ h x) − f (x) LX f (x) = lim h−1 f (FX h→0 (7.4) ¯ t = (d/dt)f (FX x)¯t=0 . We also use the common notation (7.5) LX f (x) = Xf, that is, we apply X to f as a first order differential operator. Note that, if we apply the chain rule to (7.4) and use (7.2), we have (7.6) LX f (x) = X(x) · ∇f (x) X = aj (x)∂f /∂xj , P if X = aj (x)ej , with {ej } the standard basis of Rn . In particular, using the notation (7.5), we have (7.7) aj (x) = Xxj . In the notation (7.5), (7.8) X= X aj (x)∂/∂xj . We note that X is a derivation, i.e., a map on C ∞ (U ), linear over R, satisfying (7.9) X(f g) = (Xf )g + f (Xg). Conversely, any derivation on C ∞ (U ) defines a vector field, i.e., has the form (7.8), as we now show. 54 Proposition 7.1. If X is a derivation on C ∞ (U ), then X has the form (7.8). P Proof. Set aj (x) = Xxj , X # = aj (x)∂/∂xj , and Y = X − X # . Then Y is a derivation satisfying Y xj = 0 for each j; we aim to show that Y f = 0 for all f. Note that, whenever Y is a derivation 1 · 1 = 1 ⇒ Y · 1 = 2Y · 1 ⇒ Y · 1 = 0, i.e., Y annihilates constants. Thus in this case Y annihilates all polynomials of degree ≤ 1. Now we show Y f (p) = 0 for all p ∈ U. Without loss of generality, we can suppose R1 p = 0, the origin. Then, by (1.8), we can take bj (x) = 0 (∂j f )(tx)dt, and write X f (x) = f (0) + bj (x)xj . It immediately follows that Y f vanishes at 0, so the proposition is proved. A fundamental fact about vector fields is that they can be ‘straightened out’ near points where they do not vanish. To see this, suppose a smooth vector field X is given on U such that, for a certain p ∈ U, X(p) 6= 0. Then near p there is a hypersurface M which is nowhere tangent to X. We can choose coordinates near p so that p is the origin and M is given by {xn = 0}. Thus we can identify a point x0 ∈ Rn−1 near the origin with x0 ∈ M. We can define a map (7.10) F : M × (−t0 , t0 ) −→ U by (7.11) t F(x0 , t) = FX (x0 ). This is C ∞ and has surjective derivative, so by the Inverse Function Theorem is a local diffeomorphism. This defines a new coordinate system near p, in whch the flow generated by X has the form (7.12) s FX (x0 , t) = (x0 , t + s). If we denote the new coordinates by (u1 , . . . , un ), we see that the following result is established. Theorem 7.2. If X is a smooth vector field on U with X(p) 6= 0, then there exists a coordinate system (u1 , . . . , un ) centered at p (so uj (p) = 0) with respect to which (7.13) X = ∂/∂un . We now make some elementary comments on vector fields in the plane. Here, the object is to find the integral curves of (7.14) f (x, y)∂/∂x + g(x, y)∂/∂y, 55 that is, to solve (7.15) x0 = f (x, y), y 0 = g(x, y). This implies (7.16) dy/dx = g(x, y)/f (x, y), or, written in differential form notation (which will be discussed more thoroughly in §13), (7.17) g(x, y)dx − f (x, y)dy = 0. Suppose we manage to find an explicit solution to (7.16): (7.18) y = ϕ(x), x = ψ(y). Often, it is not feasible to do so, but beginning chapters of elementary texts on ODE often give methods for doing so in some cases. Then the original system becomes (7.19) x0 = f (x, ϕ(x)), y 0 = g(ψ(y), y). In other words, we have reduced ourselves to integrating vector fields on the line. We have Z £ ¤−1 f (x, ϕ(x)) dx = t + C1 , Z (7.20) £ ¤−1 g(ψ(y), y) dy = t + C2 . If (7.18) can be explicitly achieved, it may be that one integral or the other in (7.20) is easier to evaluate. With either x or y solved, as a function of t, the other is determined by (7.18). One case when the planar vector field can be integrated explicitly (locally) is when there is a smooth u, with nonvanishing gradient, explicitly given, such that (7.21) Xu = 0 where X is the vector field (7.14). One says u is a conserved quantity. In such a case, let w be any smooth function such that (u, w) form a local coordinate system. In this coordinate system, (7.22) X = b(u, w)∂/∂w 56 by (7.7), so (7.23) Xv = 1 with Z (7.24) w v(u, w) = b(u, s)−1 ds, w0 and the local coordinate system (u, v) linearizes X. Exercises 1. Suppose h(x, y) is homogeneous of degree 0, i.e., h(rx, ry) = h(x, y), so h(x, y) = k(x/y). Show that the ODE dy/dx = h(x, y) is changed to a separable ODE for u = u(x), if u = y/x. 2. Using exercise 1, discuss constructing the integral curves of a vector field X = f (x, y)∂/∂x + g(x, y)∂/∂y when f (x, y) and g(x, y) are homogeneous of degree a, i.e., f (rx, ry) = ra f (x, y) for r > 0, and similarly for g. 3. Describe the integral curves of (x2 + y 2 )∂/∂x + xy∂/∂y. 4. Describe the integral curves of A(x, y)∂/∂x + B(x, y)∂/∂y when A(x, y) = a1 x + a2 y + a3 , B(x, y) = b1 x + b2 y + b3 . 57 5. Let X = f (x, y)(∂/∂x) + g(x, y)(∂/∂y) be a vector field on a disc Ω ⊂ R2 . Suppose div X = 0, i.e., ∂f /∂x + ∂g/∂y = 0. Show that a function u(x, y) such that ∂u/∂x = g, ∂u/∂y = −f is given by a line integral. Show that Xu = 0 and hence integrate X. Reconsider this problem after reading §14. 6. Find the integral curves of the vector field X = (2xy + y 2 + 1) 7. Show that ∂ ∂ + (x2 + 1 − y 2 ) . ∂x ∂y div(ev X) = ev (div X + Xv). Hence, if X is a vector field on Ω ⊂ R2 as in exercise 5, show that you can integrate X if you can construct a function v(x, y) such that Xv = −div X. Construct such v if either (div X)/f (x, y) = ϕ(x) or (div X)/g(x, y) = ψ(y). 8. Find the integral curves of the vector field X = 2xy ∂ ∂ + (x2 + y 2 − 1) . ∂x ∂y Let X be a vector field on Rn , with a critical point at 0, i.e., X(0) = 0. Suppose that, for x ∈ Rn near 0, (7.25) X(x) = Ax + R(x), kR(x)k = O(kxk2 ), where A is an n × n matrix. We call Ax the linearization of X at 0. 9. Suppose all the eigenvalues of A have negative real¡ part. Construct a qua¢ dratic polynomial Q : Rn → [0, ∞), such that Q(0) = 0, ∂ 2 Q/∂xj ∂xk is positive definite, and such that, for any integral curve x(t) of X as in (7.25), (d/dt)Q(x(t)) < 0 if t ≥ 0 58 provided x(0) = x0 (6= 0) is close enough to 0. Deduce that, for small enough C, if kx0 k ≤ C, then x(t) exists for all t ≥ 0 and x(y) → 0 as t → ∞. Hint. Take Q(x) = hx, xi, using exercise 10 below. 10. Let A be an n × n matrix, all of whose eigenvalues λj have negative real part. Show there exists a Hermitian inner product h, i on Cn such that Re hAu, ui < 0 for nonzero u ∈ Cn . Hint. Put A in Jordan normal form, but with εs instead of 1s above the diagonal, where ε is small compared with |Re λj |. 59 8. Lie brackets If F : V → W is a diffeomorphism between two open domains in Rn , and Y is a vector field on W, we define a vector field F# Y on V so that (8.1) FFt # Y = F −1 ◦ FYt ◦ F, or equivalently, by the chain rule, (8.2) ¡ ¢¡ ¢ ¡ ¢ F# Y (x) = DF −1 F (x) Y F (x) . In particular, if U ⊂ Rn is open and X is a vector field on U, defining a flow F t , t then for a vector field Y, F# Y is defined on most of U, for |t| small, and we can define the Lie derivative: ¡ h ¢ LX Y = lim h−1 F# Y −Y h→0 (8.3) ¯ t = (d/dt)F# Y ¯t=0 , as a vector field on U. Another natural construction is the operator-theoretic bracket: (8.4) [X, Y ] = XY − Y X, where the vector fields X and Y are regarded as first order differential operators on C ∞ (U ). One verifies that (8.4) defines a derivation on C ∞ (U ), hence a vector field on U. The basic elementary fact about the Lie bracket is the following. Theorem 8.1. If X and Y are smooth vector fields, then (8.5) LX Y = [X, Y ]. Proof. Let us first verify the identity in the special case X X = ∂/∂x1 , Y = bj (x)∂/∂xj . P P t Then F# Y = bj (x + te1 )∂/∂xj . Hence, in this case LX Y = (∂bj /∂x1 )∂/∂xj , and a straightforward calculation shows this is also the formula for [X, Y ], in this case. Now we verify (8.5) in general, at any point x0 ∈ U. First, if X is nonvanishing at x0 , we can choose a local coordinate system so the example above gives the identity. By continuity, we get the identity (8.5) on the closure of the set of points x0 where X(x0 ) 6= 0. Finally, if x0 has a neighborhood where X = 0, clearly LX Y = 0 and [X, Y ] = 0 at x0 . This completes the proof. 60 Corollary 8.2. If X and Y are smooth vector fields on U, then (8.6) t t (d/dt)FX# Y = FX# [X, Y ] for all t. t+s t+s s t Proof. Since locally FX = FX FX , we have the same identity for FX# , which yields (8.6) upon taking the s-derivative. We make some further comments about cases when one can explicitly integrate a vector field X in the plane, exploiting ‘symmetries’ which might be apparent. In fact, suppose one has in hand a vector field Y such that (8.7) [X, Y ] = 0. By (8.6), this implies FYt # X = X for all t; this connection will be pursued further in the next section. Suppose one has an explicit hold on the flow generated by Y, so one can produce explicit local coordinates (u, v) with respect to which (8.8) Y = ∂/∂u. In this coordinate system, write X = a(u, v)∂/∂u+b(u, v)∂/∂v. The condition (8.7) implies ∂a/∂u = 0 = ∂b/∂u, so in fact we have (8.9) X = a(v)∂/∂u + b(v)∂/∂v. Integral curves of (8.9) satisfy (8.10) u0 = a(v), v 0 = b(v) and can be found explicitly in terms of integrals; one has Z (8.11) b(v)−1 dv = t + C1 , and then Z (8.12) u= a(v(t))dt + C2 . More generally than (8.7), we can suppose that, for some constant c, (8.13) [X, Y ] = cX, which by (8.6) is the same as (8.14) FYt # X = e−ct X. 61 An example would be (8.15) X = f (x, y)∂/∂x + g(x, y)∂/∂y where f and g satisfy ‘homogeneity’ conditions of the form (8.16) f (ra x, rb y) = ra−c f (x, y), g(ra x, rb y) = rb−c g(x, y), for r > 0; in such a case one can take explicitly FYt (x, y) = (eat x, ebt y). (8.17) Now, if one again has (8.8) in a local coordinate system (u, v), then X must have the form £ ¤ (8.18) X = ecu a(v)∂/∂u + b(v)∂/∂v which can be explicitly integrated, since (8.19) u0 = ecu a(v), v 0 = ecu b(v) =⇒ du/dv = a(v)/b(v). The hypothesis (8.13) implies that the linear span (over R) of X and Y is a two dimensional ‘solvable Lie algebra.’ Sophus Lie devoted a good deal of effort to examining when one could use constructions of solvable Lie algebras of vector fields to explicitly integrate vector fields; his investigations led to his foundation of what is now called the theory of Lie groups. Exercises 1. Verify that the bracket (8.4) satisfies the ‘Jacobi identity’ [X, [Y, Z]] − [Y, [X, Z]] = [[X, Y ], Z], i.e., [LX , LY ]Z = L[X,Y ] Z. 2. Find the integral curves of X = (x + y 2 ) ∂ ∂ +y ∂x ∂y using (8.16). 3. Find the integral curves of X = (x2 y + y 5 ) ∂ ∂ + (x2 + xy 2 + y 4 ) . ∂x ∂y 62 9. Differential equations from classical mechanics Consider the equations of motion which arise from Newton’s law F = ma, when the force F is given by (9.1) F = − grad V (x), V being potential energy. We get the ODE m d2 x/dt2 = −∂V /∂x. (9.2) We can convert this into a first order system for (x, ξ), where (9.3) ξ = m dx/dt is the momentum. We have (9.4) dx/dt = ξ/m, dξ/dt = −∂V /∂x. This is called a Hamiltonian system. Now consider the total energy (9.5) f (x, ξ) = |ξ|2 /2m + V (x). Note that ∂f /∂ξ = ξ/m, ∂f /∂x = ∂V /∂x. Thus (9.4) is of the form (9.6) dxj /dt = ∂f /∂ξj , dξj /dt = −∂f /∂xj . Hence we’re looking for the integral curves of the vector field (9.7) n h X ∂f ∂ i ∂f ∂ − . Hf = ∂ξ ∂x ∂x ∂ξ j j j j j=1 For smooth f (x, ξ), we call Hf , defined by (9.7), a Hamiltonian vector field. Note that, directly from (9.7), (9.8) Hf f = 0. A useful notation is the Poisson bracket, defined by (9.9) {f, g} = Hf g. 63 One verifies directly from (9.7) that (9.10) {f, g} = −{g, f }, generalizing (9.8). Also a routine calculation verifies that (9.11) [Hf , Hg ] = H{f,g} . As noted at the end of §7, if X is a vector field in the plane, and we have explicitly a function u with nonvanishing gradient such that Xu = 0, then X can be explicitly integrated. These comments apply to X = Hf , u = f, in case Hf is a planar Hamiltonian vector field. We can rephrase this description, as follows. If x ∈ R, ξ ∈ R, integral curves of (9.12) x0 = ∂f /∂ξ, ξ 0 = −∂f /∂x lie on a level set (9.13) f (x, ξ) = E. Suppose that, locally, this set is described by (9.14) x = ϕ(ξ), or ξ = ψ(x). Then we have one of the following ODEs: (9.15) x0 = fξ (x, ψ(x)), or ξ 0 = −fx (ϕ(ξ), ξ), and hence we have Z ¡ ¢−1 fξ x, ψ(x) dx = t + C, (9.16) or Z (9.17) − ¡ ¢−1 fx ϕ(ξ), ξ dξ = t + C 0 . Thus solving (9.12) is reduced to a quadrature, i.e., a calculation of an explicit integral, (9.16) or (9.17). If the planar Hamiltonian vector field Hf arises from describing motion in a force field on a line, via Newton’s laws (9.2), so (9.18) f (x, ξ) = ξ 2 /2m + V (x), then the second curve in (9.14) is (9.19) £ ¡ ¢¤ 1 ξ = ± (2m) E − V (x) 2 64 and the formula (9.16) becomes Z (9.20) ±(m/2) 1 2 £ ¤−1 E − V (x) dx = t + C, defining x implicitly as a function of t. In some cases, the integral (9.20) can be evaluated by elementary means. This includes the trivial case of a constant force, where V (x) = cx, also the case of the ‘harmonic oscillator’ or linearized spring, where V (x) = cx2 . It also includes the case of motion of a rocket in space, along a line through the center of a planet, where V (x) = −K/|x|. The case V (x) = −K cos x arises in the analysis of the pendulum (see (18.38)). In that case (9.20) is an elliptic integral, rather than one that arises in first year calculus. For Hamiltonian vector fields in higher dimensions, more effort is required to understand the resulting flows. One important tractable class arises from central force problems. We treat this in the next section. Hamiltonian vector fields arise in treating many problems in addition to those derived from Newton’s laws in Cartesian coordinates. In §18 we study a broad class of variational problems, which lead to Hamiltonian systems. This variational approach has many convenient features, such as allowing easy formulation of the equations of motion in arbitrary coordinate systems. Exercises 1. Verify that [Hf , Hg ] = H{f,g} . 2. Demonstrate that the Poisson bracket satisfies the Jacobi identity {f, {g, h}} − {g, {f, h}} = {{f, g}, h}. Hint. Use exercise 1 above and exercise 1 of §8. 3. Identifying y and ξ, show that a planar vector field X = f (x, y)(∂/∂x) + g(x, y)(∂/∂y) is Hamiltonian if and only if div X = 0. Reconsider problem 5 in §7. 4. Show that on an orbit of Hf . d g(x, ξ) = {f, g} dt 65 10. Central force problems Central force problems give rise to a class of Hamiltonian systems with two degrees of freedom, whose solutions can be found ‘explicitly,’ in terms of integrals. Before treating these problems specifically, we look at a class of Hamiltonians on a region in R4 , of the form (10.1) F (y, η) = F (y1 , η1 , η2 ), i.e., with no y2 dependence. Thus Hamilton’s equations take the form (10.2) y˙ j = ∂F/∂ηj , η˙ 1 = −∂F/∂y1 , η˙ 2 = 0. In particular, η2 is constant on any orbit, say (10.3) η2 = L. This, in addition to F, provides the second conservation law implying integrability; note that {F, η2 } = 0. If F (y1 , η1 , L) = E on an integral curve, we write this relation as (10.4) η1 = ψ(y1 , L, E). We can now pursue an analysis which is a variant of that described by (9.14)(9.20). The first equation in (10.2) becomes (10.5) ¡ ¢ y˙ 1 = Fη1 y1 , ψ(y1 , L, E), L , with solution given implicitly by Z ¡ ¢−1 (10.6) Fη1 y1 , ψ(y1 , L, E), L dy1 = t + C. Once you have y1 (t), then you have (10.7) η1 (t) = ψ(y1 (t), L, E), and then the remaining equation in (10.2) becomes (10.8) ¡ ¢ y˙ 2 = Fη2 y1 (t), η1 (t), L , which is solved by an integration. 66 We apply this method to the central force problem: (10.9) x˙ = Gξ (x, ξ), ξ˙ = −Gx (x, ξ), G(x, ξ) = 21 |ξ|2 + v(|x|), x ∈ R2 . Use of polar coordinates is clearly suggested, so we set (10.10) y1 = r, y2 = θ; x1 = r cos θ, x2 = r sin θ. A calculation shows that the system (10.9) takes the form (10.2), with (10.11) F (y, η) = 1 2 ¡ 2 ¢ η1 + y1−2 η22 + v(y1 ). We see that the first pair of ODEs in (10.2) takes the form (10.12) r˙ = η1 , θ˙ = Lr−2 , where L is the constant value of η2 along an integral curve, as in (10.3). The last equation, rewritten as (10.13) r2 θ˙ = L, expresses conservation of angular momentum. The remaining ODE in (10.2) becomes (10.14) η˙ 1 = L2 r−3 − v 0 (r). Note that differentiating the first equation of (10.12) and using (10.14) gives (10.15) r¨ = L2 r−3 − v 0 (r), an equation which can be integrated by the methods described in (9.12)-(9.20). We won’t solve (10.15) by this means here, though (10.15) will be used below, to produce (10.23). For now, we use instead (10.4)-(10.6). In the present case, (10.4) takes the form (10.16) £ ¤1 η1 = ± 2E − 2v(r) − L2 r−2 2 , and since Fη1 = η1 , (10.6) takes the form Z £ ¤− 1 (10.17) ± 2Er2 − 2r2 v(r) − L2 2 rdr = t + C. In the case of the Kepler problem, where v(r) = −K/r, the resulting integral Z ¢− 1 ¡ (10.18) ± 2Er2 + 2Kr − L2 2 rdr = t + C 67 can be evaluated using techniques of first year calculus, by completing the square in 2Er2 + 2Kr − L2 . Once r = r(t) is given, the equation (10.13) provides an integral formula for θ = θ(t). One of the most remarkable aspects of the analysis of the Kepler problem is the demonstration that orbits all lie on some conic section, given in polar coordinates by (10.19) £ ¤ r 1 + e cos(θ − θ0 ) = ed, where e is the ‘eccentricity.’ We now describe the famous, elementary but ingenious trick used to demonstrate this. The method involves producing a differential equation for r in terms of θ, from (10.13) and (10.15). More precisely, we produce a differential equation for u, defined by (10.20) u = 1/r. By the chain rule, (10.21) dr du du dθ = −r2 = −r2 dt dt dθ dt du = −L , dθ in light of (10.13). Differentiating this with respect to t gives (10.22) d2 r d du d2 u dθ = −L = −L dt2 dt dθ dθ2 dt 2 d u = −L2 u2 2 , dθ again using (10.13). Comparing this with (10.15), we get −L2 u2 (d2 u/dθ2 ) = L2 u3 − v 0 (1/u), or equivalently (10.23) d2 u + u = (Lu)−2 v 0 (1/u). dθ2 In the case of the Kepler problem, v(r) = −K/r, the right side becomes the constant K/L2 , so in this case (10.23) becomes the linear equation (10.24) d2 u + u = K/L2 , 2 dθ with general solution (10.25) u(θ) = A cos(θ − θ0 ) + K/L2 , 68 which is equivalent to the formula (10.19) for a conic section. For more general central force problems, the equation (10.23) is typically not linear, but it is of the form treatable by the method of (9.12)-(9.20). Exercises 1. Solve explicitly w00 (t) = −w(t), for w taking values in R2 = C. Show that |w(t)|2 + |w0 (t)|2 = 2E is constant for each orbit. 2. For w(t) taking values in C, define a new curve by Z(τ ) = w(t)2 , dτ /dt = |w(t)|2 . Show that, if w00 (t) = −w(t), then Z 00 (τ ) = −4EZ(τ )/|Z(τ )|3 , i.e., Z(τ ) solves the Kepler problem. 3. Analyze the equation (10.23) for u(θ) in the following cases. (a) v(r) = −K/r2 , (b) v(r) = Kr2 , (c) v(r) = −K/r + εr2 . Show that, in case (c), u(θ) is typically not periodic in θ. 4. Consider the smooth map Φ : (0, ∞) × R → R2 , defined by Φ(y1 , y2 ) = (x1 , x2 ), with x1 = y1 cos y2 , x2 = y1 sin y2 . Then define Ψ : (0, ∞) × R × R2 → R2 × R2 by Ψ(y1 , y2 , η1 , η2 ) = (x1 , x2 , ξ1 , ξ2 ), where (x1 , x2 ) = Φ(y1 , y2 ) and µ ¶ µ ¶ η1 ξ1 t = DΦ(x1 , x2 ) . η2 ξ2 If G(x, ξ) is given by (10.9), show that G ◦ Ψ = F, given by (10.11). How is this related to the calculation that the change of variables (10.10) transforms the Hamiltonian system (10.9) to (10.2)? Remark. Other perspectives on this calculation can be obtained from problem 4 of §18, and from the discussion of canonical transformations in §19. 69 11. The Riemann integral in n variables We define the Riemann integral of a bounded function f : R → R, where R ⊂ Rn is a cell, i.e., a product of intervals R = I1 × · · · × In , where Iν = [aν , bν ] are intervals in R. Recall that a partition of an interval I = [a, b] is a finite collection of subintervals {Jk : 0 ≤ k ≤ N }, disjoint except for their endpoints, whose union is I. We can take Jk = [xk , xk+1 ], where (11.1) a = x0 < x1 < · · · < xN < xN +1 = b. Now, if one has a partition of each Iν into Jν1 ∪ · · · ∪ Jν,N (ν) , then a partition P of R consists of the cells (11.2) Rα = J1α1 × J2α2 × · · · × Jnαn , where 0 ≤ αν ≤ N (ν). For such a partition, define (11.3) maxsize(P) = max diam Rα , α where (diam Rα )2 = `(J1α1 )2 + · · · + `(Jnαn )2 . Here, `(J) denotes the length of an interval J. Each cell has n-dimensional volume (11.4) V (Rα ) = `(J1α1 ) · · · `(Jnαn ). Sometimes we use Vn (Rα ) for emphasis on the dimension. We also use A(R) for V2 (R), and, of course, `(R) for V1 (R). We set X I P (f ) = sup f (x) V (Rα ), (11.5) α I P (f ) = X α Rα inf f (x) V (Rα ). Rα Note that I P (f ) ≤ I P (f ). These quantities should approximate the Riemann integral of f, if the partition P is sufficiently ‘fine.’ To be more precise, if P and Q are two partitions of R, we say P refines Q, and write P  Q, if each partition of each interval factor Iν of R involved in the definition of Q is further refined in order to produce the partitions of the factors Iν , used to define P, via (11.2). It is easy to see that any two partitions of R have a common refinement. Note also that (11.6) P  Q =⇒ I P (f ) ≤ I Q (f ), and I P (f ) ≥ I Q (f ). 70 Consequently, if Pj are any two partitions of R and Q is a common refinement, we have (11.7) I P1 (f ) ≤ I Q (f ) ≤ I Q (f ) ≤ I P2 (f ). Now, whenever f : R → R is bounded, the following quantities are well defined: (11.8) I(f ) = inf P∈Π(R) I P (f ), I(f ) = sup I P (f ), P∈Π where Π(R) is the set of all partitions of R, as defined above. Clearly, by (11.7), I(f ) ≤ I(f ). We then say that f is Riemann integrable (on R) provided I(f ) = I(f ), and in such a case, we set Z f (x) dV (x) = I(f ) = I(f ). (11.9) R We will denote the set of Riemann integrable functions on R by R(R). If dim R = 2, we will often use dA(x) instead of dV (x) in (11.9), and if dim R = 1, we will often use simply dx. We derive some basic properties of the Riemann integral. First, the proof of Proposition 0.3 and Corollary 0.4 readily extend, to give: Proposition 11.1. Let Pν be any sequence of partitions of R such that maxsize(Pν ) = δν → 0, and let ξνα be any choice of one point in each cell Rνα in the partition Pν . Then, whenever f ∈ R(R), Z (11.10) f (x) dV (x) = lim X ν→∞ f (ξνα ) V (Rνα ). α R Since the right side of (11.10) is clearly linear for each ν, we obtain: Proposition 11.2. If fj ∈ R(R) and cj ∈ R, then c1 f1 + c2 f2 ∈ R(R), and Z Z (11.11) (c1 f1 + c2 f2 ) dV = c1 R Z f1 dV + c2 R f2 dV. R Next, we establish an integrability result analogous to Proposition 0.2. 71 Proposition 11.3. If f is continuous on R, then f ∈ R(R). Proof. As in the proof of Proposition 0.2, we have that, (11.12) maxsize(P) ≤ δ =⇒ I P (f ) − I P (f ) ≤ ω(δ) · V (R), where ω(δ) is a modulus of continuity for f on R. This proves the proposition. When the number of variables exceeds one, it becomes more important to identify some nice classes of discontinuous functions on R which are Riemann integrable. A useful tool for this is the following notion of size of a set S ⊂ R, called content. Extending (0.18)-(0.19), we define ‘upper content’ cont+ and ‘lower content’ cont− by cont+ (S) = I(χS ), (11.13) cont− (S) = I(χS ), where χS is the characteristic function of S. We say S has content, or ‘is contented,’ if these quantities are equal, which happens if and only if χS ∈ R(R), in which case the common value of cont+ (S) and cont− (S) is Z (11.14) V (S) = χS (x) dV (s). R It is easy to see that (11.15) + cont (S) = inf N nX o V (Rk ) : S ⊂ R1 ∪ · · · ∪ RN , k=1 where Rk are cells contained in R. In the formal definition, the Rα in (11.15) should be part of a partition P of R, as defined above, but if {R1 , . . . , RN } are any cells in R, they can be chopped up into smaller cells, some perhaps thrown away, to yield a finite cover of S by cells in a partition of R, so one gets the same result. It is easy to see that, for any set S ⊂ R, (11.16) cont+ (S) = cont+ (S), where S is the closure of S. We note that, generally, for a bounded function f on R, (11.17) I(f ) + I(1 − f ) = V (R). This follows directly from (11.5). In particular, given S ⊂ R, (11.18) cont− (S) + cont+ (R \ S) = V (R). 72 Using this together with (11.16), with S and R \ S switched, we have ◦ cont− (S) = cont− (S), (11.19) ◦ ◦ where S is the interior of S. The difference S \ S is called the boundary of S, and denoted bS. Note that (11.20) − cont (S) = sup N nX o V (Rk ) : R1 ∪ · · · ∪ RN ⊂ S , k=1 where here we take {R1 , . . . , RN } to be cells within a partition P of R, and let P vary over all partitions of R. Now, given a partition P of R, classify each cell in P ◦ as either being contained in R \ S, or intersecting bS, or contained in S. Letting P vary over all partitions of R, we see that (11.21) ◦ cont+ (S) = cont+ (bS) + cont− (S). In particular, we have: Proposition 11.4. If S ⊂ R, then S is contented if and only if cont+ (bS) = 0. If a set Σ ⊂ R has the property that cont+ (Σ) = 0, we say that Σ has content zero, or is a nil set. Clearly Σ is nil if and only if Σ is nil. It follows easily from K S Proposition 11.2 that, if Σj are nil, 1 ≤ j ≤ K, then Σj is nil. j=1 ◦ ◦ ◦ If S1 , S2 ⊂ R and S = S1 ∪ S2 , then S = S 1 ∪ S 2 and S = S 1 ∪ S 2 . Hence bS ⊂ b(S1 ) ∪ b(S2 ). It follows then from Proposition 11.4 that, if S1 and S2 are contented, so is S1 ∪ S2 . Clearly, if Sj are contented, so are Sjc = R \ Sj . It follows ¢c ¡ that, if S1 and S2 are contented, so is S1 ∩ S2 = S1c ∪ S2c . A family F of subsets of R such that R ∈ F, Sj ∈ F ⇒ S1 ∪ S2 ∈ F, and S1 ∈ F ⇒ R \ S1 ∈ F is called an algebra of subsets of R. Algebras of sets are automatically closed under finite intersections also. We see that: Proposition 11.5. The family of contented subsets of R is an algebra of sets. The following result specifies a useful class of Riemann integrable functions. Proposition 11.6. If f : R → R is bounded and the set S of points of discontinuity of f is a nil set, then f ∈ R(R). Proof. Suppose |f | ≤ M on R, and take ε > 0. Take a partition P of R, and write P = P 0 ∪ P 00 , where cells in P 0 do not meet S, and cells in P 00 do intersect S. Since cont+ (S) = 0, we can pick P so that the cells in P 00 have total volume ≤ ε. Now f 73 is continuous on each cell in P 0 . Further refining the partition if necessary, we can assume that f varies by ≤ ε on each cell in P 0 . Thus (11.22) £ ¤ I P (f ) − I P (f ) ≤ V (R) + 2M ε. This proves the proposition. To give an example, suppose K ⊂ R is a closed set such that bK is nil. Let f : K → R be continuous. Define fe : R → R by (11.23) fe(x) = f (x) for x ∈ K, 0 for x ∈ R \ K. Then the set of points of discontinuity of fe is contained in bK. Hence fe ∈ R(R). We set Z Z (11.24) f dV = fe dV. K R The following is a useful criterion for a set S ⊂ Rn to have content zero. Proposition 11.7. Let Σ ⊂ Rn−1 be a closed bounded set and let g : Σ → R be continuous. Then the graph of g, G= ©¡ ¢ ª x, g(x) : x ∈ Σ is a nil subset of Rn . Proof. Put Σ in a cell R0 ⊂ Rn−1 . Suppose |f | ≤ M on Σ. Take N ∈ Z+ and set ε = M/N. Pick a partition P0 of R0 , sufficiently fine that g varies by at most ε on each set Σ ∩ Rα , for any cell Rα ∈ P0 . Partition the interval I = [−M, M ] into 2N equal intervals Jν , of length ε. Then {Rα × Jν } = {Qαν } forms a partition of R0 × I. Now, over each cell Rα ∈ P0 , there lie at most 2 cells Qαν which intersect G, so cont+ (G) ≤ 2ε · V (R0 ). Letting N → ∞, we have the proposition. Similarly, for any j ∈ {1, . . . , n}, the graph of xj as a continuous function of the complementary variables is a nil set in Rn . So are finite unions of such graphs. Such sets arise as boundaries of many ordinary-looking regions in Rn . Here is a further class of nil sets. Proposition 11.8. Let S be a compact nil subset of Rn and f : S → Rn a Lipschitz map. Then f (S) is a nil subset of Rn . Proof. The Lipschitz hypothesis on f is that, for p, q ∈ S, (11.25) |f (p) − f (q)| ≤ C1 |p − q|. 74 If we cover S with k cells (in a partition), of total volume ≤√α, each cubical with edgesize δ, then f (S) is covered√by k sets of diameter ≤ C1 nδ, hence it can be covered √ by (approximately) kC1 n cubical cells of edgesize δ, having total volume ≤ C1 nα. From this we have the (not very sharp) general bound ¡ ¢ √ (11.26) cont+ f (S) ≤ C1 n cont+ (S), which proves the proposition. In evaluating n-dimensional integrals, it is usually convenient to reduce them to iterated integrals. The following is a special case of a result known as Fubini’s Theorem. Theorem 11.9. Let Σ ⊂ Rn−1 be a closed, bounded contented set and let gj : Σ → R be continuous, with g0 (x) < g1 (x) on Σ. Take © ª (11.27) Ω = (x, y) ∈ Rn : x ∈ Σ, g0 (x) ≤ y ≤ g1 (x) . Then Ω is a contented set in Rn . If f : Ω → R is continuous, then Z (11.28) g1 (x) ϕ(x) = f (x, y) dy g0 (x) is continuous on Σ, and Z (11.29) Z f dVn = Ω ϕ dVn−1 , Σ i.e., Z ÃZ Z (11.30) f dVn = Ω ! g1 (x) f (x, y) dy Σ dVn−1 (x). g0 (x) Proof. The continuity of (11.28) is an exercise in one-variable integration; see exercises 2-3 of §0. Let ω(δ) be a modulus of continuity for g0 , g1 , and f, and also ϕ. Put Σ in a cell R0 and let P0 be a partition of R0 . If A ≤ g0 ≤ g1 ≤ B, partition the interval [A, B], and from this and P0 construct a partition P of R = R0 × [A, B]. We denote a cell in P0 by Rα and a cell in P by Rα` = Rα × J` . Pick points ξα` ∈ Rα` . ◦ Write P0 = P00 ∪ P000 ∪ P0000 , consisting respectively of cells inside Σ, meeting bΣ, and inside R0 \ Σ. Similarly write P = P 0 ∪ P 00 ∪ P 000 , consisting respectively of cells ◦ inside Ω, meeting bΩ, and inside R \ Ω, as illustrated in Fig.11.1. For fixed α, let z 0 (α) = {` : Rα` ∈ P 0 }, and let z 00 (α) and z 000 (α) be similarly defined. Note that z 0 (α) 6= ∅ ⇐⇒ Rα ∈ P00 , 75 provided we assume maxsize(P) ≤ δ and 2δ < min g1 (x) − g0 (x), as we will from here on. It follows from (0.9)-(0.10) that Z ¯ X ¯ f (ξα` )`(J` ) − ¯ (11.31) S ¯ ¯ f (x, y) dy ¯ ≤ (B − A)ω(δ), ∀ x ∈ Rα , A(α) `∈z 0 (α) where B(α) J` = [A(α), B(α)]. Note that A(α) and B(α) are within 2ω(δ) of g0 (x) `∈z 0 (α) and g1 (x), respectively, for all x ∈ Rα , if Rα ∈ P00 . Hence, if |f | ≤ M, ¯Z ¯ ¯ (11.32) B(α) ¯ ¯ f (x, y) dy − ϕ(x)¯ ≤ 4M ω(δ). A(α) Thus, with C = B − A + 4M, ¯ X ¯ ¯ ¯ f (ξα` )`(J` ) − ϕ(x)¯ ≤ Cω(δ), (11.33) ¯ ∀ x ∈ Rα ∈ P00 . `∈z 0 (α) Multiplying by Vn−1 (Rα ) and summing over Rα ∈ P00 , we have ¯ X ¯ X ¯ ¯ (11.34) f (ξα` )Vn (Rα` ) − ϕ(xα )Vn−1 (Rα )¯ ≤ CV (R0 )ω(δ), ¯ Rα` ∈P 0 Rα ∈P00 where xα is an arbitrary point in Rα . Now, if P0 is a sufficiently fine partition of R0 , it follows from theR proof of Proposition 11.6 that the second term in (11.34) is arbitrarily close to ϕ dVn−1 , since Σ bΣ has content zero. Furthermore, an argument such as used to prove Proposition 11.7 shows that bΩ has content zero, and one verifies that, R for a sufficiently fine partition, the first sum in (11.34) is arbitrarily close to f dVn . This proves the desired identity (11.29). Ω We next take up the change of variables formula for multiple integrals, extending the one variable formula, (0.38). We begin with a result on linear changes of variables. The set of invertible real n × n matrices is denoted Gl(n, R). Proposition 11.10. Let f be a continuous function with compact support in Rn . If A ∈ Gl(n, R), then Z Z (11.35) f (x) dV = |Det A| f (Ax) dV. Proof. Let G be the set of elements A ∈ Gl(n, R) for which (11.35) is true. Clearly I ∈ G. Using Det A−1 = (Det A)−1 , and Det AB = (Det A)(Det B), we conclude 76 that G is a subgroup of Gl(n, R). To prove the proposition, it will suffice to show that G contains all elements of the following 3 forms, since the method of applying elementary row operations to reduce a matrix shows that any element of Gl(n, R) is a product of a finite number of these elements. Here, {ej : 1 ≤ j ≤ n} denotes the standard basis of Rn , and σ a permutation of {1, . . . , n}. A1 ej = eσ(j) , A2 ej = cj ej , cj 6= 0 (11.36) A3 e2 = e2 + ce1 , A3 ej = ej for j 6= 2. The first two cases are elementary consequences of the definition of the Riemann integral, and can be left as exercises. We show that (11.35) holds for transformations of the form A3 by using Theorem 11.9 (in a special case), to reduce it to the case n = 1. Given f ∈ C(R), compactly supported, and b ∈ R, we clearly have. Z (11.37) Z f (x) dx = f (x + b) dx. Now, for the case A = A3 , with x = (x1 , x0 ), we have Z 0 Z ³Z ´ f (x1 + cx2 , x0 ) dx1 dVn−1 (x0 ) Z ³Z ´ f (x1 , x0 ) dx1 dVn−1 (x0 ), f (x1 + cx2 , x ) dVn (x) = (11.38) = the second identity by (11.37). Thus we get (11.35) in case A = A3 , so the proposition is proved. It is desirable to extend Proposition 11.10 to some discontinuous functions. Given a cell R and f : R → R, bounded, we say (11.39) f ∈ C(R) ⇐⇒ the set of discontinuities of f is nil. Proposition 11.6 implies (11.40) C(R) ⊂ R(R). From the closure of the class of nil sets under finite unions it is clear that C(R) is closed under sums and products, i.e., that C(R) is an algebra of functions on R. We will denote by Cc (Rn ) the set of functions f : Rn → R such that f has compact n support and its ¯ set of discontinuities is nil. Any f ∈ Cc (R ) is supported in some cell R, and f ¯R ∈ C(R). 77 Proposition 11.11. Given A ∈ Gl(n, R), the identity (11.35) holds for all f ∈ Cc (Rn ). Proof. Assume |f | ≤ M. Suppose supp f is in the interior of a cell R. Let S be the set of discontinuities of f. The function g(x) = f (Ax) has A−1 S as its set of discontinuities, which is also nil, by Proposition 11.8. Thus g ∈ Cc (Rn ). Take ε > 0. Let P be a partition of R, of maxsize ≤ δ, such that S is covered by cells in P of total volume ≤ ε. Write P = P 0 ∪ P 00 ∪ P 000 , where P 00 consists of cells meeting S, P 000 consists of all cells touching cells in P 00 , and P 0 consists of all the rest. Then P 00 ∪ P 000 consists of cells of total volume ≤ 3n ε. Thus if we set [ (11.41) B= Rα , P0 we have that B is compact, B ∩ S = ∅, and V (R \ B) ≤ 3n ε. Now construct a continuous, compactly supported function ϕ : Rn → R such that (11.42) ϕ = 1 on B, ϕ = 0 near S, 0 ≤ ϕ ≤ 1, and set (11.43) f = ϕf + (1 − ϕ)f = f1 + f2 . We see that f1 is continuous on Rn , with compact support, |f2 | ≤ M, and f2 is supported on R \ B, which is a union of cells of total volume ≤ 3n ε. Now Proposition 11.10 applies directly to f1 , to yield Z Z (11.44) f1 (y) dV (y) = |Det A| f1 (Ax) dV (x). We have f2 ∈ Cc (Rn ), and g2 (x) = f2 (Ax) ∈ Cc (Rn ). Also, g2 is supported by + A−1 √(R n\ B), which by the estimate (11.26) has the property cont (R \ B) ≤ C1 n3 ε. Thus ¯Z ¯ ¯Z ¯ √ ¯ ¯ ¯ ¯ n (11.45) ¯ f2 (y) dV (y)¯ ≤ 3 M ε, ¯ f2 (Ax) dV (x)¯ ≤ C1 n3n M ε. Taking ε → 0, we have the proposition. In particular, if Σ ⊂ Rn is a compact set which is contented, then its characteristic function χΣ , which has bΣ as its set of points of discontinuity, belongs to Cc (Rn ), and we deduce: Corollary 11.12. If Σ ⊂ Rn is a compact, contented set and A ∈ Gl(n, R), then A(Σ) = {Ax : x ∈ Σ} is contented, and ¡ ¢ (11.46) V A(Σ) = |Det A| V (Σ). We now extend Proposition 11.10 to nonlinear changes of variables. 78 Proposition 11.13. Let O and Ω be open in Rn , G : O → Ω a C 1 diffeomorphism, and f a continuous function with compact support in Ω. Then Z Z ¡ ¢ (11.47) f (y) dV (y) = f G(x) |Det DG(x)| dV (x). Ω O Proof. It suffices to prove the result under the additional assumption that f ≥ 0, which we make from here on. If K = supp f, put G−1 (K) inside a rectangle R and let P = {Rα } be a partition of R. Let A be the set of α such that Rα ⊂ O. For α ∈ A, G(Rα ) ⊂ Ω. See Fig.11.2. Pick P fine enough that α ∈ A whenever Rα ∩ G−1 (K) is nonempty. eα = Rα − ξα , a cell with center at the Let ξα be the center of Rα , and let R origin. Then ¡ ¢ eα = ηα + Hα G(ξα ) + DG(ξα ) R (11.48) is an n-dimensional parallelipiped which is close to G(Rα ), if Rα is small enough. See Fig.11.3. In fact, given ε > 0, if δ > 0 is small enough and maxsize(P) ≤ δ, then we have (11.49) ηα + (1 + ε)Hα ⊃ G(Rα ), for all α ∈ A. Now, by (11.46), (11.50) V (Hα ) = |Det DG(α)| V (Rα ). Hence ¡ ¢ V G(Rα ) ≤ (1 + ε)n |Det DG(ξα )| V (Rα ). (11.51) Now we have Z f dV = X α∈A (11.52) ≤ X α∈A Z f dV G(Rα ) ¡ ¢ sup f ◦ G(x) V G(Rα ) Rα ≤ (1 + ε)n X α∈A sup f ◦ G(x) |Det DG(ξα )| V (Rα ). Rα The second inequality here uses (11.51) and f ≥ 0. If we set (11.53) h(x) = f ◦ G(x) |Det DG(x)|, 79 then we have (11.54) sup f ◦ G(x) |Det DG(ξα )| ≤ sup h(x) + M ω(δ), Rα Rα provided |f | ≤ M and ω(δ) is a modulus of continuity for DG. Taking arbitrarily fine partitions, we get Z Z (11.55) f dV ≤ h dV. If we apply this result, with G replaced by G−1 , O and Ω switched, and f replaced by h, given by (11.53), we have Z Z Z −1 −1 (11.56) h dV ≤ h ◦ G (y) |Det DG (y)| dV (y) = f dV. The inequalities (11.55) and (11.56) together yield the identity (11.47). Now the same argument used to establish Proposition 11.11 from Proposition 11.10 yields the following improvement of Proposition 11.13. We say f ∈ Cc (Ω) if f ∈ Cc (Rn ) has support in an open set Ω. Theorem 11.14. Let O and Ω be open in Rn , G : O → Ω a C 1 diffeomorphism, and f ∈ Cc (Ω). Then f ◦ G ∈ Cc (O), and Z Z ¡ ¢ (11.57) f (y) dV (y) = f G(x) |Det DG(x)| dV (x). Ω O The most frequently invoked case of the change of variable formula, in the case n = 2, involves the following change from Cartesian to polar coordinates: (11.58) x = r cos θ, y = r sin θ. Thus, take G(r, θ) = (r cos θ, r sin θ). We have µ ¶ cos θ −r sin θ (11.59) DG(r, θ) = , sin θ r cos θ Det DG(r, θ) = r. For example, if ρ ∈ (0, ∞) and Dρ = {(x, y) ∈ R2 : x2 + y 2 ≤ ρ2 }, (11.60) then, for f ∈ C(Dρ ), Z (11.61) Z ρ Z f (x, y) dA = Dρ 2π f (r cos θ, r sin θ)r dθ dr. 0 0 80 To get this, we first apply Theorem 11.4, with O = [ε, ρ] × [0, 2π − ε], then apply Theorem 11.9, then let ε & 0. It is often useful to integrate a function whose support is not bounded.¯ Generally, given a bounded function fR : Rn → R, we say f ∈ R(Rn ) provided f ¯R ∈ R(R) for each cell R ⊂ Rn , and |f | dV ≤ C, for some C < ∞, independent of R. If R f ∈ R(Rn ), we set Z Z (11.62) f dV = lim f dV, Rs = {x ∈ Rn : |xj | ≤ s, ∀ j}. s→∞ Rn Rs It is easy to see that, if Kν is any sequence of compact contented subsets of Rn such that each Rs , for s < ∞, is contained in all Kν for ν sufficiently large, i.e., ν ≥ N (s), then, whenever f ∈ R(Rn ), Z Z (11.63) f dV = lim f dV. ν→∞ Rn Kν Change of variables formulas and Fubini’s Theorem extend to this case. For example, the limiting case of (11.61) as ρ → ∞ is Z 2 Z 2 (11.64) f ∈ C(R ) ∩ R(R ) =⇒ Z ∞ f (x, y) dA = f (r cos θ, r sin θ)r dθ dr. 0 R2 0 2 The following is a good example. Take f (x, y) = e−x Z (11.65) Z e −x2 −y 2 2π ∞ Z −r 2 e 0 R2 . We have Z 2π dA = −y 2 ∞ r dθ dr = 2π 0 2 e−r r dr. 0 Now, methods of §0 allow the substitution s = r2 , so 1 2 . Hence R∞ 0 2 e−r r dr = Z e−x (11.66) 2 −y 2 dA = π. R2 On the other hand, Theorem 11.9 extends to give Z Z ∞Z ∞ 2 2 −x2 −y 2 e dA = e−x −y dy dx (11.67) −∞ R2 µZ −∞ ∞ = e −∞ −x2 ¶ µZ dx ¶ ∞ −∞ e −y 2 dy . 1 2 R∞ 0 e−s ds = 81 Note that the two factors in the last product are equal. We deduce that Z ∞ √ 2 (11.68) e−x dx = π. −∞ We can generalize (11.67), to obtain (via (11.68)) µZ ∞ ¶n Z n −|x|2 −x2 (11.69) e dx e dVn = = π2. −∞ Rn The integrals (11.65)-(11.69) are called Gaussian integrals, and their evaluation has many uses. We shall see some in §12. Exercises 1. Let f : Rn → R be unbounded. For s ∈ (0, ∞), set fs (x) = f (x) 0 if |f (x)| ≤ s otherwise. Assume fs ∈ R(Rn ) for each s < ∞, and assume Z |fs (x)| dx ≤ B < ∞, ∀ s < ∞. Rn In such a case, say f ∈ R# (Rn ). Set Z Z fs (x) dx. f (x) dx = lim s→∞ Rn Rn Show that this limit exists whenever f ∈ R# (Rn ). Extend the change of variable formula to such functions. Show that 2 f (x) = |x|−a e−|x| ∈ R# (Rn ) ⇐⇒ a < n. 2. Consider spherical polar coordinates on R3 , given by x = ρ sin ϕ cos θ, y = ρ sin ϕ sin θ, z = ρ cos ϕ, ¡ ¢ i.e., take G(ρ, ϕ, θ) = ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos θ . Show that Det DG(ρ, ϕ, θ) = ρ2 sin ϕ. Use this to compute the volume of the unit ball in R3 . 82 12. Integration on surfaces A smooth m-dimensional surface M ⊂ Rn is characterized by the following property. Given p ∈ M, there is a neighborhood U of p in M and a smooth map ϕ : O → U, from an open set O ⊂ Rm bijectively to U, with injective derivative at each point. Such a map ϕ is called a coordinate chart on M. We call U ⊂ M a coordinate patch. If all such maps ϕ are smooth of class C k , we say M is a surface of class C k . In §15 we will define analogous notions of a C k surface with boundary, and of a C k surface with corners. There is an abstraction of the notion of a surface, namely the notion of a ‘manifold,’ which is useful in more advanced studies. A treatment of calculus on manifolds can be found in [Spi]. ¡ ¢ There is associated an m × m matrix G(x) = gjk (x) of functions on O, defined in terms of the inner product of vectors tangent to M : (12.1) gjk (x) = Dϕ(x)ej · Dϕ(x)ek = n X ∂ϕ` ∂ϕ` , ∂xj ∂xk `=1 where {ej : 1 ≤ j ≤ m} is the standard orthonormal basis of Rm . Equivalently, G(x) = Dϕ(x)t Dϕ(x). (12.2) We call (gjk ) the metric tensor of M, on U. Note that this matrix is positive definite. From a coordinate-independent point of view, the metric tensor on M specifies inner products of vectors tangent to M , using the inner product of Rn . Suppose there is another coordinate chart ψ : Ω → U, and we compare (gjk ) with H = (hjk ), given by (12.3) hjk (y) = Dψ(y)ej · Dψ(y)ek , i.e., H(y) = Dψ(y)t Dψ(y). We can write ϕ = ψ ◦ F, where F : O → Ω is a diffeomorphism. See Fig.12.1. By the chain rule, Dϕ(x) = Dψ(y)DF (x), where y = F (x). Consequently, G(x) = DF (x)t H(y)DF (x), (12.4) or equivalently, (12.5) gjk (x) = X ∂Fi ∂F` hi` (y). ∂xj ∂xk i,` If f : M → R is a continuous function supported on U, we will define the surface integral by Z Z p (12.6) f dS = f ◦ ϕ(x) g(x) dx, M O 83 where (12.7) g(x) = Det G(x). We need to know that this is independent of the choice of coordinate chart ϕ : O → U. R Thus, ifpwe use ψ : Ω → U instead, we want to show that (12.6) is equal to f ◦ ψ(y) h(y) dy, where h(y) = Det H(y). Indeed, since f ◦ ψ ◦ F = f ◦ ϕ, we Ω can apply the change of variable formula, Theorem 11.14, to get Z Z p p (12.8) f ◦ ψ(y) h(y) dy = f ◦ ϕ(x) h(F (x)) |Det DF (x)| dx. Ω O Now, (12.4) implies that p p g(x) = |Det DF (x)| h(y), (12.9) so the right side of (12.8) is seen to be equal to (12.6), and our surface integral is well defined, at least for f supported in a coordinate patch. More generally, if f : M → R has compact support, write it as a finite sum of terms, each supported on a coordinate patch, and use (12.6) on each patch. Let us pause to consider the special cases m = 1 and m = 2. For m = 1, we are considering a curve in Rn , say ϕ : [a, b] → Rn . Then G(x) is a 1 × 1 matrix, namely G(x) = |ϕ0 (x)|2 . If we denote the curve in Rn by γ, rather than M, the formula (12.6) becomes Z (12.10) Z f ds = b f ◦ ϕ(x) |ϕ0 (x)| dx. a γ In case m = 2, let us consider a surface M ⊂ R3 , with a coordinate chart ϕ : O → U ⊂ M. For f supported in U, an alternative way to write the surface integral is Z Z (12.11) f dS = f ◦ ϕ(x) |∂1 ϕ × ∂2 ϕ| dx1 dx2 , M O where u × v is the cross product of vectors u and v in R3 . To see this, we compare this integrand with the one in (12.6). In this case, µ ¶ ∂1 ϕ · ∂1 ϕ ∂ 1 ϕ · ∂2 ϕ (12.12) g = Det = |∂1 ϕ|2 |∂2 ϕ|2 − (∂1 ϕ · ∂2 ϕ)2 . ∂2 ϕ · ∂1 ϕ ∂ 2 ϕ · ∂2 ϕ Recall from (5.28) that |u × v| = |u| |v| | sin θ|, where θ is the angle between u and v. Equivalently, since u · v = |u| |v| cos θ, ¡ ¢ (12.13) |u × v|2 = |u|2 |v|2 1 − cos2 θ = |u|2 |v|2 − (u · v)2 . 84 √ Thus we see that |∂1 ϕ × ∂2 ϕ| = g, in this case, and (12.11) is equivalent to (12.6). An important class of surfaces is the class of graphs of smooth functions. Let n−1 u ∈ C 1 (Ω), , and let M be the graph of z = u(x). The map ¡ for an ¢ open Ω ⊂ R ϕ(x) = x, u(u) provides a natural coordinate system, in which the metric tensor is given by (12.14) gjk (x) = δjk + ∂u ∂u . ∂xj ∂xk If u is C 1 , we see that gjk is continuous. To calculate g = Det(gjk ), at a given point p ∈ Ω, if ∇u(p) 6= 0, rotate coordinates so that ∇u(p) is parallel to the x1 axis. We see that √ (12.15) ¢1 ¡ g = 1 + |∇u|2 2 . In particular, the (n − 1)-dimensional volume of the surface M is given by Z (12.16) Vn−1 (M ) = Z dS = M ¡ ¢1 1 + |∇u(x)|2 2 dx. Ω Particularly important examples of surfaces are the unit spheres S n−1 in Rn , S n−1 = {x ∈ Rn : |x| = 1}. Spherical polar coordinates on Rn are defined in terms of a smooth diffeomorphism (12.17) R : (0, ∞) × S n−1 −→ Rn \ 0, R(r, ω) = rω. If (h`m ) denotes the metric tensor on S n−1 induced from its inclusion in Rn , we see that the Euclidean metric tensor can be written µ ¶ ¡ ¢ 1 (12.18) ejk = . r2 h`m Thus √ (12.19) √ e = rn−1 h. We therefore have the following result for integrating a function in spherical polar coordinates. Z Z hZ ∞ i (12.20) f (x) dx = f (rω)rn−1 dr dS(ω). Rn S n−1 0 85 We next compute the (n − 1)-dimensional area An−1 of the unit sphere S n−1 ⊂ Rn , using (12.20) together with the computation Z 2 n e−|x| dx = π 2 , (12.21) Rn from (11.69). First note that, whenever f (x) = ϕ(|x|), (12.20) yields Z Z ∞ ϕ(|x|) dx = An−1 (12.22) ϕ(r)rn−1 dr. 0 Rn 2 In particular, taking ϕ(r) = e−r and using (12.21), we have Z (12.23) π n 2 Z ∞ = An−1 e −r 2 n−1 r dr = 0 1 2 An−1 ∞ n e−s s 2 −1 ds, 0 where we used the substitution s = r2 to get the last identity. We hence have n (12.24) An−1 2π 2 = , Γ( n2 ) where Γ(z) is Euler’s Gamma function, defined for z > 0 by Z (12.25) ∞ Γ(z) = e−s sz−1 ds. 0 We need to complement (12.24) with some results on Γ(z) allowing a computation of Γ( n2 ) in terms of more familiar quantities. Of course, setting z = 1 in (12.25), we immediately get (12.26) Γ(1) = 1. Also, setting n = 1 in (12.23), we have Z Z ∞ 1 2 e π =2 −r 2 dr = 0 ∞ 1 e−s s− 2 ds, 0 or (12.27) 1 Γ( 12 ) = π 2 . We can proceed inductively from (12.26)-(12.27) to a formula for Γ( n2 ) for any n ∈ Z+ , using the following. 86 Lemma 12.1. For all z > 0, (12.28) Γ(z + 1) = zΓ(z). Proof. We can write Z ∞³ Γ(z + 1) = − 0 d −s ´ z e s ds = ds Z ∞ e−s 0 d ¡ z¢ s ds, ds the last identity by integration by parts. The last expression here is seen to equal the right side of (12.28). Consequently, for k ∈ Z+ , (12.29) Γ(k) = (k − 1)!, ¢ ¡ ¢ ¡ ¢ 1 ¡ Γ k + 12 = k − 21 · · · 21 π 2 . Thus (12.24) can be rewritten (12.30) A2k−1 = 2π k , (k − 1)! 2π k ¢ ¡ ¢. A2k = ¡ k − 12 · · · 12 We discuss another important example of a smooth surface, in the space M (n) ≈ R of real n × n matrices, namely SO(n), the set of matrices T ∈ M (n) satisfying T t T = I and Det T > 0 (hence Det T = 1). As noted in exercises 4-5 of §4, the exponential map Exp: M (n) → M (n) defined by Exp(A) = eA has the property n2 (12.31) Exp : Skew(n) −→ SO(n), where Skew(n) is the set of skew-symmetric matrices in M (n). Also D Exp(0)A = A; hence (12.32) D Exp(0) = ι : Skew(n) ,→ M (n). It follows from the Inverse Function Theorem that there is a neighborhood O of 0 in Skew(n) which is mapped by Exp diffeomorphically onto a smooth surface U ⊂ M (n), of dimension m = 12 n(n − 1). Furthermore, U is a neighborhood of I in SO(n). For general T ∈ SO(n), we can define maps (12.33) ϕT : O −→ SO(n), ϕT (A) = T Exp(A), and obtain coordinate charts in SO(n), which is consequently a smooth surface of dimension 12 n(n−1) in M (n). Note that SO(n) is a closed bounded subset of M (n); hence it is compact. We use the inner product on M (n) computed componentwise; equivalently, (12.34) hA, Bi = Tr(B t A) = Tr (BAt ). This produces a metric tensor on SO(n). The surface integral over SO(n) has the following important invariance property. 87 ¡ ¢ Proposition 12.2. Given f ∈ C SO(n) , if we set (12.35) ρT f (X) = f (XT ), for T, X ∈ SO(n), we have Z Z ρT f dS = (12.36) SO(n) λT f (X) = f (T X), Z λT f dS = SO(n) f dS. SO(n) Proof. Given T ∈ SO(n), the maps RT , LT : M (n) → M (n) defined by RT (X) = XT, LT (X) = T X are easily seen from (12.34) to be isometries. Thus they yield maps of SO(n) to itself which preserve the metric tensor, proving (12.36). ¡ ¢ R Since SO(n) is compact, its total volume V SO(n) = 1 dS is finite. We SO(n) define the integral with respect to ‘Haar measure’ Z Z 1 ¢ (12.37) f (g) dg = ¡ V SO(n) SO(n) f dS. SO(n) This is used in many arguments involving ‘averaging over rotations.’ One example will arise in the proof of Proposition 20.5. Exercises 1. Define ϕ : [0, θ] → R2 to be ϕ(t) = (cos t, sin t). Show that, if 0 < θ ≤ 2π, the image of [0, θ] under ϕ is an arc of the unit circle, of length θ. Deduce that the unit circle in R2 has total length 2π. This result follows also from (12.30). Remark. Use the definition of π given in the auxiliary problem set after §3. This length formula provided the original definition of π, in ancient Greek geometry. 2. Compute the volume of the unit ball B n = {x ∈ Rn : |x| ≤ 1}. Hint. Apply (12.22) with ϕ = χ[0,1] . 3. Suppose M is a surface in Rn of dimension 2, and ¡ ¢ ϕ : O → U ⊂ M is a coordinate chart, with O ⊂ R2 . Set ϕjk (x) = ϕj (x), ϕk (x) , so ϕjk : O → R2 . Show that the formula (12.6) for the surface integral is equivalent to s ³ Z Z ´2 X f dS = f ◦ ϕ(x) Det Dϕjk (x) dx. M O j<k 88 √ Hint. Show that the quantity under is equal to (12.12). 4. If M is an m-dimensional surface, ϕ : O → M ⊂ M a coordinate chart, for J = (j1 , . . . , jm ) set ¡ ¢ ϕJ (x) = ϕj1 (x), . . . , ϕjm (x) , ϕJ : O → R m . Show that the formula (12.6) is equivalent to Z s Z f dS = M f ◦ ϕ(x) X ³ Det DϕJ (x) ´2 dx. j1 <···<jm O √ is equal Hint. Reduce to the following. For fixed x0 ∈ O, the quantity under ¡ ¢ to g(x) at x = x0 , in the case Dϕ(x0 ) = Dϕ1 (x0 ), . . . , Dϕm (x0 ), 0, . . . , 0 . Reconsider this problem when working on the exercises for §13. 5. Let M be the graph in Rn+1 of xn+1 = u(x), x ∈ O ⊂ Rn . Let N be the unit ¡ ¢− 1 normal to M given by N = 1 + |∇u|2 2 (−∇u, 1). Show that, for a continuous function f : M → Rn+1 , Z Z ¡ ¢ ¡ ¢ f · N dS = f x, u(x) · −∇u(x), 1 dx. M The left side is often denoted O R M f · dS. 89 13. Differential forms It is very desirable to be able to make constructions which depend as little as possible on a particular choice of coordinate system. The calculus of differential forms, whose study we now take up, is one convenient set of tools for this purpose. We start with the notion of a 1-form. It is an object that gets integrated over a curve; formally, a 1-form on Ω ⊂ Rn is written X (13.1) α= aj (x) dxj . j If γ : [a, b] → Ω is a smooth curve, we set Z (13.2) Z b α= a γ X ¡ ¢ aj γ(t) γj0 (t) dt. In other words, Z (13.3) γ∗α α= γ where I = [a, b] and γ ∗ α = Z P j I aj (γ(t))γj0 (t) is the pull-back of α under the map γ. More generally, if F : O → Ω is a smooth map (O ⊂ Rm open), the pull-back F ∗ α is a 1-form on O defined by X aj (F (y))(∂Fj /∂yk ) dyk . (13.4) F ∗α = j,k The usual change of variable for integrals gives Z Z (13.5) α = F ∗α γ σ if γ is the curve F ◦ σ. If F : O → Ω is a diffeomorphism, and X (13.6) X= bj (x)∂/∂xj is a vector field on Ω, recall that we have the vector field on O : ¡ ¢ (13.7) F# X(y) = DF −1 (p) X(p), p = F (y). 90 If we define a pairing between 1-forms and vector fields on Ω by (13.8) hX, αi = X bj (x)aj (x) = b · a, j a simple calculation gives (13.9) hF# X, F ∗ αi = hX, αi ◦ F. Thus, a 1-form on Ω is characterized at each point p ∈ Ω as a linear transformation of vectors at p to R. More generally, we can regard a k-form α on Ω as a k-multilinear map on vector fields: (13.10) α(X1 , . . . , Xk ) ∈ C ∞ (Ω); we impose the further condition of anti-symmetry: (13.11) α(X1 , . . . , Xj , . . . , X` , . . . , Xk ) = −α(X1 , . . . , X` , . . . , Xj , . . . , Xk ). There is a special notation we use for k-forms. If 1 ≤ j1 < · · · < jk ≤ n, j = (j1 , . . . , jk ), we set (13.12) α= X aj (x) dxj1 ∧ · · · ∧ dxjk j where (13.13) aj (x) = α(Dj1 , . . . , Djk ), Dj = ∂/∂xj . More generally, we assign meaning to (13.12) summed over all k-indices (j1 , . . . , jk ), where we identify (13.14) dxj1 ∧ · · · ∧ dxjk = (sgn σ) dxjσ(1) ∧ · · · ∧ dxjσ(k) , σ being a permutation of {1, . . . , k}. If any jm = j` (m 6= `), then (12.14) vanishes. A common notation for the statement that α is a k-form on Ω is α ∈ Λk (Ω). (13.15) In particular, we can write a 2-form β as (13.16) β= X bjk (x) dxj ∧ dxk 91 and pick coefficients satisfying bjk (x) P P= −bkj (x). According to (13.12)-(13.13), if we set U = uj (x)∂/∂xj and V = vj (x)∂/∂xj , then (13.17) β(U, V ) = 2 X bjk (x)uj (x)v k (x). P If bjk is not required to be antisymmetric, one gets β(U, V ) = (bjk − bkj )uj v k . If F : O → Ω is a smooth map as above, we define the pull-back F ∗ α of a k-form α, given by (13.12), to be F ∗α = (13.18) X ¡ ¢ aj F (y) (F ∗ dxj1 ) ∧ · · · ∧ (F ∗ dxjk ) j where F ∗ dxj = (13.19) X (∂Fj /∂y` )dy` , ` the algebraic computation in (13.18) being performed using the rule (13.14). Extending (13.9), if F is a diffeomorphism, we have (F ∗ α)(F# X1 , . . . , F# Xk ) = α(X1 , . . . , Xk ) ◦ F. (13.20) If B = (bjk ) is an n × n matrix, then, by (13.14), ³X ´ ´ ³X ´ ³X bnk dxk b2k dxk ∧ · · · ∧ b1k dxk ∧ (13.21) k k k = ³X ´ (sgn σ)b1σ(1) b2σ(2) · · · bnσ(n) dx1 ∧ · · · ∧ dxn σ ¡ ¢ = det B dx1 ∧ · · · ∧ dxn , Hence, if F : O → Ω is a C 1 map between two domains of dimension n, and α = A(x)dx1 ∧ · · · ∧ dxn is an n-form on Ω, then (13.22) F ∗ α = det DF (y) A(F (y)) dy1 ∧ · · · ∧ dyn . Comparison with the change of variable R formula for multiple integrals suggests that one has an intrinsic definition of α when α is an n-form on Ω, n = dim Ω Ω. To implement this, we need to take into account that det DF (y) rather than | det DF (y)| appears in (13.21). We say a smooth map F : O → Ω between two open subsets of Rn preserves orientation if det DF (y) is everywhere positive. The object called an ‘orientation’ on Ω can be identified as an equivalence class of nowhere vanishing n-forms on Ω, two such forms being equivalent if one is a multiple of 92 another by a positive function in C ∞ (Ω); the standard orientation on Rn is determined by dx1 ∧ · · · ∧ dxn . If S is an n-dimensional surface in Rn+k , an orientation on S can also be specified by a nowhere vanishing form ω ∈ Λn (S). If such a form exists, S is said to be orientable. The equivalence class of positive multiples a(x)ω is said to consist of ‘positive’ forms. A smooth map ψ : S → M between oriented ndimensional surfaces preserves orientation provided ψ ∗ σ is positive on S whenever σ ∈ Λn (M ) is positive. If S is oriented, one can choose coordinate charts which are all orientation preserving. We mention that there exist surfaces which cannot be oriented. If O, Ω are open in Rn and F : O → Ω is an orientation preserving diffeomorphism, we have Z Z ∗ (13.23) F α = α. O Ω More generally, if S is an n-dimensional surface with an orientation, say the image of an open set O ⊂ Rn by ϕ : O → S, carrying the natural orientation of O, we can set Z Z (13.24) α = ϕ∗ α S O for an n-form α on S. If it takes several coordinate patches to cover S, define R α S by writing α as a sum of forms, each supported R on one patch. We need to show that this definition of α is independent of the choice of S coordinate system on S (as long as the orientation of S is respected). Thus, suppose ϕ : O → U ⊂ S and ψ : Ω → U ⊂ S are both coordinate patches, so that F = ψ −1 ◦ ϕ : O → Ω is an orientation-preserving diffeomorphism, as in Fig.12.1 of the last section. We need to check that, if α is an n-form on S, supported on U, then Z Z ∗ (13.25) ϕ α = ψ ∗ α. O Ω To see this, first note that, for any form α of any degree, (13.26) ψ ◦ F = ϕ =⇒ ϕ∗ α = F ∗ ψ ∗ α. P It suffices to check this for α = dxj . Then (13.14) gives ψ ∗ dxj = (∂ψj /∂x` )dx` , so X ∂F` ∂ψj X ∂ϕj F ∗ ψ ∗ dxj = dxm , ϕ∗ dxj = dxm ; ∂xm ∂x` ∂xm m `,m 93 but the identity of these forms follows from the chain rule: Dϕ = (Dψ)(DF ) =⇒ X ∂ψj ∂F` ∂ϕj = . ∂xm ∂x` ∂xm ` Now that we have (13.26), we see that the left side of (13.25) is equal to Z F ∗ (ψ ∗ α), O which is equal to the right side of (13.25), by (13.23). Thus the integral of an n-form over an oriented n-dimensional surface is well defined. Exercises 1. If F : U0 → U1 and G : U1 → U2 are smooth maps and α ∈ Λk (U2 ), then (13.26) implies (G ◦ F )∗ α = F ∗ (G∗ α) in Λk (U0 ). In the special case that Uj = Rn and F and G are linear maps, and k = n, show that this identity implies det(GF ) = (det F )(det G). 2. Let Λk Rn denote ¡n¢the space of k-forms (13.12) with constant coefficients. Show k n that dimR Λ R = k . If T : Rm → Rn is linear, then T ∗ preserves this class of spaces; we denote the map Λk T ∗ : Λk Rn −→ Λk Rm . Similarly, replacing T by T ∗ yields Λk T : Λk Rm −→ Λk Rn . 3. Show that Λk T is uniquely characterized as a linear map from Λk Rm to Λk Rn which satisfies (Λk T )(v1 ∧ · · · ∧ vk ) = (T v1 ) ∧ · · · ∧ (T vk ), vj ∈ Rm . 94 4. If {e1 , . . . , en } is the standard orthonormal basis of Rn , define an inner product on Λk Rn by declaring an orthonormal basis to be {ej1 ∧ · · · ∧ ejk : 1 ≤ j1 < · · · < jk ≤ n}. Show that, if {u1 , . . . , un } is any other orthonormal basis of Rn , then the set {uj1 ∧ · · · ∧ ujk : 1 ≤ j1 < · · · < jk ≤ n} is an orthonormal basis of Λk Rn . 5. Let ϕ : O → Rn be smooth, with O ⊂ Rm open. Show that, for each x ∈ O, kΛm Dϕ(x)ωk2 = Det Dϕ(x)t Dϕ(x), where ω = e1 ∧ · · · ∧ em . Take another look at problem 4 of §12. 95 14. Products and exterior derivatives of forms Having discussed the notion of a differential form as something to be integrated, we now consider some operations on forms. There is a wedge product, or exterior product, characterized as follows. If α ∈ Λk (Ω) has the form (13.12) and if X (14.1) β= bi (x)dxi1 ∧ · · · ∧ dxi` ∈ Λ` (Ω), i define (14.2) α∧β = X aj (x)bi (x)dxj1 ∧ · · · ∧ dxjk ∧ dxi1 ∧ · · · ∧ dxi` j,i in Λk+` (Ω). A special case of this arose in (13.18)-(13.21). We retain the equivalence (13.14). It follows easily that (14.3) α ∧ β = (−1)k` β ∧ α. In addition, there is an interior product if α ∈ Λk (Ω) with a vector field X on Ω, producing ιX α = αcX ∈ Λk−1 (Ω), defined by (14.4) (αcX)(X1 , . . . , Xk−1 ) = α(X, X1 , . . . , Xk−1 ). Consequently, if α = dxj1 ∧ · · · ∧ dxjk , Di = ∂/∂xi , then (14.5) c j ∧ · · · ∧ dxj αcDj` = (−1)`−1 dxj1 ∧ · · · ∧ dx ` k c j denotes removing the factor dxj . Furthermore, where dx ` ` i∈ / {j1 , . . . , jk } =⇒ αcDi = 0. If F : O → Ω is a diffeomorphism and α, β are forms and X a vector field on Ω, it is readily verified that (14.6) F ∗ (α ∧ β) = (F ∗ α) ∧ (F ∗ β), F ∗ (αcX) = (F ∗ α)c(F# X). We make use of the operators ∧k and ιk on forms: (14.7) ∧k α = dxk ∧ α, ιk α = αcDk . There is the following useful anticommutation relation: (14.8) ∧k ι` + ι` ∧k = δk` , 96 where δk` is 1 if k = `, 0 otherwise. This is a fairly straightforward consequence of (14.5). We also have (14.9) ∧j ∧k + ∧k ∧j = 0, ιj ιk + ιk ιj = 0. From (14.8)-(14.9) one says that the operators {ιj , ∧j : 1 ≤ j ≤ n} generate a ‘Clifford algebra.’ Another important operator on forms is the exterior derivative: d : Λk (Ω) −→ Λk+1 (Ω), (14.10) defined as follows. If α ∈ Λk (Ω) is given by (13.12), then X (14.11) dα = (∂aj /∂x` )dx` ∧ dxj1 ∧ · · · ∧ dxjk . j,` Equivalently, (14.12) dα = n X ∂` ∧` α `=1 where ∂` = ∂/∂x` and ∧` is given by (14.7). The antisymmetry dxm ∧ dx` = −dx` ∧ dxm , together with the identity ∂ 2 aj /∂x` ∂xm = ∂ 2 aj /∂xm ∂x` , implies (14.13) d(dα) = 0, for any differential form α. We also have a product rule: (14.14) d(α ∧ β) = (dα) ∧ β + (−1)k α ∧ (dβ), α ∈ Λk (Ω), β ∈ Λj (Ω). The exterior derivative has the following important property under pull-backs: F ∗ (dα) = dF ∗ α, (14.15) if α ∈ Λk (Ω) and F : O → Ω is a smooth map. To see this, extending (14.14) to a formula for d(α ∧ β1 ∧ · · · ∧ β` ) and using this to apply d to F ∗ α, we have dF ∗ α = (14.16) X ∂ ¡ ¢ ¡ ¢ ¡ ¢ aj ◦ F (x) dx` ∧ F ∗ dxj1 ∧ · · · ∧ F ∗ dxjk ∂x` j,` X ¡ ¢¡ ¢ ¡ ¢ ¡ ¢ + (±)aj F (x) F ∗ dxj1 ∧ · · · ∧ d F ∗ dxjν ∧ · · · ∧ F ∗ dxjk . j,ν Now ¡ ¢ X ∂ 2 Fi dxj ∧ dx` = 0, d F ∗ dxi = ∂xj ∂x` j,` 97 so only the first sum in (14.16) contributes to dF ∗ α. Meanwhile, (14.17) F ∗ dα = X ∂aj ¡ ¢ ¡ ¢ ¡ ¢ F (x) (F ∗ dxm ) ∧ F ∗ dxj1 ∧ · · · ∧ F ∗ dxjk , ∂xm j,m so (14.15) follows from the identity (14.18) X ∂aj ¡ X ∂ ¡ ¢ ¢ aj ◦ F (x) dx` = F (x) F ∗ dxm , ∂x` ∂xm m ` which in turn follows from the chain rule. If dα = 0, we say α is closed; if α = dβ for some β ∈ Λk−1 (Ω), we say α is exact. Formula (14.13) implies that every exact form is closed. The converse is not always true globally. Consider the multi-valued angular coordinate θ on R2 \ (0, 0); dθ is a single valued closed form on R2 \ (0, 0) which is not globally exact. As we will see below, every closed form is locally exact. First we introduce another important construction. If α ∈ Λk (Ω) and X is a t , the Lie derivative LX α is defined to be vector field on Ω, generating a flow FX (14.19) ¡ t ¢∗ α|t=0 . LX α = (d/dt) FX Note the formal similarity to the definition (8.2) of LX Y for a vector field Y. Recall the formula (8.4) for LX Y. The following is not only a computationally convenient formula for LX α, but also an identity of fundamental importance. Proposition 14.1. We have (14.20) LX α = d(αcX) + (dα)cX. Proof. First we compare both sides in the special case X = ∂/∂x` = D` . Note that X ¡ t ¢∗ aj (x + te` )dxj1 ∧ · · · ∧ dxjk , FD` α = j so (14.21) LD` α = X ∂aj j ∂x` dxj1 ∧ · · · ∧ dxjk = ∂` α. To evaluate the right side of (14.20) with X = D` , use (14.12) to write this quantity as (14.22) n X ¡ ¢ d(ι` α) + ι` dα = ∂j ∧j ι` + ι` ∂j ∧j α. j=1 98 Using the commutativity of ∂j with ∧j and with ι` , and the anticommutation relations (14.8), we see that the right side of (14.22) is ∂` α, which coincides with (14.21). Thus the proposition holds for X = ∂/∂x` . Now we can prove the proposition in general, for a smooth vector field X on Ω. It is to be verified at each point x0 ∈ Ω. If X(x0 ) 6= 0, choose a coordinate system about x0 so X = ∂/∂x1 and use the calculation above. This shows that the desired identity holds on the set of points {x0 ∈ Ω : X(x0 ) 6= 0}, and by continuity it holds on the closure of this set. However, if x0 ∈ Ω has a neighborhood on which X vanishes, it is clear that LX α = 0 near x0 and also αcX and dαcX vanish near x0 . This completes the proof. The identity (14.20) can furnish a formula for the exterior derivative in terms of Lie brackets, as follows. By (8.4) and (13.20) we have, for a k-form ω, X ¡ ¢ (14.23) LX ω (X1 , . . . , Xk ) = X ·ω(X1 , . . . , Xk )− ω(X1 , . . . , [X, Xj ], . . . , Xk ). j Now (14.20) can be rewritten as (14.24) ιX dω = LX ω − dιX ω. This implies (14.25) ¡ ¢ ¡ ¢ (dω)(X0 , X1 , . . . , Xk ) = LX0 ω (X1 , . . . , Xk ) − dιX0 ω (X1 , . . . , Xk ). We can substitute (14.23) into the first term on the right in (14.25). In case ω is a 1-form, the last term is easily evaluated; we get (14.26) (dω)(X0 , X1 ) = X0 · ω(X1 ) − X1 · ω(X0 ) − ω([X0 , X1 ]). More generally, we can tackle the last term on the right side of (14.25) by the same method, using (14.24) with ω replaced by the (k − 1)-form ιX0 ω. In this way we inductively obtain the formula (14.27) k X b` , . . . , Xk ) (dω)(X0 , . . . , Xk ) = (−1)` X` · ω(X0 , . . . , X `=0 + X b` , . . . , X bj , . . . , Xk ). (−1)j+` ω([X` , Xj ], X0 , . . . , X 0≤`<j≤k s+t s t Note that from (14.19) and the property FX = FX FX it easily follows that ¡ t ¢∗ ¡ t ¢∗ ¡ t ¢∗ (14.28) (d/dt) FX α = LX FX α = FX LX α. It is useful to generalize this. Let Ft be any smooth family of diffeomorphisms from M into M. Define vector fields Xt on Ft (M ) by (14.29) (d/dt)Ft (x) = Xt (Ft (x)). 99 Then it easily follows that, for α ∈ Λk M, (d/dt)Ft∗ α = Ft∗ LXt α ¤ £ = Ft∗ d(αcXt ) + (dα)cXt . (14.30) In particular, if α is closed, then, if Ft are diffeomorphisms for 0 ≤ t ≤ 1, Z F1∗ α (14.31) − F0∗ α = dβ, 1 β= 0 Ft∗ (αcXt ) dt. Using this, we can prove the celebrated Poincare Lemma. Theorem 14.2. If B is the unit ball in Rn , centered at 0, α ∈ Λk (B), k > 0, and dα = 0, then α = dβ for some β ∈ Λk−1 (B). Proof. Consider the family of maps Ft : B → B given by Ft (x) = tx. For 0 < t ≤ 1 these are diffeomorphisms, and the formula (14.30) applies. Note that F1∗ α = α, F0∗ α = 0. Now a simple limiting argument shows (14.31) remains valid, so α = dβ with Z (14.32) 1 β= 0 Ft∗ (αcV )t−1 dt P where V = r∂/∂r = xj ∂/∂xj . Since F0∗ = 0, the apparent singularity in the integrand is removable. Since in the proof of the theorem we dealt with Ft such that F0 was not a diffeomorphism, we are motivated to generalize (14.31) to the case where Ft : M → N is a smooth family of maps, not necessarily diffeomorphisms. Then (14.29) does not work to define Xt as a vector field, but we do have (14.33) (d/dt)Ft (x) = Z(t, x); Z(t, x) ∈ TFt (x) N. Now in (14.31) we see that ¡ ¢¡ ¢ F ∗ (αcXt )(Y1 , . . . , Yk−1 ) = α Ft (x) Xt , DFt (x)Y1 , . . . , DFt (x)Yk−1 , and we can replace Xt by Z(t, x). Hence, in this more general case, if α is closed, we can write Z 1 ∗ ∗ (14.34) F1 α − F0 α = dβ; β = γt dt 0 where, at x ∈ M, (14.35) ¡ ¢¡ ¢ γt (Y1 , . . . , Yk−1 ) = α Ft (x) Z(t, x), DFt (x)Y1 , . . . , DFt (x)Yk−1 . 100 For an alternative approach to this homotopy invariance, see exercise 4. Exercise 1. If α is a k-form, verify the formula (14.14), i.e., d(α∧β) = (dα)∧β +(−1)k α∧dβ. If α is closed and β is exact, show that α ∧ β is exact. 2. Let F be a vector field on U, open in R3 , F = 1-form ϕ = 3 P 1 3 P 1 fj (x)∂/∂xj . Consider the fj (x)dxj . Show that dϕ and curl F are related in the following way: curl F = 3 X gj (x) ∂/∂xj , 1 dϕ = g1 (x)dx2 ∧ dx3 + g2 (x)dx3 ∧ dx1 + g3 (x)dx1 ∧ dx2 . 3. If F and ϕ are related as in exercise 3, show that curl F is uniquely specified by the relation dϕ ∧ α = hcurl F, αiω for all 1-forms α on U ⊂ R3 , where ω = dx1 ∧ dx2 ∧ dx3 is the volume form. 4. Let B be a ball in R3 , F a smooth vector field on B. Show that ∃ u ∈ C ∞ (B) s.t. F = grad u ⇐⇒ curl F = 0. Hint. Compare F = grad u with ϕ = du. 5. Let B be a ball in R3 and G a smooth vector field on B. Show that ∃ vector field F s.t. G = curl F ⇐⇒ div G = 0. Hint. If G = 3 P 1 gj (x)dxj , consider ψ = g1 (x)dx2 ∧dx3 +g2 (x)dx3 ∧dx1 +g3 (x)dx1 ∧ dx2 . Compute dψ. 101 6. Suppose f0 , f1 : X → Y are smoothly homotopic maps, via Φ : X × R → Y, Φ(x, j) = fj (x). Let α ∈ Λk Y. Apply (13.51) to α ˜ = Φ∗ α ∈ Λk (X × R), with ˜ and from Ft (x, s) = (x, s + t), to obtain β˜ ∈ Λk−1 (X × R) such that F1∗ α ˜−α ˜ = dβ, k−1 ∗ ∗ there produce β ∈ Λ (X) such that f1 α − f0 α = dβ. ∗˜ Hint. Use β = ι β, where ι(x) = (x, 0). For the next set of exercises, let Ω be a planar domain, X = f (x, y)∂/∂x + g(x, y)∂/∂y a nonvanishing vector field on Ω. Consider the 1-form α = g(x, y)dx − f (x, y)dy. 7. Let γ : I → Ω be a smooth curve, I = (a, b). Show that the image C = γ(I) is the image of an integral curve of X if and only if γ ∗ α = 0. Consequently, with slight abuse of notation, one describes the integral curves by gdx − f dy = 0. If α is exact, i.e., α = du, conclude the level curves of u are the integral curves of X. 8. A function ϕ is called an integrating factor if α ˜ = ϕα is exact, i.e., if d(ϕα) = 0, provided Ω is simply connected. Show that an integrating factor always exists, at least locally. Show that ϕ = ev is an integrating factor if and only if Xv = − div X. Reconsider problem 7 in §7. Find an integrating factor for α = (x2 + y 2 − 1)dx − 2xydy. 9. Let Y be a vector field which you know how to linearize (i.e., conjugate to ∂/∂x) and suppose LY α = 0. Show how to construct an integrating factor for α. Treat the more general case LX α = cα for some constant c. Compare the discussion in §8 of the situation where [X, Y ] = cX. 102 15. The general Stokes formula A basic result in the theory of differential forms is the generalized Stokes formula: Proposition 15.1. Given a compactly supported (k − 1)-form β of class C 1 on an oriented k-dimensional surface M (of class C 2 ) with boundary ∂M, with its natural orientation, Z Z β (15.1) dβ = M ∂M The orientation induced on ∂M is uniquely determined by the following requirement. If M = Rk− = {x ∈ Rk : x1 ≤ 0} (15.2) then ∂M = {(x2 , . . . , xk )} has the orientation determined by dx2 ∧ · · · ∧ dxk . Proof. Using a partition of unity and invariance of the integral and the exterior derivative under coordinate transformations, it suffices to prove this when M has the form (15.2). In that case, we will be able to deduce (15.1) from the fundamental theorem of calculus. Indeed, if (15.3) c j ∧ · · · ∧ dxk , β = bj (x) dx1 ∧ · · · ∧ dx with bj (x) of bounded support, we have (15.4) dβ = (−1)j−1 (∂bj /∂xj ) dx1 ∧ · · · ∧ dxk . If j > 1, we have Z (15.5) Z nZ ∞ dβ = −∞ M o ∂bj dxj dx0 = 0, ∂xj and also κ∗ β = 0, where κ : ∂M → M is the inclusion. On the other hand, for j = 1, we have Z Z nZ 0 o ∂b1 dβ = dx1 dx2 · · · dxk −∞ ∂x1 M Z (15.6) = b1 (0, x0 )dx0 Z = β. ∂M 103 This proves Stokes’ formula (15.1). It is useful to allow singularities in ∂M. We say a point p ∈ M is a corner of dimension ν if there is a neighborhood U of p in M and a C 2 diffeomorphism of U onto a neighborhood of 0 in K = {x ∈ Rk : xj ≤ 0, for 1 ≤ j ≤ k − ν}, (15.7) where k is the dimension of M. If M is a C 2 surface and every point p ∈ ∂M is a corner (of some dimension), we say M is a C 2 surface wih corners. In such a case, ∂M is a locally finite union of C 2 surfaces with corners. The following result extends Proposition 15.1. Proposition 15.2. If M is a C 2 surface of dimension k, with corners, and β is a compactly supported (k − 1)-form of class C 1 on M , then (15.1) holds. Proof. It suffices to establish this when β is supported on a small neighborhood of a corner p ∈ ∂M, of the form U described above. Hence it suffices to show that (15.3) holds whenever β is a (k − 1)-form of class C 1 , with compact support on K in (15.7); and we can take β to have the form (15.3). Then, for j > k − ν, (15.5) still holds, while, for j ≤ k − ν, we have, as in (15.6), Z dβ = (−1) j−1 0 −∞ K (15.8) Z nZ o ∂bj c j · · · dxk dxj dx1 · · · dx ∂xj Z = (−1)j−1 Z = β. c j · · · dxk bj (x1 , . . . , xj−1 , 0, xj+1 , . . . , xk ) dx1 · · · dx ∂K The reason we required M to be a surface of class C 2 (with corners) in Propositions 15.1 and 15.2 is the following. Due to the formulas (13.18)-(13.19) for a pull-back, if β is of class C j and F is of class C ` , then F ∗ β is generally of class C µ , with µ = min(j, ` − 1). Thus, if j = ` = 1, F ∗ β might be only of class C 0 , so there is not a well-defined notion of a differential form of class C 1 on a C 1 surface, though such a notion is well defined on a C 2 surface. This problem can be overcome, and one can extend Propositions 15.1-15.2 to the case where M is a C 1 surface (with corners), and β is a (k −1)-form with the property that both β and dβ are continuous. We will not go into the details. Substantially more sophisticated generalizations are given in [Fed]. Exercises 104 1. Consider the region Ω ⊂ R2 defined by Ω = {(x, y) : 0 ≤ y ≤ x2 , 0 ≤ x ≤ 1}. Show that the boundary points (1, 0) and (1, 1) are corners, but (0, 0, ) is not a corner. The boundary of Ω is too sharp at (0, 0) to be a corner; it is called a ‘cusp.’ Extend Proposition 15.2. to treat this region. 2. Suppose U ⊂ Rn is an open set with smooth boundary M = ∂U, and U has the standard orientation, determined by dx1 ∧ · · · ∧ dxn . (See the paragraph above (13.23).) Let ϕ ∈ C 1 (Rn ) satisfy ϕ(x) < 0 for x ∈ U, ϕ(x) > 0 for x ∈ Rn \ U , and grad ϕ(x) 6= 0 for x ∈ ∂U, so grad ϕ points out of U. Show that the natural oriemtation on ∂U, as defined after Proposition 15.1, is the same as the following. The equivalence class of forms β ∈ Λn−1 (∂U ) defining the orientation on ∂U satisfies the property that dϕ ∧ β is a positive multiple of dx1 ∧ · · · ∧ dxn , on ∂U. 3. Suppose U = {x ∈ Rn : xn < 0}. Show that the orientation on ∂U described above is that of (−1)n−1 dx1 ∧ · · · ∧ dxn−1 . 105 16. The classical Gauss, Green, and Stokes formulas The case of (15.1) where S = Ω is a region in R2 with smooth boundary yields the classical Green Theorem. In this case, we have ³ ∂g ∂f ´ (16.1) β = f dx + gdy, dβ = − dx ∧ dy, ∂x ∂y and hence (15.1) becomes the following Proposition 16.1. If Ω is a region in R2 with smooth boundary, and f and g are smooth functions on Ω, which vanish outside some compact set in Ω, then ZZ ³ Z ∂g ∂f ´ (16.2) − dx dy = (f dx + gdy). ∂x ∂y Ω ∂Ω Note that, if we have a vector field X = X1 ∂/∂x + X2 ∂/∂y on Ω, then the integrand on the left side of (16.2) is ∂X1 ∂X2 + = div X, ∂x ∂y (16.3) provided g = X1 , f = −X2 . We obtain ZZ Z ¡ (16.4) div X dx dy = −X2 dx + X1 dy). Ω ∂Ω ¡ ¢ If ∂Ω is parametrized by arc-length, as γ(s) = x(s), y(s) , with orientation as defined for Proposition 15.1,¢ then the unit normal ν, to ∂Ω, pointing out of Ω, is ¡ given by ν(s) = y(s), ˙ −x(s) ˙ , and (16.4) is equivalent to ZZ Z (16.5) div X dx dy = hX, νi ds. Ω ∂Ω This is a special case of Gauss’ Divergence Theorem. We now derive a more general form of the Divergence Theorem. We begin with a definition of the divergence of a vector field on a surface M. Let M be a region in Rn , or an n-dimensional surface in Rn+k , provided with a volume form ω ∈ Λn M. Let X be a vector field on M. Then the divergence of X, denoted div X, is a function on M which measures the rate of change of the volume form under the flow generated by X. Thus it is defined by (16.6) LX ω = (div X)ω. 106 Here, LX denotes the Lie derivative. In view of the general formula LX α = dαcX + d(αcX), derived in (14.20), since dω = 0 for any n-form ω on M, we have (16.7) (div X)ω = d(ωcX). If M = Rn , with the standard volume element (16.8) ω = dx1 ∧ · · · ∧ dxn , and if (16.9) X= X X j (x)∂/∂xj , then (16.10) ωcX = n X dj ∧ · · · ∧ dxn . (−1)j−1 X j (x)dx1 ∧ · · · ∧ dx j=1 Hence, in this case, (16.7) yields the familiar formula (16.11) div X = n X ∂j X j , j=1 where we use the notation (16.12) ∂j f = ∂f /∂xj . Suppose now that M is endowed with a metric tensor gjk (x). Then M carries a natural volume element ω, determined by the condition that, if one has a coordinate system in which gjk (p0 ) = δjk , then ω(p0 ) = dx1 ∧· · ·∧dxn . This condition produces the following formula, in any oriented coordinate system: (16.13) ω= √ g dx1 ∧ · · · ∧ dxn , g = det(gjk ), as we have seen in (12.7). We now compute div X when the volume element on M is given by (16.13). We have X √ dj ∧ · · · ∧ dxn (16.14) ωcX = (−1)j−1 X j g dx1 ∧ · · · ∧ dx j and hence (16.15) √ d(ωcX) = ∂j ( gX j ) dx1 ∧ · · · ∧ dxn . 107 Here, as below, we use the summation convention. Hence the formula (16.7) gives 1 1 div X = g − 2 ∂j (g 2 X j ). (16.16) We now derive the Divergence Theorem, as a consequence of Stokes’ formula, which we recall is Z Z (16.17) dα = α, M ∂M for an (n − 1)-form on M , assumed to be a smooth compact oriented surface with boundary. If α = ωcX, formula (16.7) gives Z (16.18) Z (div X)ω = M ωcX. ∂M This is one form of the Divergence Theorem. We will produce an alternative expression for the integrand on the right before stating the result formally. Given that ω is the volume form for M determined by a Riemannian metric, we can write the interior product ωcX in terms of the volume element ω∂ on ∂M, with its induced Riemannian metric, as follows. Pick coordinates on M, centered at p0 ∈ ∂M, such that ∂M is tangent to the hyperplane {xn = 0} at p0 = 0, and such that gjk (p0 ) = δjk . Then it is clear that, at p0 , j ∗ (ωcX) = hX, νiω∂ (16.19) where ν is the unit vector normal to ∂M, pointing out of M and j : ∂M ,→ M the natural inclusion. The two sides of (16.19), which are both defined in a coordinate independent fashion, are hence equal on ∂M, and the identity (16.18) becomes Z (16.20) Z (div X)ω = M hX, νiω∂ . ∂M Finally, we adopt the following common notation: we denote the volume element on M by dV and that on ∂M by dS, obtaining the Divergence Theorem: Theorem 16.2. If M is a compact surface with boundary, X a smooth vector field on M , then Z Z (16.21) (div X) dV = hX, νi dS, M ∂M where ν is the unit outward-pointing normal to ∂M. 108 The only point left to mention here is that M need not be orientable. Indeed, we can treat the integrals in (16.21) as surface integrals, as in §12, and note that all objects in (16.21) are independent of a choice of orientation. To prove the general case, just use a partition of unity supported on orientable pieces. The definition of the divergence of a vector field given by (16.6), in terms of how the flow generated by the vector field magnifies or diminishes volumes, is a good geometrical characterization, explaining the use of the term ‘divergence.’ We obtain some further integral identities. First, we apply (16.21) with X replaced by uX. We have the following ‘derivation’ identity: (16.22) div uX = u div X + hdu, Xi = u div X + Xu, which follows easily from the formula (16.16). The Divergence Theorem immediately gives Z (16.23) Z (div X)u dV + M Z Xu dV = M hX, νiu dS. ∂M Replacing u by uv and using the derivation identity X(uv) = (Xu)v + u(Xv), we have Z Z Z £ ¤ (16.24) (Xu)v + u(Xv) dV = − (div X)uv dV + hX, νiuv dS. M M ∂M It is very useful to apply (16.23) to a gradient vector field X. If v is a smooth function on M, grad v is a vector field satisfying (16.25) hgrad v, Y i = hY, dvi, for any vector field Y, where the brackets on the left are given by the metric tensor on M and those on the right by the natural pairing of vector fields and 1-forms. Hence grad v = X has components X j = g jk ∂k v, where (g jk ) is the matrix inverse of (gjk ). Applying div to grad v, we have the Laplace operator: (16.26) ¡ ¢ 1 1 ∆v = div grad v = g − 2 ∂j g jk g 2 ∂k v . When M is a region in Rn and we use the standard Euclidean metric, so div X is given by (16.11), we have the Laplace operator on Euclidean space: (16.27) ∆v = ∂2v ∂2v + · · · + . ∂x21 ∂x2n 109 Now, setting X = grad v in (16.23), we have Xu = hgrad u, grad vi, and hX, νi = hν, grad vi, which we call the normal derivative of v, and denote ∂v/∂ν. Hence Z Z Z ∂v (16.28) u(∆v) dV = − hgrad u, grad vi dV + u dS. ∂ν M M ∂M If we interchange the roles of u and v and subtract, we have Z Z Z h ∂v ∂u i (16.29) u(∆v) dV = (∆u)v dV + u − v dS. ∂ν ∂ν M M M Formulas (16.28)-(16.29) are also called Green formulas. We will make further use of them in §17. We return to the Green formula (16.2), and give it another formulation. Consider a vector field Z = (f, g, h) on a region in R3 containing the planar surface U = {(x, y, 0) : (x, y) ∈ Ω}. If we form i j k (16.30) curl Z = Det ∂x ∂y ∂z f g h we see that the integrand on the left side of (16.2) is the k-component of curl Z, so (16.2) can be written ZZ Z (16.31) (curl Z) · k dA = (Z · T ) ds, U ∂U where T is the unit tangent vector to ∂U. To see how to extend this result, note that k is a unit normal field to the planar surface U. To formulate and prove the extension of (16.31) to any compact oriented surface with boundary in R3 , we use the relation between curl and exterior derivative discussed in exercises 2-3 of §14. In particular, if we set 3 X ∂ F = fj (x) , ∂x j j=1 (16.32) then curl F = 3 P 1 (16.33) ϕ= 3 X fj (x) dxj , j=1 gj (x) ∂/∂xj where dϕ = g1 (x)dx2 ∧ dx3 + g2 (x)dx3 ∧ dx1 + g3 (x)dx1 ∧ dx2 . Now Suppose M is a smooth oriented (n − 1)-dimensional surface with boundary in Rn . Using the orientation of M, we pick a unit normal field N to M as follows. 110 Take a smooth function v which vanishes on M but such that ∇v(x) 6= 0 on M. Thus ∇v is normal to M. Let σ ∈ Λn−1 (M ) define the orientation of M. Then dv ∧ σ = a(x) dx1 ∧ · · · ∧ dxn , where a(x) is nonvanishing on M. For x ∈ M, we take N (x) = ∇v(x)/|∇v(x)| if a(x) > 0, and N (x) = −∇v(x)/|∇v(x)| if a(x) < 0. We call N the ‘upward’ unit normal field to the oriented surface M, in this case. Part of the motivation for this characterization of N is that, if Ω ⊂ Rn is an open set with smooth boundary M = ∂Ω, and we give M the induced orientation, as described in §15, then the upward normal field N just defined coincides with the unit normal field pointing out of Ω. Compare exercises 2-3 of §15. Now, if G = (g1 , . . . , gn ) is a vector field defined near M, then Z (16.34) (N · G) dS = M Z ³X n M ´ c j · · · ∧ dxn . (−1)j−1 gj (x) dx1 ∧ · · · dx j=1 This result follows from (16.19). When n = 3 and G = curl F, we deduce from (16.32)-(16.33) that ZZ ZZ (16.35) dϕ = (N · curl F ) dS. M M Furthermore, in this case we have Z Z (16.36) ϕ= (F · T ) ds, ∂M ∂M where T is the unit tangent vector to ∂M, specied as follows by the orientation of ∂M ; if τ ∈ Λ1 (∂M ) defines the orientation of ∂M, then hT, τ i > 0 on ∂M. We call T the ‘forward’ unit tangent vector field to the oriented curve ∂M. By the calculations above, we have the classical Stokes formula: Proposition 16.3. If M is a compact oriented surface with boundary in R3 , and F is a C 1 vector field on a neighborhood of M , then ZZ Z (16.37) (N · curl F ) dS = (F · T ) ds, M ∂M where N is the upward unit normal field on M and T the forward unit tangent field to ∂M. Exercises 111 1. Given a Hamiltonian vector field Hf = n h X ∂f ∂ ∂f ∂ i − , ∂ξ ∂x ∂x ∂ξ j j j j j=1 calculate div Hf directly from (16.11). 2. Show that the identity (16.21) for div (uX) follows from (16.7) and du ∧ (ωcX) = (Xu)ω. Prove this identity, for any n-form ω on M n . What happens if ω is replaced by a k-form, k < n? 3. Relate problem 2 to the calculations (16.38) LuX α = uLX α + du ∧ (ιX α) and (16.39) du ∧ (ιX α) = −ιX (du ∧ α) + (Xu)α, valid for any k-form α. The last identity follows from (14.8). 4. Show that div [X, Y ] = X(div Y ) − Y (div X). 5. Show that, if F : R3 → R3 is a linear rotation, then, for a C 1 vector field Z on R3 , (16.40) F# (curl Z) = curl(F# Z). 6. Let M be the graph in R3 of a smooth function, z = u(x, y), (x, y) ∈ O ⊂ R2 , a bounded region with smooth boundary (maybe with corners). Show that (16.41) Z Z h³ Z ∂F3 ∂F2 ´³ ∂u ´ ³ ∂F1 ∂F3 ´³ ∂u ´ (curl F · N ) dS = − − + − − ∂y ∂z ∂x ∂z ∂x ∂y M O + ³ ∂F 2 ∂x − ∂F1 ´i dx dy, ∂y 112 ¡ ¢ where ∂Fj /∂x and ∂Fj /∂y are evaluated at x, y, u(x, y) . Show that Z Z (16.42) (F · T ) ds = ∂M ¡ ¢ ¡ ¢ e e ∂u Fe1 + Fe3 ∂u ∂x dx + F2 + F3 ∂y dy, ∂O ¡ ¢ where Fej (x, y) = Fj x, y, u(x, y) . Apply Green’s Theorem, with f = Fe1 +Fe3 ∂u ∂x , g = ∂u e e F2 + F3 ∂y , to show that the right sides of (16.41) and (16.42) are equal, hence proving Stokes’ Theorem in this case. 7. Let M ⊂ Rn be the graph of a function xn = u(x0 ), x0 = (x1 , . . . , xn−1 ). If n X β= c j ∧ · · · ∧ dxn , (−1)j−1 gj (x) dx1 ∧ · · · ∧ dx j=1 ¡ ¢ as in (16.34), and ϕ(x0 ) = x0 , u(x0 ) , show that ¡ ¢ ∂u ¡ ¢ gj x0 , u(x0 ) ϕ∗ β = (−1)n − gn x0 , u(x0 ) dx1 ∧ · · · ∧ dxn−1 ∂xj j=1 n−1 X = (−1)n−1 G · (−∇u, 1) dx1 ∧ · · · ∧ dxn−1 , where G = (g1 , . . . , gn ), and verify the identity (16.34) in this case. Hint. For the last part, recall problems 2-3 of §15, regarding the orientation of M. 8. Let S be a smooth oriented 2-dimensional surface in R3 , and M an open subset of S, with smooth boundary; see Fig.16.1. Let N be the upward unit normal field to S, defined by its orientation. For x ∈ ∂M, let ν(x) be the unit vector, tangent to M, normal to ∂M, and pointing out of M, and let T be the forward unit tangent vector field to ∂M. Show that, on ∂M, N × ν = T, ν × T = N. 113 17. Holomorphic functions and harmonic functions Let f be a complex-valued C 1 function on a region Ω ⊂ R2 . We identify R2 with C, via z = x + iy, and write f (z) = f (x, y). We say f is holomorphic on Ω if it satisfies the Cauchy-Riemann equation: ∂f 1 ∂f = . ∂x i ∂y (17.1) Note that f (z) = z k has this property, for any k ∈ Z, via an easy calculation of the partial derivatives of (x + iy)k . Thus z k is holomorphic on C if k ∈ Z+ , and on C \ {0}, if k ∈ Z− . Another example is the exponential function, ez = ex eiy . On the other hand, x2 + y 2 is not holomorphic. The following is often useful for producing more holomorphic functions: Lemma 17.1. If f and g are holomorphic on Ω, so is f g. Proof. We have (17.2) ∂ ∂f ∂g (f g) = g+f , ∂x ∂x ∂x ∂ ∂f ∂g (f g) = g+f , ∂y ∂y ∂y so if f and g satisfy the Cauchy-Riemann equation, so does f g. R R We next apply Green’s theorem to the line integral f dz = f (dx + idy). ∂Ω ∂Ω Clearly (16.2) applies to complex-valued functions, and if we set g = if, we get Z (17.3) ZZ ³ ∂f ∂f ´ f dz = i − dx dy. ∂x ∂y Ω ∂Ω Whenever f is holomorphic, the integrand on the right side of (17.3) vanishes, so we have the following result, known as Cauchy’s Integral Theorem: Theorem 17.2. If f ∈ C 1 (Ω) is holomorphic, then Z (17.4) f (z) dz = 0. ∂Ω Until further notice, we assume Ω is a bounded region in C, with smooth boundary. Using (17.4), we can establish Cauchy’s Integral Formula: 114 Theorem 17.3. If f ∈ C 1 (Ω) is holomorphic and z0 ∈ Ω, then 1 f (z0 ) = 2πi (17.5) Z ∂Ω f (z) dz. z − z0 Proof. Note that g(z) = f (z)/(z − z0 ) is holomorphic on Ω \ {z0 }. Let Dr be the disk of radius r centered at z0 . Pick r so small that Dr ⊂ Ω. Then (17.4) implies Z (17.6) ∂Ω f (z) dz = z − z0 Z ∂Dr f (z) dz. z − z0 To evaluate the integral on the right, parametrize the curve ∂Dr by γ(θ) = z0 +reiθ . Hence dz = ireiθ dθ, so the integral on the right is equal to Z 2π (17.7) 0 f (z0 + reiθ ) ireiθ dθ = i iθ re Z 2π f (z0 + reiθ ) dθ. 0 As r → 0, this tends in the limit to 2πif (z0 ), so (17.5) is established. Suppose f ∈ C 1 (Ω) is holomorphic, z0 ∈ Dr ⊂ Ω, where Dr is the disk of radius r centered at z0 , and suppose z ∈ Dr . Then Theorem 17.3 implies 1 f (z) = 2πi (17.8) Z ∂Ω f (ζ) dζ. (ζ − z0 ) − (z − z0 ) We have the infinite series expansion (17.9) ∞ 1 1 X ³ z − z0 ´n = , (ζ − z0 ) − (z − z0 ) ζ − z0 n=0 ζ − z0 valid as long as |z − z0 | < |ζ − z0 |. Hence, given |z − z0 | < r, this series is uniformly convergent for ζ ∈ ∂Ω, and we have ∞ Z 1 X f (ζ) ³ z − z0 ´n f (z) = dζ. 2πi n=0 ζ − z0 ζ − z0 (17.10) ∂Ω Thus, for z ∈ Dr , f (z) has the convergent power series expansion (17.11) f (z) = ∞ X n=0 n an (z − z0 ) , 1 an = 2πi Z ∂Ω f (ζ) dζ. (ζ − z0 )n+1 115 Note that, when (17.5) is applied to Ω = Dr , the disk of radius r centered at z0 , the computation (17.7) yields (17.12) 1 f (z0 ) = 2π Z 2π 0 1 f (z0 + re ) dθ = `(∂Dr ) Z iθ f (z) ds(z), ∂Dr when f is holomorphic and C 1 on Dr , and `(∂Dr ) = 2πr is the length of the circle ∂Dr . This is a mean value property, which extends to harmonic functions on domains in Rn , as we will see below. Note that we can write (17.1) as (∂x + i∂y )f = 0; applying the operator ∂x − i∂y to this gives ∂2f ∂2f + 2 =0 ∂x2 ∂y (17.13) for any holomorphic function. A general C 2 solution to (17.13) on a region Ω ⊂ R2 is called a harmonic function. More generally, if O is an open set in Rn , a function f ∈ C 2 (Ω) is called harmonic if ∆f = 0 on O, where, as in (16.27), (17.14) ∆f = ∂2f ∂2f + · · · + . ∂x21 ∂x2n Generalizing (17.12), we have the following, known as the mean value property of harmonic functions: Proposition 17.4. Let Ω ⊂ Rn be open, u ∈ C 2 (Ω) be harmonic, p ∈ Ω, and BR (p) = {x ∈ Ω : |x − p| ≤ R} ⊂ Ω. Then Z 1 ¢ (17.15) u(p) = ¡ u(x) dS(x). A ∂BR (p) ∂BR (p) For the proof, set (17.16) 1 ψ(r) = A(S n−1 ) Z u(p + rω) dS(ω), S n−1 for 0 < r ≤ R. We have ψ(R) equal to the right side of (17.15), while clearly ψ(r) → u(p) as r → 0. Now Z Z 1 1 ∂u 0 ¢ (17.17) ψ (r) = ω·∇u(p+rω) dS(ω) = ¡ dS(x). n−1 A(S ) ∂ν A ∂Br (p) S n−1 At this point, we establish: ∂Br (p) 116 Lemma 17.5. If O ⊂ Rn is a bounded domain with smooth boundary and u ∈ C 2 (O) is harmonic in O, then Z ∂u (17.18) (x) dS(x) = 0. ∂ν ∂O Proof. Apply the Green formula (16.29), with M = O and v = 1. If ∆u = 0, every integrand in (16.29) vanishes, except the one appearing in (17.18), so this integrates to zero. It follows from this lemma that (17.17) vanishes, so ψ(r) is constant. This completes the proof of (17.15). We can integrate the identity (17.15), to obtain Z 1 ¢ (17.19) u(p) = ¡ u(x) dV (x), V BR (p) BR (p) ¡ ¢ where u ∈ C 2 BR (p) is harmonic. This is another expression of the mean value property. The mean value property of harmonic functions has a number of important consequences. Here we mention one result, known as Liouville’s Theorem. Proposition 17.5. If u ∈ C 2 (Rn ) is harmonic on all of Rn and bounded, then u is constant. Proof. Pick any two points p, q ∈ Rn . We have, for any r > 0, Z Z 1 ¢ u(x) dx . (17.20) u(p) − u(q) = ¡ u(x) dx − V Br (0) ¡ ¢ Br (p) Br (q) Note that V Br (0) = Cn rn , where Cn is evaluated in problem 2 of §12. Thus Z Cn (17.21) |u(p) − u(q)| ≤ n |u(x)| dx, r ∆(p,q,r) where (17.22) ³ ´ ³ ´ ∆(p, q, r) = Br (p)4Br (q) = Br (p) \ Br (q) ∪ Br (q) \ Br (p) . Note that, if a = |p − q|, then ∆(p, q, r) ⊂ Br+a (p) \ Br−a (p); hence ¡ ¢ (17.23) V ∆(p, q, r) ≤ C(p, q) rn−1 , r ≥ 1. It follows that, if |u(x)| ≤ M for all x ∈ Rn , then (17.24) |u(p) − u(q)| ≤ M Cn C(p, q) r−1 , ∀ r ≥ 1. Taking r → ∞, we obtain u(p) − u(q) = 0, so u is constant. We will now use Liouville’s Theorem to prove the Fundamental Theorem of Algebra: 117 Theorem 17.6. If p(z) = an z n + an−1 z n−1 + · · · + a1 z + a0 is a polynomial of degree n ≥ 1 (an 6= 0), then p(z) must vanish somewhere in C. Proof. Consider (17.25) f (z) = 1 . p(z) If p(z) does not vanish anywhere in C, then f (z) is holomorphic on all of C. On the other hand, (17.26) f (z) = 1 1 , z n an + an−1 z −1 + · · · + a0 z −n so (17.27) |f (z)| −→ 0, as |z| → ∞. Thus f is bounded on C, if p(z) has no roots. By Proposition 17.5, f (z) must be constant, which is impossible, so p(z) must have a complex root. From the fact that every holomorphic function f on O ⊂ R2 is harmonic, it follows that its real and imaginary parts are harmonic. This result has a converse. Let u ∈ C 2 (O) be harmonic. Consider the 1-form (17.28) α=− ∂u ∂u dx + dy. ∂y ∂x We have dα = −(∆u)dx ∧ dy, so α is closed if and only if u is harmonic. Now, if O is diffeomorphic to a disk, Theorem 14.2 implies that α is exact on O, whenever it is closed, so, in such a case, (17.29) ∆u = 0 on O =⇒ ∃ v ∈ C 1 (O) s.t. α = dv. In other words, (17.30) ∂u ∂v = , ∂x ∂y ∂u ∂v =− . ∂y ∂x This is precisely the Cauchy-Riemann equation (17.1) for f = u + iv, so we have: Proposition 17.7. If O ⊂ R2 is diffeomorphic to a disk and u ∈ C 2 (O) is harmonic, then u is the real part of a holomorphic function on O. The function v (which is unique up to an additive constant) is called the harmonic conjugate of u. 118 We close this section with a brief mention of holomorphic functions on a domain O ⊂ Cn . We say f ∈ C 1 (O) is holomorphic provided it satisfies ∂f 1 ∂f = , ∂xj i ∂yj (17.31) 1 ≤ j ≤ n. Suppose z ∈ O, z = (z1 , . . . , zn ). Suppose ζ ∈ O whenever |z − ζ| < r. Then, by successively applying Cauchy’s integral formula (17.5) to each complex variable zj , we have that Z Z −n (17.32) f (z) = (2πi) · · · f (ζ)(ζ1 − z1 ) · · · (ζn − zn )−1 dζ1 · · · dζn , γn γ1 where γj is any √ simple counterclockwise curve about zj in C with the property that |ζj − zj | < r/ n for all ζj ∈ γj . Consequently, if p ∈ Cn and O contains the ‘polydisc’ D = {z ∈ Cn : |z − p| ≤ √ nδ}, then, for z ∈ D, the interior of D, we have (17.33) Z Z £ ¤−1 −n f (z) = (2πi) · · · f (ζ) (ζ1 − p1 ) − (z1 − p1 ) ··· Cn C1 £ ¤−1 (ζn − pn ) − (zn − p2 ) dζ1 · · · dζn , where C1 = {ζ ∈ C : |ζ − pj | = δ}. Then, parallel to (17.8)-(17.11), we have X (17.34) f (z) = cα (z − p)α , α≥0 for z ∈ D, where α = (α1 , . . . , αn ) is a multi-index, (z − p)α = (z1 − p1 )α1 · · · (zn − pn )αn , as in (1.13), and Z Z −n (17.35) cα = (2πi) · · · f (ζ)(ζ1 − p1 )−α1 −1 · · · (ζn − pn )−αn −1 dζ1 · · · dζn . Cn C1 Thus holomorphic functions on open domains in Cn have convergent power series expansions. We refer to [Ahl] and [Hil] for more material on holomorphic functions of one complex variable, and to [Kr] for material on holomorphic functions of several complex variables. A source of much information on harmonic functions is [Kel]. Also further material on these subjects can be found in [T1]. Exercises 119 1. Look again at problem 4 in §1. 2. Look again at problems 3-5 in §2. 3. Look again at problems 1-2 in §6. 4. Let O, Ω be open in C. If f is holomorphic on O, with range in Ω, and g is holomorphic on Ω, show that h = g ◦ f is holomorphic on O. Hint. Chain rule. 5. If f (x) = ϕ(|x|) on Rn , show that ∆f (x) = ϕ00 (|x|) + n−1 0 ϕ (|x|). |x| In particular, show that |x|−(n−2) is harmonic on Rn \ 0, if n ≥ 3, and log |x| is harmonic on R2 \ 0. If O, Ω are open in Rn , a smooth map ϕ : O → Ω is said to be conformal provided the matrix function G(x) = Dϕ(x)t Dϕ(x) is a multiple of the identity, G(x) = γ(x)I. Recall formula (12.2). 6. Suppose n = 2 and ϕ preserves orientation. Show that ϕ (pictured as a function ϕ : O → C) is conformal if and only if it is holomorphic. If ϕ reverses orientation, ϕ is conformal ⇔ ϕ is holomorphic (we say ϕ is anti-holomorphic). 7. If O and Ω are open in R2 and u is harmonic on Ω, show that u ◦ ϕ is harmonic on O, whenever ϕ : O → Ω is a smooth conformal map. Hint. Use problem 4. 8. Let ez be defined by ez = ex eiy , z = x + iy, x, y ∈ R, 120 where ex and eiy are defined by (3.20), (3.25). Show that ez is holomorphic in z ∈ C, and ∞ X zk z e = . k! k=0 9. For z ∈ C, set cos z = 1 2 ¡ iz ¢ e + e−iz , sin z = 1 2i ¡ iz ¢ e − e−iz . Show that these functions agree with the definitions of cos t and sin t given in (3.25)-(3.26), for z = t ∈ R. Show that cos z and sin z are holomorphic in z ∈ C. 121 18. Variational problems; the stationary action principle The calculus of variations consists of the study of stationary points (e.g., maxima and minima) of a real valued function which itself is defined on some space of functions. Here, we let M be a region in Rn , or more generally an n-dimensional surface in Rn+k , fix two points p, q ∈ M, and an interval [a, b] ⊂ R, and consider a space of functions P consisting of smooth curves u : [a, b] → M satisfying u(a) = p, u(b) = q. We consider functions I : P → R of the form Z (18.1) b I(u) = ¡ ¢ F u(t), u(t) ˙ dt. a Here F (x, v) is a smooth function on the tangent bundle T M, or perhaps on some open subset of T M. By definition, the condition for I to be stationary at u is that (18.2) (d/ds)I(us )|s=0 = 0 for any smooth family us of elements of P with u0 = u. Note that (18.3) (d/ds)us (t)|s=0 = w(t) defines a tangent vector to M at u(t), and precisely those tangent vectors w(t) vanishing at t = a and at t = b arise from making some variation of u within P. We can compute the left side of (18.2) by differentiating under the integral, and obtaining a formula for this involves considering t-derivatives of w. We work in local coordinates on M. Since any smooth curve on M can be enclosed by a single coordinate patch, this involves no loss of generality. Then, given (18.3), we have Z (18.4) b£ ¤ Fx (u, u)w ˙ + Fv (u, u) ˙ w˙ dt. (d/ds)I(us )|s=0 = a Integrating the last term by parts and recalling that w(a) and w(b) vanish, we see that this is equal to Z (18.5) b£ ¤ Fx (u, u) ˙ − (d/dt)Fv (u, u) ˙ wdt. a It follows that the condition for u to be stationary is precisely that u satisfy the equation (18.6) (d/dt)Fv (u, u) ˙ − Fx (u, u) ˙ = 0, 122 a second order ODE, called Lagrange’s equation. Written more fully, it is (18.7) Fvv (u, u)¨ ˙ u + Fvx (u, u) ˙ u˙ − Fx (u, u) ˙ = 0, where Fvv is the n × n matrix of second order v-derivatives of F (x, v), acting on the vector u ¨, etc. This is a nonsingular system as long as F (x, v) satisfies the condition (18.8) Fvv (x, v) is invertible, as an n × n matrix, for each (x, v) = (u(t), u(t)), ˙ t ∈ [a, b]. The ODE (18.6) suggests a particularly important role for (18.9) ξ = Fv (x, v). Then, for (x, v) = (u, u), ˙ we have ξ˙ = Fx (x, v), (18.10) x˙ = v. We claim this system, in (x, ξ) coordinates, is in Hamiltonian form. Note that (x, ξ) gives a local coordinate system under the hypothesis (18.8), by the Inverse Function Theorem. In other words, we will produce a function E(x, ξ) such that (18.10) is the same as (18.11) x˙ = Eξ , ξ˙ = −Ex , so the goal is to construct E(x, ξ) such that (18.12) Ex (x, ξ) = −Fx (x, v), Eξ (x, ξ) = v, when v = v(x, ξ) is defined by inverting the transformation (18.13) (x, ξ) = (x, Fv (x, v)) = λ(x, v). If we set E b (x, v) = E(λ(x, v)), (18.14) then (18.12) is equivalent to (18.15) Exb (x, v) = −Fx + vFvx , Evb (x, v) = v Fvv , as follows from the chain rule. This calculation is most easily performed using differential forms. In the differential form notation, our task is to find E b (x, v) such that (18.16) dE b = (−Fx + vFvx )dx + vFvv dv. 123 It can be seen by inspection that this identity is satisfied by (18.17) E b (x, v) = Fv (x, v)v − F (x, v). Thus the ODE (18.7) describing a stationary point for (18.1) has been converted to a first order Hamiltonian system, in the (x, ξ) coordinates, given the hypothesis (18.8) on Fvv . In view of (18.13), one often writes (18.17) informally as E(x, ξ) = ξ · v − F (x, v). We make some observations about the transformation λ of (18.13). If v ∈ Tx M, then Fv (x, v) acts naturally as a linear functional on Tx M. In other words, ξ = Fv (x, v) is naturally regarded as an element of Tx∗ M, in the cotangent bundle of M ; it makes invariant sense to regard (18.18) λ : T M −→ T ∗ M (if F is defined on all of T M ). This map is called the Legendre transformation. As we have already noted, the hypothesis (18.8) is equivalent to the statement that λ is a local diffeomorphism. As an example, suppose M has a metric tensor g, and F (x, v) = 1 g(v, v). 2 Then the map (18.18) is the identification of T M and T ∗ M associated with ‘lowering indices,’ using the metric tensor gjk . A straightforward calculation gives E(x, ξ) equal to half the natural square norm p on cotangent vectors, in this case. On the other hand, the function F (x, v) = g(v, v) fails to satisfy hypothesis (18.8). Since this is the integrand for arc length, it is important to incorporate this case into our analysis. This case involves parametrizing a curve by arc length. We will look at the following more general situation. We say Fp (x, v) is homogeneous of degree r in v if F (x, cv) = cr F (x, v) for c > 0. Thus g(v, v) above is homogeneous of degree 1. When F is homogeneous of degree 1, hypothesis (18.8) is never satisfied. Furthermore, I(u) is independent of the parametrization of a curve in this case; if σ : [a, b] → [a, b] is a diffeomorphism (fixing a and b) then I(u) = I(˜ u) for u ˜(t) = u(σ(t)). Let us look at a function f (x, v) related to F (x, v) by (18.19) f (x, v) = ψ(F (x, v)), F (x, v) = ϕ(f (x, v)). Given a family us of curves as before, we can write Z bh ¡ ¢ d ϕ0 f (u, u) ˙ fx (u, u) ˙ I(us )|s=0 = ds a (18.20) ¢ ªi d © 0¡ − w dt. ϕ f (u, u) ˙ fv (u, u) ˙ dt 124 If u satisfies the condition (18.21) f (u, u) ˙ = c, with c constant, this is equal to Z (18.22) b£ ¤ fx (u, u) ˙ − (d/dt)fv (u, u) ˙ w dt 0 c a with c0 = ϕ0 (c). Of course, setting Z (18.23) J(u) = b f (u, u)dt, ˙ a we have Z (18.24) (d/ds)J(us )|s=0 = b£ ¤ fx (u, u) ˙ − (d/dt)fv (u, u) ˙ w dt. a Consequently, if u satisfies (18.21), then u is stationary for I if and only if u is stationary for J (provided ϕ0 (c) 6= 0). It is possiblepthat f (x, v) satisfies (18.8) even though F (x, v) does not, as the case F (x, v) = g(v, v) illustrates. Note that fvj vk = ψ 0 (F )Fvj vk + ψ 00 (F )Fvj Fvk . Let us specialize to the case ψ(F ) = F 2 , so f (x, v) = F (x, v)2 is homogeneous of degree 2. If F is convex in v and (Fvj vk ), a positive semidefinite matrix, annihilates only radial vectors, and if F > 0, then f (x, v) is strictly convex, i.e., fvv isp positive definite, and hence (18.8) holds for f (x, v). This is the case when F (x, v) = g(v, v) is the arc length integrand. If f (x, v) = F (x, v)2 satisfies (18.8), then the stationary condition for (18.23) is that u satisfy the ODE fvv (u, u)¨ ˙ u + fvx (u, u) ˙ u˙ − fx (u, u) ˙ = 0, a nonsingular ODE for which we know there is a unique local solution, with u(a) = p, u(a) ˙ given. We will be able to say such a solution is also stationary for (18.1) once we know that (18.21) holds, i.e., f (u, u) ˙ is constant. Indeed, if f (x, v) is homogeneous of degree two, then fv (x, v)v = 2f (x, v), and hence (18.25) eb (x, v) = fv (x, v)v − f (x, v) = f (x, v). But since the equations for u take Hamiltonian form in the coordinates (x, ξ) = (x, fv (x, v)), it follows that eb (u(t), u(t)) ˙ is constant for u stationary, so (18.21) does hold in this case. 125 In particular, the equations for a geodesic, i.e., for a critical point for the length functional Z bp `(u) = g(u, ˙ u) ˙ dt, a with constant speed, is the same as the equation for a critical point for the functional Z b E(u) = g(u, ˙ u) ˙ dt. a We will write out the ODE explicitly below; see (18.32)-(18.34). There is a general principle, known as the stationary action principle, or Hamilton’s principle, for producing equations of mathematical physics. In this set up, the state of a physical system at a given time is described by a pair (x, v), position and velocity. One has a kinetic energy function T (x, v) and a potential energy function V (x, v), determining the dynamics, as follows. Form the difference (18.26) L(x, v) = T (x, v) − V (x, v), known as the Lagrangian. Hamilton’s principle states that a path u(t) describing the evolution of the state in this system is a stationary path for the action integral Z (18.27) I(u) = b L(u, u)dt. ˙ a In many important cases, the potential V = V (x) is velocity independent and T (x, v) is a quadratic form in v; say T (x, v) = 12 v · G(x)v for a symmetric matrix G(x). In that case, we consider (18.28) L(x, v) = 1 v · G(x)v − V (x). 2 Thus we have (18.29) ξ = Lv (x, v) = G(x)v, and the conserved quantity (18.17) becomes E b (x, v) = v · G(x)v − (18.30) = £1 ¤ v · G(x)v − V (x) 2 1 v · G(x)v + V (x), 2 which is the total energy T (x, v) + V (x). Note that the nondegeneracy condition is that G(x) be invertible; assuming this, we have (18.31) E(x, ξ) = 1 ξ · G(x)−1 ξ + V (x), 2 126 whose Hamiltonian vector field defines the dynamics. Note that, in this case, Lagrange’s equation (18.6) takes the form (18.32) £ ¤ 1 (d/dt) G(u)u˙ = u˙ · Gx (u)u˙ − Vx (u), 2 which can be rewritten as (18.33) u ¨ + Γu˙ u˙ + G(u)−1 Vx (u) = 0, where Γu˙ u˙ is a vector whose `th component is h ∂g ∂gjm ∂gjk i km (18.34) Γ`jk u˙ j u˙ k , Γ`jk = 12 g `m + − . ∂xj ∂xk ∂xm ¡ ¢ Here, g `m is the matrix G(x)−1 , and we use the ‘summation convention’, i.e., we sum over m. The equation (18.33) contains as a special case the geodesic equation for the Riemannian metric (gjk ) = G(x), which is what would arise in the case V = 0. We refer to [Ar], [Go] for a discussion of the relation of Hamilton’s principle to other formulations of the laws of Newtonian mechanics, but illustrate it briefly with a couple of examples here. Consider the basic case of motion of a particle in Euclidean space Rn , in the presence of a force field of potential type F (x) = − grad V (x), as in the beginning of §9. Then (18.35) T (x, v) = 1 m|v|2 , 2 V (x, v) = V (x). This is of course the special case of (18.28) with G(x) = mI, and the ODE satisfied by stationary paths for (18.27) hence has the form (18.36) m¨ u + Vx (u) = 0, precisely the equation (10.2) expressing Newton’s law F = ma. Next we consider one example where Cartesian coordinates are not used, namely the motion of a pendulum. See Fig.18.1 at the end of this section. We suppose a mass m is at the end of a (massless) rod of length `, swinging under the influence of gravity. In this case, we can express the potential energy as (18.37) V (θ) = −mg` cos θ, where θ is the angle the rod makes with the downward vertical ray, and g denotes ˙ the strength of gravity. The speed of the mass at the end of the pendulum is `|θ|, so the kinetic energy is (18.38) ˙ = 1 m`2 |θ| ˙ 2. T (θ, θ) 2 127 In this case we see that Hamilton’s principle leads to the ODE `θ¨ + g sin θ = 0, (18.39) describing the motion of a pendulum. Exercises 1. Suppose that, more generally than (18.28), we have a Lagrangian of the form L(x, v) = 12 v · G(x)v + A(x) · v − V (x). Show that (18.30) continues to hold, i.e., E b (x, v) = 12 v · G(x)v + V (x), and that the Hamiltonian function becomes, in place of (18.31), E(x, ξ) = 1 2 ¡ ¢ ¡ ¢ ξ − A(x) · G(x)−1 ξ − A(x) + V (x). Work out the modification to (18.33) when the extra term A(x) · v is included. 2. Work out the differential equations for a planar double pendulum, in the spirit of (18.36)-(18.38). See Fig.18.2. Hint. To compute kinetic and potential energy, think of the plane as the complex plane, with the real axis pointing down. The position of particle 1 is `1 eiθ1 and that of particle 2 is `1 eiθ1 + `2 eiθ2 . 3. The statement before (18.4), that any smooth curve u(s) on M can be enclosed by a single coordinate patch, is not strictly accurate, as the curve may have selfintersections. Give a more precise statement. 4. Show that, if one uses polar coordinates, x = r cos θ, y = r sin θ, the Euclidean metric tensor ds2 = dx2 + dy 2 becomes dr2 + r2 dθ2 . Use this together with (18.28)(18.31) to explain the calculation (10.11), in §10. 128 19. The symplectic form; canonical transformations Recall from §9 that a Hamiltonian vector field on a region Ω ⊂ R2n , with coordinates ζ = (x, ξ), is a vector field of the form n h X ∂f ∂ ∂f ∂ i (19.1) Hf = − . ∂ξj ∂xj ∂xj ∂ξj j=1 We want to gain an understanding of Hamiltonian vector fields, free from coordinates. In particular, we ask the following question. Let F : O → Ω be a diffeomorphism, Hf a Hamiltonian vector field on Ω. Under what condition on F is F# Hf a Hamiltonian vector field on O? A central object in this study is the symplectic form, a 2-form on R2n defined by n X (19.2) σ= dξj ∧ dxj . j=1 Note that, if X£ ¤ U= uj (ζ)∂/∂xj + aj (ζ)∂/∂ξj , V = X£ ¤ v j (ζ)∂/∂xj + bj (ζ)∂/∂ξj , then (19.3) n X £ j ¤ σ(U, V ) = −u (ζ)bj (ζ) + aj (ζ)v j (ζ) . j=1 In particular, σ satisfies the following nondegeneracy condition: if U has the property that, for some (x0 , ξ0 ) ∈ R2n , σ(U, V ) = 0 at (x0 , ξ0 ) for all vector fields V, then U must vanish at (x0 , ξ0 ). The relation between the symplectic form and Hamiltonian vector fields is the following. Proposition 19.1. The vector field Hf is uniquely determined by the identity (19.4) σcHf = −df. Proof. The content of the identity is (19.5) σ(Hf , V ) = −V f for any smooth vector field V. If V has the form used in (14.3), then that identity gives n X £ ¤ σ(Hf , V ) = − (∂f /∂ξj )bj (ζ) + (∂f /∂xj )v j (ζ) , j=1 which coincides with the right side of (19.5). In view of the nondegeneracy of σ, the proposition is proved. Note the special case (19.6) σ(Hf , Hg ) = {f, g}. The following is an immediate corollary. 129 Proposition 19.2. If O, Ω are open in R2n , and F : O → Ω is a diffeomorphism preserving σ, i.e., satisfying F ∗ σ = σ, (19.7) then for any f ∈ C ∞ (Ω), F# Hf is Hamiltonian on Ω, and (19.8) F# Hf = HF ∗ f , where F ∗ f (y) = f (F (y)). A diffeomorphism satisfying (19.7) is called a canonical transformation, or a symplectic transformation. Let us now look at the condition on a vector field X on t Ω that the flow FX generated by X preserve σ for each t. There is a simple general condition in terms of the Lie derivative for a given form to be preserved. ¡ t ¢∗ Lemma 19.3. Let α ∈ Λk (Ω). Then FX α = α for all t if and only if LX α = 0. Proof. This is an immediate consequence of (14.28). Recall the formula (14.20): (19.9) LX α = d(αcX) + (dα)cX. We apply it in the case α = σ is the symplectic form. Clearly (19.2) implies (19.10) dσ = 0, so (19.11) LX σ = d(σcX). t Consequently, FX preserves the symplectic form σ if and only if d(σcX) = 0 on Ω. In view of Poincare’s Lemma, at least locally, one has a smooth function f (x, ξ) such that (19.12) σcX = df, provided d(σcX) = 0. Any two f ’s satisfying (19.12) must differ by a constant, and it follows that such f exists globally provided Ω is simply connected. In view of Proposition 19.1, (19.12) is equivalent to the identity (19.13) In particular, we have established: X = −Hf . 130 Proposition 19.4. The flow generated by a Hamiltonian vector field Hf preserves the symplectic form σ. It follows a fortiori that the flow F t generated by a Hamiltonian vector field Hf leaves invariant the 2n-form v = σ ∧ ··· ∧ σ (n factors), which provides a volume form on Ω. That this volume form is preserved is known as a theorem of Liouville. This result has the following refinement. Let S be a level surface of the function f ; suppose f is non-degenerate on S. Then we can define a (2n − 1)-form w on S (giving rise to a volume element on S) which is also invariant under the flow F t , as follows. Let X be any vector field on Ω such that Xf = 1 on S, and define w = j ∗ (vcX), (19.14) where j : S ,→ Ω is the natural inclusion. We claim this is well defined. Lemma 19.5. The form (19.14) is independent of the choice of X, as long as Xf = 1 on S. Proof. The difference of two such forms is j ∗ (vcY1 ), where Y1 f = 0 on S, i.e., Y1 is tangent to S. Now this form, acting on vectors Y2 , . . . , Y2n , all tangent to S, is merely (j ∗ v)(Y1 , . . . , Y2n ); but obviously j ∗ v = 0, since dim S < 2n. We can now establish the invariance of the form w on S. Proposition 19.6. The form (19.14) is invariant under the flow F t on S. Proof. Since v is invariant under F t , we have t F t∗ w = j ∗ (F t∗ vcF# X) t = j ∗ (vcF# X) ¡ ¢ t = w + j ∗ vc(F# X − X) . t Since F t∗ f = f, we see that (F# X)f = 1 = Xf, so the last term vanishes, by Lemma 19.5, and the proof is complete. Let O ⊂ Rn be open; we claim the symplectic form σ is well defined on T ∗ O = O × Rn , in the following sense. Suppose g : O → Ω is a diffeomorphism, i.e., a coordinate change. The map this induces from T ∗ O to T ∗ Ω is (19.15) ¡ ¡ ¢−1 ¢ G(x, ξ) = g(x), (Dg)t (x)ξ = (y, η). Our invariance result is: (19.16) G∗ σ = σ. 131 In fact, a stronger result is true. We can write X (19.17) σ = dκ, κ = ξj dxj , j where the 1-form κ is called the contact form. We claim that G∗ κ = κ, (19.18) which implies (19.16), since G∗ dκ = dG∗ κ. To see (19.18), note that X X dyj = (∂gj /∂xk )dxk , ηj = Hj` ξ` , k ` ¡ ¢−1 where (Hj` ) is the matrix of (Dg)t , i.e., the inverse matrix of (∂g` /∂xj ). Hence X X ηj dyj = (∂gj /∂xk )Hj` ξ` dxk j (19.19) j,k,` = X δk` ξ` dxk k,` = X ξk dxk , k which establishes (19.18). As a particular case, a vector field Y on O, generating a flow FYt on O, induces a flow GYt on T ∗ O. Not only does this flow preserve the symplectic form; in fact GYt is generated by the Hamiltonian vector field HΦ , where X (19.20) Φ(x, ξ) = hY (x), ξi = ξj v j (x) j P if Y = v j (x)∂/∂xj . The symplectic form given by (19.2) can be regarded as a special case of a general symplectic form, which is a closed nondegenerate 2-form on a domain (or manifold) Ω. Often such a form ω arises naturally, not a priori looking like (19.2). It is a theorem of Darboux that locally one can pick coordinates in such a fashion that ω does take the standard form (19.2). We present a short proof, due to Moser, of that theorem. To start, pick p ∈ Ω, and consider B = ω(p), a nondegenerate, antisymmetric, bilinear form on the vector space V = Tp Ω. It is a simple exercise in linear algebra that if one has such a form, then dim V must be even, say 2n, and V has a basis {ej , fj : 1 ≤ j ≤ n} such that (19.21) B(ej , e` ) = B(fj , f` ) = 0, B(ej , f` ) = δj` for 1 ≤ j, ` ≤ n. Using such a basis to impose linear coordinates (x, ξ) on a neighP borhood of p, taken to the origin, we have ω = ω0 = dξj ∧ dxj at p. Thus Darboux’ Theorem follows from: 132 Proposition 19.7. If ω and ω0 are closed nondegenerate 2-forms on Ω, and ω = ω0 at p ∈ Ω, then there is a diffeomorphism G1 defined on a neighborhood of p, such that G1 (p) = p and G∗1 ω = ω0 . (19.22) Proof. For t ∈ [0, 1], let (19.23) ωt = (1 − t)ω0 + tω = ω0 + tα, α = ω − ω0 . Thus α = 0 at p, and α is a closed 2-form. We can hence write (19.24) α = dβ on a neighborhood of p, and if β is given by the formula (14.32) in the proof of the Poincare Lemma, we have β = 0 at p. Since, for each t, ωt = ω at p, we see that each ωt is nondegenerate on some common neighborhood of p, for t ∈ [0, 1]. Our strategy will be to produce a smooth family of local diffeomorphisms Gt , 0 ≤ t ≤ 1, such that Gt (p) = p, G0 = id., and such that G∗t ωt is independent of t, hence G∗t ωt = ω0 . Gt will be specified by a time-varying family of vector fields, via the ODE (19.25) (d/dt)Gt (x) = Xt (Gt (x)), G0 (x) = x. We will have Gt (p) = p provided Xt (p) = 0. To arrange that G∗t ωt is independent of t, note that, by the product rule, (19.26) (d/dt)G∗t ωt = G∗t LXt ωt + G∗t (dωt /dt). By (19.23), dωt /dt = α = dβ, and by Proposition 14.1, (19.27) LXt ωt = d(ωt cXt ) since ωt is closed. Thus we can write (19.26) as (19.28) (d/dt)G∗t ωt = G∗t d(ωt cXt + β). This vanishes provided Xt is defined to satisfy (19.29) ωt cXt = −β. Since ωt is nondegenerate near p, this does indeed uniquely specify a vector field Xt near p, for each t ∈ [0, 1], which vanishes at p, since β = 0 at p. The proof of Darboux’ Theorem is complete. 133 Exercises 1. Do the linear algebra exercise stated before Proposition 19.7, as a preparation for the proof of Darboux’ Theorem. 2. On R2 , identify (x, ξ) with (x, y), so the symplectic form is σ = dy ∧ dx. Show that X = f ∂/∂x + g∂/∂y and α = gdx − f dy are related by α = σcX. Reconsider problems 7-9 of §14 in light of this. 3. Show that the volume form w on the level surface S of f, given by (19.14), can be characterized as follows. Let Sh be the level set {f (x, ξ) = c+h}, S = S0 . Given any vector field X transversal to S, any open set O ⊂ S with smooth boundary, let eh be the thin set sandwiched between S and Sh , lying on orbits of X through O. O Then, with v = σ ∧ · · · ∧ σ the volume form on Ω, Z Z 1 w = lim v. h→0 h O eh O 134 20. Topological applications of differential forms Differential forms are a fundamental tool in calculus. In addition, they have important applications to topology. We give a few here, starting with simple proofs of some important topological results of Brouwer. Proposition 20.1. There is no continuous retraction ϕ : B → S n−1 of the closed unit ball B in Rn onto its boundary S n−1 . In fact, it is just as easy to prove the following more general result. The approach we use is adapted from [Kan]. Proposition 20.2. If M is a compact oriented surface with nonempty boundary ∂M, there is no continuous retraction ϕ : M → ∂M. Proof. A retraction ϕ satisfies ϕ ◦ j(x) = x where j : S n−1 ,→ B is the natural inclusion. By a simple approximation, if there were a continuous retraction there would be a smooth one, so we can suppose ϕ is smooth. Pick ω ∈ Λn−1 (∂M ) to be the R volume form on ∂M, endowed with some Riemannian metric (n = dim M ), so ω > 0. Now apply Stokes’ theorem to α = ϕ∗ ω. If ∂M ϕ is a retraction, j ∗ ϕ∗ ω = ω, so we have Z Z (20.1) ω = dϕ∗ ω. ∂M M But dϕ∗ ω = ϕ∗ dω = 0, so the integral (20.1) is zero. This is a contradiction, so there can be no retraction. A simple consequence of this is the famous Brouwer Fixed Point Theorem. Theorem 20.3. If F : B → B is a continuous map on the closed unit ball in Rn , then F has a fixed point. Proof. We are claiming that F (x) = x for some x ∈ B. If not, define ϕ(x) to be the endpoint of the ray from F (x) to x, continued until it hits ∂B = S n−1 . It is clear that ϕ would be a retraction, contradicting Proposition 20.1. We next show that an even dimensional sphere cannot have a smooth nonvanishing vector field. Proposition 20.4. There is no smooth nonvanishing vector field on S n if n = 2k is even. Proof. If X were such a vector field, we could arrange it to have unit length, so we would have X : S n → S n with X(v) ⊥ v for v ∈ S n ⊂ Rn+1 . Thus there is a unique unit speed geodesic γv from v to X(v), of length π/2. Define a smooth 135 family of maps Ft : S n → S n by Ft (v) = γv (t). Thus F0 (v) = v, Fπ/2 (v) = X(v), and Fπ = A would be the antipodal map, A(v) = −v. By (14.34), we deduce that A∗ ω − ω = dβ is exact, where ω is the volume form on S n . Hence, by Stokes’ theorem, Z Z ∗ (20.2) A ω= Sn ω. Sn On the other hand, it is straightforward that A∗ ω = (−1)n+1 ω, so (20.2) is possible only when n is odd. Note that an important ingredient in the proof of both Proposition 20.2 and Proposition 20.4 is the existence of n-forms on a compact oriented n-dimensional surface M which are not exact (though of course they are closed). We next establish the following important counterpoint to the Poincare lemma. Proposition 20.5. If M is a compact connected oriented surface of dimension n and α ∈ Λn M, then α = dβ for some β ∈ Λn−1 (M ) if and only if Z (20.3) α = 0. M We have already discussed the necessity of (20.3). To prove the sufficiency, we first look at the case M = S n . In that case, any n-form α is of the form a(x)ω, a ∈ C ∞ (S n ), ω the volume form on S n , with its standard metric. The group G = SO(n + 1) of rotations of Rn+1 acts as a transitive group of isometries on S n . In §12 we constructed the integral of functions over SO(n + 1), with respect to Haar measure. As noted in §12, we have the map Exp:Skew(n + 1) → SO(n + 1), giving a diffeomorphism from a ball O about 0 in Skew(n + 1) onto an open set U ⊂ SO(n + 1) = G, a neighborhood of the identity. Since G is compact, we can pick a finite number of elements ξj ∈ G such that the open sets Uj = {ξj g : g ∈ U } cover G. Pick ηj ∈ Skew(n + 1) such that Exp ηj = ξj . Define Φjt : Uj → G for 0 ≤ t ≤ 1 by (20.4) ¡ ¢ Φjt ξj Exp(A) = (Exp tηj )(Exp tA), A ∈ O. Now partition G into subsets Ωj , each of whose boundaries has content zero, such that Ωj ⊂ Uj . If g ∈ Ωj , set g(t) = Φjt (g). This family of elements of SO(n + 1) defines a family of maps Fgt : S n → S n . Now, as in (14.31) we have, Z (20.5) ∗ α = g α − dκg (α), 1 κg (α) = 0 ∗ Fgt (αcXg )dt, 136 for each g ∈ SO(n + 1), where Xgt is the family of vector fields on S n generated by Fgt , as in (14.29). Therefore, Z (20.6) Z ∗ α= g α dg − d G κg (α) dg. G Now the first term on the right is equal to αω, where α = in fact, the constant is (20.7) 1 α= Vol S n R a(g · x)dg is a constant; Z α. Sn Thus in this case (20.3) is precisely what serves to make (20.6) a representation of α as an exact form. This finishes the case M = S n . For a general compact, oriented, connected M, proceed as follows. Cover M with open sets O1 , . . . , OK such that each Oj is diffeomorphic to the closed unit ball in Rn . Set U1 = O1 , and inductively enlarge each Oj to Uj , so that U j is also diffeomorphic to the closed ball, and such that Uj+1 ∩ Uj 6= ∅, 1 ≤ j < K. You can do this by drawing a simple curve from Oj+1 to a point in Uj and thickening it. Pick a smooth partition of unity ϕj , subordinate to this cover. R Given α ∈ Λn M, satisfying (20.3), take α ˜ j = ϕj α. Most likely R α ˜ 1 = c1 6= 0, so take σ1 ∈ Λn M, with compact support in U1 ∩ U2 , such that σ1 = c1 . Set α1 = α ˜R1 − σ1 , and redefine α ˜ 2 to be the old α ˜ 2 plus σ1 . Make a similar construction using α ˜ 2 = c2 , and continue. When you are done, you have (20.8) α = α1 + · · · + αK , with αj compactly supported in Uj . By construction, Z (20.9) αj = 0 R for 1 ≤ j < K. But then (20.3) implies αK = 0 too. Now pick p ∈ S n and define smooth maps (20.10) ψj : M −→ S n which map Uj diffeomorphically onto S n \p, and map M \Uj to p. There is a unique vj ∈ Λn S n , with compact support in S n \ p, such that ψ ∗ vj = αj . Clearly Z vj = 0, Sn 137 so by the case M = S n of Proposition 20.5 already established, we know that vj = dwj for some wj ∈ Λn−1 S n , and then (20.11) βj = ψj∗ wj . αj = dβj , This concludes the proof. We can sharpen and extend some of the topological results given above, using the notion of the degree of a map between compact oriented surfaces. Let X and Y be compact oriented n-dimensional surfaces. We want to define the degree of a smooth map F : X → Y. To do this, assume Y is connected. We pick ω ∈ Λn Y such that Z (20.12) ω = 1. Y We want to define Z (20.13) F ∗ ω. Deg(F ) = X The following result shows that Deg(F ) is indeed well defined by this formula. The key argument is an application of Proposition 20.5. Lemma 20.6. The quantity (20.13) is independent of the choice of ω, as long as (20.12) holds. R R Proof. Pick ω1 ∈ Λn Y satisfying ω1 = 1, so (ω − ω1 ) = 0. By Proposition 20.5, Y this implies Y (20.14) ω − ω1 = dα, for some α ∈ Λn−1 Y. Thus Z Z ∗ (20.15) F ω− X Z ∗ dF ∗ α = 0, F ω1 = X X and the lemma is proved. The following is a most basic property. Proposition 20.7. If F0 and F1 are homotopic, then Deg(F0 ) = Deg(F1 ). Proof. As noted in problem 6 ofR §14, if F0 and F1 are homotopic, then F0∗ ω − F1∗ ω is exact, say dβ, and of course dβ = 0. X We next give an alternative formula for the degree of a map, which is very useful in many applications. A point y0 ∈ Y is called a regular value of F, provided that, for each x ∈ X satisfying F (x) = y0 , DF (x) : Tx X → Ty0 Y is an isomorphism. The easy case of Sard’s Theorem, discussed in Appendix B, implies that most points in Y are regular. Endow X with a volume element ωX , and similarly endow Y with ωY . If DF (x) is invertible, define JF (x) ∈ R \ 0 by F ∗ (ωY ) = JF (x)ωX . Clearly the sign of JF (x), i.e., sgn JF (x) = ±1, is independent of choices of ωX and ωY , as long as they determine the given orientations of X and Y. 138 Proposition 20.8. If y0 is a regular value of F, then (20.16) Deg(F ) = X {sgn JF (xj ) : F (xj ) = y0 }. Proof. Pick ω ∈ Λn Y, satisfying P (20.12), with support in a small neighborhood of ∗ y0 . ThenR F ω will be a sum ωj , with ωj supported in a small neighborhood of xj , and ωj = ±1 as sgn JF (xj ) = ±1. The following result is a powerful tool in degree theory. Proposition 20.9. Let M be a compact oriented surface with boundary, dim M = ¯ ¯ n + 1. Given a smooth map F : M → Y, let f = F ∂M : ∂M → Y. Then Deg(f ) = 0. Proof. Applying Stokes’ Theorem to α = F ∗ ω, we have Z Z ∗ dF ∗ ω. f ω= ∂M M But dF ∗ ω = F ∗ dω, and dω = 0 if dim Y = n, so we are done. An easy corollary of this is Brouwer’s no-retraction theorem. Compare the proof of Proposition 20.2. Corollary 20.10. If M is a compact oriented surface with nonempty boundary ∂M, then there is no smooth retraction ϕ : M → ∂M. Proof. Without loss of generality, we can assume M is connected. If there were a retraction, then ∂M = ϕ(M ) must also be connected, so Proposition 20.9 applies. ¯ But then we would have, for the map id. = ϕ¯∂M , the contradiction that its degree is both zero and 1. For another application of degree theory, let X be a compact smooth oriented hypersurface in Rn+1 , and set Ω = Rn+1 \ X. Given p ∈ Ω, define (20.17) Fp : X −→ S n , Fp (x) = x−p . |x − p| It is clear that Deg(Fp ) is constant on each connected component of Ω. It is also easy to see that, when p crosses X, Deg(Fp ) jumps by ±1. Thus Ω has at least two connected components. This is most of the smooth case of the Jordan-Brouwer separation theorem: 139 Theorem 20.11. If X is a smooth compact oriented hypersurface of Rn+1 , which is connected, then Ω = Rn+1 \ X has exactly 2 connected components. Proof. X being oriented, it has a smooth global normal vector field. Use this to separate a small collar neighborhood C of X into 2 pieces; C \ X = C0 ∪ C1 . The collar C is diffeomorphic to [−1, 1] × X, and each Cj is clearly connected. It suffices to show that any connected component O of Ω intersects either C0 or C1 . Take p ∈ ∂O. If p ∈ / X, then p ∈ Ω, which is open, so p cannot be a boundary point of any component of Ω. Thus ∂O ⊂ X, so O must intersect a Cj . This completes the proof. Let us note that, of the two components of Ω, exactly one is unbounded, say Ω0 , and the other is bounded; call it Ω1 . Then we claim (20.18) p ∈ Ωj =⇒ Deg(Fp ) = j. Indeed, for p very far from X, Fp : X → S n is not onto, so its degree is 0. And when p crosses X, from Ω0 to Ω1 , the degree jumps by +1. For a simple closed curve in R2 , this result is the smooth case of the Jordan curve theorem. That special case of the argument given above can be found in [Sto]. We remark that, with a bit more work, one can show that any compact smooth hypersurface in Rn+1 is orientable. For one proof, see appendix B in Chapter 5 of [T1]. The next application of degree theory is useful in the study of closed orbits of planar vector fields. Let C be a simple smooth closed curve in R2 , parametrized by arc-length, of total length L. Say C is given by x = γ(t), γ(t + L) = γ(t). Then we have a unit tangent field to C, T (γ(t)) = γ 0 (t), defining (20.19) T : C −→ S 1 . Proposition 20.12. For T given by (20.19), we have (20.20) Deg(T ) = 1. Proof. Pick a tangent line ` to C such that C lies on one side of `, as in Fig.20.1. Without changing Deg(T ), you can flatten out C a little, so it intersects ` along a line segment, from γ(L0 ) to γ(L) = γ(0), where we take L0 = L − 2ε, L1 = L − ε. Now T is close to the map Ts : C → S 1 , given by (20.21) ¡ ¢ γ(t + s) − γ(t) Ts γ(t) = , |γ(t + s) − γ(t)| for any s > 0 small enough; hence T and such Ts are homotopic; hence T and Ts are homotopic for all s ∈ (0, L). Furthermore, we can even let s = s(t) be any 140 continuous function s : [0, L] → (0, L), such that s(0) = s(L). In particular, T is homotopic to the map V : C → S 1 , obtained from (20.21) by taking s(t) = L1 − t, for t ∈ [0, L0 ], and s(t) going monotonically from L1 − L0 to L1 for t ∈ [L0 , L]. Note that ¡ ¢ γ(L1 ) − γ(t) , V γ(t) = |γ(L1 ) − γ(t)| 0 ≤ t ≤ L0 . The parts of V over the ranges 0 ≤ t ≤ L0 and L0 ≤ t ≤ L, respectively, are illustrated in Figures 20.1 and 20.2. We see that V maps the segment of C from γ(0) to γ(L0 ) into the lower half of the circle S 1 , and it maps the segment of C from γ(L0 ) to γ(L) into the upper half of the circle S 1 . Therefore V (hence T ) is homotopic to a one-to-one map of C onto S 1 , preserving orientation, and (20.20) is proved. Exercises 1. Show that the identity map I : X → X has degree 1. 2. Show that, if F : X → Y is not onto, then Deg(F ) = 0. 3. If A : S n → S n is the antipodal map, show that Deg(A) = (−1)n−1 . 4. Show that the homotopy invariance property given in Proposition 20.7 can be deduced as a corollary of Proposition 20.9. Hint. Take M = X × [0, 1]. 5. Let p(z) = z n + an−1 z n−1 + · · · + a1 z + a0 be a polynomial of degree n ≥ 1. Show that, if we identify S 2 ≈ C ∪ {∞}, then p : C → C has a unique continuous extension pe : S 2 → S 2 , with pe(∞) = ∞. Show that Deg pe = n. Deduce that pe : S 2 → S 2 is onto, and hence that p : C → C is onto. In particular, each nonconstant polynomial in z has a complex root. This result is the Fundamental Theorem of Algebra. 141 21. Critical points and index of a vector field A critical point of a vector field V is a point where V vanishes. Let V be a vector field defined on a neighborhood O of p ∈ Rn , with a single critical point, at p. Then, for any small ball Br about p, Br ⊂ O, we have a map (21.1) Vr : ∂Br → S n−1 , Vr (x) = V (x) . |V (x)| The degree of this map is called the index of V at p, denoted indp (V ); it is clearly independent of r. If V has a finite number of critical points, then the index of V is defined to be X (21.2) Index(V ) = indpj (V ). If ψ : O → O0 is an orientation preserving diffeomorphism, taking p to p and V to W, then we claim (21.3) indp (V ) = indp (W ). In fact, Dψ(p) is an element of GL(n, R) with positive determinant, so it is homotopic to the identity, and from this it readily follows that Vr and Wr are homotopic maps of ∂Br → S n−1 . Thus one has a well defined notion of the index of a vector field with a finite number of critical points on any oriented surface M. A vector field V on O ⊂ Rn is said to have a non-degenerate critical point at p provided DV (p) is a nonsingular n × n matrix. The following formula is convenient. Proposition 21.1. If V has a nondegenerate critical point at p, then (21.4) indp (V ) = sgn det DV (p). Proof. If p is a nondegenerate critical point, and we set ψ(x) = DV (p)x, ψr (x) = ψ(x)/|ψ(x)|, for x ∈ ∂Br , it is readily verified that ψr and Vr are homotopic, for r small. The fact that Deg(ψr ) is given by the right side of (21.4) is an easy consequence of Proposition 20.8. The following is an important global relation between index and degree. Proposition 21.2. Let Ω be a smooth bounded region in Rn+1 . Let V be a vector field on Ω, with a finite number of critical points pj , all in the interior Ω. Define F : ∂Ω → S n by F (x) = V (x)/|V (x)|. Then (21.5) Index(V ) = Deg(F ). S Proof. If we apply Proposition 20.9 to M = Ω \ Bε (pj ), we see that Deg(F ) is j equal to the sum of degrees of the maps of ∂Bε (pj ) to S n , which gives (21.5). Next we look at a process of producing vector fields in higher dimensional spaces from vector fields in lower dimensional spaces. 142 Proposition 21.3. Let W be a vector¡ field on¢Rn , vanishing only at 0. Define a vector field V on Rn+k by V (x, y) = W (x), y . Then V vanishes only at (0, 0). Then we have (21.6) ind0 W = ind(0,0) V. Proof. If we use Proposition 20.8 to compute degrees of maps, and choose y0 ∈ S n−1 ⊂ S n+k−1 , a regular value of Wr , and hence also for Vr , this identity follows. We turn to a more sophisticated variation. Let X be a compact oriented n dimensional surface in Rn+k , W a (tangent) vector field on X with a finite number of critical points pj . Let Ω be a small tubular neighborhood of X, π : Ω → X mapping z ∈ Ω to the nearest point in X. Let ϕ(z) = dist(z, X)2 . Now define a vector field V on Ω by (21.7) V (z) = W (π(z)) + ∇ϕ(z). Proposition 21.4. If F : ∂Ω → S n+k−1 is given by F (z) = V (z)/|V (z)|, then (21.8) Deg(F ) = Index(W ). Proof. We see that all the critical points of V are points in X which are critical for W, and, as in Proposition 21.3, Index(W ) = Index(V ). But Proposition 21.2 implies Index(V ) = Deg(F ). Since ϕ(z) is increasing as one goes away from X, it is clear that, for z ∈ ∂Ω, V (z) points out of Ω, provided it is a sufficiently small tubular neighborhood of X. Thus F : ∂Ω → S n+k−1 is homotopic to the Gauss map (21.9) N : ∂Ω −→ S n+k−1 , given by the outward pointing normal. This immediately gives: Corollary 21.5. Let X be a compact oriented surface in Rn+k , Ω a small tubular neighborhood of X, and N : ∂Ω → S n+k−1 the Gauss map. If W is a vector field on X with a finite number of critical points, then (21.10) Index(W ) = Deg(N ). Clearly the right side of (21.10) is independent of the choice of W. Thus any two vector fields on X with a finite number of critical points have the same index, i.e., Index(W ) is an invariant of X. This invariant is denoted (21.11) Index(W ) = χ(X), 143 and is called the Euler characteristic of X. See the exercises for more results on χ(M ). Exercises A nondegenerate critical point p of a vector field V is said to be a source if the real parts of the eigenvalues of DV (p) are all positive, a sink if they are all negative, and a saddle if they are all either positive or negative, and there exist some of each sign. Such a critical point is called a center if all orbits of V close to p are closed orbits, which stay near p; this requires all the eigenvalues of DV (p) to be purely imaginary. In problems 1-3, V is a vector field on a region Ω ⊂ R2 . 1. Let V have a nondegenerate critical point at p. Show that p saddle =⇒ indp (V ) = −1 p source =⇒ indp (V ) = 1 p sink =⇒ indp (V ) = 1 p center =⇒ indp (V ) = 1 2. If V has a closed orbit γ, show that the map T : γ → S 1 , T (x) = V (x)/|V (x)|, has degree +1. Hint. Use Proposition 20.12. 3. If V has a closed orbit γ whose inside O is contained in Ω, show that V must have at least one critical point in O, and that the sum of the indices of such critical points must be +1. Hint. Use Proposition 21.2. If V has exactly one critical point in O, show that it cannot be a saddle. 4. Let M be a compact oriented 2-dimensional surface. Given a triangulation of M, within each triangle construct a vector field, vanishing at 7 points as illustrated in Fig.21.1, with the vertices as attractors, the center as a repeller, and the midpoints of each side as saddle points. Fit these together to produce a smooth vector field X on M. Show directly that Index(X) = V − E + F, 144 where V = # vertices, E = # edges, F = # faces, in the triangulation. 5. More generally, construct a vector field on an n-simplex so that, when a compact oriented n-dimensional surface M is triangulated into simplices, one produces a vector field X on M such that (21.12) Index(X) = n X (−1)j νj , j=0 where νj is the number of j-simplices in the triangulation, i.e., ν0 = # vertices, ν1 = # edges, . . . , νn = # of n-simplices. See Fig.21.2 for a picture of a 3-simplex, with its faces (i.e., 2-simplices), edges, and vertices labelled. The right side of (21.12) is one definition of χ(M ). As we have seen that the left side of (21.12) is independent of the choice of X, it follows that the right side is independent of the choice of triangulation. 6. Let M be the sphere S n , which is homeomorphic to the boundary of an (n + 1)simplex. Computing the right side of (21.12), show that (21.13) χ(S n ) = 2 if n even, 0 if n odd. Conclude that, if n is even, there is no smooth nowhere vanishing vector field on Sn. 7. With X = S n ⊂ Rn+1 , note that the manifold ∂Ω in (21.9) consists of two copies of S n , with opposite orientations. Compute the degree of the map N in (21.9)-(21.10), and use this to give another derivation of (21.13), granted (21.11). 8. Consider the vector field R on S 2 generating rotation about an axis. Show that R has two critical points, at the ‘poles.’ Classify the critical points, compute Index(R), and compare the n = 2 case of (21.13). 9. Show that the computation of the index of a vector field X on a surface M is independent of orientation, and that Index(X) can be defined when M is not orientable. 145 A. Metric spaces, compactness, and all that A metric space is a set X, together with a distance function d : X × X → [0, ∞), having the properties that d(x, y) = 0 ⇐⇒ x = y, (A.1) d(x, y) = d(y, x), d(x, y) ≤ d(x, z) + d(y, z). The third of these properties is called the triangle inequality. An example of a metric space is the set of rational p numbers Q, with d(x, y) = |x − y|. Another n example is X = R , with d(x, y) = (x1 − y1 )2 + · · · + (xn − yn )2 . If (xν ) is a sequence in X, indexed by ν = 1, 2, 3, . . . , i.e., by ν ∈ Z+ , one says xν → y if d(xν , y) → 0, as ν → ∞. One says (xν ) is a Cauchy sequence if d(xν , xµ ) → 0 as µ, ν → ∞. One says X is a complete metric space if every Cauchy sequence converges to a limit in X. Some metric spaces are not complete; for example,Q is√not complete. You can take a sequence (xν ) of rational numbers such that xν → 2, which is not rational. Then (xν ) is Cauchy in Q, but it has no limit in Q. b as follows. If a metric space X is not complete, one can construct its completion X b consist of an equivalence class of Cauchy sequences in X, Let an element ξ of X where we say (xν ) ∼ (yν ) provided d(xν , yν ) → 0. We write the equivalence class containing (xν ) as [xν ]. If ξ = [xν ] and η = [yν ], we can set d(ξ, η) = lim d(xν , yν ), ν→∞ b a complete metric space. and verify that this is well defined, and makes X If the completion of Q is constructed by this process, you get R, the set of real numbers. A metric space X is said to be compact provided any sequence (xν ) in X has a convergent subsequence. Clearly every compact metric space is complete. There are two useful conditions, each equivalent to the characterization of compactness just stated, on a metric space. The reader can establish the equivalence, as an exercise. (i) If S ⊂ X is a set with infinitely many elements, then there is an accumulation point, i.e., a point p ∈ X such that every neighborhood U of p contains infinitely many points in S. Here, a neighborhood of p ∈ X is a set containing the ball (A.2) for some ε > 0. Bε (p) = {x ∈ X : d(x, p) < ε}, 146 (ii) Every open cover {Uα } of X has a finite subcover. Here, a set U ⊂ X is called open if it contains a neighborhood of each of its points. The complement of an open set is said to be closed. Equivalently, K ⊂ X is closed provided that (A.3) xν ∈ K, xν → p ∈ X =⇒ p ∈ K. It is clear that any closed subset of a compact metric space is also compact. If Xj , 1 ≤ j ≤ m, is a finite collection of metric spaces, with metrics dj , we can define a product metric space (A.4) X= m Y Xj , d(x, y) = d1 (x1 , y1 ) + · · · + dm (xm , ym ). j=1 p Another choice of metric is δ(x, y) = d1 (x1 , y1 )2 + · · · + dm (xm , ym )2 . The metrics d and δ are equivalent, i.e., there exist constants C0 , C1 ∈ (0, ∞) such that (A.5) C0 δ(x, y) ≤ d(x, y) ≤ C1 δ(x, y), ∀ x, y ∈ X. We describe some useful classes of compact spaces. Proposition A.1. If Xj are compact metric spaces, 1 ≤ j ≤ m, so is X = m Q j=1 Xj . Proof. If (xν ) is an infinite sequence of points in X, say xν = (x1ν , . . . , xmν ), pick a convergent subsequence (x1ν ) in X1 , and consider the corresponding subsequence of (xν ), which we relabel (xν ). Using this, pick a convergent subsequence (x2ν ) in X2 . Continue. Having a subsequence such that xjν → yj in Xj for each j = 1, . . . , m, we then have a convergent subsequence in X. The following result is useful for calculus on Rn . Proposition A.2. If K is a closed bounded subset of Rn , then K is compact. Proof. The discussion above reduces the problem to showing that any closed interval I = [a, b] in R is compact. Suppose S is a subset of I with infinitely many elements. Divide I into 2 equal subintervals, I1 = [a, b1 ], I2 = [b1 , b], b1 = (a + b)/2. Then either I1 or I2 must contain infinitely many elements of S. Say Ij does. Let x1 be any element of S lying in Ij . Now divide Ij in two equal pieces, Ij = Ij1 ∪ Ij2 . One of these intervals (say Ijk ) contains infinitely many points of S. Pick x2 ∈ Ijk to be one such point (different from x1 ). Then subdivide Ijk into two equal subintervals, and continue. We get an infinite sequence of distinct points xν ∈ S, and |xν − xν+k | ≤ 2−ν (b − a), for k ≥ 1. Since R is complete, (xν ) converges, say to y ∈ I. Any neighborhood of y contains infinitely many points in S, so we are done. If X and Y are metric spaces, a function f : X → Y is said to be continuous provided xν → x in X implies f (xν ) → f (x) in Y. 147 Proposition A.3. If X and Y are metric spaces, f : X → Y continuous, and K ⊂ X compact, then f (K) is a compact subset of Y. Proof. If (yν ) is an infinite sequence of points in f (K), pick xν ∈ K such that f (xν ) = yν . If K is compact, we have a subsequence xνj → p in X, and then yνj → f (p) in Y. If F : X → R is continuous, we say f ∈ C(X). A useful corollary of Proposition A.3 is: Proposition A.4. If X is a compact metric space and f ∈ C(X), then f assumes a maximim and a minimum value on X. Proof. We know from Proposition A.3 that f (X) is a compact subset of R. Hence f (X) is bounded, say f (X) ⊂ I = [a, b]. Repeatedly subdividing I into equal halves, as in the proof of Proposition A.2, at each stage throwing out intervals that do not intersect f (X), and keeping only the leftmost and rightmost interval amongst those remaining, we obtain points α ∈ f (X) and β ∈ f (X) such that f (X) ⊂ [α, β]. Then α = f (x0 ) for some x0 ∈ X is the minimum and β = f (x1 ) for some x1 ∈ X is the maximum. At this point, the reader might take a look at the proof of the Mean Value Theorem, given in §0, which applies this result. A function f ∈ C(X) is said to be uniformly continuous provided that, for any ε > 0, there exists δ > 0 such that (A.6) x, y ∈ X, d(x, y) ≤ δ =⇒ |f (x) − f (y)| ≤ ε. An equivalent condition is that f have a modulus of continuity, i.e., a monotonic function ω : [0, 1) → [0, ∞) such that δ & 0 ⇒ ω(δ) & 0, and such that (A.7) x, y ∈ X, d(x, y) ≤ δ ≤ 1 =⇒ |f (x) − f (y)| ≤ ω(δ). Not all continuous functions are uniformly continuous. For example, if X = (0, 1) ⊂ R, then f (x) = sin 1/x is continuous, but not uniformly continuous, on X. The following result is useful, for example, in the development of the Riemann integral in §11. Proposition A.5. If X is a compact space and f ∈ C(X), then f is uniformly continuous. Proof. If not, there exist xν , yν ∈ X and ε > 0 such that d(xν , yν ) ≤ 2−ν but (A.8) |f (xν ) − f (yν )| ≥ ε. Taking a convergent subsequence xνj → p, we also have yνj → p. Now continuity of f at p implies f (xνj ) → f (p) and f (yνj ) → f (p), contradicting (A.8). 148 If X and Y are metric spaces, the space C(X, Y ) of continuous maps f : X → Y has a natural metric structure, under some additional hypotheses. We use (A.9) ¡ ¢ D(f, g) = sup d f (x), g(x) . x∈X This sup exists provided f (X) and g(X) are bounded subsets of Y, where to say B ⊂ Y is bounded is to say d : B × B → [0, ∞) has bounded image. In particular, this supremum exists if X is compact. The following result is useful in the proof of the fundamental local existence theorem for ODE, in §3. Proposition A.6. If X is a compact metric space and Y is a complete metric space, then C(X, Y ), with the metric (A.9), is complete. We leave the proof as an exercise. We end this appendix with a couple of slightly more sophisticated results on compactness. The following extension of Proposition A.1 is a special case of Tychonov’s Theorem. ∞ Q Proposition A.7. If {Xj : j ∈ Z+ } are compact metric spaces, so is X = Xj . j=1 Here, we can make X a metric space by setting (A.10) d(x, y) = ∞ X 2−j dj (x, y) . 1 + d (x, y) j j=1 It is easy to verify that, if xν ∈ X, then xν → y in X, as ν → ∞, if and only if, for each j, pj (xν ) → pj (y) in Xj , where pj : X → Xj is the projection onto the j th factor. Proof. Following the argument in Proposition A.1, if (xν ) is an infinite sequence of points in X, we obtain a nested family of subsequences (A.11) (xν ) ⊃ (x1 ν ) ⊃ (x2 ν ) ⊃ · · · ⊃ (xj ν ) ⊃ · · · such that p` (xj ν ) converges in X` , for 1 ≤ ` ≤ j. the next step is a diagonal construction. We set (A.12) ξν = xν ν ∈ X. Then, for each j, after throwing away a finite number N (j) of elements, one obtains from (ξν ) a subsequence of the sequence (xj ν ) in (A.11), so p` (ξν ) converges in X` for all `. Hence (ξν ) is a convergent subsequence of (xν ). Before stating the next result, we establish a simple lemma. 149 Lemma A.8. If X is a compact metric space, then there is a countable dense subset Σ of X. (One says X is separable.) Proof. For each ν ∈ Z+ , the collection of balls B1/ν (x) covers X, so there is a finite subcover, {B1/ν (xνj ) : 1 ≤ j ≤ N (ν)}. It follows that Σ = {xνj : j ≤ N (ν), ν ∈ Z+ } is a countable dense subset of X. The next result is a special case of Ascoli’s Theorem. Proposition A.9. Let X and Y be compact metric spaces, and fix a modulus of continuity ω(δ). Then (A.13) © ¡ ¢ ¡ ¢ª Cω = f ∈ C(X, Y ) : d f (x), f (y) ≤ ω d(x, y) is a compact subset of C(X, Y ). Proof. Let (fν ) be a sequence in Cω . Let Σ be a countable dense subset of X. For each x ∈ Σ, (fν (x)) is a sequence in Y, which hence has a convergent subsequence. Using a diagonal construction similar to that in the proof of Proposition A.7, we obtain a subsequence (ϕν ) of (fν ) with the property that ϕν (x) converges in Y, for each x ∈ Σ, say (A.14) x ∈ Σ =⇒ ϕν (x) → ψ(x), where ψ : Σ → Y. So far, we have not used (A.13), but this hypothesis readily yields ¡ ¢ ¡ ¢ d ψ(x), ψ(y) ≤ ω d(x, y) , (A.15) for all x, y ∈ Σ. Using the denseness of Σ ⊂ X, we can extend ψ uniquely to a continuous map of X → Y, which we continue to denote ψ. Also, (A.15) holds for all x, y ∈ X, i.e., ψ ∈ Cω . It remains to show that ϕν → ψ uniformly on X. Pick ε > 0. Then pick δ > 0 such that ω(δ) < ε/3. Since X is compact, we can cover X by finitely many balls Bδ (xj ), 1 ≤ j ≤ N, xj ∈ Σ. Pick M so large that ϕν (xj ) is within ε/3 of its limit for all ν ≥ M (when 1 ≤ j ≤ N ). Now, for any x ∈ X, picking ` ∈ {1, . . . , N } such that d(x, x` ) ≤ δ, we have, for k ≥ 0, ν ≥ M, (A.16) ¡ ¢ ¡ ¢ ¡ ¢ d ϕν+k (x), ϕν (x) ≤ d ϕν+k (x), ϕν+k (x` ) + d ϕν+k (x` ), ϕν (x` ) ¡ ¢ + d ϕν (x` ), ϕν (x) ≤ ε/3 + ε/3 + ε/3, proving the proposition. 150 B. Sard’s theorem Let F : O → Rn be a C 1 map, with O open in Rn . If p ∈ O and DF (p) : Rn → Rn is not surjective, then p is said to be a critical point, and F (p) a critical value. The set C of critical points can be a large subset of O, even all of it, but the set of critical values F (C) must be small in Rn , as the following result implies. Proposition B.1. If F : O → Rn is a C 1 map, C ⊂ O its set of critical points, and K ⊂ O compact, then F (C ∩ K) is a nil subset of Rn . Proof. Without loss of generality, we can assume K is a cubical cell. Let P be a partition of K into cubical cells Rα , all of diameter δ. Write P = P 0 ∪ P 00 , where cells in P 0 are disjoint from C, and cells in P 00 intersect C. Pick xα ∈ Rα ∩ C, for Rα ∈ P 00 . Fix ε > 0. Now we have (B.1) F (xα + y) = F (xα ) + DF (xα ) + rα (y), and, if δ > 0 is small enough, then |rα (y)| ≤ ε|y| ≤ εδ, for xα +y ∈ Rα . Thus F (Rα ) is contained in an εδ-neighborhood of the set Hα = F (xα ) + DF (xα )(Rα − xα ), which is a parallelipiped of dimension ≤ n − 1, and diameter ≤ M δ, if |DF | ≤ M. Hence (B.2) cont+ F (Rα ) ≤ Cεδ n ≤ C 0 εV (Rα ), for Rα ∈ P 00 . Thus (B.3) cont+ F (C ∩ K) ≤ X cont+ F (Rα ) ≤ C 00 ε. Rα ∈P 00 Taking ε → 0, we have the proof. This is the easy case of a result known as Sard’s Theorem, which also treats the case F : O → Rn when O is an open set in Rm , m > n. Then a more elaborate argument is needed, and one requires more differentiability, namely that F is class C k , with k = m − n + 1. A proof can be found in [Mil] or [Stb]. 151 References [AM] R.Abraham and J.Marsden, Foundations of Mechanics, Benjamin, Reading, Mass., 1978. [Ahl] L.Ahlfors, Complex Analysis, McGraw-Hill, New York, 1979. [Ar] V.Arnold, Mathematical Methods of Classical Mechanics, Springer, New York, 1978. [Ar2] V.Arnold, Geometrical Methods in the Theory of Ordinary Differential Equations, Springer, New York, 1983. [Bir] G.D.Birkhoff, Dynamical Systems, AMS Colloq. Publ., vol. 9, Providence, RI, 1927. [Bu] R.Buck, Advanced Calculus, McGraw-Hill, New York, 1978. [Car] C.Caratheodory, Calculus of Variations and Partial Differential Equations of the First Order, Holden-Day, San Fransisco, 1965. [Fed] H.Federer, Geometric Measure Theory, Springer, New York, 1969. [Fle] W.Fleming, Finctions of Several Variables, Addison-Wesley, Reading, Mass., 1965. [Fol] G.Folland, Real Analysis: Modern Techniques and Applications, WileyInterscience, New York, 1984. [Fr] P.Franklin, A Treatise on Advanced Calculus, John Wiley, New York, 1955. [Go] H.Goldstein, Classical Mechanics, Addison-Wesley, New York, 1950. [Hal] J.Hale, Ordinary Differential Equations, Wiley, New York, 1969. [Hil] E.Hille, Analytic Function Theory, Chelsea Publ., New York, 1977. [HS] M.Hirsch and S.Smale, Differential Equations, Dynamical Systems, and Linear Algebra, Academic Press, New York, 1974. [Kan] Y.Kannai, An elementary proof of the no-retraction theorem, Amer. Math. Monthly 88(1981), 264-268. [Kel] O.Kellog, Foundations of Potential Theory, Dover, New York, 1953. [Kr] S.Krantz, Function Theory of Several Complex Variables, Wiley, New York, 1982. [Lef] S.Lefschetz, Differential Equations, Geometric Theory, J. Wiley, New York, 1957. [LS] L.Loomis and S.Sternberg, Advanced Calculus, Addison-Wesley, New York, 1968. [Mil] J.Milnor, Topology from the Differentiable Viewpoint, Univ. Press of Virginia, Charlottesville, 1965. [Mos] J.Moser, Stable and Random Motions in Dynamical Systems, Princeton Univ. Press, 1973. [NS] V.Nemytskii and V.Stepanov, Qualitative Theory of Differential Equations, Dover, New York, 1989. 152 [NSS] H.Nickerson, D.Spencer, and N.Steenrod, Advanced Calculus, Van Nostrand, Princeton, New Jersey, 1959. [Poi] H.Poincare, Les Methodes Nouvelles de la Mecanique Celeste, GauthierVillars, 1899. [Spi] M.Spivak, Calculus on Manifolds, Benjamin, New York, 1965. [Stb] S.Sternberg, Lectures on Differential Geometry, Prentice Hall, New Jersey, 1964. [Sto] J.J.Stoker, Differential Geometry, Wiley-Interscience, New York, 1969. [Str] G.Strang, Linear Algebra and its Applications, Harcourt Brace Jovanovich, San Diego, 1988. [T1] M.Taylor, Partial Differential Equations, MET Press, 1993. [T2] M.Taylor, Measure Theory and Integration, MET Press, 1993. [TS] J.M.Thompson and H.Stewart, Nonlinear Dynamics and Chaos, J. Wiley, New York, 1986. [Wh] E.Whittaker, A Treatise on the Analytical Dynamics of Particles and Rigid Bodies, Dover, New York, 1944. [Wig] S.Wiggins, Introduction to Applied Dynamical Systems and Chaos, Springer, New York, 1990. [Wil] E.B.Wilson, Advanced Calculus, Dover, New York, 1958. (Reprint of 1912 ed.) 153 Index A action integral 122 algebra of sets 70 analytic function 48 complex 17 real 23, 49 angular momentum 64 anticommutation relation 92 antipodal map 131, 137 arclength 43, 86, 120 arcsin 29 arctan 30 Ascoli’s theorem 145 autonomous system 25, 46 averaging over rotations 85 B basis 13, 34, 35, 37, 80, 91 boundary 99 Brouwer fixed point theorem 131 C canonical transformation 66, 126 Cauchy sequence 21, 142 Cauchy-Riemann equations 17, 110 Cauchy’s inequality 31 Cauchy’s integral formula 110, 114 Cauchy’s integral theorem 110 central force 63 chain rule 11, 13, 20, 80, 90, 93, 116 change of variable formula 76, 79, 81, 87, 89 closed set 143 cofactor 17 combination 16 compact space 6, 10, 142 154 complete metric space 20, 142 completion of a metric space 142 complex variable 115 conformal map 116 conic section 65 conserved quantity 52, 64 contact form 128 content 7, 69, 147 upper 8, 69 continuous 6, 11, 13, 68, 72, 80, 143 uniformly 6, 144 contraction mapping 20, 24 coordinate 51 chart 80, 86, 90 corner 100 cos 28, 117 Cramer’s formula 17, 42 critical point 147 of a vector field 54, 138, 140 critical value 147 cross product 4, 45, 81 curl 97, 106, 108 curvature 43 cusp 101 cyclic vector 37 D Darboux’ Theorem 128 degree of a map 3, 133, 138 derivation 50 derivative 8, 13, 47, 51 determinant 17, 19, 73, 75, 81 diagonal construction 145 diffeomorphism 20, 75, 76, 80, 82, 84, 87, 90, 129 differential form 3, 52, 87, 119 closed 94, 97 exact 94, 97, 98 div 54, 62, 97, 102, 108 Divergence Theorem 102, 104 Duhamel’s principle 37, 40, 41, 47 155 E eigenvalue 33, 36, 54 eigenvector 33, 39 energy 59 Euclidean norm 13 Euler characteristic 139 Euler’s formula 28 exponential 26, 27 of a matrix 31, 39, 84 exterior derivative 93 F fixed point 20, 24, 131 flow 50, 127 force 59 Frenet-Serret formula 43 Fubini’s Theorem 71, 77 fundamental theorem of algebra 3, 34, 113, 137 fundamental theorem of calculus 2, 9, 14, 17 noncommutative 40 G gamma function 83 Gauss map 139 Gaussian integral 78 geodesic 121, 123 Gl(n, R) 73, 138 grad 105 Green’s formula 106 Green’s Theorem 102, 110 Gronwall inequality 42 H Haar measure 85, 132 Hamiltonian systems 2, 59, 62, 108, 120, 125 harmonic conjugate 114 harmonic function 3, 112, 116 holomorphic function 3, 17, 23, 49, 110, 114 homotopy 96, 97, 134, 137 156 I implicit function theorem 21 index of a vector field 138 initial value problem 24 integral curve 50, 51, 53, 58, 97 integral equation 24 integrating factor 98 integration by parts 12, 118 interior product 92 inverse function theorem 20, 51, 84 iterated integral 71 J Jacobi identity 58, 62 Jordan curve theorem 136 Jordan normal form 37, 54 Jordan-Brouwer separation theorem 135 K Kepler problem 2, 64, 66 kinetic energy 122 L Lagrange’s equation 118, 122 Lagrangian 3, 122, 124 Laplace operator 105 Legendre transformation 120 Lie bracket 55, 95 Lie derivative 55, 94, 103, 126 Lie group 57 linearization 46, 54 Liouville’s Theorem 113 Lipschitz continuity 24, 47, 71 log 28 M manifold 80 matrix 13, 17, 19, 31, 40, 73, 84, 89 157 maximum 39, 144 mean value property 112 Mean Value Theorem 9 metric space 142 metric tensor 3, 80, 82, 103 modulus of continuity 72, 76, 144 multi-index 15 N Newton’s law 2, 59, 123 Newton’s method 23 nil set 70, 71, 74, 147 nilpotent 37 norm 31, 38 normal derivative 105 normal vector outward 106, 109 upward 106, 109 O open set 142 orbit 50, 62 closed 136, 140 orientation 89, 99, 101, 109, 116 orthogonal matrix 44 P partial derivative 10, 13 partition 5, 67, 73 pendulum 61, 123 double 124 permutation 19 sign of 19 pi (π) 29, 30, 77, 86 Picard iteration 24 Poincare lemma 95, 126, 129, 132 Poisson bracket 59, 62 polar coordinates 64, 76, 82, 124 positive definite matrix 80 potential energy 59, 122 power series 23, 33, 48, 110, 114 pull-back 87, 89, 93, 100 158 Q quadrature 60 R rational numbers 142 real numbers 142 regular value 134 retraction 131 Riemann integral 2, 5, 67 rotation 82 S Sard’s theorem 134, 147 sin 28, 117 skew-adjoint matrix 44 Skew(n) 38, 84, 132 SO(n) 38, 84, 132 sphere 82 area of 82 spherical polar coordinates 79 Stokes formula 3, 99, 104, 131, 135 surface 3, 80 surface integral 80 symplectic form 125 symplectic transformation 126 T tan 30 tangent vector forward 107, 109 Taylor’s formula with remainder 2, 12, 15 torsion 43 triangle inequality 31 triangulation 140 trigonometric functions 4, 29 Tychonov’s theorem 144 V variational problem 61, 118 159 vector field 50, 59, 88, 97, 106, 125 volume form 127 W wedge product 92 Wronskian 42