MATH08007 Linear Algebra S2, 2011/12 Lecture 1
Transcription
MATH08007 Linear Algebra S2, 2011/12 Lecture 1
MATH08007 Linear Algebra S2, 2011/12 Lecture 1 0. Prelude - What is Linear Algebra? The key idea that underlies the course is the notion of LINEARITY. Roughly speaking, a property is linear if, whenever objects X and Y have the property, so do X + Y and λX, where λ is a real or complex scalar. For this to make sense, we need to be able to add our objects X and Y and multiply them by scalars. The first key idea we shall meet is that of sets with a linear structure, that is sets in which there are natural definitions of addition and scalar multiplication. Simple examples are (a) R3 (3-dimensional space) with (x1 , x2 , x3 ) + (y1 , y2 , y3 ) = (x1 + y1 , x2 + y2 , x3 + y3 ) and λ(x1 , x2 , x3 ) = (λx1 , λx2 , λx3 ) for (x1 , x2 , x3 ), (y1 , y2 , y3 ) ∈ R3 and λ ∈ R. (b) The set CR [a, b] of all continuous functions f : [a, b] → R, with addition f + g and scalar multiplication λf (λ ∈ R) given by (f + g)(t) = f (t) + g(t) and (λf )(t) = λf (t) for a ≤ t ≤ b. These two examples, though different in concrete terms, have similar ‘linear’ structures; they are in fact both examples of vector spaces (the precise definition will be given in Chapter 1). Linear algebra is the study of linear structures (vector spaces) and certain mappings between them (linear mappings). To illustrate, consider the following simple examples. µ ¶ a b (a) A 2 × 2 real matrix A = defines a mapping R2 → R2 given by c d ¶ ¶ µ ¶ µ ¶µ µ ax1 + bx2 a b x1 x1 . = → T : cx1 + dx2 x2 x2 c d This has the property that T (x + y) = T x + T y and T (λx) = λ(T x); in other words, T respects the linear structure in R2 (b) For f ∈ CR [a, b], put Z b Tf = f (t) dt. a Here T maps CR [a, b] to R and satisfies T (f + g) = T f + T g and T (λf ) = λT f so that, as in (a), T respects the linear structures in CR [a, b] and R. Such mappings T that respect or preserve the linear structure are called linear mappings. 1. Vector spaces Throughout, R and C denote, respectively, the real and complex numbers, referred to as scalars. F will denote either R or C when it is unimportant which set (field) of scalars we are using. Fn is the set of all n-tuples of elements of F and we write a typical element x of Fn either as a column (vector) x1 x2 x = .. . xn or as a row (vector) x = (x1 , x2 , . . . , xn ). In Fn , addition and scalar multiplication are defined coordinatewise by (x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn ) and λ(x1 , x2 , . . . , xn ) = (λx1 , λx2 , . . . , λxn ). This is the basic example of a vector space over F. The formal definition of a vector space V over F is as follows. V is a non-empty set with the following structure. • Given elements x, y ∈ V , we can add them to obtain an element x + y of V . • Given x ∈ V and λ ∈ F, we can form the product λx, again an element of V . • We need these (usually natural) operations to obey the familiar algebraic laws of arithmetic: (a) x + y = y + x for all x, y ∈ V (addition is commutative); (b) x + (y + z) = (x + y) + z for all x, y, z ∈ V (addition is associative); (c) there exists an element 0 ∈ V such that 0 + x = x (= x + 0) for all x ∈ V ; (d) given x ∈ V , there is an element −x ∈ V such that x + (−x) = 0 (= (−x) + 0) and (e) λ(x + y) = λx + λy; (f) (λ + µ)x = λx + µx; (g) (λµ)x = λ(µx) for all x, y ∈ V and all λ, µ ∈ F; and finally (h) 1.x = x for all x ∈ V . [(a)-(d) say that V is an abelian group under +; (e)-(g) are the natural distributive laws linking addition and scalar multiplication; (h) is a natural normalization for scalar multiplication.] We call V a real (resp. complex) vector space when F = R (resp. C). The examples we shall encounter in the course will comprise either sets of points in Fn , with 2 algebraic operations defined coordinatewise, or sets of scalar valued functions defined on a set, with algebraic operations defined pointwise on the set. In both cases, properties (a),(b), (e), (f), (g) and (h) hold automatically. Also, 0 = 0.x (where the 0 on the l.h.s. is the zero element of V and the 0 on the r.h.s. in the scalar 0) and −x = (−1).x for every x ∈ V . The crucial properties to check are • If x, y ∈ V , is x + y in V ? • If x ∈ V and λ ∈ F, is λx in V ? When these two properties hold, we say that V is closed under addition and scalar multiplication. Examples 1. Fn with addition and scalar multiplication defined coordinatewise as above. Here (writing elements as row vectors), 0 = (0, 0, . . . , 0) and −(x1 , x2 , . . . , xn ) = (−x1 , −x2 , . . . , −xn ). 2. The set Mm,n (F) of all m × n matrices with entries in F and with addition and scalar multiplication defined entrywise. Here 0 in Mm,n (F) is just the zero matrix (all entries 0) and −(ai,j ) = (−ai,j ). 3. The set Pn of all polynomials in a variable t with real coefficients and degree less than or equal to n and with addition and scalar multiplication defined pointwise in the variable t. This is a real vector space. (Note that the set of such polynomials of exact degree n is not a vector space.) Lecture 2 1. Vector spaces contd. Subspaces Let V be a vector space over F and let U ⊆ V . Definition We say that U is a subspace of V if, with addition and scalar multiplication inherited from V , U is a vector space in its own right. (1.1) Proposition Let U be a subset of a vector space V over F. Then U is a subspace of V if and only if (a) 0 ∈ U ; (b) x + y ∈ U for all x, y ∈ U ; (c) λx ∈ U for all x ∈ U and all λ ∈ F. Comments 1. The zero element of a subspace of V is the same as the zero element of V . 2. We could replace (a) by just saying that U 6= ∅. 3. V is a subspace of itself; {0} is also a subspace of V . Examples 1. {p ∈ Pn : p(0) = 0} is a subspace of Pn . 3 2. {x ∈ Rn : x1 ≥ 0} is not a subspace of Rn . 3. The subspaces of R2 are just R2 , {0}, and straight lines through 0. Likewise, the subspaces of R3 are just R3 , {0}, and straight lines or planes through 0. Spans Definitions Let V be a vector space over F. (i) Given vectors v1 , v2 , . . . , vk in V , we call a vector u of the form u = λ 1 v1 + λ 2 v2 + · · · + λ k vk for some λ1 , λ2 , . . . , λk ∈ F a linear combination of v1 , . . . , vk . (ii) Let S be a non-empty subset of V . Then the span (or linear span) of S, denoted by span S, is the set of all linear combinations of elements of S. Thus span S = {λ1 v1 + · · · + λk vk : λj ∈ F, vj ∈ S}. Notes 1. An alternative notation to span S is lin S. 2. k here may be different for different elements of span S. 3. S may be infinite but we only consider finite linear combinations. 4. If S is finite, say S = {v1 , . . . , vm }, then span S = {λ1 x1 + · · · + λm xm : λj ∈ F}. (1.2) Theorem Let S be a non-empty subset of a vector space V . Then (i) span S is a subspace of V ; (ii) if U is a subspace of V and S ⊆ U , then span S ⊆ U . [Thus span S is the smallest subspace of V containing S.] Examples 1. In R2 , the span of a single non-zero vector v = (a, b) is just the straight line through the point (a, b) and the origin. Similarly, in R3 , the span of a single non-zero vector v = (a, b, c) is just the straight line through the point (a, b, c) and the origin. Lecture 3 1. Vector spaces contd. Examples of spans contd. 4 2. Let v1 = (1, 1, 2) and v2 = (−1, 3, 1) in R3 . Does (1,0,1) belong to span {v1 , v2 }? Try to write (1, 0, 1) as λv1 + µv2 for some λ, µ ∈ R; that is, write (1, 0, 1) = λ(1, 1, 2) + µ(−1, 3, 1). Equating coordinates, this gives λ − µ = 1, λ + 3µ = 0, 2λ + µ = 1. The first two equations give λ = 3/4, µ = −1/4, but then 2λ + µ 6= 1 so the third equation is not satisfied. Thus no such λ, µ exist and (1, 0, 1) does not belong to span {v1 , v2 }. 3. Let P denote the vector space of all real polynomials of arbitrary degree in a variable t, with the natural definitions of addition and scalar multiplication. Then span {1, t, t2 , . . . , tn } = Pn , the vector space of all such polynomials with degree less than or equal to n. Spanning sets Definition A non-empty subset S of a vector space V is a spanning set for V (or spans V ) if span S = V . Examples 1. Let v1 = (1, 0, 0), v2 = (0, 1, 0), v3 = (1, 1, 1) in R3 . Then (x1 , x2 , x3 ) = (x1 − x3 )v1 + (x2 − x3 )v2 + x3 v3 and so {v1 , v2 , v3 } spans R3 . 2. Let v1 = (1, 0), v2 = (0, 2), v3 = (3, 1) in R2 . Then (x1 , x2 ) = x1 v1 + (x2 /2)v2 + 0.v3 or (x1 , x2 ) = ³x x1 x1 ´ x1 2 v1 + − v2 + v3 . 2 2 12 6 So span {v1 , v2 , v3 } = R2 here. Note In Example 1 above, the expression for (x1 , x2 , x3 ) as a linear combination of {v1 , v2 , v3 } is unique. However, in Example 2, the expression for (x1 , x2 ) as a linear combination of {v1 , v2 } is not unique. Indeed, we have span {v1 , v2 } = R2 and so v3 is not needed to span R2 . This redundancy in a spanning set will lead to the notion of a basis of a vector space. Linear dependence and independence Definition A set of vectors {v1 , v2 , . . . , vk } in a vector space V over F is said to be linearly dependent if there exist λ1 , λ2 , . . . , λk in F, not all zero, such that λ1 v1 + λ2 v2 + · · · + λk vk = 0 5 (i.e. there is a non-trivial linear combination of the vj ’s equal to zero). The set {v1 , v2 , . . . , vk } is linearly independent if it is not linearly dependent. (We tacitly assume here that the vectors v1 , v2 , . . . , vk are distinct.) Comments 1. If some vj = 0, then {v1 , . . . , vk } is linearly dependent since in that case we have 0.v1 + · · · + 0.vj−1 + 1.vj + 0.vj+1 + · · · + 0.vk = 0. 2. The definition of linear independence can be reformulated as: {v1 , . . . , vk } is linearly independent if λ1 v1 + λ2 v2 + · · · + λk vk = 0 ⇒ λ1 = λ2 = · · · = λk = 0. 3. If v 6= 0 then the set {v} is linearly independent. (1.3) Proposition The vectors v1 , . . . , vk are linearly dependent if and only if one of them is a linear combination of the others. Examples 1. Let v1 = (1, 2, −1), v2 = (0, 1, 2), v1 = (−1, 1, 7) in R3 . Are the vectors v1 , v2 , v3 linearly dependent or linearly independent? Consider λ1 v1 + λ2 v2 + λ3 v3 = 0. Equating coordinates leads to λ1 − λ3 = 0 ; 2λ1 + λ2 + λ3 = 0 ; −λ1 + 2λ2 + 7λ3 = 0 ; and then λ1 = λ3 (from the first equation), 3λ1 +λ2 = 0 (from the second) and 6λ1 +2λ2 = 0 (from the third). So we get (taking λ1 = 1 for instance), the non-trivial solution λ1 = λ3 = 1 and λ2 = −3. Hence v1 − 3v2 + v3 = 0 and the vectors v1 , v2 , v3 are linearly dependent. 2. The vectors v1 = (1, 1, 1, −1) and v2 = (2, 1, 1, 1) in R4 are linearly independent since neither is a scalar multiple of the other. Comment Although in this course we shall primarily be concerned with the notions of linear dependence and independence for finite sets of vectors, these notions extend to sets that may be infinite. Specifically, given an arbitrary (possibly infinite) set S in a vector space V over F, S is said to be linearly dependent if, for some k ∈ N, there exist distinct v1 , . . . , vk in S and λ1 , . . . , λk in F, with some λj 6= 0, such that λ1 v1 + · · · + λk vk = 0; and S is linearly independent if it is not linearly dependent. Bases and dimension Definition A set of vectors B in a vector space V is called a basis of V if (i) span B = V, and (ii) B is linearly independent. Examples 6 1. Let v1 = (1, 0) and v2 = (0, 1) in R2 . Then B = {v1 , v2 } is a basis of R2 . Lecture 4 1. Vector spaces contd. Examples of bases contd. 2. Let B = {v1 , v2 } where v1 = (3, 1) and v2 = (1, −2) in R2 . Here, if we try to find λ1 and λ2 such that (x1 , x2 ) = λ1 (3, 1) + λ2 (1, −2) we find (solving the two coordinate equations) that we get (x1 , x2 ) = 2x1 + x2 x1 − 3x2 (3, 1) + (1, −2), 7 7 giving span B = R2 . Also, {v1 , v2 } is linearly independent (neither vector is a scalar multiple of the other). So B is a basis of R2 . 3. B = {1, t, t2 , . . . , tn } is a basis of Pn . It is clear that span B = Pn . For linear independence, suppose that λ0 .1 + λ1 .t + · · · + λn .tn = 0. Differentiate k times (0 ≤ k ≤ n) and put t = 0 to get λk k! = 0. Hence λk = 0 for k = 0, . . . , n and B is linearly independent. Thus B is a basis of Pn . (1.4) Theorem Let B = {v1 , . . . , vn } in a vector space V over F. Then B is a basis of V if and only if each element x ∈ V can be written uniquely as x = λ1 v1 + · · · + λn vn , with λ1 , . . . , λn ∈ F. Informally, the way to think of a basis B of V is that it has two properties: (i) span B = V , so B is big enough to generate V ; (ii) B is linearly independent, so we cannot omit any element of B and still have the span equal to V (using Proposition (1.3)). The aim now is to show that, if a particular basis of V has n elements, then every basis will have exactly n elements and then we are able to define the dimension of V to be n. To do this, we need the following result. (1.5) Theorem (Exchange Theorem) Let {u1 , . . . , um } span V and let {v1 , . . . , vk } be a linearly independent subset of V . Then (i) k ≤ m; (ii) V is spanned by v1 , . . . , vk and m − k of the u’s. To prove this result, we use the following simple lemma. (1.6) Lemma Let S ⊆ V and let u ∈ S. Suppose that u is a linear combination of the other vectors in S, i.e. u ∈ span (S\{u}). Then span S = span (S\{u}). Corollary of (1.5) Let {v1 , . . . , vn } be a basis of V . Then every basis of V has exactly n elements. 7 Lecture 5 1. Vector spaces contd. Definition Let V be a vector space over F. If V has a basis with n elements, then we define the dimension of V , denoted by dim V , to be n. [This is a well-defined concept by the Corollary to Theorem (1.5).] Otherwise, we say that V is infinite dimensional. Examples of bases and dimension 1. In Rn , let ej = (0, 0, . . . , 0, 1, 0, . . . , 0), where the 1 is in the j th coordinate. Then {e1 , . . . , en } is a basis of Rn and dim Rn = n. This basis is often referred to as the standard basis of Rn . 2. From an earlier example, put v1 = (3, 1) and v2 = (1, −2) in R2 . Then {v1 , v2 } spans R2 and {v1 , v2 } is clearly linearly independent since neither vector is a multiple of the other. Thus {v1 , v2 } is also a basis of R2 . [Note the illustration of the Corollary of Theorem (1.5) here.] 3. The set {1, t, t2 , . . . , tn } is a basis of Pn ; so dim Pn = n + 1. 4. The vector space, P , of real polynomials of arbitrary degree is infinite dimensional. [If it had a basis {p1 , . . . , pn }, then every polynomial in P would have degree no more than the maximum of the degrees of p1 , . . . , pn , contradicting the fact that there are polynomials of arbitrarily large degree.] 5. Find a basis of U = {x = (x1 , x2 , x3 , x4 ) ∈ R4 : x1 + x2 + x3 + x4 = 0}. Putting x4 = −x1 − x2 − x3 leads to the vectors {(1, 0, 0, −1), (0, 1, 0, −1), (0, 0, 1, −1)} spanning U and these vectors are easily checked to be linearly independent, so form a basis of U ; dim U = 3. Convention If V = {0}, we say that the empty set ∅ is a basis of V and that dim V = 0. The following results are simple consequences of the Exchange Theorem (1.5). This simple lemma (see Tut.Qn. 3.1) is useful in their proof. (1.7) Lemma Let {v1 , . . . , vn } be a linearly independent subset of a vector space V and let v ∈ V . If v ∈ / {v1 , . . . , vn }, then {v, v1 , . . . , vn } is also linearly independent. (1.8) Theorem Let dim V = n. (i) A linearly independent subset of V has most n elements; if it has n elements then it is a basis of V . (ii) A spanning subset of V has least n elements; if it has n elements then it is a basis of V . (iii) A subset of V containing at least n + 1 elements is linearly dependent; a subset of V with at most n − 1 elements cannot span V . 8 (iv) Let U be a subspace of V . Then U is finite dimensional and dim U ≤ dim V . If dim U = dim V , then U = V . (1.9) Theorem (Extension to a basis) Let {v1 , . . . , vk } be a linearly independent in a finite dimensional vector space V . Then there exists a basis of V containing {v1 , . . . , vk } (i.e. vectors can be added to {v1 , . . . , vk } to obtain a basis). Lecture 6 1. Vector spaces contd. Example Let v1 = (1, 0, 1, 0) and v2 = (1, 1, 0, 1). Extend {v1 , v2 } to obtain a basis of R4 . (The set {v1 , v2 } is clearly linearly independent since both vectors in it are non-zero and v2 is not a multiple of v1 .) To do this, apply the procedure detailed in the proof of the Exchange Theorem (1.5), starting with the linearly independent set {v1 , v2 } and the spanning set comprising the standard basis {e1 , e2 , e3 , e4 } of R4 . This will yield the basis {v1 , v2 , e3 , e4 } of R4 . In a similar way, a spanning set can be reduced to a basis. (1.10) Theorem Let S be a finite set that spans a vector space V . Then some subset of S forms a basis of V and (hence) V is finite dimensional. Comment The proof of this result actually shows that, given a finite set S in a vector space, there is a linearly independent subset S 0 of S with span S 0 = span S (so that S 0 is a basis of span S). Example Let v1 = (1, 0, 1), v2 = (0, 1, 1), v3 = (1, −1, 0) and v4 = (1, 0, 0). Find a linearly independent subset of {v1 , v2 , v3 , v4 } with the same span. Note that {v1 , v2 } is linearly independent but v3 = v1 − v2 . So span{v1 , v2 , v3 , v4 } = span{v1 , v2 , v4 }. Now check whether or not v4 is a linear combination of v1 and v2 . Setting v4 = λ1 v1 + λ2 v2 leads (equating coordinates) to λ1 = 1, λ2 = 0 and λ1 + λ2 = 0, so no such λ1 , λ2 exist and v4 is not a linear combination of v1 and v2 . Thus {v1 , v2 , v4 } is linearly independent and has the same span as {v1 , v2 , v3 , v4 }. (This span will be all of R3 since dim R3 = 3.) Sums of subspaces Definition The sum U1 + U2 of two subspaces U1 , U2 of a vector space V is defined as U1 + U2 = {u1 + u2 : u1 ∈ U1 and u2 ∈ U2 }. Compare the following result with that of Theorem (1.2). (1.11) Theorem U1 + U2 is a subspace of V containing U1 and U2 . Further, if W is a subspace of V with U1 , U2 ⊆ W , then U1 + U2 ⊆ W . Hence U1 + U2 is the smallest subspace of V that contains both U1 and U2 . Examples 1. In R2 , let U1 = span{v1 } and U2 = span{v2 }, where v1 , v2 6= 0 and neither is a multiple of the other. Then U1 + U2 = R2 . Note that each v ∈ R2 can be written uniquely as v = u1 + u2 with u1 ∈ U1 and u2 ∈ U2 . 9 2. In P4 , let U1 be the subspace of all polynomials of degree less than or equal to 3 and let U2 be the subspace of all even polynomials of degree less than or equal to 4. Then U1 + U2 = P4 . Note that, here, the expression of a given p ∈ P4 as p = p1 + p2 , with p1 ∈ U1 and p2 ∈ U2 , is not unique. Direct sums Suppose that U1 ∩ U2 = {0}. Then each u ∈ U1 + U2 can be expressed uniquely as u = u1 + u2 , with u1 ∈ U1 and u2 ∈ U2 . In this situation, we call write U1 ⊕ U2 for U1 + U2 and call this the direct sum of U1 and U2 . Comment In Example 1 above, R2 = U1 ⊕ U2 whilst, in Example 2, P4 is not the direct sum of U1 and U2 . (1.12) Theorem (The dimension theorem) Let U1 and U2 be subspaces of a finite dimensional vector space. Then (i) dim(U1 + U2 ) = dim U1 + dim U2 − dim(U1 ∩ U2 ); (ii) if U1 ∩ U2 = {0}, then dim(U1 ⊕ U2 ) = dim U1 + dim U2 . Lecture 7 1. Vector spaces contd. Comment Compare the formula in Theorem (1.12)(i) with the formula #(A ∪ B) = #(A) + #(B) − #(A ∩ B) for finite sets A, B, where # denotes ‘number of elements in’. Exercise Verify the results of Theorem (1.12) in the cases of the Examples 1 and 2 of sums of subspaces in Lecture 6. Coordinates Let B = {v1 , . . . , vn } be a basis of a vector space V . Then each v ∈ V can be written uniquely as v = λ1 v1 + · · · + λn vn with the coefficients λj ∈ F. We call λ1 , . . . , λn the coordinates of v with respect to the basis B; we also call (a) the row vector (λ1 , . . . , λn ) the coordinate row vector of v w.r.t. B; and λ1 (b) the column vector ... the coordinate column vector of v w.r.t. B. λn Note: The order in which the vectors in B are listed is important when we consider row or column coordinate vectors. Examples 10 1. Let B be the standard basis {e1 , . . . , en } in Fn (ej = (0, 0, . . . , 0, 1, 0, . . . , 0) with the 1 in the j th place).If x = (x1 , . . . , xn ) ∈ Fn , then the column coordinate vector of x w.r.t. B x1 .. is just . . xn 2. In R2 , consider the basis B = {(1, 0), (2, 3)}. Then the coordinate column vector of ¡ ¢ (4, 9) ∈ R2 w.r.t. B is −2 3 since (4, 9) = −2(1, 0) + 3(2, 3). Change of basis Let B1 = {v1 , . . . , vn } and B2 = {w1 , . . . , wn } be two bases of a vector space V . Express each vector in B2 in terms of B1 thus: w1 = p11 v1 + p21 v2 + · · · + pn1 vn w2 = p12 v1 + p22 v2 + · · · + pn2 vn ········· (∗) wn = p1n v1 + p2n v2 + · · · + pnn vn . Note that (∗) can be written as wj = n X pij vi i=1 for j = 1, . . . , n. We call the matrix p11 p12 · · · p21 p22 · · · P = ··· pn1 pn2 · · · p1n p2n pnn the change of basis matrix from B1 to B2 . It is important to note that the columns of the matrix P are the rows of the coefficients in (∗) (or, equivalently, the columns of P are the column coordinate vectors of w1 , . . . , wn w.r.t. B1 ). Note that, writing B1 and B2 as rows (v1 , . . . , vn ) and (w1 , . . . , wn ) respectively, (∗) can be written as (w1 , . . . , wn ) = (v1 , . . . , vn )P. Example Let B1 = {e1 , e2 , e3 } be the standard basis of R3 and let B2 = {w1 , w2 , w3 } be the basis given by w1 = (1, 0, 1) ; w2 = (1, 1, 0) ; w3 = (1, 1, 1). Then w1 = 1.e1 + 0.e2 + 1.e3 w2 = 1.e1 + 1.e2 + 0.e3 w3 = 1.e1 + 1.e2 + 1.e3 11 and so the change of basis matrix from B1 to B2 is 1 1 1 P = 0 1 1 . 1 0 1 We also have e1 = 1.w1 + 1.w2 − 1.w3 e2 = −1.w1 + 0.w2 + 1.w3 e3 = 0.w1 − 1.w2 + 1.w3 , so the change of basis matrix from B2 to B1 is 1 −1 0 0 −1 . Q= 1 −1 1 1 Lecture 8 1. Vector spaces contd. Comments 1. Notice that, in the example at the end of Lecture 7, the columns of P are just the elements of B2 written as column vectors. This true is true more generally. Important fact Let B1 = {e1 , . . . , en } be the standard basis of Fn and let B2 = {v1 , . . . , vn } be another basis. Suppose that v1 = (α11 , α21 , . . . , αn1 ) , v2 = (α12 , α22 , . . . , αn2 ) , . . . , vn = (α1n , α2n , . . . , αnn ) (that is, vj = (α1j , α2j , . . . , αnj ) for j = 1, . . . , n). Then the change of basis matrix from B1 to B2 is the matrix α11 α12 · · · α1n α21 α22 · · · α2n .. .. .. , . . . αn1 αn2 · · · αnn the columns of which are just the vectors vj written as column vectors. 2. Note also that, in this same example, P Q = QP = I3 , the 3 × 3 identity matrix. Hence the change of basis matrix P from B1 to B2 is invertible and P −1 = Q, the change of basis matrix from B2 to B1 . This is true in general. (1.13) Theorem Let B1 = {v1 , . . . , vn } and B2 = {w1 , . . . , wn } be two bases of a vector space V . Let P be the change of basis matrix from B1 to B2 and Q the change of basis matrix from B2 to B1 . Then P is invertible and Q = P −1 . 12 The importance of a change of basis matrix is that it gives a way of relating thecolumn coorλ1 dinates of a vector relative to two bases. Note that we can express the fact that ... is the λn coordinate column vector of a vector v w.r.t. a basis {v1 , . . . , vn } by writing λ1 (∗) v = (v1 , . . . , vn ) ... . λn Formally, (∗) denotes v = v1 λ1 + · · · + vn λn , but we interpret this as the usual expression v = λ1 v1 + · · · + λn vn for v in terms of {v1 , . . . , vn }. (1.14) Theorem Let B1 = {v1 , . . . , vn } and B2 = {w1 , . . . , wn } be two bases of a vector space V . Let P be the change of basis matrix from B1 to B2 and Q the change of basis matrix λ1 from B2 to B1 (so that Q = P −1 ). Let v ∈ V have coordinate column vector ... w.r.t. B1 λn µ1 .. and coordinate column vector . w.r.t. B2 . Then µn µ1 λ1 λ1 µ1 .. . . . . = Q .. and .. = P .. . µn λn λn µn Important Comment Note that, although P expresses B2 in terms of B1 , it is the inverse P −1 that is used to express B2 -coordinates in terms of B1 -coordinates. Examples 1. Let B1 be the standard basis of R2 and let B2 be the basis {(3, −1), (1, 1)}. The change of basis matrix from B1 to B2 is µ ¶ 3 1 P = −1 1 and that from B2 to B1 is µ ¶ 1 1 −1 P = . 4 1 3 µ ¶ a b [General fact: The 2 × 2 matrix is non-singular (invertible) if and only if c d ad − bc 6= 0 and in this case µ ¶−1 µ ¶ 1 a b d −b = .] c d ad − bc −c a −1 Thus the coordinates of (x1 , x2 ) ∈ R2 w.r.t. B2 are given by ¶ µ ¶µ ¶ µ 1 x1 − x2 x1 1 1 −1 = . 4 1 3 x2 4 x1 + 3x2 Thus (x1 , x2 ) = 14 (x1 − x2 )(3, −1) + 14 (x1 + 3x2 )(1, 1). 13 2. Let B1 be the standard basis of R3 and let B2 be the basis {w1 , w2 , w3 } where w1 = (1, 0, 1), w2 = (1, 1, 0), w3 = (1, 1, 1). Then the change of basis from B2 to B1 is 1 −1 0 1 0 −1 −1 1 1 (see the example at the end of Lecture 7). Hence the coordinates of the vector (2, 3, 1) ∈ R3 w.r.t. B2 are given by −1 2 1 −1 0 1 0 −1 3 = 1 . 2 1 −1 1 1 Thus (2, 3, 1) = −w1 + w2 + 2w3 . Lecture 9 2. Linear Mappings Let U and V be two vector spaces over F. Definition A linear mapping (or linear transformation) from U to V is a map T : U → V such that (∗) T (x + y) = T x + T y and T (λx) = λT x for all x, y ∈ U and all λ ∈ F. Notes 1. T is a function defined on U with values in V . We usually write T x rather than T (x) for its value at x ∈ U . 2. Note that (∗) can be replaced by the single condition T (λx + µy) = λT x + µT y for all x, y ∈ U and all λ, µ ∈ F. Examples 1. Let T : R3 → R2 be defined by µ ¶ x1 x1 1 −1 2 x2 , T : x2 → 3 1 0 x3 x3 so that T (x1 , x2 , x3 ) = (x1 − x2 + 2x3 , 3x1 + x2 ). Then T is linear. 2. T : Pn → Pn+1 given by (T p)(t) = tp(t) is linear. 3. T : Pn → Pn given by (T p)(t) = p0 (t) + 2p(t) is linear. 14 4. The linear mappings Fn → Fm are precisely the mappings of the form x1 x1 x2 x2 T : .. → A .. , . . xn xn where A is an m × n matrix (fixed for a given T ). Technical terms Let T : U → V be a linear mapping. (a) If T u = v (u ∈ U, v ∈ V ), then v is the image of u under v. (b) T is one-one (or injective) if, for u1 , u2 ∈ U , u1 6= u2 ⇒ T u1 6= T u2 (equivalently, T u1 = T u2 ⇒ u1 = u2 ). (c) T is onto V (or surjective) if, given v ∈ V , v = T u for some u ∈ U . (d) The kernel of T is the set ker T = {u ∈ U : T u = 0} . (e) The image of T is the set im T = {T u : u ∈ U }; im T is also called the range of T , sometimes denoted by ran T . (2.1) Theorem Let T : U → V be linear. Then (i) T 0 = 0; (ii) ker T and im T are subspaces of U and V respectively; (iii) T is one-one (or injective) if and only if ker T = {0}. Example Let T : R2 → R3 be given by µ T x1 x2 ¶ ¶ 1 2 µ x 1 = −1 0 , x2 1 1 so that T (x1 , x2 ) = (x1 + 2x2 , −x1 , x1 + x2 ). Then ker T = {0}, so that T is one-one, whilst im T = span{(1, −1, 1) , (2, 0, 1)}. Note This example illustrates the important fact that, if T x1 x2 T .. = A . xn : Fn → Fm is a linear mapping defined by x1 x2 .. , . xn where A is an m × n matrix, then im T equals the span (in Fm ) of the columns of A 15 Lecture 10 2. Linear Mappings contd. The matrix of a linear mapping Let T : U → V be linear, and suppose also that B1 = {u1 , . . . , un } is a basis of U and that B2 = {v1 , . . . , vm } is a basis of V . The linearity of T means that T is completely determined by the values of T u1 , T u2 , . . . , T un . Suppose then that T u1 = a11 v1 + a21 v2 + · · · + am1 vm T u2 = a12 v1 + a22 v2 + · · · + am2 vm ········· (∗) T un = a1n v1 + a2n v2 + · · · + amn vm . Note that (∗) can be written as T uj = m X aij vi i=1 for j = 1, . . . , n. We call the matrix a11 a12 · · · a21 a22 · · · A= ··· am1 am2 · · · a1n a2n amn the matrix of T with respect to B1 and B2 . Notes 1. The columns of A are coefficients in the rows of (∗). 2. (∗) can be written as (T u1 , . . . , T un ) = (v1 , . . . , vm ) A. 3. A is an m × n matrix, where n = dim U and m = dim V . 4. If U = V and B1 = B2 = B, we just refer to A as the matrix of T w.r.t. B. 5. Let B1 and B2 be two bases of U , let T be the identity mapping T x = x (x ∈ U ) and consider T as a linear mapping from U equipped with basis B1 to U equipped with basis B2 . Then the matrix of T is the change of basis matrix from B2 to B1 . Examples 1. Let T be linear mapping Pn → Pn+1 defined by (T p)(t) = tp(t). Then the matrix of T w.r.t. the bases {1, t, t2 , . . . , tn } and {1, t, t2 , . . . , tn+1 } of Pn and Pn+1 respectively is the (n + 1) × n matrix 0 0 0 ··· 0 1 0 0 ··· 0 0 1 0 ··· 0 0 0 1 ··· 0 . .. .. . . . ··· . . 0 0 ··· 0 1 16 For instance, the first two and final columns of this matrix come from noting that, since T (1) = t, T (t) = t2 and T (tn ) = tn+1 , we have T (1) = 0.1 + 1.t + 0.t2 + · · · + 0.tn + 0.tn+1 T (t) = 0.1 + 0.t + 1.t2 + · · · + 0.tn + 0.tn+1 T (tn ) = 0.1 + 0.t + 0.t2 + · · · + 0.tn + 1.tn+1 . 2. Let T : Fn → Fm be given by x1 T : ... → C xn x1 .. , . xn where C is an m × n matrix. Then C is the matrix of T w.r.t. the standard bases of Fn and Fm . 3. Let T : R3 → R3 be defined by T (x1 , x2 , x3 ) = (x2 , x3 , x1 ). Consider the basis B = {u1 , u2 , u3 } of R3 , where u1 = (1, 0, 0) ; u2 = (1, 2, 3) ; u3 = (−1, 1, 1). Then T u1 = (0, 0, 1) = λ1 u1 + λ2 u2 + λ3 u3 , where (after equating the three coordinates and solving the resulting equation for λ1 , λ2 , λ3 ) λ1 = −3, λ2 = 1 and λ3 = −3. Hence T u1 = −3u1 + 1.u2 − 2.u3 . Similarly, T u2 = (2, 3, 1) = 11u1 − 2u2 + 7u3 T u3 = (1, 1, −1) = 8u1 − 2u2 + 5u3 . So the matrix of T w.r.t. B is −3 11 8 1 −2 −2 . −2 7 5 Coordinates The matrix of a linear mapping is related to the corresponding mapping of coordinates as follows. (2.2) Theorem Let T : U → V be linear and have matrix A = (aij ) w.r.t. bases B1 = {u1 , . . . , un } and B2 = {v1 , . . . , vm } of U and V respectively. Let u ∈ U have coordinate column vector λ1 λ = ... λn w.r.t. B1 . Then T u has coordinate column vector λ1 a11 · · · a1n .. .. Aλ = ... · · · . . λn am1 · · · amn 17 w.r.t. B2 . Example Let T : U → V have matrix µ 1 2 −1 −1 3 6 ¶ w.r.t. bases {u1 , u2 , u3 } of U and {v1 , v2 } of V . Then T (3u1 − u2 + 2u3 ) has coordinate column vector µ ¶ µ ¶ 3 −1 1 2 −1 −1 = −1 3 6 6 2 w.r.t. {v1 , v2 }, so that T (3u1 − u2 + 2u3 ) = −v1 + 6v2 . Change of bases (2.3) Theorem Let T : U → V be linear, with matrix A w.r.t. bases B1 of U and B2 of V . Let B10 be another basis of U and B20 another basis of V . Then the matrix C of T w.r.t. B10 and B20 is given by C = Q−1 AP, where P is the change of basis matrix from B1 to B10 and Q is the change of basis matrix from B2 to B20 . Lecture 11 2. Linear Mappings contd. The rank theorem Let T : U → V be linear and recall that ker T = {u ∈ U : T u = 0} is a subspace of U , whilst im T = {T u : u ∈ U } is a subspace of V . Suppose that U and V are finite dimensional. Definitions (i) The nullity of T , null(T ), is defined by null(T ) = dim ker T. (ii) The rank of T , rk(T ), is defined by rk(T ) = dim im T. [Alternative notation: n(T ) and r(T )] (2.4) Theorem (The rank theorem) rk(T ) + null(T ) = dim U. Example Let T : R3 → R4 be defined by T (x1 , x2 , x3 ) = (x1 − x2 , x2 − x3 , x3 − x1 , x1 + x2 − 2x3 ). Then x = (x1 , x2 , x3 ) ∈ ker T ⇔ x1 − x2 = x2 − x3 = x3 − x1 = x1 + x2 − 2x3 = 0 ⇔ x1 = x2 = x3 ⇔ x = x1 (1, 1, 1); 18 thus ker T = span{(1, 1, 1)} and null(T ) = 1. Writing (x1 − x2 , x2 − x3 , x3 − x1 , x1 + x2 − 2x3 ) as x1 (1, 0, −1, 1) + x2 (−1, 1, 0, 1) + x3 (0, −1, 1, −2), we see that im T = span{(1, 0, −1, 1) , (−1, 1, 0, 1) , (0, −1, 1, −2)}. The first two of these vectors are clearly linearly independent but (0, −1, 1, −2) = −(1, 0, −1, 1) − (−1, 1, 0, 1). Hence im T = span{(1, 0, −1, 1) , (−1, 1, 0, 1)}, {(1, 0, −1, 1) , (−1, 1, 0, 1)} is a basis of im T and rk(T ) = 2. Thus rk(T ) + null(T ) = 2 + 1 = 3 = dim R3 as expected. Note: dim V does not appear in the statement of the rank theorem; this is not surprising as the space into which T is mapping could be adjusted (say, enlarged) without changing im T . Corollary to the rank theorem Let T : U → V be linear, where U and V are finite dimensional. We have the following. (i) rk(T ) ≤ dim U. (ii) Suppose that dim U = dim V. Then T is one-one (i.e. is injective) if and only it maps U onto V (i.e. is surjective). In this case, T is bijective and the inverse mapping T −1 : V → U is also linear. Comments 1. It is clear that rk(T ) ≤ dim V . Result (i) is saying something less obvious. 2. When T is bijective as in (ii), we say that T is invertible or is a linear isomorphism. Rank and nullity of a matrix Let A be an m × n matrix. Then there is an associated linear mapping TA : Fn → Fm given by x1 x1 TA : ... → A ... . xn xn The rank of A, rk(A), is defined to be the rank of TA and the nullity of A, null(A), is defined to be the nullity of TA . Since im TA is the span in Fm of the columns of A, rk(A) is sometimes referred to as the column rank of A. It equals the maximum number of linearly independent columns of A. One can also consider the maximum number of linearly independent rows of A (equivalently, the dimension of the span in Fn of the rows of A). This is called the row rank of A. It can be shown, though this is not at all obvious, that the row and column ranks of a matrix are always equal (Tut. Qn. 7.1 will indicate how this is proved). 19 Notice also that, if A = (aij ), then null(A) is the maximum number of linearly independent solutions of the system of linear equations a11 x1 + a12 x2 + · · · + a1n xn = 0 a21 x1 + a22 x2 + · · · + a2n xn = 0 ········· am1 x1 + am2 x2 + · · · + amn xn = 0 . Sums and products of linear mappings Let S, T : U → V be two linear mappings. Then we define S + T : U → V by (S + T )u = Su + T u (u ∈ U ) and, for λ ∈ F, we define λS : U → V by (λS)u = λSu (u ∈ U ). It is easy to check that S + T and λS are linear mappings and that, if S has matrix A and T has matrix B w.r.t. some bases of U and V , then S + T has matrix A + B and λS has matrix λA w.r.t. the same bases. Comment: Denote by L(U, V ) the set of all linear mappings from U into V . Then, with the above definitions of addition and scalar multiplication, L(U, V ) is itself a vector space. Suppose now that T : U → V and S : V → W are linear. We get the composition R = S ◦ T : U → W , defined by Ru = S(T u) (u ∈ U ). Again it is easy to check that R is linear. For linear mappings, we normally write such a composition as R = ST and call R the product of S and T . Example Let T : R2 → R3 and S : R3 → R2 be defined by T (x1 , x2 ) = (x1 + x2 , 0, x2 ) and S(x1 , x2 , x3 ) = (x2 , x3 − x1 ). Then ST : R2 → R2 is given by (ST )(x1 , x2 ) = S(x1 + x2 , 0, x2 ) = (0, −x1 ). We also have T S : R3 → R3 given by (T S)(x1 , x2 , x3 ) = T (x2 , x3 − x1 ) = (x2 + x3 − x1 , 0, x3 − x1 ). Note: ST and T S are completely different mappings. 20 Lecture 12 2. Linear Mappings contd. (2.5) Theorem Let T : U → V and S : V → W be linear mappings. Suppose that T has matrix A w.r.t. bases B1 of U and B2 of V , and S has matrix C w.r.t. the bases B2 of V and B3 of W . Then ST has matrix CA w.r.t. B1 and B3 . Notes 1. Given S, T ∈ L(U, U ), both products ST and T S are defined. However, when dim U > 1, it may not be the case that ST equals T S. This is because, when n > 1, there always exist n × n matrices A, B such that AB 6= BA. 2. When T ∈ L(U, U ), we can form positive powers T n of T by taking n compositions of T with itself. It is customary to define T 0 to be the identity mapping Iu = u on U . Further, when T is invertible, negative powers of T are defined by setting T −n = (T −1 )n for n = 1, 2, . . . . With these definitions, the expected index laws hold for the range of indices for which they make sense. For instance, (T n )(T m ) = T (m+n) holds for general T when m, n ≥ 0 and holds for invertible T when m, n ∈ Z. Singular and non-singular matrices Let A be an n × n matrix with entries in Fn and let In be the n × n identity matrix (i.e. it has 1’s on the diagonal and 0’s elsewhere). The following is a consequence of the Corollary to the rank theorem. (2.6) Theorem The following statements are equivalent. (i) There exists an n × n matrix B such that BA = In . (ii) There exists an n × n matrix B such that AB = In . (iii) The columns of A are linearly independent. x1 (iv) Ax = 0 ⇒ x = 0 [where x = ... ]. xn Definition When (i)-(iv) hold, we say that A is non-singular; otherwise, A is said to be singular. Thus A is singular precisely when its columns are linearly dependent. It can be shown that A is singular if and only if det A = 0 (equivalently, A is non-singular if and only if det A 6= 0). Linear Equations Let T : U → V , let v ∈ V be given, and suppose that we are interested in solving the equation Tx = v (†); i.e. we wish to find all x ∈ U such that T x = v. The basic simple result is the following. (2.7) Theorem The equation (†) has a solution if and only if v ∈ im T. Assume that this is the case and let x = x0 be some solution of (†). Then the general solution of (†) is x = x0 + u, 21 where u ∈ ker T . Comment This result is illustrated , for instance, in the following. 1. Let A be an m × n matrix of m linear equations Ax = b in and considerthe system x1 b1 .. .. the variables x = . , where b = . . This has a solution if and only if b xn bm is in the span of the columns of A and, in that case, the general solution has the form x = x0 + u, where x0 is a particular solution and u is the general solution of the associated homogeneous equation Ax = 0. 2. Let L be a linear differential operator (e.g. a linear combination of derivatives dk x/dtk ) and consider the equation Lx = f , where f (t) is some given function. Then the general solution of this equation has the form x(t) = x0 (t) + u(t), where x0 (t) is a solution of the equation (called, in this context, a particular integral), and u(t) is the general solution of the associated homogeneous equation Lx = 0 (called the complementary function). Eigenvalues, eigenvectors and diagonalization Let S : U → U be a linear mapping, where U is a vector space over F. Definition A scalar λ ∈ F is an eigenvalue of T if there exists u 6= 0 such that T u = λu. Such a (non-zero) vector u is called an eigenvector associated with λ. The set {u : T u = λu}, which equals ker(T − λI) where I is the identity operator Iu = u on U , is called the eigenspace corresponding to λ. It is clearly a subspace of U (being the kernel of a linear mapping). We also have the following related definition. Definition We say that T is diagonalizable if U has a basis consisting of eigenvectors of T . We then have the following simple result that explains the terminology. (2.8) Theorem T is diagonalizable if and only if there exists a basis B = {u1 , . . . , un } of U w.r.t. which the matrix of T is diagonal, i.e. has the form λ1 0 · · · 0 0 λ2 · · · 0 ··· ··· . 0 · · · λn Suppose now that A is an n × n matrix, with an associated linear mapping TA : x → Ax from Fn to itself. Then TA is diagonalizable if and only if there exists a non-singular n × n matrix P such that P −1 AP is diagonal. [P will be the change of basis matrix from the standard basis of Fn to the basis consisting of eigenvectors of TA .] In these circumstances, we say that A is diagonalizable. (2.9) Theorem Let λ1 , . . . , λn be distinct eigenvalues of T , with corresponding eigenvectors u1 , . . . , un . Then {u1 , . . . , un } is linearly independent. Corollary If T : U → U is linear, dim U = n and T has n distinct eigenvalues, then T is diagonalizable. 22 Calculation of eigenvalues Let T : U → U be linear, suppose that dim U = n and let T have matrix A w.r.t. some basis of U . Then λ is an eigenvalue of T ⇔ ⇔ ⇔ ⇔ ker(T − λI) 6= {0} T − λI is not invertible A − λIn is singular det(A − λIn ) = 0. Here In denotes the n × n identity matrix. Notice that det(A−λIn ) is a polynomial in λ of degree n. It is called the characteristic polynomial of T ; its roots (in F) are precisely the eigenvalues of T . It is easy to check, using the change of bases formula for the matrix of a linear mapping (Theorem (2.3)), that the characteristic polynomial depends on T but not the particular matrix A representing T . The underlying scalar field is important here. If F = C, then T will always have an eigenvalue, since every complex polynomial has a root in C (the so-called Fundamental Theorem of Algebra). However, when F = R, the characteristic polynomial may have no real roots if n is even, though it will always have at least one real root if n is odd. Thus, a linear mapping on an even dimensional real vector space might not have any eigenvalues, though such a mapping on an odd dimensional real vector space will always have an eigenvalue. Comment Starting with an n × n matrix A with entries in F, the polynomial det(A − λIn ) is also referred to as the characteristic polynomial of A. Its roots are the eigenvalues of the associated linear mapping TA : x → Ax on Fn . Lecture 13 3. Inner product spaces We now extend the idea of an inner (or scalar) product in R2 and R3 to more general vector spaces. In this context, it is necessary to distinguish between real and complex vector spaces. Definition Let V be a vector space over F = R or C. An inner product on V is a map that associates with each pair of vectors x, y ∈ V a scalar hx, yi ∈ F (more formally, x, y → hx, yi is a function from V × V to F) such that, for all x, y and z ∈ V and all λ, µ ∈ F, hx, yi when F = R (i) hy, xi = hx, yi when F = C ; (ii) hλx + µy, zi = λhx, zi + µhy, zi; (iii) hx, xi ≥ 0, with equality if and only if x = 0. A real (resp. complex) inner product space is a vector space over R (resp. over C) endowed with an inner product. Comments 23 1. It follows from (i) that, even when F = C, hx, xi ∈ R for all x ∈ V ; thus (iii) both makes sense and strengthens this. Further, it is easy to see that (ii) implies that h0, 0i = 0, so that, for (iii), we need hx, xi ≥ 0 and hx, xi = 0 ⇒ x = 0. 2. For a real inner product space V , properties (i) and (ii) together imply that hx, λy + µzi = λhx, yi + µhx, zi for all x, y, z ∈ V and all λ, µ ∈ R, and so hx, yi is linear in the second variable y as well as in the first variable x. However, for a complex inner product space V , hx, λy + µzi = λhx, yi + µhx, zi for all x, y, z ∈ V and all λ, µ ∈ C. In this case, we say that hx, yi is conjugate linear in the second variable y (it is linear in the first variable x by (ii)) . Examples 1. In Rn , define hx, yi = x1 y1 + x2 y2 + · · · + xn yn for x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). Then h·, ·i is an inner product, referred to as the standard inner product on Rn . 2. On R2 , define hx, yi = x1 y1 + 4x2 y2 . Then h·, ·i is an inner product on R2 . 3. In Cn , define hz, wi = z1 w1 + z2 w2 + · · · + zn wn for z = (z1 , . . . , zn ) and w = (w1 , . . . , wn ). Then h·, ·i is an inner product, referred to as the standard inner product on Cn . 4. hz, wi = z1 w1 + 4z2 w2 defines an inner product on C2 . 5. On R2 , define hx, yi = 2x1 y1 + x1 y2 + x2 y1 + x2 y2 . Properties (i) and (ii) for an inner product are easily seen to be satisfied. Also hx, xi = 2x21 + 2x1 x2 + x22 = 2(x1 + x2 /2)2 + x22 /2 ≥ 0, with equality in the last line if and only if x1 = x2 = 0. Thus h·, ·i is inner product on R2 . 6. Now define hx, yi = x1 y1 + 2x1 y2 + 2x2 y1 + x2 y2 on R2 . Again it is easy to see that properties (i) and (ii) of an inner product are satisfied. However, hx, xi = x21 + 4x1 x2 + x22 = (x1 + 2x2 )2 − 3x22 < 0, provided x2 6= 0 and x1 = −2x2 . Thus h·, ·i is not an inner product on R2 in this case. 24 7. Fix a < b and, for p, q ∈ Pn , define Z b hp, qi = p(t)q(t) dt. a Then h·, ·i is an inner product on Pn . Properties (i) and (ii) follow from the linearity of the integral, whilst (iii) follows from the fact that, if ϕ : [a, b] → R is continuous, ϕ(t) ≥ 0 Rb on [a, b] and a ϕ(t) dt = 0, then ϕ ≡ 0 on [a, b]. Thus, with this inner product, Pn is a real inner product space. 8. Let V = CC [a, b] be the vector space of all continuous complex valued functions defined on [a, b], where a < b. (Continuity for complex valued functions is defined as the natural extension of the definition for real valued functions, using the modulus in C to measure ‘nearness’). For f, g ∈ V , define Z b hf, gi = f (t)g(t) dt. a (For a complex valued function ϕ on [a, b], the integral Rb Rb Reϕ(t) dt + i a Imϕ(t) dt.) a Rb a ϕ(t) dt is defined to be Rb It is straightforward to check that h·, ·i satisfies (i) and (ii). Also, hf, f i = a |f (t)|2 dt ≥ 0, with equality if and only if f (t) ≡ 0 on [a, b] (using the continuity of the non-negative function |f (t)|2 ). Hence h·, ·i also satisfies (iii) and so defines an inner product on the complex vector space CC [a, b] Lecture 14 3. Inner product spaces contd. Norms Suppose that V, h·, ·i is a real or complex inner product space. Then hx, xi ≥ 0 for all x ∈ V . Definition The norm of x ∈ V , kxk, is defined as kxk = hx, xi1/2 , where 1/2 denotes the non-negative square root. (3.1) Theorem (The Cauchy-Schwarz inequality) We have |hx, yi| ≤ kxkkyk for all x, y ∈ V . Applications of the Cauchy-Schwarz inequality 1. Let a1 , . . . , an , b1 , . . . , bn ∈ R. Then |a1 b1 + · · · + an bn | ≤ (a21 + · · · + a2n )1/2 (b21 + · · · + b2n )1/2 . 25 2. Let p, q be two real polynomials (thought of as in Pn for some n) and let a < b. Then, Rb applying the Cauchy-Schwarz inequality with the inner product hp, qi = a p(t)q(t) dt, we get ¯ ½Z b ¯Z b ¾1/2 ½Z b ¾1/2 ¯ ¯ 2 2 ¯ p(t)q(t) dt¯¯ ≤ |p(t)| dt |q(t)| dt . ¯ a a a (3.2) Theorem The norm k · k in an inner product space V satisfies (i) kxk ≥ 0 with equality if and only if x = 0; (ii) kλxk = |λ|kxk; (iii) kx + yk ≤ kxk + kyk [the triangle inequality] for all x, y, z ∈ V and all λ ∈ F. Orthogonal and orthonormal sets Definitions (i) Two vectors x, y in an inner product space are orthogonal if hx, yi = 0. We write this as x⊥y (ii) A set of vectors {u1 , . . . , uk } in an inner product space is (a) orthogonal if hui , uj i = 0 when i 6= j; ½ 1 if i = j (b) orthonormal if hui , uj i = 0 if i 6= j . Comments 1. If {u1 , . . . , uk } is orthonormal, then kuj k = 1 and so uj 6= 0 for all j. If {u1 , . . . , uk } is u orthogonal and uj 6= 0 for all j, then the set { j : j = 1, . . . , k} is orthonormal. We say kuj k that {u1 , . . . , uk } has been normalized. 2. The term ‘orthonormal’ is often abbreviated to ‘o.n.’. (3.3) Theorem An orthogonal set {u1 , . . . , uk } of non-zero vectors is linearly independent. In particular, an o.n. set {u1 , . . . , uk } is linearly independent. Suppose now that V is an inner product space and dim V = n. Let {u1 , . . . , un } be an orthogonal set of non-zero vectors in V . Then {u1 , . . . , un } is a basis of V and so each vector x ∈ V can be expressed uniquely as x = λ1 u1 +· · ·+λn un . In this situation, it easy to compute the coordinates λj of x ∈ V . (3.4) Theorem Let V be an n-dimensional inner product space. (i) Let {u1 , . . . , un } be an orthogonal set of non-zero vectors in V , and hence a basis of V . Then n X hx, uj i x= uj 2 ku k j j=1 for all x ∈ V . 26 (ii) Let {e1 , . . . , en } be an orthonormal set of vectors in V , and hence an o.n. basis of V . Then n X x= hx, ej iej j=1 and 2 ||xk = n X |hx, ej i|2 (Parseval0 s equality) j=1 for all x ∈ V . Thus, if x = Pn j=1 λj ej , then 2 kxk = n X |λj |2 . j=1 Lecture 15 3. Inner product spaces contd. The Gram-Schmidt process (3.5) Theorem Let V be a finite dimensional inner product space. Then V has an orthogonal, and hence an o.n basis. The proof of this result gives a procedure, called the Gram-Schmidt process, for constructing from a given basis of V an orthogonal basis. Normalization will then yield an o.n. basis. Example Let V be the subspace of R4 with basis {v1 , v2 , v3 }, where v1 = (1, 1, 0, 0), v2 = (1, 0, 1, 1), v3 = (1, 0, 0, 1). (It is easy to check that these three vectors are linearly independent; for instance, v2 and v3 are linearly independent since they are not multiples of each other and, by considering the second coordinate, v1 ∈ / span{v2 , v3 }.) An orthogonal basis {u1 , u2 , u3 } of V can be constructed from {v1 , v2 , v3 } as follows. Let u1 = v1 = (1, 1, 0, 0). Now put u2 = v2 − λu1 = (1, 0, 1, 1) − λ(1, 1, 0, 0) = (1 − λ, −λ, 1, 1) and choose λ so that u2 ⊥ u1 . For this, we need hu2 , u1 i = 1 − λ − λ = 0 or λ = 1/2, giving u2 = ( 12 , − 12 , 1, 1). Now put 1 1 u3 = v3 − λ1 u1 − λ2 u2 = (1, 0, 0, 1) − λ1 (1, 1, 0, 0) − λ2 ( , − , 1, 1) 2 2 and choose λ1 , λ2 so that u3 ⊥ u1 , u2 . For u3 ⊥ u1 , we need hu3 , u1 i = h(1, 0, 0, 1), (1, 1, 0, 0)i − λ1 h(1, 1, 0, 0), (1, 1, 0, 0)i = 0 27 i.e. 1 − 2λ1 = 0 or λ1 = 1/2. (Note that we have used the fact that u1 ⊥ u2 when calculating hu3 , u1 i here; the effect is that we get an equation in λ1 alone.) For u3 ⊥ u2 , we need 1 1 1 1 1 1 hu3 , u2 i = h(1, 0, 0, 1), ( , − , 1, 1)i − λ2 h( , − , 1, 1), ( , − , 1, 1)i = 0, 2 2 2 2 2 2 giving 1 1 1 + 1 − λ2 ( + + 1 + 1) = 0 2 4 4 i.e. λ2 = 3/2 = 3/5. Hence 5/2 1 3 1 1 u3 = (1, 0, 0, 1) − (1, 1, 0, 0) − ( , − , 1, 1) 2 5 2 2 1 1 3 2 = ( ,− ,− , ) 5 5 5 5 1 (1, −1, −3, 2). = 5 We have thus constructed an orthogonal basis {u1 , u2 , u3 } of V given by 1 1 1 1 u1 = (1, 1, 0, 0), u2 = u2 = ( , − , 1, 1) = (1, −1, 2, 2), u3 = (1, −1, −3, 2). 2 2 2 5 Stripping out the fractions, we get (1, 1, 0, 0), (1, −1, 2, 2), (1, −1, −3, 2). It useful to check that these three vectors are indeed orthogonal in pairs. Note also that, to avoid quite so many fractions, we could have replaced u2 = ( 12 , − 12 , 1, 1) by (1, −1, 2, 2) earlier when calculating u3 . To obtain an o.n. basis {e1 , e2 , e3 }, we normalize to get e1 = u1 1 u2 1 u3 1 = √ (1, 1, 0, 0), e2 = = √ (1, −1, 2, 2), e3 = = √ (1, −1, −3, 2). ku1 k ku2 k ku3 k 2 10 15 Consider v = v1 + v2 − v3 = (1, 1, 1, 0) ∈ V . We can write v as λ1 e1 + λ2 e2 + λ3 e3 , the coefficients here being given by √ √ √ 2 2 10 −3 15 λ1 = hv, e1 i = √ = 2, λ2 = hv, e2 i = √ = , λ3 = hv, e3 i = √ = − . 5 5 2 10 15 Notice that λ21 + λ22 + λ23 = 2 + 10 15 + = 3 = kvk2 , 25 25 as expected by Parseval’s equality. Another example R1 Find a basis of P2 that is orthogonal w.r.t. the inner product given by hp, qi = 0 p(t)q(t) dt. To do this, start with the basis {v1 , v2 , v3 }, where v1 (t) = 1, v2 (t) = t, v3 (t) = t2 , and apply the Gram-Schmidt process. 28 Put u1 = v1 ; then u2 (t) = v2 (t) − λu1 (t) = t − λ. For u2 ⊥ u1 , we need Z 1 (t − λ).1 dt = 0, i.e. λ = 1/2. 0 This gives u2 (t) = t − 1/2. Now put u3 = v3 − λ1 u1 − λ2 u2 . For u3 ⊥ u1 , we need Z 1 Z 1 Z 1 2 (v3 (t) − λ1 u1 (t))u1 (t) dt = 0, i.e. t dt − λ1 1 dt = 0. 0 0 0 So λ1 = 1/3. For u3 ⊥ u2 , we need Z 1 Z (v3 (t) − λ2 u2 (t))u2 (t) dt = 0, i.e. 0 This leads to 1 Z 1 2 t (t − 1/2) dt − λ2 0 (t − 1/2)2 dt = 0. 0 R1 t2 (t − 1/2) dt 1/12 λ2 = R0 1 = = 1. 2 dt 1/12 (t − 1/2) 0 Hence u3 (t) = t2 − 1/3 − (t − 1/2) = t2 − t + 1/6 and we obtain the orthogonal basis 1, t − 1/2, t2 − t + 1/6. Lecture 16 3. Inner product spaces contd. Orthogonal complements and orthogonal projections Let U be a subspace of an inner product space V . Definition The orthogonal complement of U is the set U ⊥ defined by U ⊥ = {x ∈ V : hx, ui = 0 for all u ∈ U }. (3.6) Theorem Let U be a subspace of a finite dimensional inner product space V . Then each x ∈ V can be written uniquely as x = y + z, where y ∈ U and z ∈ U ⊥ . Thus V = U ⊕ U ⊥ . Terminology Let V and U be as in Theorem (3.6). Given x ∈ V , define the orthogonal projection of x onto U to be the unique y ∈ U with x = y + z, where z ∈ U ⊥ , and write y = PU x. We can then think of PU as a mapping V → V . This mapping is called the orthogonal projection of V onto U . The following are its main properties. (3.7) Theorem With V, U and PU as above, the following hold. (i) PU is a linear mapping with im PU = U and ker PU = U ⊥ . 29 (ii) If {e1 , . . . , ek } is an o.n. basis of U , then PU is given by the formula k X PU x = hx, ej i ej (x ∈ V ). j=1 If {u1 , . . . , uk } is an orthogonal basis of U , then PU is given by the formula PU x = k X j=1 1 hx, uj i uj kuj k2 (x ∈ V ). (iii) PU2 = PU . (iv) x ∈ U if and only if x = PU x. (v) For x ∈ V , kxk2 = kPU xk2 + kx − PU xk2 . [Strictly, we should consider the case when U = {0} separately. Then U ⊥ = V and PU = 0.] Nearest point to a subspace. In an inner product space, the norm is used to measure distance, with kx − yk being thought of as the distance between x and y. (3.8) Theorem Let U be a subspace of a finite dimensional inner product space V and let x ∈ V . Then kx − uk ≥ kx − PU xk for all u ∈ U , with equality if and only if u = PU x. Thus, with distance measured using the norm, PU x is the unique nearest point of U to x. Connection with Fourier series Let V denote the space of 2π-periodic continuous functions f : R → C and give V the inner product defined by Z π 1 hf, gi = f (t)g(t) dt. 2π −π For n ∈ Z, let en ∈ V be the function en (t) = eint . Then {en : n ∈ Z} is an orthonormal set and, for f ∈ V , hf, en i is the nth (complex) Fourier coefficient Z π 1 f (t)e−int dt cn = 2π −π of f . Thus the Fourier series of f can be written as ∞ X hf, en i en . n=−∞ This suggests that, in some sense, the Fourier series of f is its expansion in terms of the o.n. ‘basis’ {en : n ∈ Z}. This is, indeed, a reasonable interpretation but, to make it more rigorous, one has to decide how to make sense of the convergence of the Fourier series of f ∈ V and also 30 whether V is the right space to consider these matters. What is true is that, for f ∈ V with ∞ P Fourier series cn e n , n=−∞ kf − N X ( hf, en ien k = n=−N 1 2π Z π |f (t) − −π )1/2 N X cn en (t)|2 dt →0 n=−N as N → ∞. Further, fixing N ∈ N, amongst all the functions g ∈ span{en : n = −N, . . . , N }, N P the N th partial sum cn en of the Fourier series of f is the unique function g that minimizes n=−N ½ kf − gk = 1 2π Z ¾1/2 π 2 |f (t) − g(t)| dt . −π In fact, V is not quite the correct space to discuss these issues − this will be discussed in more detail in later years. Lecture 17 4. Adjoints and self-adjointness Assume throughout this chapter that V is a finite dimensional inner product space. The aim is to show that, given a linear mapping T : V → V , there is a (unique) linear mapping T ∗ : V → V such that hT ∗ x, yi = hx, T yi for all x, y ∈ V and then to study those mappings for which T = T ∗ . (4.1) Theorem Given a linear mapping T : V → V , there is a unique linear mapping T ∗ : V → V such that hT ∗ x, yi = hx, T yi for all x, y ∈ V . Furthermore, if {e1 , . . . , en } is an o.n. basis of V and T has matrix A = (aij ) with respect to {e1 , . . . , en }, then T ∗ has matrix C = (cij ) with respect to {e1 , . . . , en }, where (when F = R) aji cij = aji (when F = C) . Notation and terminology 1. The mapping T ∗ is called the adjoint of T . 2. Given a matrix A = (aij ), its transpose is the matrix At = (aji ) and, when the entries are complex, its conjugate transpose is A∗ = (aji ) = (A)t . Thus, for a given linear mapping T : V → V with matrix A with respect to some o.n. basis of V , (a) if V is a real inner product space, T ∗ has matrix At , (b) if V is a complex inner product space, T ∗ has matrix A∗ with respect to the same o.n. basis. Definition A linear mapping T : V → V is said to be self-adjoint if T ∗ = T , i.e. if hT x, yi = hx, T yi for all x, y ∈ V . Thus T is self-adjoint if and only if its matrix A w.r.t. any (and hence every) o.n. basis satisfies 31 (a) At = A if V is a real inner product space; (b) A∗ = A if V is a complex inner product space. Definitions A matrix A for which At = A is called symmetric whilst (in the case of complex entries) it is called hermitian when A∗ = A. Although a linear mapping on a complex (finite dimensional) vector space always has an eigenvalue, it is possible for a linear mapping on a real vector space not to have any eigenvalues. However, this phenomenon cannot occur for a self-adjoint mapping. Indeed, more can be said about eigenvalues and eigenvectors of self-adjoint linear mappings. (4.2) Theorem Let T : V → V be self-adjoint. (i) Every eigenvalue of T is real (even when V is a complex inner product space). (ii) Let λ, µ be distinct eigenvalues of T with corresponding eigenvectors u, v. Then hu, vi = 0. (iii) T has an eigenvalue (even when V is a real inner product space). Lecture 18 4. Adjoints and self-adjointness contd. Our aim now is to show that, given a self-adjoint T on V , V has an o.n. basis consisting of eigenvectors of T (and so, in particular, T is diagonalizable). The following result will be useful in doing this, as well as being of interest in its own right. (4.3) Theorem Let T : V → V be self-adjoint and let U be a subspace of V such that T (U ) ⊆ U . Then T (U ⊥ ) ⊆ U ⊥ and the restrictions of T to U and U ⊥ are self-adjoint. We now have the main result of this chapter. (4.4) Theorem (The finite dimensional spectral theorem) Let V be a finite dimensional inner product space and let T : V → V be a self-adjoint linear mapping. Then V has an o.n. basis consisting of eigenvectors of T . An illustration of the use of the spectral theorem Suppose that T : V → V is self-adjoint and that every eigenvalue λ of T satisfies λ ≥ 1. Then hT x, xi ≥ kxk2 for all x ∈ V. Proof Let {e1 , . . . , en } be an o.n. basis of V with T ej = λj ej for j = 1, . . . , n, where λj ≥ 1 for all j. Let x ∈ V . We have x = x1 e1 + · · · + xn en , where xj = hx, ej i, and T x = x1 λ1 e1 + · · · + xn λn en . 32 Then hT x, xi = h = ≥ n X xj λj ej , j=1 n X |xj |2 xk e k i k=1 λj |xj |2 j=1 n X n X since hej , ek i = δij since each λj ≥ 1 j=1 = kxk2 by Parseval0 s equality. Here δij is the Kronecker delta symbol, defined as 0 if i 6= j δij = 1 if i = j . Lecture 19 4 Adjoints and self-adjointness contd. Real symmetric matrices Let A be a real n × n matrix. This has an associated mapping TA : Rn → Rn given by TA : x → Ax, where we are thinking of x as a column vector. Then A is the matrix of TA and At is the matrix of (TA )∗ w.r.t. the standard basis of Rn . Suppose now that A is symmetric (i.e. At = A). Then TA is self-adjoint since in this case (TA )∗ = TAt = TA . Aside: This can also be seen as follows. Note that, writing vectors in Rn as columns and thinking x1 of these as n × 1 matrices, we can express the standard inner product of x = ... and xn y1 y1 .. .. t [= (x1 , . . . , xn ) . ]. We then have y = . as hx, yi = x y yn yn hAx, yi = (Ax)t y = (xt At )y = (xt A)y = xt (Ay) = hx, Ayi, which shows that TA is self-adjoint. Applying the spectral theorem to TA , we conclude that there is an o.n. basis {u1 , . . . , un } of Rn 33 consisting of eigenvectors of TA , say TA uj = λj uj for j = 1, . . . , n. Consider the n × n matrix R = (u1 . . . un ) (i.e. uj is the j th column of R). Then Auj is the j th column of AR and so AR = (Au1 . . . Aun ) = (λ1 u1 . . . λn un ) λ1 0 0 · · · 0 0 λ2 0 · · · 0 = (u1 . . . un ) .. .. . . . ··· . . 0 0 · · · 0 λn = RD, λ1 0 0 · · · 0 0 λ2 0 · · · 0 where D is the diagonal matrix .. .. . . . . ··· . . 0 0 · · · 0 λn Now Rt is the n × n matrix with j th row utj (where uj is a column and so utj is the vector uj written as a row). Thus ut1 Rt R = ... (u1 . . . un ), utn so the ij th entry of Rt R is uti uj = hui , uj i = δij . Thus Rt R = In , the n × n identity matrix. Thus R is invertible and R−1 = Rt . (It could have been noted earlier that R is invertible since its columns are linearly independent. However, the above calculation shows that its inverse can be written down without any further work.) Definition An orthogonal matrix is an n × n real matrix R (for some n) such that Rt R = In [equivalently, R is invertible and R−1 = Rt or the columns (or rows) of R form an o.n. basis of Rn ]. Returning to the above equation AR = RD, premultiplication by Rt gives Rt AR = D. It is easy to show (by considering the characteristic polynomial of A, which equals the characteristic polynomial of R−1 AR = Rt AR = D) that the diagonal elements of D are precisely the eigenvalues of A, repeated according to their multiplicity as roots of the characteristic polynomial. (4.5) Theorem (The spectral theorem for real symmetric matrices) Let A be a real n × n symmetric matrix. Then there is an n × n orthogonal matrix R such that λ1 0 0 · · · 0 0 λ2 0 · · · 0 t −1 R AR = R AR = .. .. , . . . ··· . . 0 0 · · · 0 λn where λ1 , λ2 , . . . , λn are the (necessarily real) eigenvalues of A, repeated according to their multiplicity as roots of the characteristic polynomial of A. Example 34 Let 2 1 1 A = 1 2 1 . 1 1 2 Find an orthogonal matrix R such that Rt AR is diagonal. There are three steps to do this. 1. Find the eigenvalues of A. We have (after some computation) det(A − λI3 ) = (1 − λ)2 (4 − λ). Thus the eigenvalues, which are the roots of det(A − λI3 ) = 0, are 1 and 4. (As a check, their sum counting multiplicities is 1 + 1 + 4 = 6 and this equals the trace of A, the sum of the diagonal elements of A. This will always be the case.) 2. Find o.n. bases of the corresponding eigenspaces. Eigenspace corresponding to λ = 1: We need to solve (A − 1.I3 )x = 0, i.e. x1 + x2 + x3 = 0, leading to x = (x1 , x2 , −x1 − x2 ) = x1 (1, 0, −1) + x2 (0, 1, −1). Thus {v1 , v2 } is a basis of this eigenspace, where v1 = (1, 0, −1) and v2 = (0, 1, −1). An application of the Gram-Schmidt process to {v1 , v2 } gives the orthogonal basis {(1, 0, −1) , (−1/2, 1, −1/2)} or (stripping out the fractions) {(1, 0, −1) , (−1, 2, −1)}. Normalizing, we get the o.n. basis {e1 , e2 } where 1 1 e1 = √ (1, 0, −1) ; e2 = √ (−1, 2, −1). 2 6 Eigenspace corresponding to λ = 4: For this we need to solve (A − 4I3 )x = 0, leading to the three equations −2x1 + x2 + x3 = 0 x1 − 2x2 + x3 = 0 x1 + x2 − 2x3 = 0. The third of these equations is just minus the sum of the first two and so is redundant. Eliminating x3 from the first two gives x1 = x2 and then x3 = x1 from the first, leading to the general solution x = x1 (1, 1, 1). Thus an o.n. basis for this eigenspace consists of the single vector 1 e3 = √ (1, 1, 1). 3 3. Obtain the required orthogonal matrix. Now form the matrix R = (e1 e2 e3 ), with the ej ’s written as column vectors; i.e. let 1 −1 1 R= √ 2 0 −1 √ 2 This matrix is orthogonal and √ √ 6 3 √2 √1 6 3 −1 √1 √ 6 3 . 1 0 0 Rt AR = A = 0 1 0 . 0 0 4 35 Lecture 20 4 Adjoints and self-adjointness contd. Hermitian matrices Suppose that A = (aij ) ∈ Mn (C) is hermitian, i.e. A∗ = A or aji = aij for all i, j. In this case, the associated mapping TA : Cn → Cn given by TA : z → Az, where, as in the discussion of real symmetric matrices, we are thinking of z as a column vector, is self-adjoint. We have already seen this, but a direct proof is as follows. z1 Write a typical vector of Cn as a column vector z = ... , thought of as an n × 1 matrix. zn Then z ∗ = (z1 . . . zn ) and hence hw, zi = w1 z1 + · · · + wn zn = z ∗ w. We then have hAw, zi = z ∗ (Aw) = (z ∗ A)w = (z ∗ A∗ )w = (Az)∗ w = hw, Azi, showing that TA is self-adjoint. Again, we can apply the spectral theorem to TA and conclude that there is an o.n. basis {u1 , . . . , un } of Cn consisting of eigenvectors of TA , say TA uj = λj uj for j = 1, . . . , n, where each λj ∈ R since the eigenvalues of self-adjoint mappings are always real. Guided by the discussion of real symmetric matrices, consider the n × n matrix U = (u1 . . . un ) (i.e. uj is the j th column of U ). Then Auj is the j th column of AU and so AU = (Au1 . . . Aun ) = (λ1 u1 . . . λn un ) λ1 0 0 · · · 0 0 λ2 0 · · · 0 = (u1 . . . un ) .. . . .. . ··· . . 0 0 · · · 0 λn = U D, λ1 0 0 · · · 0 0 λ2 0 · · · 0 where D is the diagonal matrix .. .. . . . . ··· . . 0 0 · · · 0 λn Here we have u∗1 U ∗ U = ... (u1 . . . un ) = In , u∗n as the ij th entry of U ∗ U is u∗i uj = huj , ui i = δij . Thus U ∗ U = In , the n × n identity matrix 36 and U −1 = U ∗ . Definition A unitary matrix is a (complex) n × n real matrix U (for some n) such that U ∗ U = In [equivalently, U is invertible and U −1 = U ∗ or the columns (or rows) of U form an o.n. basis of Cn ]. The spectral theorem then takes on the following form. (4.6) Theorem (The spectral theorem for hermitian matrices) Let A be an n × n (complex) hermitian matrix. Then there is an n × n unitary matrix U such that λ1 0 0 · · · 0 0 λ2 0 · · · 0 , U ∗ AU = U −1 AU = .. . . .. . ··· . . 0 0 · · · 0 λn where λ1 , λ2 , . . . , λn are the (necessarily real) eigenvalues of A, repeated according to their multiplicity as roots of the characteristic polynomial of A. Comments about orthogonal and unitary matrices 1. Let R be an n × n orthogonal matrix, so that Rt R = In . Then hRx, Ryi = (Rx)t (Ry) = xt Rt Ry = xt y = hx, yi for all x, y ∈ Rn . Thus the mapping x → Rx from Rn to Rn preserves the inner product. In particular, kRxk = kxk for all x ∈ Rn since kRxk2 = hRx, Rxi = hx, xi = kxk2 . Conversely, suppose that A is an n × n real matrix and hAx, Ayi = hx, yi for all x, y ∈ Rn . Then hAt Ax, yi = hAx, Ayi = hx, yi for all x, y, from which it follows that At A = In , i.e. A is orthogonal. In fact, it suffices to assume that kAxk = kxk for all x ∈ Rn to conclude that A is orthogonal. To prove this, note the expression for an inner product in terms of the associated norm given in Tutorial Qn. 7.7. 2. In much the same way, if U is a unitary n × n matrix, then hU w, U zi = hw, zi and, in particular, kU wk = kwk for all w, z ∈ Cn . Conversely, if A is an n × n complex matrix such that hAw, Azi = hw, zi for all w, z ∈ Cn (or just kAwk = kwk for all w ∈ Cn ), then A is unitary. 37