MATH10212 Linear Algebra Lecture Notes Textbook

Transcription

MATH10212 Linear Algebra Lecture Notes Textbook
MATH10212 Linear Algebra
Lecture Notes
Textbook
Students are strongly advised to acquire a copy of the Textbook:
D. C. Lay. Linear Algebra and its
Applications. Pearson, 2006. ISBN
0-521-28713-4.
Other editions can be used as well; the
book is easily available on Amazon (this
includes some very cheap used copies).
These lecture notes should be
treated only as indication of the
content of the course; the textbook contains detailed explanations, worked out examples, etc.
Normally, homework assignments will
consist of some odd numbered exercises
from the sections of the Textbook covered in the lectures on the previous week.
The Textbook contains answers to most
odd numbered exercises.
About Homework
The Student Handbook 2.10 (f) says:
the recommended text books.
As a rough guide you should be spending
approximately twice the number of
instruction hours in private study,
mainly working through the examples
sheets and reading your lecture notes and
In respect of this course, MATH10212
Linear Algebra B, this means that students are expected to spend 8 (eight!)
hours a week in private study of Linear
Algebra.
What is Linear Algebra?
It is well-known that the total cost of
a purchase of amounts g1 , g2 , g3 of some
goods at prices p1 , p2 , p3 is an expression
Expressions of this kind,
a1 x 1 + · · · + an x n
are called linear forms in variables
x1 , . . . , xn with coefficients a1 , . . . , an .
p1 g1 + p2 g2 + p3 g3 =
3
X
i=1
pi gi .
Linear Algebra studies the mathematics
of linear forms.
2
MATH10212 • Linear Algebra B • Lecture 1 • Linear systems
Lecture 1 Systems of linear equations [Lay 1.1]
A linear equation in the variables
x1 , . . . , xn is an equation that can be
written in the form
a1 x 1 + a2 x 2 + · · · + an x n = b
where b and the coefficients a1 , . . . , an
are real numbers. The subscript n can
be any natural number.
A system of simultaneous linear
equations is a collection of one or more
linear equations involving the same variables, say x1 , . . . , xn . For example,
x1 + x2 = 3
x1 − x2 = 1
• exactly one solution, or
• infinitely many solutions,
under the assumption that the coefficients and solutions of the systems are
real numbers.
A system of linear equations is said to
be consistent it if has solutions (either
one or infinitely many), and a system in
inconsistent if it has no solution.
Solving a linear system
The basic strategy is
We shall abbreviate the words “a system of simultaneous linear equations” just to “a linear system”.
A solution of the system is a list
(s1 , . . . , sn ) of numbers that makes each
equation a true identity when the values
s1 , . . . , sn are substituted for x1 , . . . , xn ,
respectively. For example, in the system
above (2, 1) is a solution.
The set of all possible solutions is called
the solution set of the linear system.
Two linear systems are equivalent if the
have the same solution set.
We shall prove later in the course that
a system of linear equations has either
• no solution, or
to replace one system with
an equivalent system (that is,
with the same solution set)
which is easier to solve.
Existence
questions
and
uniqueness
• Is the system consistent?
• If a solution exist, is it unique?
Equivalence of linear systems
• When are two linear systems
equivalent?
3
MATH10212 • Linear Algebra B • Lecture 2 • Row reduction and echelon forms
Lecture 2 Row reduction and echelon forms [Lay 1.2]
Matrix notation
It is convenient to write coefficients of a
linear system in the form of a matrix, a
rectangular table. For example, the system
x1 − 2x2 + 3x3 = 1
x1 + x2
= 2
x2 + x3 = 3
has the matrix of coefficients


1 −2 3
1 1 0
0 1 1
and the augmented matrix


1 −2 3 1
1 1 0 2 ;
0 1 1 3
notice how the coefficients are aligned
in columns, and how missing coefficients
are replaced by 0.
The augmented matrix in the example
above has 3 rows and 4 columns; we say
that it is a 3×4 matrix. Generally, a matrix with m rows and n columns is called
an m × n matrix.
Elementary row operations
Replacement Replace one row by the
sum of itself and a multiple of another row.
Interchange Interchange two rows.
Scaling Multiply all entries in a row by
a nonzero constant.
The two matrices are row equivalent if
there is a sequence of elementary row operations that transforms one matrix into
the other.
Note:
• The row operations are reversible.
• Row equivalence of matrices is an
equivalence relation on the set of
matrices.
Theorem: Row Equivalence.
If the augmented matrices of two linear
systems are row equivalent, then the two
systems have the same solution set.
A nonzero row or column of a matrix is
a row or column which contains at least
one nonzero entry.
We can now formulate a theorem (to be
proven later).
Theorem:
systems.
Equivalence of linear
Two linear systems are equivalent if and
only if the augmented matrix of one of
them can be obtained from the augmented matrix of another system by frow
operations and insertion / deletion of
zero rows.
MATH10212 • Linear Algebra B • Lecture 3 • Solution of Linear Systems
4
Lecture 3 Solution of Linear Systems [Lay 1.2]
A nonzero row or column of a matrix is
a row or column which contains at least
one nonzero entry. A leading entry of
a row is the leftmost nonzero entry (in a
non-zero row).
Definition. A matrix is in echelon
form (or row echelon form) if it has
the following three properties:
1. All nonzero rows are above any row
of zeroes.
2. Each leading entry of a row is in
column to the right of the leading
entry of the row above it.
3. All entries in a column below a
leading entry are zeroes.
If, in addition, the following two
conditions are satisfied,
4. All leading entries are equal 1.
5. Each leading 1 is the only non-zero
entry in its column
Theorem 1.2.1: Uniqueness of the
reduced echelon form.
Each matrix is row equivalent to one and
only one reduced echelon form.
Definition. A pivot position in a matrix A is a location in A that corresponds
to a leading 1 in the reduced echelon
form of A. A pivot column is a column of A that contains a pivot position.
Example for solving in the lecture
(The Row Reduction Algorithm):


0 2 2 2 2
1 1 1 1 1


1 1 1 3 3
1 1 1 2 2
A pivot is a nonzero number in a pivot
position which is used to create zeroes in
the column below it.
A rule for row reduction:
then the matrix is in reduced echelon
form.
1. Pick the leftmost non-zero column
and in it the topmost nonzero entry; it is a pivot.
An echelon matrix is a matrix in echelon form.
2. Using scaling, make the pivot
equal 1.
Any non-zero matrix can be row reduced (that, transformed by elementary
row operations) into a matrix in echelon
form (but the same matrix can give rise
to different echelon forms).
3. Using replacement row operations, kill all non-zero entries in the
column below the pivot.
Examples. The following is a schematic
presentation of an echelon matrix:


∗ ∗ ∗ ∗
 0 ∗ ∗ ∗
0 0 0 ∗
and this is a reduced echelon matrix:


1 0 ∗ 0 ∗
0 1 ∗ 0 ∗
0 0 0 1 ∗
4. Mark the row and column containing the pivot as pivoted.
5. Repeat the same with the matrix
made of not pivoted yet rows and
columns.
6. When this is over, interchange
the rows making sure that the resulting matrix is in reduced echelon form.
7. Using replacement row operations, kill all non-zero entries in the
column above the pivot entries.
5
MATH10212 • Linear Algebra B • Lecture 3 • Solution of Linear Systems
Solution of Linear Systems
When we converted the augmented matrix of a linear system into its reduced
row echelon form, we can write out the
entire solution set of the system.
Example. Let


1 0 −5 1
0 1 1 4
0 0 0 0
be the augmented matrix of a a linear
system; then the system is equivalent to
x1
− 5x3 = 1
x2 + x3 = 4
0 = 0
The variables x1 and x2 correspond to
pivot columns in the matrix and a re
called basic variables (also leading or
pivot variables). The other variable,
x3 is a free variable.
Free variables can be assigned arbitrary
values and then leading variables expressed in terms of free variables:
x1 = 1 + 5x3
x2 = 4 − x3
x3
is free
Theorem 1.2.2:
Uniqueness
Existence and
A linear system is consistent if and only
if the rightmost column of the augmented matrix is not a pivot column—
that is, if and only if an echelon form of
the augmented matrix has no row of the
form
0 ···
0 b with b nonzero
If a linear system is consistent, then the
solution set contains either
(i) a unique solution, when there are
no free variables, or
(ii) infinitely many solutions, when
there is at least one free variable.
Using row reduction to solve a linear system
1. Write the augmented matrix of the
system.
2. Use the row reduction algorithm
to obtain an equivalent augmented
matrix in echelon form. Decide
whether the system is consistent.
3. if the system is consistent, get the
reduced echelon form.
4. Write the system of equations corresponding to the matrix obtained
in Step 3.
5. Express each basic variable in
terms of any free variables appearing in the equation.
6
MATH10212 • Linear Algebra B • Lectures 5-6 • Vector equations
Lectures 5-6 Vector equations [Lay 1.3]
A matrix with only one column is called
a column vector, or simply a vector.
Rn is the set of all column vectors with
n entries.
1. u + v = v + u
2. (u + v) + w = u + (v + w)
3. u + 0 = 0 + u = u
A row vector: a matric with one row.
4. u + (−u) = −u + u = 0
Two vectors are equal if and only if they
have
5. c(u + v) = cu + cv
6. (c + d)u = cu + du
• the same shape,
7. c(du) = (cd)u
• the same number of rows,
8. 1u = u
• and their corresponding entries are
equal.
The set of al vectors with n entries is
denoted Rn .
The sum u+v of two vectors u and v in
Rn is obtained by adding corresponding
entries in u and v. For example in R2
1
−1
0
+
=
.
2
−1
1
The scalar multiple cv of a vector v
and a real number (“scalar”) c is the
vector obtained by multiplying each entry in v by c. For example in R3 ,
   
1
1.5



0 =
0 .
1.5
−3
−2
The vector whose entries are all zeroes is
called the zero vector and denoted 0:
 
0
0
 
0 =  ..  .
.
0
Operations with row vectors are defined
in a similar way.
Algebraic properties of Rn
For all u, v, w ∈ Rn and all scalars c and
d:
(Here −u denotes (−1)u.)
Linear combinations
Given vectors v1 , v2 , . . . , vp in Rn and
scalars c1 , c2 , . . . , cp , the vector
y = c1 v1 + · · · cp vp
is
called
a
linear
combination of v1 , v2 , . . . , vp with weights
c1 , c2 , . . . , cp .
Rewriting a linear system as
a vector equation
Consider an example: the linear system
x2 + x3 = 2
x 1 + x2 + x3 = 3
x1 + x2 − x3 = 2
can be written as equality of two vectors:

  
x2 + x3
2
x1 + x2 + x3  = 3
x1 + x2 − x3
2
which is the same as
 
 
   
0
1
1
2
x1 1 + x2 1 + x3  1  = 3
1
1
−1
2
7
MATH10212 • Linear Algebra B • Lectures 5-6 • Vector equations
Let us write the

0
1
1
matrix

1
1 2
1
1 3
1 −1 2
in a way that calls attention to its
columns:
a1 a2 a3 b
Denote
 
 
 
0
1
1





1
a1 = 1 , a2 = 1 , a3 =
1
1
−1
and
 
2
b = 3 ,
2
then the vector equation can be written
as
x1 a1 + x2 a2 + x3 a3 = b.
Notice that to solve this equation is the
same as
express b as a linear combination of a1 , a2 , a3 , and find
all such expressions.
right part of the system as a linear combination of columns in its matrix of coefficients.
A vector equation
x1 a1 + x2 a2 + · · · + xn an = b.
has the same solution set as the linear
system whose augmented matrix is
a1 a2 · · · an b
In particular b can be generated by a linear combination of a1 , a2 , . . . , an if and
only if there is a solution of the corresponding linear system.
Definition.
If v1 , . . . , vp are in
n
R , then the set of all linear combination of v1 , . . . , vp is denoted by
Span{v1 , . . . , vp } and is called the subset of Rn spanned (or generated) by
v1 , . . . , vp .
That is, Span{v1 , . . . , vp } is the collection of all vectors which can be written
in the form
c1 v1 + c2 v2 + · · · + cp vp
with c1 , . . . , cp scalars.
Therefore
solving a linear system is the same as
finding an expression of the vector of the
We say that vectors v1 , . . . , vp span Rn
if
Span{v1 , . . . , vp } = Rn
8
MATH10212 • Linear Algebra B • Lecture 4 • The matrix equation Ax = b
The matrix equation Ax = b [Lay 1.4]
Definition. If A is an m × n matrix,
with columns a1 , . . . , an , and if x is in
Rn , then the product of A and x, denoted Ax, is the linear combination of
the columns of A using the corresponding entries in x as weights:
 
x
 .1 
Ax = a1 a2 · · · an  .. 
xn
has the same solution set as the vector
equation
x1 a1 + x2 a2 + · · · + xn an = b
which has the same solution set as the
system of linear equations whose augmented matrix is
a1 a2 · · ·
an b .
= x1 a1 + x2 a2 + · · · + xn an
Example. The system
x2 + x3 = 2
x1 + x2 + x3 = 3
x1 + x2 − x3 = 2
was written as
x1 a1 + x2 a2 + x3 a3 = b.
where
 
 
 
0
1
1





1
a1 = 1 , a2 = 1 , a3 =
1
1
−1
and
 
2
b = 3 .
2
In the matrix product notation it becomes

   
0 1
1 x1
2
1 1




1 x2 = 3
1 1 −1
x3
2
or
Ax = b
where
A = a1 a2 a3 ,
 
x1

x = x2  .
x3
Theorem 1.4.3: If A is an m × n matrix, with columns a1 , . . . , an , and if x is
in Rn , the matrix equation
Ax = b
Existence of solutions. The equation
Ax = b has a solution if and only if b is
a linear combination of columns of A.
Theorem 1.4.4: Let A be an m×n matrix. Then the following statements are
equivalent.
(a) For each b ∈ Rn , the equation
Ax = b has a solution.
(b) Each b ∈ Rn is a linear combination of columns of A.
(c) The columns of A span Rn .
(d) A has a pivot position in every row.
Row-vector rule for computing Ax.
If the product Ax is defined then the ith
entry in Ax is the sum of products of
corresponding entries from the row i of
A and from the vector x.
Theorem 1.4.5: Properties of the
matrix-vector product Ax.
If A is an m × n matrix, u, v ∈ Rn , and
c is a scalar, then
(a) A(u + v) = Au + Av;
(b) A(cu) = c(Au).
9
MATH10212 • Linear Algebra B • Lecture 4 • The matrix equation Ax = b
Solution sets of linear equations [Lay 1.5]
Homogeneous linear systems
A linear system is homogeneous if it
can be written as
Ax = 0.
A homogeneous system always has at
least one solution x = 0 (trivial solution).
Therefore for homogeneous systems an
important question os existence of a
nontrivial solution, that is, a nonzero
vector x which satisfies Ax = 0:
The homogeneous system Ax = b has
a nontrivial solution if and only if the
equation has at least one free variable.
Example.
x1 + 2x2 − x3 = 0
x1 + 3x3 + x3 = 0
Nonhomogeneous systems
When a nonhomogeneous system has
many solutions, the general solution can
be written in parametric vector form a
one vector plus an arbitrary linea combination of vectors that satisfy the corresponding homogeneous system.
Example.
x1 + 2x2 − x3 = 0
x1 + 3x3 + x3 = 5
Theorem 1.5.6: Suppose the equation
Ax = b
is consistent for some given b, and p
be a solution. Then the solution set of
Ax = b is the set of all vectors of the
form
w = p + vh ,
where vh is any solution of the homogeneous equation
Ax = 0.
10
MATH10212 • Linear Algebra B • Lectures 5-6 • Linear independence
Lecture 7: Linear independence [Lay 1.7]
Definition. An indexed set of vectors
{ v1 , . . . , vp }
in Rn is linearly independent if the
vector equation
x1 v1 + · · · + xp vp = 0
has only trivial solution.
Therefore the columns of matrix A are
linearly independent if and only if the
equation
Ax = 0
has only the trivial solution.
A set of one vectors { v1 } is linearly dependent if v1 = 0.
{ v1 , . . . , vp }
A set of two vectors { v1 , v2 } is linearly
dependent if at least one of the vectors
is a multiple of the other.
in Rn is linearly dependent if there exist weights c1 , . . . , cp , not all zero, such
that
c1 v1 + · · · + cp vp = 0
Theorem 1.7.7: Characterisation of
linearly dependent sets. An indexed
set
S = { v1 , . . . , vp }
The set
Linear independence of matrix
columns. The matrix equation
Ax = 0
where A is made of columns
A = a1 · · · an
can be written as
x1 a1 + · · · + xn an = 0
of two or more vectors is linearly dependent if and only if at least one of the
vectors in S is a linear combination of
the others.
Theorem 1.7.8:
dependence of
“big” sets. If a set contains more vectors than entries in each vector, then the
set is linearly dependent. Thus, any set
{ v1 , . . . , vp }
in Rn is linearly dependent if p > n.
11
MATH10212 • Linear Algebra B • Lecture 6 • Linear transformations
Lecture 6: Introduction to linear transformations [1.8]

1
0
I4 = 
0
0
0
1
0
0
0
0
1
0

0
0
.
0
1
Transformation. A transformation
T from Rn to Rm is a rule that assigns
to each vector x in Rn a vector T (x) in
Rm .


1 0 0
I3 = 0 1 0 ,
0 0 1
The set Rn is called the domain of T ,
the set Rm is the codomain of T .
The columns of the identity matrix In
will be denoted
Matrix transformations. With every
m × n matrix A we can associated the
transformation
e1 , e2 , . . . , en .
T : Rn −→ Rm
x 7→ Ax.
In short:
T (x) = Ax.
The range of a matrix transformation. The range of T is the set of all
linear combinations of the columns of A.
Indeed, this can be immediately seen
from the fact that each image T (x) has
the form
T (x) = Ax = x1 a1 + · · · + xn an
Definition: Linear transformations.
A transformation
T : Rn −→ Rm
is linear if:
• T (u + v) = T (u) + T (v) for all
vectors u, v ∈ Rn ;
• T (cu) = cT (u) for all vectors u
and all scalars c.
For example, in R3
 
 
1
0



e1 = 0 , e2 = 1 ,
0
0
 
0

e3 = 0 .
1
Obsetrve that if x is an arbitrary vector
in Rn ,
 
x1
 .. 
x =  . ,
xn
then


x1
 x2 
 
 
x =  x3 
 .. 
.
xn
 
 
 
1
0
0
0
1
0
 
 
 
 
 
 
= x1 0 + x2 0 + · · · + xn 0
 .. 
 .. 
 .. 
.
.
.
0
0
1
= x 1 e 1 + x2 e 2 + · · · + xn e n .
Properties of linear transformations. If T is a linear transformation
then
T (0) = 0
and
T (cu + dv) = cT (u) + dT (v).
The identity matrix. An n × n matrix with 1’s on the diagonal and 0’s elsewhere is called the identity matrix In .
For example,
The identity transformation. It is
easy to check that
In x = x for all x ∈ Rn .
Therefore the linear transformation associated with the identity matrix is the
identity transformation of Rn :
Rn −→ Rn
x 7→ x
12
MATH10212 • Linear Algebra B • Lecture 6 • Linear transformations
The matrix of a linear transformation [Lay 1.9]
Theorem 1.9.10: The matrix of a
linear transformation. Let
T : Rn −→ Rm
be a linear transformation.
Then there exists a unique matrix A such
that
T (x) = Ax for all x ∈ Rn .
In fact, A is the m × n matrix whose jth
column is the vector T (ej ) where ej is
the jth column of the identity matrix in
Rn :
A = T (e1 ) · · · T (en )
The matrix A is called the standard
matrix for the linear transformation T .
Definition. A transformation
T : Rn −→ Rm
is onto Rm if each b ∈ Rm is the image
of at least one x ∈ Rn .
A transformation T is one-to-one if
each b ∈ Rm is the image of at most
one x ∈ Rn .
Theorem: One-to-one transformations: a criterion. A linear transformation
T : Rn −→ Rm
is one-to-one if and only if the equation
Proof. First we express x in terms of
e1 , . . . , en :
x = x1 e1 + x2 e2 + · · · + xn en
T (x) = 0
has only the trivial solution.
and compute, using definition of linear
transformation
Theorem:
“One-to-one” and
“onto” in terms of matrices. Let
T (x) = T (x1 e1 + x2 e2 + · · · + xn en )
T : Rn −→ Rm
= T (x1 e1 ) + · · · + T (xn en )
= x1 T (e1 ) + · · · + xn T (en )
and then switch to matrix
notation:
 
x
 .1 
= T (e1 ) · · · T (en )  .. 
xn
= Ax.
be linear transformation and let A be the
standard matrix for T . Then:
• T maps Rn onto Rm if and only if
the columns of A span Rm .
• T is one-to-one if and only if the
columns of A are linearly independent.
13
MATH10212 • Linear Algebra B • Lectures 7–9 • Matrix operations
Lectures 7–9: Matrix operations [Lay 2.1]
Labeling of matrix entries. Let A be
an m × n matrix.
A = a1 a2 · · · an
a11 · · ·
 ..
 .

=  ai1 · · ·
 .
 ..
am1 · · ·

a1j
..
.
···
aij
..
.
···
amj · · ·

a1n
.. 
. 

ain 
.. 
. 
amn
Notice that in aij the first subscript i
denotes the row number, the second subscript j the column number of the entry
aij . In particular, the column aj is


a1j
 .. 
 . 


aj =  aij  .
 . 
 .. 
amj
Matrices
1 0
,
0 2


π √0 0
0
2 0
0 0 3
are all diagonal. The identity matrices
1 ,


1 0 0
0 1 0 , . . .
0 0 1
1 0
,
0 1
are diagonal.
Zero matrix. By definition, 0 is a m×n
matrix whose entries are all zero. For example, matrices
0 0 ,
0 0 0
0 0 0
are zero matrices. Notice that zero
square matrices, like
Diagonal matrices, zero matrices.
The diagonal entries in A are
a11 , a22 , a33 , . . .
For example, the diagonal entries of the
matrix


1 2 3
A = 4 5 6
7 8 9


1 0 0
0 0 0 ,
0 0 2
0 0
,
0 0


0 0 0
0 0 0
0 0 0
are diagonal!
Sums. If
A = a1 a2 · · ·
an
B = b1 b2 · · ·
bn
are 1, 5, and 9.
A square matrix is a matrix with equal
numbers of rows and columns.
and
A diagonal matrix is a square matrix
whose non-diagonal entries are zeroes.
are m × n matrices then we define the
sum A + B as
a1 + b1 a2 + b2

a11 + b11 · · ·
..

.


=  ai1 + bi1 · · ·

..

.
am1 + bm1 · · ·
A+B =
···
an + bn
a1j + b1j
..
.
···
aij + bij
..
.
···
amj + bmj · · ·

a1n + b1n
..

.


ain + bin 

..

.
amn + bmn
14
MATH10212 • Linear Algebra B • Lectures 7–9 • Matrix operations
Scalar multiple. If c is a a scalar then
we define
cA = ca1 ca2 · · · can
ca11 · · ·
 ..
 .

=  cai1 · · ·
 .
 ..
cam1 · · ·

ca1j
..
.
···
caij
..
.
···
camj · · ·

ca1n
.. 
. 

cain 
.. 
. 
camn
Theorem 2.1.1: Properties of matrix addition. Let A, B, and C be ma-
trices of the same size and r and s be
scalars.
1. A + B = B + A
2. (A + B) + C = A + (B + C)
3. A + 0 = A
4. r(A + B) = rA + rB
5. (r + s)A = rA + sA
6. r(sA) = (rs)A.
15
MATH10212 • Linear Algebra B • Lectures 8–9 • Matrix multiplication
Lectures 8–9: Matrix multiplication [Lay 2.1]
Composition of linear transformations. Let B be an m × n matrix and
A an p × m matrix. They define linear
transformations
T : Rn −→ Rm ,
C =A·B
x 7→ Bx
(but the multiplication symbol “·” is frequently skipped).
y 7→ Ay.
Definition: Matrix multiplication.
If A is an p × m matrix and B is an
m × n matrix with columns
and
S : Rm −→ Rp ,
S ◦ T and it will be natural to call C the
product of A and B and denote
b1 , . . . , bn
Their composition
(S ◦ T )(x) = S(T (x))
is a linear transformation
S ◦ T : Rn −→ Rp .
From the previous lecture, we know that
linear transformations are given by matrices. What is the matrix of S ◦ T ?
Multiplication of matrices. To answer the above question, we need to compute A(Bx) in matrix form.
Write x as


x1
 
x =  ... 
xn
and observe
Bx = x1 b1 + · · · + xn bn .
Hence
then the product AB is the p×n matrix
whose columns are
Ab1 , . . . , Abn :
AB = A b1 · · · bn
= Ab1 · · · Abn .
Columns of AB. Each column Abj of
AB is a linear combination of columns of
A with weights taken from the jth column of B:
 
b
 1j

Abj = a1 · · · am  ... 
bmj
= b1j a1 + · · · + bmj am
Mnemonic rules
[m × n matrix] · [n × p matrix] =
[m × p matrix]
A(Bx) = A(x1 b1 + · · · + xn bn )
= A(x1 b1 ) + · · · + A(xn bn )
columnj (AB) = A · columnj (B)
= x1 A(b1 ) + · · · + xn A(bn )
= Ab1 Ab2 · · · Abn x
rowi (AB) = rowi (A) · B
Therefore multiplication by the matrix
C = Ab1 Ab2 · · · Abn
transforms x into A(Bx). Hence C is
the matrix of the linear transformation
Theorem 2.1.2: Properties of matrix multiplication. Let A be an m×n
matrix and let B and C be matrices for
which indicated sums and products are
defined. Then the following identities
are true:
MATH10212 • Linear Algebra B • Lectures 8–9 • Matrix multiplication
1. A(BC) = (AB)C
2. A(B + C) = AB + AC
3. (B + C)A = BA + CA
4. r(AB) = (rA)B = A(rB) for any
scalar r
5. Im A = A = AIn
16
17
MATH10212 • Linear Algebra B • Lecture 10 • The inverse of a matrix
Lecture 10: The inverse
of a matrix [Lay 2.2]
Powers of matrix. As it is usual in
algebra, we define, for a square matrix
A,
Ak = A · · · A
(k times)
If A 6= 0 then we set
A0 = I
The transpose of a matrix. The
transpose AT of an m × n matrix A is
the n × m matrix whose rows are formed
from corresponding columns of A:


T
1 4
1 2 3
= 2 5
4 5 6
3 6
Theorem 2.1.3: Properties of transpose. Let A and B denote matrices
whose sizes are appropriate for the following sums and products. Then we
have:
1. (AT )T = A
T
T
2. (A + B) = A + B
T
3. (rA)T = r(AT ) for any scalar r
4. (AB)T = B T AT
Invertible matrices
An n × n matrix A is invertible if there
is an n × n matrix C such that
CA = I and AC = I
C is called the inverse of A.
The inverse of A, if exists, is unique (!)
and is denoted A−1 :
A−1 A = I and AA−1 = I.
Singular matrices. A non-invertible
matrix is called a singular matrix.
An invertible matrix is nonsingular.
Theorem 2.2.4: Inverse of a 2 × 2
matrix. Let
a b
A=
c d
If ad − bc 6= 0 then A is invertible and
−1
1
d −b
a b
=
a
c d
ad − bc −c
The quantity ad − bc is called the determinant of A:
det A = ad − bc
Theorem 2.2.5:
Solving matrix
equations. If A is an invertible n × n
matrix, then for each b ∈ Rn , the equation Ax = b has the unique solution
x = A−1 b.
Theorem 2.2.6: Properties of invertible matrices.
(a) If A is an invertible matrix, then
A−1 is also invertible and
(A−1 )−1 = A
(b) If A and B are n×n invertible matrices, then so is AB, and
(AB)−1 = B −1 A−1
(c) If A is an invertible matrix, then
so is AT , and
(AT )−1 = (A−1 )T
Definition: Elementary matrices.
An elementary matrix is a matrix obtained by performing a single elementary
row operation on an identity matrix.
If an elementary row transformation is
performed on an n × n matrix A, the
MATH10212 • Linear Algebra B • Lecture 10 • The inverse of a matrix
resulting matrix can be written as EA,
where the m × m matrix is made by the
same row operations on Im .
Each elementary matrix E is invertible.
The inverse of E is the elementary matrix of the same type that transforms E
back into I.
Theorem 2.2.7: Invertible matrices.
An n × n matrix A is invertible if and
only if A is row equivalent to In , and
in this case, any sequence of elementary
18
row operations that reduces A to In also
transforms In into A−1 .
Computation of inverses.
• Form
the augmented matrix
A I and row reduce it.
• If A is row
equivalent to I,
A
I
then
is
row
equivalent to
−1
I A .
• Otherwise A has no inverse.
19
MATH10212 • Linear Algebra B • Lecture 11–12 • Invertible matrices
Lectures 11–12: Characterizations of invertible matrices. Partitioned matrices [Lay 2.4]
The Invertible Matrix Theorem
2.3.8: For an n × n matrix A, the following are equivalent:
(a) A is invertible.
One-sided inverse is the inverse. Let
A and B be square matrices.
If AB = I then both A and B are invertible and
B = A−1 and A = B −1 .
(b) A is row equivalent to In .
(c) A has n pivot positions.
(d) Ax = 0 has only the trivial solution.
(e) The columns of A are linearly independent.
(f) The linear transformation x 7→ Ax
is one-to-one.
(g) Ax = b has at least one solution
for each b ∈ Rn .
Theorem 2.3.9: Invertible linear
transformations. Let
T : Rn −→ Rn
be a linear transformation and A its
standard matrix.
Then T is invertible if and only if A is
an invertible matrix.
In that case, the linear transformation
n
(h) The columns of A span R .
(i) x 7→ Ax maps Rn onto Rn .
(j) There is an n × n matrix C such
that CA = I.
(k) There is an n × n matrix D such
that AD = I.
(l) AT is invertible.
S(x) = A−1 x
is the only transformation satisfying
S(T (x)) = x
T (S(x)) = x
for all x ∈ Rn
for all x ∈ Rn
20
MATH10212 • Linear Algebra B • Lecture 11–12 • Invertible matrices
Partitioned matrices [Lay 2.4]
Example: Partitioned matrix


1 2 a
A = 3 4 b 
p q z
A11 A12
=
A21 A22
A is a 3 × 3 matrix which can be viewed
as a 2 × 2 partitioned (or block) matrix with blocks
1 2
a
A11 =
, A12 =
,
3 4
b
A21 = p q , A22 = z
Addition of partitioned matrices. If
matrices A and B are of the same size
and partitioned the same way, they can
be added block-by-block.
Similarly, partitioned matrices can be
multiplied by a scalar blockwise.
Multiplication of partitioned matrices. If the column partition of A
matches the row partition of B then
AB can be computed by the usual rowcolumn rule, with blocks treated as matrix entries.




1 2 a
α β
A = 3 4 b  , B =  γ δ 
p q z
Γ ∆
Example


1 2 a
A11 A12


A= 3 4 b =
,
A21 A22
p q z


α β
B1


B= γ δ =
B2
Γ ∆
A11 A12 B1
AB =
A21 A22 B2
A11 B1 + A12 B2
=
A21 B1 + A22 B2

1 2 α
 3 4 γ
= 

α
p q
γ

β
a Γ ∆ 
+
δ
b


β
+ z Γ ∆
δ
Theorem 2.4.10: Column-Row Expansion of AB. If A is m × n and B is n × p
then


row1 (B)


 row2 (B) 
AB = col1 (A) col2 (A) · · · coln (A) 

..


.
rown (B)
= col1 (A)row1 (B) + · · · + coln (A)rown (B)
21
MATH10212 • Linear Algebra B • Lecture 12 • Subspaces
Lecture 12: Subspaces of Rn [Lay 2.8]
Subspace. A subspace of Rn is any set
H that has the properties
The null space of a matrix A is the
set Nul A of all solutions to the homogeneous system
(a) 0 ∈ H.
Ax = 0
(b) For each vectors u, v ∈ H, the sum
u + v ∈ H.
(c) For each vector u ∈ H and each
scalar c, cu ∈ H.
Rn is a subspace of itself.
{ 0 } is a subspace, called the zero subspace.
Span. Span{ v1 , . . . , vp } is the subspace spanned (or generated) by
v1 , . . . , vp .
The column space of a matrix A is the
set Col A of all linear combinations of
columns of A.
Theorem 2.8.12: The null space.
The null space of an m × n matrix A is
a subspace of Rn .
Equivalently, the set of all solutions to a
system
Ax = 0
of m homogeneous linear equations in n
unknowns is a subspace of Rn .
Basis of a subspace. A basis of a subspace H of Rn is a linearly independent
set in H that spans H.
Theorem 2.8.13: Pivot columns of a
matrix A form a basis of Col A.
Dimension and rank [Lay 2.9]
Theorem: Given a basis b1 , . . . , bp in a
subspace H of Rn , every vector u ∈ H is
uniquely expressed as a linear combination of b1 , . . . , bp .
such that
Solving homogeneous systems.
Finding a parametric solution of a system of homogeneous linear equations
The vector
Ax = 0
x = c1 b1 + · · · + cp bp .
 
c1
 .. 
x B=.
cp
means to find a basis of Nul A.
is the coordinate vector of x.
Coordinate system. Let
Dimension.
The dimension of a
nonzero subspace H, denoted by dim H,
is the number of vectors in any basis of
H.
B = { b1 , . . . , bp }
be a basis for a subspace H.
For each x ∈ H, the coordinates of x
relative to the basis B are the weights
c1 , . . . , c p
The dimension of the zero subspace {0}
is defined to be 0.
Rank of a matrix. The rank of a matrix A, denoted by rank A, is the dimen-
22
MATH10212 • Linear Algebra B • Lecture 12 • Subspaces
sion of the column space Col A of A:
rank A = dim Col A
The Rank Theorem 2.9.14: If a matrix A has n columns,
rank A + dim Nul A = n.
The Basis Theorem 2.9.15: Let H be
a p-dimensional subspace of Rn .
• Any linearly independent set of exactly p elements in H is automatically a basis for H.
• Also, any set of p elements of H
that spans H is automatically a basis of H.
For example, this means that the vectors
 
 
 
1
2
4
0 , 3 , 5
0
0
6
form a basis of R3 : there are three of
them and they are linearly independent
(the latter should be obvious to students
at this stage).
The Invertible Matrix Theorem
(continued) Let A be an n × n matrix.
Then the following statements are each
equivalent to the statement that A is an
invertible matrix.
(m) The columns of A form a basis of
Rn .
(n) Col A = Rn .
(o) dim Col A = n.
(p) rank A = n.
(q) Nul A = {0}.
(r) dim Nul A = 0.
23
MATH10212 • Linear Algebra B • Lecture 14 • Introduction to determinants
Lecture 14: Introduction to determinants [Lay 3.1]
The determinant det A of a square matrix A is a certain number assigned to the
matrix; it is defined recursively, that
is, we define first determinants of matrices of sizes 1 × 1, 2 × 2, and 3 × 3, and
then supply a formula which expresses
determinants of n × n matrices in terms
of determinants of (n − 1) × (n − 1) matrices.
The determinant
of a 1 × 1 matrix
A = a11 is defined simply as being
equal its only entry a11 :
det a11 = a11 .
The determinant of a 2 × 2 matrix
is defined by the formula
a11 a12
det
= a11 a22 − a12 a21 .
a21 a22
The determinant of a 3 × 3 matrix


a11 a12 a13
A = a21 a22 a23 
a31 a32 a33
is defined by the formula


a11 a12 a13
a22 a23
a21 a23
a21 a22


det a21 a22 a23 = a11 ·
− a12 ·
+ a13 ·
a32 a33
a31 a33
a31 a32
a31 a32 a33
Example. The determinant det A of the
matrix


2 0 7
A = 0 1 0
1 0 4
For example, if


1 2 3
A = 4 5 6 ,
7 8 9
equals
1 0
0 1
2 · det
+ 7 · det
0 4
1 0
then
A22
which further simplifies as
1 3
=
,
7 9
A31
2 3
=
5 6
2 · 4 + 7 · (−1) = 8 − 7 = 1.
Submatrices. By definition, the submatrix Aij is obtained from the matrix
A by crossing out row i and column j.
Recursive definition of determinant. For n > 3, the determinant of
an n × n matrix A is defined as the expression
det A = a11 det A11 − a12 det A12 + · · · + (−1)1+n a1n det A1n
=
n
X
j=1
(−1)1+j a1j det A1j
24
MATH10212 • Linear Algebra B • Lecture 14 • Introduction to determinants
which involves the determinants of
smaller (n−1)×(n−1) submatrices A1j ,
which, in their turn, can be evaluated by
a similar formula wich reduces the calcualtions to the (n − 2) × (n − 2) case,
and can be repeated all the way down to
determinants of size 2 × 2.
Cofactors. The (i, j)-cofactor of A is
Cij = (−1)i+j det Aij
is based on induction on n, which, to
avoid the use of “general” notation is illustrated by a simple example. Let


a11 0
0
0
0
a21 a22 0
0
0



0
A = a31 a32 a33 0
.
a41 a42 a43 a44 0 
a51 a52 a53 a54 a55
All entries in the first row of A–with possible exception of a11 —are zeroes,
Then
a12 = · · · = a15 = 0,
det A = a11 C11 + a12 C12 + · · · + a1n C1n
is cofactor expansion across the first
row of A.
Theorem 3.1.1: Cofactor expansion. For any row i, the result of the
cofactor expansion across the row i is the
same:
therefore in the formula
det A = a11 det A11 + · · · + a55 det A55
all summand with possible exception of
the first one,
a11 det A11 ,
are zeroes, and therefore
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin
The chess board pattern of signs in
cofactor expansions. The signs
(−1)i+j
which appear in the formula for cofactors, form the easy-to-recognise and
easy-to-remember “chess board” pattern:


+ − + ···
− + −



+ − +



..
...
.
Theorem 3.1.2: The determinant
of a triangular matrix. If A is a triangular n × n matrix then det A is the
product of the diagonal entries of A.
Proof: This proof contains more details
than the one given in the textbook. It
det A = a11 det A11


a22 0
0
0
a32 a33 0
0

= a11 det 
a42 a43 a44 0  .
a52 a53 a54 a55
But the smaller matrix A11 is also lower
triangle, and therefore we can conclude
by induction that


a22 0
0
0
a32 a33 0
0

det 
a42 a43 a44 0  = a22 · · · a55
a52 a53 a54 a55
hence
det A = a11 · (a22 · · · a55 )
= a11 a22 · · · a55
is the product of diagonal entries of A.
The basis of induction is the case n = 2;
of course, in that case
a11 0
det
= a11 a22 − 0 · a21
a21 a22
= a11 a22
25
MATH10212 • Linear Algebra B • Lecture 14 • Introduction to determinants
is the product of the diagonal entries of
A.
Corollary. The determinant of a diagonal matrix equals is the product of its
diagonal elements:


d1 0 · · · 0
 0 d2
0


det  ..
..  = d1 d2 · · · dn .
.
.
.
. .
0
· · · dn
Corollary. The determinant of the
identity matrix equals 1:
det In = 1.
26
MATH10212 • Linear Algebra B • Lectures 15–16 • Properties of determinants
Lectures 15–16: Properties of determinants [Lay 3.2]
Theorem 3.2.3: Row Operations.
Let A be a square matrix.
Similarly
det A
(a) If a multiple of one row of A is
added to another row to produce
a matrix B, then
det B = det A.
(b) It two rows of A are swapped to
produce B, then
det B = − det A.
(c) If one row of A is multiplied by k
to produce B, then
det B = k det A.
Theorem 3.2.5: The determinant of
the transposed matrix. If A is an
n × n matrix then
det AT = det A.
Example. Let


1 2 0
A = 0 1 0 ,
0 3 4
then


1 0 0
AT = 2 1 3
0 0 4
T
1 3
= 1 · det
+0+0
0 4
= 1 · 4 = 4;
we got the same value.
Corollary:
Cofactor expansion
across a column. For any column j,
det A = a1j C1j + a2j C2j + · · · + anj Cnj
Example. We have already computed
the determinant of the matrix


2 0 7
A = det 0 1 0
1 0 4
by expansion across the first row. Now
we do it by expansion across the third
column:
0 1
1+3
det A = 7 · (−1)
· det
1 0
2 0
2+3
+0 · (−1)
· det
1 0
2 0
3+3
+4 · (−1)
· det
0 1
= −7 − 0 + 8
= 1.
and
1 0
det A = 1 · det
3 4
0 0
−2 · det
0 4
0 1
+0 · det
0 3
= 1·4−2·0+0·0
= 4.
Proof: Columns of A are rows of AT
which has the same determinant as A.
Corollary: Column operations. Let
A be a square matrix.
(a) If a multiple of one column of A
is added to another column to produce a matrix B, then
det B = det A.
27
MATH10212 • Linear Algebra B • Lectures 15–16 • Properties of determinants
(b) It two columns of A are swapped
to produce B, then
• If a row or a column has a convenient scalar factor, take it out of
the determinant.
det B = − det A.
(c) If one column of A is multiplied by
k to produce B, then
det B = k det A.
Proof: Column operations on A are row
operation of AT which has the same determinant as A.
Computing determinants. In computations by hand, the quickest method
of computing determinants is to work
with both columns and rows:
• If convenient, swap two rows or
two columns— but do not forget
to change the sign of the determinant.
• Adding to a row (column) scalar
multiples of other rows (columns),
simplify the matrix to create a row
(column) with just one nonzero entry.
• Expand across this row (column).
• Repeat until you get the value of
the determinant.
Example.


1 0 3
det 1 3 1 
1 1 −1
C3 −3C1
=
2
out of C3
=


1 0 0
det 1 3 −2
1 1 −4


1 0 0
2 · det 1 3 −1
1 1 −2

−1
out of C3
=
expand
=
R1 −3R2
=
R1 ↔R2
=
=
=
Example. Compute the determinant


1 1 1 1 0
1 1 1 0 1



1
1
0
1
1
∆ = det 





1 0 0
−2 · det 1 3 1
1 1 2
3 1
−2 · 1 · (−1)
· det
1 2
0 −5
−2 · det
1 2
1 2
−2 · (−1) · det
0 −5
1+1
2 · (−5)
−10.
Solution: Subtracting Row 1 from
Rows 2, 3, and 4, we rearrange the de-
28
MATH10212 • Linear Algebra B • Lectures 15–16 • Properties of determinants
terminant as
the first column we have
∆
=


1 1
1
1 0
0 0
0 −1 1



∆ = det 0 0 −1 0 1

0 −1 0
0 1
0 1
1
1 1
R3 +R2
After expanding the determinant across
the first column, we get
=
=
2+1
−(−1) · (−1)
−1 1
· det
1 3
=
=
−1 1
− det
1 3
−1 1
− det
0 4
−(−4)
4.
 As an exercise, I leave you with this ques0
0 −1 1 tion: wouldn’t you agree that the deter 0 −1 0 1 minant of the similarly constructed ma
∆
=
1 · (−1)1+1 · det 
−1 0
0 1 trix of size n × n should be ±(n − 1)?
1
1
1 1 But how can one determine the correct
sign?


0
0 −1 1
And one more exercise: you should now
 0 −1 0 1

=
det 
be able to instantly compute
−1 0
0 1


1
1
1 1
1 2 3 4 0
1 2 3 0 5




.
0
0 −1 1
1
2
0
4
5
det 




0 −1 0 1
1 0 3 4 5
R4 +R3

=
det 
−1 0
0 1
0 2 3 4 5
0
1
1 2
Theorem 3.2.4: Invertible matrices. A square matrix A is invertible if
and only if
After expansion across the 1st column
det A 6= 0.


∆
=

0 −1 1
(−1) · (−1)3+1 · det −1 0 1
1
1 2

=

0 −1 1
− det −1 0 1
1
1 2

R3 +R2
=

0 −1 1
− det −1 0 1
0
1 3
After expanding the determinant across
Recall that this theorem has been already known to us in the case of 2 × 2
matrices A.
Example. The matrix


2 0 7
A = 0 1 0
1 0 4
has the determinant
1 0
0 1
det A = 2 · det
+ 7 · det
0 4
1 0
= 2 · 4 + 7 · (−1)
= 8−7=1
29
MATH10212 • Linear Algebra B • Lectures 15–16 • Properties of determinants
hence A is invertible.

4
A−1 =  0
−1
Indeed,

0 −7
1 0
0 2
and
det(AB) = 1 · 17 − 5 · 3 = 2,
which is exactly the value of
—check!
(det A) · (det B).
Theorem 3.2.6:
Multiplicativity
Property. If A and B are n × n matrices then
Corollary. If A is invertible,
det AB = (det A) · (det B)
det A−1 = (det A)−1 .
Example. Take
1 0
1 5
A=
and B =
,
3 2
0 1
Proof. Set B = det A−1 , then
AB = I
then
det A = 2 and det B = 1.
and Theorem 3.2.6 yields
But
1 0 1 5
AB =
3 2 0 1
1 5
=
3 17
det A · det B = det I = 1.
Hence
det B = (det A)−1 .
Cramer’s Rule [Lay 3.3]
Cramer’s Rule is an explicit (closed) formula for solving systems of n linear equations with n unknowns and nonsigular
(invertible) matrix of coefficients. It has
important theoretical value, but is unsuitable for practical application.
n
For any n × n matrix A and any b ∈ R ,
denote by Ai (b) the matrix obtained
from A by replacing column i by the vector b:
Ai (b) = a1 · · · ai−1 b ai+1 · · · an .
Theorem 3.3.7: Cramer’s Rule. Let
A be an invertible n × n matrix. For
any b ∈ Rn , the unique solution x of the
linear system
Ax = b
is given by
xi =
det Ai (b)
,
det A
i = 1, 2, . . . , n.
For example, for a system
x1 + x2 = 3
x1 − x2 = 1
Cramer’s rule gives the answer
3
1
det
1 −1
−4
=
x1 =
=2
−2
1
1
det
1 −1
1 3
det
1 1
−2
=
x2 =
=2
−2
1
1
det
1 −1
30
MATH10212 • Linear Algebra B • Lectures 15–16 • Properties of determinants
Theorem 3.3.8: An Inverse Formula. Let A be an invertible n × n matrix. Then


C11 C21 · · · Cn1

1 
 C12 C22 · · · Cn2 
A−1 =
 ..
..
.. 
det A  .
.
. 
C1n C2n · · · Cnn
the cofactors are
C11
C21
C12
C22
=
=
=
=
(−1)1+1 · d = d
(−1)2+1 · b = −b
(−1)1+2 · c = −c
(−1)2+2 · a = a,
where cofactors Cji are given by
Cji = (−1)j+i det Aji
and Aji is the matrix obtained from A
by deleting row j and column i.
Notice that for a 2 × 2
a
A=
c
matrix
b
d
and we get the familiar formula
A
−1
1
d −b
=
a
det A −c
1
d −b
.
=
a
ad − bc −c
31
MATH10212 • Linear Algebra B • Lecture 17 • Eigenvalues and eigenvectors
Lecture 17: Eigenvalues and eigenvectors
Quiz
What is the value of this determinant?


1 2 3
∆ = det 0 2 3
1 2 6
x is called an eigenvector corresponding
to λ.
Eigenvalue. A scalar λ is an eigenvalue
of A iff the equation
(A − λI)x = 0
Quiz
has a non-trivial solution.
Is Γ positive, negative, or zero?
Eigenspace. The set of all solutions of


0 0 71
Γ = det  0 93 0
87 0 0
Eigenvectors. An eigenvector of an
n × n matrix A is a nonzero vector x
such that
Ax = λx
for some scalar λ.
Eigenvalues. A scalar λ is called an
eigenvalue of A if there is a non-trivial
solution x of
Ax = λx;
(A − λI)x = 0
is a subspace of Rn called the
eigenspace of A corresponding to the
eigenvalue λ.
Theorem: Linear independence of
eigenvectors. If v1 , . . . , vp are eigenvectors that correspond to distinct eigenvalues of λ1 , . . . , λp of an n×n matrix A,
then the set
{ v1 , . . . , vp }
is linearly independent.
32
MATH10212 • Linear Algebra B • Lecture 17 • The characteristic equation
Lecture 17: The characteristic equation
Characteristic polynomial
The polynomial
det(A − λI)
in variable λ is characteristic polynomial of A.
For example, in the 2 × 2 case
a−λ
b
det
= λ2 −(a+d)λ+(ad−bc).
c
d−λ
det(A − λI) = 0
is the characteristic equation for A.
Characterisation of eigenvalues
Theorem. A scalar λ is an eigenvalue
of an n × n matrix A if and only if λ
satisfies the characteristic equation
det(A − λI) = 0
Zero as an eigenvalue. λ = 0 is an
eigenvalue of A if and only if det A = 0.
(s) The number 0 is not an eigenvalue
of A.
(t) The determinant of A is not zero.
Theorem: Eigenvalues of a triangular matrix. The eigenvalues of a triangular matrix are the entries on its main
diagonal.
Proof. This immediately follows from
the fact the the determinant of a triangular matrices is the product of its diagonal elements. Therefore, for example,
for the 3 × 3 triangular matrix


d1 a b
A =  0 d2 c 
0 0 d3
its characteristic polynomial det(A−λI)
equals


d1 − λ
a
b
d2 − λ
c 
det  0
0
0
d3 − λ
and therefore
det(A − λI) = (d1 − λ)(d2 − λ)(d3 − λ)
The Invertible Matrix Theorem
(continued). An n × n matrix A is invertible iff :
has roots (eigenvalues of A)
λ = d1 , d 2 , d 3 .
33
MATH10212 • Linear Algebra B • Lectures 18–21 • Diagonalisation
Lectures 18-21: Diagonalisation
Similar matrices. A is similar to B if
there exist an invertible matrix P such
that
A = P BP −1 .
Diagonalisable matrices. A square
matrix A is diagonalisable if it is similar to a diagonal matrix, that is,
A = P DP −1
Similarity is an equivalence relation.
Conjugation. Operation
AP = P −1 AP
is called conjugation of A by P .
Properties of conjugation
• IP = I
• (AB)P = AP B P ; as a corollary:
• (A + B)P = AP + B P
• (c · A)P = c · AP
• (Ak )P = (AP )k
• (A−1 )P = (AP )−1
• (AP )Q = AP Q
Theorem: Similar matrices have
the same characteristic polynomial.
If matrices A and B are similar then they
have the same characteristic polynomials
and eigenvalues.
for some diagonal matrix D and some
invertible matrix P .
Of course, the diagonal entries of D are
eigenvalues of A.
The Diagonalisation Theorem. An
n × n matrix A is diagonalisable iff A
has n linearly independent eigenvectors.
In fact,
A = P DP −1
where D is diagonal iff the columns of
P are n linearly independent eigenvectors of A.
Theorem: Matrices with all eigenvalues distinct. An n × n matrix with
n distinct eigenvalues is diagonalisable.
Example: Non-diagonalisable matrices
2 1
A=
0 2
has repeated eigenvalues λ1 = λ2 = 2.
A is not diagonalisable. Why?
34
MATH10212 • Linear Algebra B • Lectures 22 and 24 • Vector spaces
Lectures 22 and 24: Vector spaces
A vector space is a non-empty space
of elements (vectors), with operations of
addition and multiplication by a scalar.
By definition, the following holds for all
u, v, w in a vector space V and for all
scalars c and d.
1. The sum of u and v is in V .
2. u + v = v + u.
3. (u + v) + w = u + (v + w).
4. There is zero vector 0 such that
u + 0 = u.
5. For each u there exists −u such
that u + (−u) = 0.
6. The scalar multiple cu is in V .
7. c(u + v) = cu + cv.
8. (c + d)u = cu + du.
9. c(du) = (cd)u.
10. 1u = u.
Definitions
Most concepts introduced in the previous lectures for the vector spaces Rn can
be transferred to arbitrary vector spaces.
Let V be a vector space.
Given vectors v1 , v2 , . . . , vp in V and
scalars c1 , c2 , . . . , cp , the vector
y = c1 v1 + · · · cp vp
is
called
a
linear
combination of v1 , v2 , . . . , vp with weights
c1 , c2 , . . . , cp .
If v1 , . . . , vp are in V , then the set of
all linear combination of v1 , . . . , vp is denoted by Span{v1 , . . . , vp } and is called
the subset of V spanned (or generated) by v1 , . . . , vp .
That is, Span{v1 , . . . , vp } is the collection of all vectors which can be written
in the form
c1 v1 + c2 v2 + · · · + cp vp
with c1 , . . . , cp scalars.
Examples. Of course, the most important example of vector spaces is provided
by standard spaces Rn of column vectors
with n components, with the usual operations of component-wise addition and
multiplication by scalar.
Another example is the set Pn of polynomials of degree at most n in variable
x,
Pn = {a0 + a1 x + · · · + an xn }
with the usual operations of addition of
polynomials multiplication by constant.
And here is another example: the set
C[0, 1] of real-valued continuous functions on the segment [0, 1], with the
usual operations of addition of functions
and multiplication by constant.
Span{v1 , . . . , vp } is a vector subspace
of V , that is, it is closed with respect to
addition of vectors and multiplying vectors by scalars.
V is a subspace of itself.
{ 0 } is a subspace, called the zero subspace.
An indexed set of vectors
{ v1 , . . . , vp }
in V is linearly independent if the vector equation
x1 v1 + · · · + xp vp = 0
has only trivial solution.
35
MATH10212 • Linear Algebra B • Lectures 22 and 24 • Vector spaces
The set
is linear if:
{ v1 , . . . , vp }
in V is linearly dependent if there exist weights c1 , . . . , cp , not all zero, such
that
c1 v1 + · · · + cp vp = 0
Theorem: Characterisation of linearly dependent sets. An indexed set
S = { v1 , . . . , vp }
of two or more vectors is linearly dependent if and only if at least one of the
vectors in S is a linear combination of
the others.
A basis of V is a linearly independent
set which spans V . A vector space V
is called finite dimensional if it has
a finite basis. Any two bases in a finite dimensional vector space have the
same number of vectors; this number is
called the dimension of V and is denoted dim V .
If B = (b1 , . . . , bn ) is a basis of V , every
vector x ∈ V can be written, and in a
unique way, as a linear combination
• T (u + v) = T (u) + T (v) for all
vectors u, v ∈ V ;
• T (cu) = cT (u) for all vectors u
and all scalars c.
Properties of linear transformations. If T is a linear transformation
then
T (0) = 0
and
T (cu + dv) = cT (u) + dT (v).
The identity transformation of V is
V −→ V
x 7→ x.
The zero transformation of V is
V −→ V
x 7→ 0.
x = x1 b1 + · · · xn bn ;
the scalars x1 , . . . , xn are called coordinates of x with respect to the basis B.
A transformation T from a vector
space V to a vector space W is a rule
that assigns to each vector x in V a vector T (x) in W .
The set V is called the domain of T , the
set W is the codomain of T .
A transformation
T : V −→ W
If
T : V −→ W
is a linear transformation, its kernel
Ker T = {x ∈ V : T (x) = 0}
is a subspace of V , while its image
Im T = {y ∈ W : T (x) = y for some x ∈ V }
is a subspace of W .
MATH10212 • Linear Algebra B • Lectures 25–27 • Symmetric matrices and inner product
36
Lectures 25—27: Symmetric matrices and inner product
Symmetric matrices. A matrix A is
symmetric if AT = A.
An example:


1 2 3
A = 2 3 4 .
3 4 5
Notice that symmetric matrices are necessarily square.
Inner (or dot) product. For vectors
u, v ∈ Rn their inner product (also
called scalar product or dot product)
u v is defined as
u v = uT v
 
v1

 v2 

= u1 u2 · · · un  .. 
.
vn
= u1 v1 + u2 v2 + · · · + un vn
Properties of dot product
Theorem 6.1.1. Let u, v, w ∈ Rn , and
c be a scalar. Then
(a) u v = v u
(b) (u + v) w = u w + v w
(c) (cu) v = c(u v)
(d) u u > 0, and u u = 0 if and only
if u = 0
Properties (b) and (c) can be combined
as
(c1 u1 + · · · + cp up ) w = c1 (u1 w) + · · · + cp (up w)
The length of a vector v is defined as
q
√
kvk = v v = v12 + v22 + · · · + vn2
In particular,
kvk2 = v v.
We call v a unit vector if kvk = 1.
Orthogonal vectors. Vectors u and v
are orthogonal to each other if
uv = 0
Theorem. Vectors u and v are orthogonal if and only if
kuk2 + kvk2 = ku + vk2 .
That is, vectors u and v are orthogonal
if and only if the Pythagoras Theorem
holds for the triangle formed by u and
v as two sides (so that its third side is
u + v).
Proof is a straightforward computation:
ku + vk2 =
=
=
=
=
(u + v) (u + v)
uu + uv + vu + vv
uu + uv + uv + vv
u u + 2u v + v v
kuk2 + 2u v + kvk2 .
MATH10212 • Linear Algebra B • Lectures 25–27 • Symmetric matrices and inner product
Hence
37
is an orthogonal set of non-zero vectors
in Rn then S is linearly independent.
ku + vk2 = kuk2 + kvk2 + 2u v,
and
ku + vk2 = kuk2 + kvk2 iff u v = 0.
Proof. Assume the contrary that S
is linearly dependent. This means that
there exist scalars c1 , . . . , cp , not all of
them zeroes, such that
Orthogonal sets. A set of vectors
c1 u1 + · · · + cp up = 0.
{ u1 , . . . , up }
in Rn is orthogonal if
ui uj = 0 whenever i 6= j.
Forming the dot product of the left hand
side and the right hand side of this equation with ui , we get
(c1 u1 + · · · + cp up ) ui = 0 ui = 0
Theorem 6.2.4. Is
S = { u1 , . . . , up }
After opening the brackets we have
c1 u1 ui + · · · + ci−1 ui−1 + ci ui ui + ci+1 ui+1 ui + · · · + cp up ui = 0.
In view of orthogonality of the set S, all
dot products on the left hand side, with
the exception of ui u, equal 0. Hence
what remains from the equality is
be eigenvectors of a symmetric matrix A
for pairwise distinct eigenvalues
ci ui ui = 0
λ1 , . . . , λp .
Since ui is non-zero,
Then
ui ui 6= 0
and therefore ci = 0. This argument
works for every index i = 1, . . . , p. We
get a contradition with our assumption
that one of ci is non-zero.
{ v1 , . . . , vp }
is an orthogonal set.
Proof. We need to prove the following:
n
An orthogonal basis for R is a basis
which is also an orthogonal set.
For example, the two vectors
1
1
,
1
−1
form an orthogonal basis of R2 (check!).
If u and v are eigenvectors
for A for eigenvalues λ 6= µ,
then
u v = 0.
Theorem: Eigenvectors of symmetric matrices. Let
v1 , . . . , vp
For a proof, consider the following se-
MATH10212 • Linear Algebra B • Lectures 25–27 • Symmetric matrices and inner product
Notice that this definition can reformulated as
1, if i = j
ui uj =
.
0, if i 6= j
quence of equalities:
λu v =
=
=
=
=
=
=
=
=
38
(λu) v
(Au) v
(Au)T v
(uT AT )v
(uT A)v
uT (Av)
uT (µv)
µuT v
µu v.
Theorem: Coordinates with respect to an orthonormal basis. If
{ u1 , . . . , un }
is an orthonormal basis and v a vector
in Rn , then
Hence
v = x1 u1 + · · · + xn un
λu v = µu v
where
and
(λ − µ)u v = 0.
Since λ − µ 6= 0, we have u v = 0.
Corollary. If A is an n × n symmetric
matrix with n distinct eigenvalues
λ1 , . . . , λ n
then the corresponding eigenvectors
v1 , . . . , vn
form an orthogonal basis of Rn .
An orthonormal basis in Rn is an orthogonal basis
u1 , . . . , un
made of unit vectors,
kui k = 1 for all i = 1, 2, . . . , n.
xi = v ui , for all i = 1, 2, . . . , n.
Proof. Let
v = x1 u1 + · · · + xn un ,
then, for i = 1, 2, . . . , n,
v ui = x1 u1 ui + · · · + xn un ui
= x1 ui ui
= xi .
Theorem: Eigenvectors of symmetric matrices. Let A be a symmetric
n × n matrix with n distinct eigenvalues
λ1 , . . . , λ n .
Then Rn has an orthonormal basis made
of eigenvectors of A.
39
MATH10212 • Linear Algebra B • Lectures 28 and 29 • Inner product spaces
Lectures 28 and 29: Inner product spaces
Inner (or dot) product is a function
which associate, with each pair of vectors u and v in a vector space V a real
number denoted
u v,
subject to the following axioms:
u u > 0 and u u = 0 if and
only if u = 0,
that is, if f is a continuous function on
[0, 1] and
Z 1
f (x)2 dx = 0
0
1. u v = v u
2. (u + v) w = u w + v w
3. (cu) v = c(u v)
4. u u > 0 and u u = 0 if and only
if u = 0
Inner product space. A vector space
with an inner product is called an inner
product space.
Example: School Geometry. The ordinary Euclidean plane of school geometry is an inner product space:
• Vectors: directed segments starting at the origin O
then f (x) = 0 for all x ∈ [0, 1]. This requires the use of properties of continuous
functions; since we study linear algebra,
not analysis, I am leaving it to the readers as an exercise.
Length. The length of the vector u is
defined as
√
kuk = u u.
Notice that
kuk2 = u u,
and
kuk = 0
⇔
u = 0.
• Addition: parallelogram rule
• Product-by-scalar: stretching the
vector by factor of c.
Orthogonality. We say that vectors u
and v are orthogonal if
u v = 0.
• Dot product:
u v = kukkvk cos θ,
where θ is the angle between vectors u and v.
Example: C[0, 1]. The vector space
C[0, 1] of real-valued continuous functions on the segment [0, 1] becomes an
inner product space if we define inner
product by the formula
Z 1
f g=
f (x)g(x)dx.
0
In this example, the tricky bit is to show
that the dot product on C[0, 1] satisfies
the axiom:
The Pythagoras Theorem. Vectors
u and v are orthogonal if and only if
kuk2 + kvk2 = ku + vk2 .
Proof is a straightforward computation,
exactly the same as in Lectures 25–27:
ku + vk2 =
=
=
=
=
(u + v) (u + v)
uu + uv + vu + vv
uu + uv + uv + vv
u u + 2u v + v v
kuk2 + 2u v + kvk2 .
40
MATH10212 • Linear Algebra B • Lectures 28 and 29 • Inner product spaces
Hence
Theorem: The Cauchy-Schwarz Inequality. In any inner product space,
ku + vk2 = kuk2 + kvk2 + 2u v,
and
|u v| 6 kukkvk.
ku + vk2 = kuk2 + kvk2
if and only if
The Cauchy-Schwarz Inequality:
an example. In the vector space Rn
with the dot product of
u v = 0.


 
u1
v1
 .. 
 .. 
u =  .  and v =  . 
un
vn
The Cauchy-Schwarz Inequality. In
the Euclidean plane,
u v = kukkvk cos θ
hence
defined as
|u v| 6 kukkvk.
u v = uT v = u1 v1 + · · · + un vn ,
The following Theorem shows that this
is is a general property of inner product
spaces.
|u1 v1 + · · · + un vn | 6
q
the Cauchy-Schwarz Inequality becomes
u21
+ ··· +
u2n
q
· v12 + · · · + vn2 .
or, which is its equivalent form,
(u1 v1 + · · · + un vn )2 6 (u21 + · · · + u2n ) · (v12 + · · · + vn2 ).
The Cauchy-Schwarz Inequality:
one more example. In the inner prod-
Z
1
0
uct space C[0, 1] the Cauchy-Schwarz Inequality takes the form
sZ
f (x)g(x)dx 6
s
Z
1
1
f (x)2 dx ·
0
g(x)2 dx,
0
or, equivalently,
Z
2
1
f (x)g(x)dx
Z
6
0
Proof of the Cauchy-Schwarz Inequality given here is different from the
one in the textbook by Lay. Our proof is
based on the following simple fact from
school algebra:
1
1
Z
f (x) dx ·
2
0
2
g(x) dx .
0
Let
q(t) = at2 + bt + c
be a quadratic function in
variable t with the property
41
MATH10212 • Linear Algebra B • Lectures 28 and 29 • Inner product spaces
that
2. d(u, v) > 0.
q(t) > 0
3. d(u, v) = 0 iff u = v.
for all t. Then
4. The Triangle Inequality:
b2 − 4ac 6 0.
d(u, v) + d(v, w) > d(u, w).
Now consider the function q(t) in variable t defined as
q(t) = (u + tv) (u + tv)
= u u + 2tu v + t2 v v
= kuk2 + (2u v)t + (kvk2 )t2 .
Notice q(t) is a quadratic function in t
and
q(t) > 0.
Hence
Axioms 1–3 immediately follow from the
axioms for dot product, but the Triangle
Inequality requires some attention.
Proof of the Triangle Inequality. It
suffices to prove
ku + vk 6 kuk + kvk,
or
ku + vk2 6 (kuk + kvk)2 ,
(2u v)2 − 4kuk2 kvk2 6 0
or
which can be simplified as
(u+v) (u+v) 6 kuk2 +2kukkvk+kvk2 ,
|u v| 6 kukkvk.
or
Distance. We define distance between
vectors u and v as
d(u, v) = ku − vk.
It satisfies axioms of metric:
1. d(u, v) = d(v, u).
u u+2u v+v v 6 u u+2kukkvk+v v,
or
2u v 6 2kukkvk
But this is the Cauchy-Schwarz Inequality. Now observe that all rearrangements are reversible, hence the Triangle
Inequality holds.
42
MATH10212 • Linear Algebra B • Lecture 30 • Orthogonalisation and the Gram-Schmidt Process
Lecture 30: Orthogonalisation and the Gram-Schmidt
Process
Orthogonal basis. A basis v1 , . . . , vn
in the inner product space V is called
orthogonal if vi vj = 0 for i 6= j.
is called orthonormal if it is orthogonal
and kvi k = 1 for all i.
Equivalently,
Coordinates in respect to an orthogonal basis:
vi vj =
1 if i = j
.
0 if i 6= j
u = c1 v1 + · · · + cn vn
iff
ci =
u vi
vi vi
Orthonormal basis.
A basis
v1 , . . . , vn in the inner product space V
Coordinates in respect to an orthonormal basis:
u = c1 v1 + · · · + cn vn
iff
ci = u vi
The Gram-Schmidt Orthogonalisation Process makes an orthogonal basis from a
basis x1 , . . . , xp :
v1 = x1
x2 v1
v1
v1 v1
x3 v1
x3 v2
= x3 − v1 − v2
v1 v1
v2 v2
..
.
xp v1
xp v2
xp vp−1
= xp − v1 − v2 − · · · −
vp−1
v1 v1
v2 v2
vp−1 vp−1
v2 = x2 −
v3
vp
Main Theorem about inner product spaces. Every finite dimensional
inner product space has an orthonormal
basis and is isomorphic to on of Rn with
the standard dot product
u v = uT v = u1 · · ·
 
v
 .1 
un  ..  .
vn