March 5, 2012 MATH 408 FINAL EXAM SAMPLE

Transcription

March 5, 2012 MATH 408 FINAL EXAM SAMPLE
MATH 408
FINAL EXAM
March 5, 2012
SAMPLE
Partial Solutions to Sample Questions (in progress)
See the sample questions for the midterm exam, but also consider the following questions. Obviously, a final
exam would not contain questions of this depth and scope for each question. That is, the final will contain
some easy and some hard questions. The questions listed below are all on the hard side with some very hard.
2
1. (a) Is the function f (x1 , x2 ) = e(x1 +x2 ) coercive? Is it convex? Justify your answers.
Solution: It is not coercive, just consider the line {(t, −) | t ∈ IR }. It is convex since the function
2
2
φ(u) = eu is convex (φ00 (u) = 2eu (1 + 2u2 ) > 0), and f is just the composition of φ with the
linear function (x1 , x2 ) 7→ x1 + x2 .
(b) For what values of the parameters α, β ∈ IR is the function
f (x1 , x2 ) = x21 + 2αx1 x2 + βx22
a convex function. For what values is it coercive? Justify your answers.
2 2α
2
= 4(β − α2 ).
Solution: det ∇ f (x) = det
2α 2β
Therefore, ∇2 f (x) is positive definite if and only if β > α2 and, by continuity, is positive semidefinite if β = α2 ; otherwise, ∇2 f (x) is indefinite. Consequently, f is convex if and only if β ≥ α2
and is strictly convex if and only if β > α2 . Finally, a quadratic function is coercive if and only if
∇2 f (x) is positive definite which in tis case occurs when β > α2 .
(c) Compute and classify all critical points of the function
f (x1 , x2 , x3 ) = 2(ax1 − 1)2 + (x1 − x22 )2 + (ax2 + x3 )2 ,
where a > 0 is a given positive real number.
Solution: Critical points are
  



 
x1
1/a
2a/(2a2 + 1)
1/a
√
√
x2  = 
 , 1/ a , and −1/ a .
0
√
√
− a
a
x3
0
By considering ∇2 f (x) at each of these points we find that the first one is a saddle point and
the last two are local solutions. By plugging the last two into f we see that they yield the same
function value zero. Since f is everywhere non-negative, these last two points are global solutions.
2. Consider the function
where Q ∈ IRn×n
1
f (x) = xT Qx + cT x,
2
m
is symmetric and c ∈ IR .
(a) Under what condition on the matrix Q ∈ IRn×n is f convex? Justify your answer.
Solution: There are many ways to analyze this question. Below we establish the convexity condition from first principals. First observe that since α2 = α − α(1 − α) we have
((1 − λ)x + λy)T Q((1 − λ)x + λy) = (1 − λ)2 xT Qx + 2λ(1 − λ)xT Qy + λ2 y T Qy
= ((1 − λ) − (1 − λ)λ)xT Qx + 2λ(1 − λ)xT Qy + (λ − λ(1 − λ))y T Qy
= (1 − λ)xT Qx + λy T Qy − λ(1 − λ) xT Qx − 2xT Qy + y T Qy
= (1 − λ)xT Qx + λy T Qy − λ(1 − λ)(x − y)T Q(x − y) .
Therefore, for all 0 ≤ λ ≤ 1 and x, y ∈ IRn ,
1
((1 − λ)x + λy)T Q((1 − λ)x + λy) + cT ((1 − λ)x + λy)
2
1 T
1 T
λ(1 − λ)
T
T
= (1 − λ) x Qx + c x + λ y Qy + c y −
(x − y)T Q(x − y)
2
2
2
λ(1 − λ)
= (1 − λ)f (x) + λf (y) −
(x − y)T Q(x − y) .
2
f ((1 − λ)x + λy) =
Consequently, f is convex if and only if (x − y)T Q(x − y) ≥ 0 for all x, y ∈ IRn , or equivalently, Q
is positive semi-definite.
(b) If Q is symmetric and positive definite, show that there is a nonsingular matrix L such that
Q = LLT .
Solution: This is easily shown either by an eigenvalue decomposition of a Cholesky factorization.
(c) With Q and L as defined in the part (b), show that
1
1
f (x) = kLT x + L−1 ck22 − cT Q−1 c.
2
2
Solution: Just multiply it out:
1
1 T
kL x + L−1 ck22 − cT Q−1 c =
2
2
=
=
1
1 T T T
(L x) (L x) + 2(L−1 c)T (LT x) + (L−1 c)T (L−1 c) − cT Q−1 c
2
2
1 T −1
1 T
T
T
T −T −1
x LL x + 2c x + c L L c − c Q c
2
2
1 T
x Qx + cT x .
2
(d) If f is convex, under what conditions is minx∈IRn f (x) = −∞?
Solution: If Q has a negative eigenvalue, then the minimum value is obviously −∞ so we can
assume that Q is positive semi-definite. If Q is positive definite, then f is coercive so the min
value cannot be −∞, so we can assume that Q has a non-trivial null-space. Since we are assuming
that Q is positive semi-definite, f is convex, so a necessary and sufficient condition for f to have a
global solution is that there exist an x such that 0 = ∇f (x) = Qx + c, or equivalently, c ∈ RanQ.
Therefore, min f = −∞ can only possibly occur is c ∈
/ RanQ. In this case, we may write c = c1 + c2
with c1 ∈ RanQ and c2 ∈ (RanQ)⊥ = NulQ with c2 6= 0. Note that f (tc2 ) = t kc2 k2 and letting
t ↓ −∞ sends f to −∞.
Putting all of this together we get that min f = −∞ if and only if either Q has a negative eigenvalue
or Q is positive semi-definite and c ∈
/ NulQ.
(e) Assume that Q is symmetric and positive definite and let S be a subspace of IRn . Show that x
¯
solves the problem
min f (x)
x∈S
if and only if ∇f (¯
x) ⊥ S.
Solution: In lecture this was shown to be true for an arbitary convex function.
3. Let A ∈ IRm×n and b ∈ IRm and consider the function
1
f (x) = kAx − bk22 .
2
(a) Show that Ran(AT A) = Ran(AT ).
Solution: Done in class a couple of times.
(b) Show that for every b ∈ IRm there exists a solution to the equation AT Ax = AT b.
Solution: Obvious from (i).
(c) Show that there always exists a solution to the problem
LS
minn
x∈IR
1
kAx − bk22 ,
2
and describe the set of solutions to this problem.
Solution: Obvious from (ii).
(d) Under what conditions on A is the solution to the problem LS unique.
Solution: Nul(A) = {0}.
(e) Using the condition from part (iv) derive a formula for the unique solution to LS.
Solution: d = (AT A)−1 AT b.
(f) If AT A is invertible, show that the matrix P = A(AT A)−1 AT is the orthogonal projection onto
Ran(A), that is, show that
i. P 2 = P
ii. P T = P
iii. Ran(P ) = Ran(A).
Solution: A and B are obvious. For C , clearly Ran(P ) ⊂ Ran(A). On the other hand, if y ∈
Ran(A) then there is an x such that Ax = y. Consequently, P y = P Ax = A(AT A)−1 AT Ax =
Ax = y so Ran(A) ⊂ Ran(P ).
4. (a) Matrix secant questions.
i. Let u, v ∈ IRn with u 6= 0 6= v. Show that the matrix uv T ∈ IRn×n is rank 1.
Solution: Since Ran(uv T ) = Span[u], uv T ha rank 1.
ii. Let B ∈ IRn×n and s, y ∈ IRn such that y 6= Bs. For what values of u, v ∈ IRn is it true that
the matrix B+ = B + uv T satisfies B+ s = y? Justify your answer.
Solution: y = B+ s = Bs + uv T s so v T s 6= 0 since y 6= Bs. But then u = y−Bs
.
vT s
n×n
n
iii. Let H ∈ IR
be symmetric and s, y ∈ IR . Under what conditions do there exist vectors
u, v ∈ IRn such that the matrix H+ = H + uv T is also symmetric and satisfies H+ s = y?
Justify your answer.
Solution: From the previous problem we must have v T s 6= 0 and for symmetry we must
have u is a multiple of v. So v = y − Hs and we must have (y − Hs)T s 6= 0 or equivalently
y T s 6= sT Hs.
iv. Let M ∈ IRn×n be symmetric and positive definite and let s, y ∈ IRn be such that sT y > 0.
The BFGS applied to M is
yy T
M ssT M
M+ = M + T − T
.
s y
s Ms
Show that M+ can be rewritten in the form
M+ = M + U D−1 U T ,
where U ∈ IRn×2 and D ∈ IR2×2 is an invertible diagonal matrix, by deriving formulas for both
U and D.
Solution: U = [y, M s] and D = diag(sT y, sT M s).
(b) (This problem is over the top. But parts might be trimmed down for a suitable final exam question.)
Let F : IRn → IRm be continuously differentiable, and let k · k be any norm on IRm . In this
problem we consider the function f (x) = kF (x)k and properties of the the associated GaussNewton direction for minimizing f . Recall that x
¯ ∈ IRn is a first-order stationary point for f if
0 ≤ f 0 (¯
x; d) for all d ∈ IRn .
i. Given x, d ∈ IRn and t > 0 show that
| kF (x + td)k − kF (x) + tF 0 (x)dk | ≤ kF (x + td) − (F (x) + tF 0 (x)d)k .
ii. Why is it true that
kF (x + td) − (F (x) + tF 0 (x)d)k
=0?
t→0
t
lim
Hint: What is the definition of F 0 (x)?
iii. Use parts i and ii to show that
f 0 (x; d) = lim
t↓0
kF (x) + tF 0 (x)dk − kF (x)k
.
t
iv. Use part iii and the convexity of the norm to show that
f 0 (x; d) ≤ kF (x) + F 0 (x)dk − kF (x)k .
Hint: F (x) + tF 0 (x)d = (1 − t)F (x) + t(F (x) + F 0 (x)d)
v. Use parts iii and iv to show that 0 ≤ f 0 (x; d) for all d if and only if kF (x)k ≤ kF (x) + F 0 (x)dk
for all d.
vi. If x is not a first-order stationary point for f and d¯ solves
GN
min kF (x) + F 0 (x)dk ,
d∈IRn
show that d is a descent direction for f at x by showning that
¯ ≤ kF (x) + F 0 (x)dk − kF (x)k < 0 .
f 0 (x; d)
vii. Show that the problem
min kF (x) + F 0 (x)dk1
d∈IRn
can be written as a linear program.
viii. Suppose at a given point x ∈ IRn one solves the problem GN in Part vi above to obtain a
¯ < kF (x)k. Show that the following backtracking line
direction d¯ for which kF (x) + F 0 (x)dk
search procedure is finitely terminating in the sense that the solution t¯ is not zero.
Line Search: Let 0 < c < 1 and 0 < γ < 1 and set
t¯ := min γ s
s.t. s ∈ {0, 1, 2, 3, ....} and
¯ ≤ (1 − γ s c)kF (x)k + cγ s kF (x) + F 0 (x)dk.
¯
kF (x + γ s d)k
¯ = kF (x)k + cγ s [kF (x) + F 0 (x)dk
¯ − kF (x)k]. Then
Hint: (1 − γ s c)kF (x)k + cγ s kF (x) + F 0 (x)dk
use Part vi.
5. (a) Locate all of the KKT points for the following problem. Are these points local solutions? Are they
global solutions?
minimize x21 + x22 − 4x1 − 4x2
subject to x21 ≤ x2
x1 + x2 ≤ 2
(b) Let Q ∈ IRn×n be symmetric and positive definite, g ∈ IRn , b ∈ IRm , and A ∈ IRm×n with
Nul(AT ) = {0}.
i. Show that the matrix AQ−1 AT is nonsingular.
Solution: Since Q is positive definite wT Qw = 0 iff w = 0, so y T AQ−1 AT y = 0 iff AT y = 0
iff y = 0 since Nul(AT ) = {0}. Therefore, AQ−1 AT is symmetric and positive definite and so
nonsingular.
ii. Show that
x
¯ = Q−1 AT (AQ−1 AT )−1 b − [I − Q−1 AT (AQ−1 AT )−1 A]Q−1 c
solves the problem
minimize 21 xT Qx + cT x
.
subject to Ax = b
Hint: Use the Lagrangian L(x, y) = f (x) + y T (b − Ax) and show that the Lagrange multiplier
y¯ must satisfy x
¯ = Q−1 (−c + AT y¯). Then multiply this expression through by A and solve for
y¯.
Solution: First note that 0 = Nul(AT ) = Ran(A)⊥ so Ran(A) = IRm . In particular, this
implies that this optimization problem is feasible for possible values of b. The KKT conditions
are 0 = ∇L(¯
x, y¯) = Q¯
x + c − AT y¯, or equivalently, x
¯ = Q−1 (AT y¯ − c). Plugging this into
Ax = b, we get
b = A¯
x = AQ−1 (AT y¯ − c) = AQ−1 AT y¯ − AQ−1 c .
Multiplying through by (AQ−1 AT )−1 and re-arranging gives
y¯ = (AQ−1 AT )−1 (AQ−1 c + b).
Plugging this back into x
¯ = Q−1 (AT y¯ − c) gives the result.
(c) Let Q ∈ IRn×n be symmetric and positive definite, and let c ∈ IRn . Consider the optimization
problem
1
min xT Qx + cT x .
0≤x 2
i. What is the Lagrangian function for this problem?
ii. Show that the Lagrangian dual is the problem
1
max − y T Q−1 y
y≤c
2
=
− min
y≤c
1 T −1
y Q y.
2
iii. Show that if x
¯, y¯ ∈ IRn satisfy y¯ = −Q¯
x, then x
¯ solves the primal problem if and only if y¯
solves the dual problem, and the optimal values in the primal and dual coincide.
(d) (A harder duality problem) Let Q ∈ IRn×n be symmetric and positive definite. Consider the
optimization problem
P minimize 21 xT Qx + cT x
subject to kxk∞ ≤ 1 .
i. Show that this problem is equivalent to the problem
minimize 12 xT Qx
Pˆ
subject to −e ≤ x ≤ e ,
where e is the vector of all ones.
ˆ
ii. What is the Lagrangian for P?
iii. Show that the Lagrangian dual for Pˆ is the problem
D
1
max − (y − c)T Q−1 (y − c) − kyk1
2
=
1
− min (y − c)T Q−1 (y − c) + kyk1 .
2
This is also the Lagrangian dual for cP .
iv. Show that if x
¯, y¯ ∈ IRn satisfy y¯ = Q¯
x + c, then x
¯ solves P if and only if y¯ solves D, and the
optimal values in P and D coincide.