L p Spaces and Convexity

Transcription

Lp Spaces and Convexity
These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real &
Complex Analysis.
1. Convex functions
Let I ⊂ R be an interval. For I open, we say a function f : I → R is convex if for every
a, b ∈ I and every λ ∈ (0, 1), we have
(1)
φ(λb + (1 − λ)a) ≤ λφ(b) + (1 − λ)φ(a).
(Note we do not assume that φ is differentiable, as for example φ(x) = |x| is convex.) If
I is not open, then we say φ : I → R is convex if (1) is satisfied and φ is continuous at
any endpoint of I. Geometrically, I is convex if every secant line segment lies above the
graph of φ. A convex function φ is said to be strictly convex if whenever the equality in
(1) is satisfied for some λ ∈ (0, 1) and a, b ∈ I, then a = b. In other words, φ is strictly
convex if for every a 6= b ∈ I and λ ∈ (0, 1),
φ(λb + (1 − λ)a) < λφ(b) + (1 − λ)φ(a).
Here are a few lemmas about convex functions, whose proofs will be left as exercises.
Lemma 1. Let φ be a convex function, and let a, b ∈ I with a < b. Assume there is a
λ ∈ (0, 1) so that
φ(λb + (1 − λ)a) = λφ(b) + (1 − λ)φ(a).
Then the restriction of φ to [a, b] is linear.
Lemma 2. A convex function φ is strictly convex if and only if its graph contains no line
segments.
Lemma 3. Each tangent line to the graph of a differentiable strictly convex function φ
intersects the graph of φ only at the point of tangency.
Lemma 4. Any convex function is continuous.
If φ : I → R is convex and x0 ∈ I, then the line given by the graph of `(x) =
φ(x0 ) + m(x − x0 ) is a supporting line of φ at x0 , if φ(x) ≥ `(x) for all x ∈ I.
Proposition 5. Let φ : I → R be convex, and let x0 ∈ I ◦ . Then there is a supporting
line for φ at x0 .
0)
Proof. For x ∈ I − {x0 }, let m(x) = φ(x)−φ(x
. Then we claim m is an increasing function
x−x0
of x. To prove the claim, first consider x0 < x0 < x be points in I, and define λ ∈ (0, 1)
so that x0 = λx + (1 − λ)x0 . Consider the secant line from x0 to x. Then the convexity
of φ implies
φ(x0 ) = φ(λx + (1 − λ)x0 ) ≤ λφ(x) + (1 − λ)φ(x0 ).
1
2
Compute
λ =
1−λ =
φ(x0 ) ≤
=
=
=
x0 − x0
,
x − x0
x − x0
,
x − x0
x0 − x0
x − x0
φ(x) +
φ(x0 )
x − x0
x − x0
x0 − x0
x0 − x0
x0 − x0
x − x0
φ(x) −
φ(x0 ) +
φ(x0 ) +
φ(x0 )
x − x0
x − x0
x − x0
x − x0
x0 − x0
x − x0
[φ(x) − φ(x0 )] +
φ(x0 )
x − x0
x − x0
(x0 − x0 )m(x) + φ(x0 ),
0
φ(x ) − φ(x0 )
≤ m(x),
x0 − x0
m(x0 ) ≤ m(x).
The other cases of x0 < x0 < x and x0 < x < x0 are similar.
Now since m(x) is increasing on I − {x0 }, the one-sided limits m+ = limx→x+0 m(x) and
m− = limx→x−0 m(x) exist and satisfy m+ ≥ m− . Then we claim that if m− ≤ m ≤ m+ ,
then `(x) = φ(x0 ) + m(x − x0 ) is a supporting line of φ at x0 . Since m ≤ m(x) for all
x > x0 ,
`(x) = φ(x0 ) + m(x − x0 ) ≤ φ(x0 ) + m(x)(x − x0 ) = φ(x).
Similarly, since m ≥ m(x) for all x < x0 , we also see `(x) ≤ φ(x) for all x < x0 , and thus
the graph of ` is a supporting line for φ at x0 .
Corollary 6. Let φ be a differentiable convex function. Let a, b ∈ I. Then
φ(b) ≥ φ(a) + φ0 (a)(b − a).
In other words, the graph of φ lies above the graph of each tangent line. For the proof,
just recognize m+ = m− = φ0 (a) in this case.
Proposition 7. If φ : I → R is strictly convex, and the graph of ` is a supporting line of
φ at x0 ∈ I ◦ , then for all x ∈ I − {x0 }, φ(x) > `(x).
Proof. Apply Lemma 2.
Proposition 8. Let φ : I → R be continuous, and assume φ00 > 0 on the interior I ◦ of I.
Then φ is strictly convex on I.
Proof. Since φ00 > 0, we see that φ0 is strictly increasing on I ◦ . Let a < b in I. Define
ψ(t) = φ(tb + (1 − t)a) − tφ(b) − (1 − t)φ(a).
Then we want to show ψ < 0 on (0, 1). Note ψ(0) = ψ(1) = 0. Now
ψ 0 (t) = φ0 (tb + (1 − t)a)(b − a) − φ(b) + φ(a)
is strictly increasing. Since ψ(0) = ψ(1) = 0, there is a T ∈ (0, 1) where ψ 0 (T ) = 0 (either
there is a local extremum point or ψ is constant; this is Rolle’s Theorem). Since ψ 0 is
strictly increasing, we have ψ 0 (t) < 0 for t ∈ (0, T ) and ψ 0 (t) > 0 for t ∈ (T, 1). Therefore,
3
ψ is strictly decreasing on [0, T ] and strictly increasing on [T, 1]. Since ψ(0) = ψ(1) = 0,
we find ψ(t) < 0 for t ∈ (0, 1).
Corollary 9. For p ∈ (1, ∞), the function x 7→ xp is strictly convex on [0, ∞). The
exponential function exp x = ex is strictly convex on (−∞, ∞).
2. The Banach space Lp
Let (X, M, µ) be a measure space. For a measurable function f , define
Z
p1
p
kf kLp =
|f | dµ .
X
Then we define
Lp (X) = {f : X → R (or C) : kf kLp < ∞}/ ∼,
where as usual f ∼ g if f = g almost everywhere.
Proposition 10. k · kLp is a norm on Lp (X).
• It is obvious that kf kLp ≥ 0 always, and if kf kLp = 0, then we have
|f |p dµ = 0, which implies |f |p = 0 a.e. This is equivalent to f = 0 a.e.
X
• It is obvious that if α is a constant, then kαf kLp = |α| · kf kLp .
• The Triangle Inequality is harder, and we cover it in Minkowski’s Theorem below.
Proof. R
Theorem 1 (Minkowski’s Theorem). Let p ∈ [1, ∞]. If f, g ∈ Lp (X), then
(2)
kf + gkLp ≤ kf kLp + kgkLp .
If p ∈ (1, ∞), then equality can hold only if there are nonnegative constants α, β, not
both zero, so that βf = αg.
Moreover, if f, g ≥ 0 are measurable (but not necessarily in Lp (X)), then (2) holds.
Proof. We have already addressed the cases of p = 1, ∞. So we may assume p ∈ (1, ∞).
Also, if kf kLp = 0, then f = 0 a.e., and the conclusion is valid.
So now assume p ∈ (1, ∞), α = kf kLp > 0, and β = kgkLp > 0. Choose functions
f0 = α−1 |f |, g0 = β −1 |g|. Therefore, kf0 kLp = kg0 kLp = 1. For λ = α/(α + β), and so
1 − λ = β/(α + β). Compute
|f (x) + g(x)|p ≤
=
=
≤
(|f (x)| + |g(x)|)p
[αf0 (x) + βg0 (x)]p
(α + β)p [λf0 (x) + (1 − λ)g0 (x)]p
(α + β)p [λf0 (x)p + (1 − λ)g0 (x)p ]
by the convexity of φ(t) = tp . Recall p ∈ (1, ∞) implies this last inequality is strict unless
f0 (x) = g0 (x).
z
otherwise. Also define sgn(∞) = sgn(−∞) =
For z ∈ C, define sgn 0 = 0 and sgn z = |z|
0. For f (x), g(x) finite and nonzero, we see |f (x) + g(x)|p = (|f (x)| + |g(x)|)p if and only
4
if sgn f (x) = sgn g(x). Thus, when f (x) and g(x) are finite, by considering various cases,
we find
|f (x) + g(x)|p ≤ (α + β)p [λf0 (x)p + (1 − λ)g0 (x)p ]
(3)
with equality if and only if α−1 f (x) = β −1 g(x) when f (x), g(x) are finite.
Integrating both sides of (3) gives
kf + gkpLp ≤ (α + β)p [λkf0 kpLp + (1 − λ)kg0 kpLp ]
= (α + β)p
= (kf kLp + kgkLp )p .
Therefore kf + gkLp ≤ kf kLp + kgkLp for f, g ∈ Lp . Moreover, if there is equality, then
Z
|f (x) + g(x)|p − (α + β)p [λf0 (x)p + (1 − λ)g0 (x)p ] = 0,
X
and the integrand is nonnegative almost everywhere. Therefore, the integrand must vanish
almost everywhere, and thus α−1 f (x) = β −1 g(x) for almost every x ∈ X.
Finally, the remaining case in which f, g are nonnegative and kf kLp or kgkLp is infinite
is trivial.
Let (V, k · k) be a normed linear space.
P In other words, V is a vector space over R or C
the partial sums
equipped with a norm k · k. A series ∞
n=1 vn for vn ∈ V is convergent if P
converge to a limit in V . The series is said to be absolutely convergent if ∞
n=1 kvn k < ∞.
Proposition 11. Let V be a vector space over the field R or C equipped with a norm
k · k. Consider the metric on V with the distance function kx − yk. Then V is complete
if and only if every absolutely convergent series in V is convergent.
P∞
convergent series.
Proof. First
of
all
assume
V
is
complete.
Let
n=1 vn be an absolutely
P
P
Let sn = nj=1 vj be the partial sum. Then if m > n, sm − sn = m
j=n+1 vj and
m
m
∞
X
X
X
(4)
ksm − sn k = vj ≤
kvj k ≤
kvj k.
j=n+1
j=n+1
j=n+1
P
P∞
But now since n=1P
vn is absolutely convergent, the sum ∞
n=1 kvn k converges, and so
the tail of the series ∞
kv
k
→
0
as
n
→
∞.
In
other
words,
for every > 0, there
j
j=n+1
P∞
is an N so that if n ≥ N , then j=n+1 kvj k ≤ . Then (4) shows the sequence of partial
sums sn is a Cauchy sequence. Since V is complete, it has a limit s ∈ V , which is the
sum of the series.
On the other hand, assume every absolutely convergent series in V is convergent. Let
wn be a Cauchy sequence. Define wnk as a subsequence as follows: For = 21 , there is an
N so that if n, m ≥ N , then kwn − wm k ≤ 21 . Let n1 = N . Then define nk recursively
as nk = max{nk−1 + 1, N }, for N a constant so that if n, m ≥ N , then kwn − wm k ≤ 21k .
By induction, wnk is a subsequence of wn so that kwnk − wnk+1 k ≤ 21k for all k. Now if
v1 = wn1 and vk = wnk − wnk−1 for k ≥ 2. By construction
∞
X
k=1
kvk k ≤ kwn1 k +
∞
X
k=2
1
2k
= kwn1 k +
1
2
< ∞.
5
P
Therefore, ∞
k=1 vk is absolutely convergent, and thus is convergent to a sum s by our
assumption.
P
Now we show wn → s. Let > 0. Note the partial sum kj=1 vj = wnk , and so wnk → s
as k → ∞. So there is a K so that if k ≥ K, then kwnk − sk ≤ 2 . Since wn is Cauchy,
there is an N so that if n, m ≥ N , then kwn − wm k ≤ 2 . So choose L ≥ K so that
nL ≥ N , and then for n ≥ nL , we have
kwn − sk ≤ kwn − wnL k + kwnL − sk ≤
2
+
2
So wn → s.
= .
Theorem 2. For p ∈ [1, ∞], Lp (X) is a Banach space.
Proof. We have already addressed the case of p = ∞. Thus we may assume that p ∈
[1, ∞). We have also proved above that k·kLp is a norm. Thus we only need to prove Lp (X)
is complete. We will use the previous proposition to show that absolutely convergent series
in Lp (X) are convergent.
P
Let fnP
∈ Lp (X) be an absolutely convergent series, so that ∞
n=1 kfn kLp = M . Define
n
gn (x) = k=1 |fk (x)|. By Minkowski’s Inequality,
kgn k
Lp
≤
n
X
kfk k
Lp
≤
k=1
∞
X
kfk kLp = M.
k=1
Since gn is increasing pointwise, it converges gn (x) → g(x) as n → ∞ (where g may take
the value ∞). Moreover, gn (x)p → g(x)p . By Fatou’s Lemma, we see
Z
Z
p
g ≤ lim inf
gnp = lim inf kgn kpLp ≤ M p .
X
n→∞
n→∞
X
So g p is integrable, and g(x) is finite for almost every x.
P∞
and thus is convergent
For x with g(x) < ∞,
n=1 fn (x) is absolutely convergent,
P∞
in R (or C). So for almost every x, we define s(x) = n=1 fn (x), and let sn (x) be the
corresponding partial sum. Note |sn (x)| ≤ g(x) implies |s(x)| ≤ g(x) and thus s ∈ Lp (X).
This implies
|sn (x) − s(x)|p ≤ 2p g(x)p ,
and 2p g p is integrable. Therefore, the Dominated Convergence Theorem applies, and since
|sn (x) − s(x)|p → 0 almost everywhere,
Z
Z
p
p
ksn − skLp =
|sn − s| →
0 = 0.
X
Thus the sum
P∞
n=1
X
p
fn converges to s ∈ L (X).
Theorem 3. Let p ∈ [1, ∞), and consider Rd with Lebesgue measure. Then the following
sets of functions are dense in Lp (Rd ):
• Simple functions.
• Step functions.
• Continuous functions with compact support.
The proof is very similar to the case p = 1.
6
¨ lder’s Inequality
3. Ho
For p ∈ [1, ∞], the conjugate exponent is defined to be q so that p1 + 1q = 1. We consider
1, ∞ to be conjugate exponents.
Theorem 4 (H¨older’s Inequality). Let p, q be conjugate exponents. Let f ∈ Lp (X) and
g ∈ Lq (X). Then
Z
(5)
kf gkL1 =
|f g| ≤ kf kLp · kgkLq .
X
Moreover, if p ∈ (1, ∞), equality holds in (5) if and only if there are constants α, β which
are not both zero so that α|f |p = β|g|q almost everywhere.
More generally, if f, g are nonnegative measurable functions, then (5) holds.
Proof. First of all, if kf kLp = 0, then f = 0 a.e. and the result is trivial. The same is true
if kgkLq = 0.
If p = 1 and q = ∞, then |g(x)| ≤ kgkL∞ for almost all x. Therefore,
Z
Z
|f g| ≤ kgkL∞
|f | = kf kL1 · kgkL∞ .
X
X
The same is true if p = ∞ and q = 1. Thus we assume p, q ∈ (1, ∞).
We may assume α = kf kLp and β = kgkLq are positive. Let f0 = α−1 |f | and g0 =
s
t
β −1 |g|. The convexity of the exponential function implies since p1 + 1q = 1 that e p + q ≤
p−1 es + q −1 et . Now for x so that f0 (x), g0 (x) ∈ (0, ∞), define s, t by f0 (x) = exp( ps ) and
g0 (x) = exp( qt ). Therefore,
f0 (x)g0 (x) ≤ p−1 f0 (x)p + q −1 g0 (x)q
(6)
for every x ∈ X. (The cases where f0 (x), g0 (x) are 0 or ∞ are easy to analyze.) Moreover,
the strict convexity of the exponential function implies that if there is equality in (6), then
s = t, which implies f0 (x)p = g0 (x)q , at least in the case when f0 (x), g0 (x) are both finite.
Now integrate (6) to see
Z
Z
Z
p
−1
−1
g0q = p−1 + q −1 = 1.
f0 g0 ≤ p
f0 + q
X
X
X
Then the definitions of f0 , g0 imply (5). If H¨older’s Inequality is an equality, then
Z
f0 g0 − (p−1 f0p + q −1 g0p ) = 0,
X
while the integrand is nonnegative. Thus we have f0 (x)g0 (x) = p−1 f0 (x)p + q −1 g0 (x)q for
almost all x ∈ X. This implies f0 (x)p = g0 (x)q for almost all x.
One remaining case is that of f, g ≥ 0 but kf kLp = ∞. The inequality is trivially true
here. The last remaining case, of kgkLq = ∞, is handled the same way.
7
4. Jensen’s Inequality
A measure µ on a σ-algebra M on a set X is called a probability measure if µ(X) = 1.
Proposition 12. Let (X, M, µ) be a measure space.
R Let f be a nonnegative measurable
function on X. For every E ∈ M, define ν(E) = E f dµ. Then ν is a measure on M.
Proof. We need to check countable additivity. So let Ej be a countable disjoint collection
of measurable sets. Then
! Z
Z X
∞
∞
∞ Z
∞
[
X
X
ν
Ej =
f · χ ∪∞
dµ =
f · χEj =
f · χEj =
ν(Ej ).
j=1 Ej
X
j=1
X j=1
X
j=1
j=1
Here the second equality is by the assumption that the Ej are disjoint, while the third
follows from the Monotone Convergence Theorem, since f · χEj ≥ 0.
This proposition shows how to produce a probability measure from any measure space
together with a measurable nonnegative function with integral 1.
Theorem 5 (Jensen’s Inequality). Let (X, M, µ) be a probability measure space. Let g
be an integrable function on X with range in an interval I ⊂ R. Let φ : I → R be convex.
Then
Z
Z
g dµ .
φ ◦ g dµ ≥ φ
X
X
R
Proof. Let α = X g dµ. Then we claim α ∈ I. To prove the claim, consider b = sup I. If
b = ∞, then α < b since g is integrable. On the other hand, if b is finite, then
Z
Z
b dµ = b µ(X) = b.
g dµ ≤
α=
X
X
A similar analysis applies to inf I, and this implies α ∈ I¯ the closure of I. Moreover, if b
is an endpoint of I, then α 6= b unless g(x) = b for almost every x ∈ X. (Why?)
Thus there are two cases. In the trivial case g(x) = b for almost every x. In this case,
Z
Z
Z
φ ◦ g dµ =
φ(b) dµ = φ(b) = φ
g dµ .
X
X
X
Otherwise, α ∈ I ◦ . By Proposition 5, we my choose `(x) = φ(α) + m(x − α) ≤ φ(x) for
all x ∈ I. Therefore,
Z
Z
Z
φ ◦ g dµ ≥
φ(α) + m(g − α) dµ = φ(α) = φ
g dµ .
X
X
X
Corollary 13. If on a measure space, f is a positive measurable function with integral
1, let g and φ satisfy the hypotheses of Jensen’s Inequality. Then
Z
Z
(φ ◦ g)f dµ ≥ φ
gf dµ) .
X
X

L p Spaces and Convexity

Transcription

Similar documents

A brief review of non-NN approaches to deep learning

MAT 132 - Calculus II,Practice Problems for Midterm 2

STAT 512 - Sample Final Exam Instructions:

2015 State Competition Team Round Problems 1−10

x1 + 2x2 â 3x4 â x5 â¤ 0 â 15x1 + 30x2 â 35x3 + 45x4 + 45x5 â¥ 50

Blocking the k-holes of point sets on the plane

Kettering University Mathematics Olympiad For High School Students 2004, Sample Solutions

MATH 2400 LECTURE NOTES: DIFFERENTIAL MISCELLANY