1. Prerequisites in Measure Theory, Probability, and Ergodic Theory

Transcription

1. Prerequisites in Measure Theory, Probability, and Ergodic Theory
1. Prerequisites in Measure Theory,
Probability, and Ergodic Theory
Last modified on March 26, 2015
We start by recalling briefly basic notions and facts which will be used in subsequent
chapters.
1.1. Notation
The sets of natural, integer, non-negative integer, and real numbers will be denoted by
N, Z, Z+ , R, respectively. The set of extended real numbers R is R [ {±1}. If ⇤ and V
are some non-empty sets, by V ⇤ we will denote the space of V -valued functions defined
on ⇤:
V ⇤ = {f : ⇤ ! V }.
If ⇧ ✓ ⇤ and f 2 V ⇤ , by f⇧ we will denote the restriction of f to ⇧, i.e., f⇧ 2 V ⇧ .
If ⇧1 and ⇧2 are disjoint, f 2 V ⇧1 , and g 2 V ⇧2 , by f⇧1 g⇧2 we will denote the unique
element h 2 V ⇧1 [⇧2 such that h⇧1 = f and h⇧2 = g.
1.2. Measure theory
1.2.1. Suppose ⌦ is some non-empty set, a collection A of subsets of ⌦ is called a
-algebra if it satisfies the following properties:
(i) ⌦ 2 A;
(ii) if A 2 A, then Ac = ⌦ \ A 2 A;
(iii) for any sequence {An }n2N where An 2 A for every n 2 N, then [n An 2 A.
Equivalently, -algebra A is a collection of subsets of ⌦, closed under countable operations like intersection, union, complement, symmetric di↵erence. An element of A is
called a measurable subset of ⌦. Intersection of two -algebras is again a -algebra.
This property allows us, for a given collection of sets C of ⌦, to define uniquely the
minimal -algebra (C) containing C, as an intersection of all -algebras containing C.
11
1. Prerequisites
A pair (⌦, A), where A is some -algebra of subsets of ⌦, is called a measurable space.
If the space ⌦ is metrizable (e.g., ⌦ = [0, 1] or R), one can define the Borel -algebra of
⌦, denoted by B(⌦), as the minimal -algebra containing all open subsets of ⌦.
1.2.2. Suppose (⌦1 , A1 ) and (⌦2 , A2 ) are two measurable spaces. The map T : ⌦1 ! ⌦2
is called (A1 , A2 )-measurable if for any A2 2 A2 , the full preimage of A2 is a measurable
subset of ⌦1 , i.e.,
T 1 A2 = {! 2 ⌦1 : T (!) 2 A2 } 2 A1 .
By definition, a random variable is a measurable map from (⌦1 , A1 ) into (R, B(R)). If
(⌦, A) is a measurable space, and T : ⌦ ! ⌦ is a measurable map, then T is called
a measurable transformation of (⌦, A) or an endomorphism of (⌦, A). If, furthermore,
T is invertible and T 1 : ⌦ ! ⌦ is also measurable, then T is called a measurable
isomorphism of (⌦, A).
1.2.3. Suppose (⌦, A) is a measurable space. The function µ : A ! [0, +1] is called a
measure if
• µ(?) = 0;
• for any countable collection of pairwise disjoint sets {An }n , An 2 A, one has
⇣[ ⌘ X
µ
An =
µ(An ).
(1.2.1)
n
n
The triple (⌦, A, µ) is called a measure space. If µ(⌦) = 1, then µ is called a probability
measure, and (⌦, A, µ) a probability (measure) space.
1.2.4. Suppose (⌦, A, µ) is called a measure space. The for any A 2 A, the indicator
function of A is
(
1,
! 2 A,
IA (!) =
0,
! 2 A.
The Lebesgue integral of f = IA is defined as
Z
IA dµ = µ(A).
A function f is called simple if there exist K 2 N, measurable sets {Ak }K
k=1 with Ak 2 A,
and non-negative numbers {↵k }K
such
that
k=1
f (!) =
K
X
↵k IAk (!).
k=1
The Lebesgue integral of the simple function f is defined as
Z
12
f dµ =
K
X
k=1
↵k µ(Ak ).
1.2. Measure theory
Suppose {fn } is a monotonically increasing sequence of simple functions and f = limn fn .
Then the Lebesgue integral of f is defined as
Z
Z
f dµ = lim fn dµ.
n
It turns out that every non-negative measurable function f : ⌦ ! R+ can be represented
as a limit of increasing sequence of simple functions.
For a measurable function f : ⌦ ! R, let f+ and f be the positive and the negative
part of f , respectively, i.e. f = f+ f . The function f is called Lebesgue integrable if
Z
Z
f+ dµ,
f dµ < +1
and the Lebegsue integral of f is then defined as
Z
Z
Z
f dµ = f+ dµ
f dµ.
The set of all Lebesgue integrable function of (⌦, A, µ), will be denoted by L1 (⌦, A, µ),
or L1 (⌦) and L1 (µ), if the latter cases do not lead to confusion.
R
If (⌦, A, µ) is a probability space, we will sometimes denote the Lebesgue integral f dµ
by Eµ f or Ef .
1.2.5. Conditional expectation with respect to a sub- -algebra. Suppose (⌦, A, µ) is
a probability space, and F is some sub- -algebra of A. Suppose f 2 L1 (⌦, A, µ) is a
A-measurable Lebesgue integrable function. The conditional expectation of f given F
will be denoted by E(f |F), and is by definition a F-measurable function on ⌦ such that
Z
Z
E(f |F) dµ =
f dµ
C
C
for every C 2 F.
1.2.6. Martingale convergence theorem.
Suppose (⌦, A, µ) is a measure space, and {An } is a sequence of sub- -algebras of A
such that An ✓ An+1 for all n. Denote by A1 the minimal -algebra containing all An .
Then for any f 2 L1 (⌦, A, µ)
Eµ (f |An ) ! Eµ (f |A1 )
µ-almost surely and in L1 (⌦).
1.2.7. Absolutely continuous measures and the Radon-Nikodym Theorem. Suppose ⌫
and µ are two measures on the measurable space (⌦, A). The measure ⌫ is absolutely
continuous with respect to µ (denoted by ⌫ ⌧ µ) if
⌫(A) = 0
for all A 2 A such that µ(A) = 0.
13
1. Prerequisites
The Radon-Nikodym theorem states that if µ and ⌫ are two -finite measures on (⌦, A),
and ⌫ ⌧ µ, then there exists a non-negative measurable function f , called the (RadonNikodym) density of ⌫ with respect to µ, such that
Z
⌫(A) =
f dµ, for all A 2 A.
A
1.3. Stochastic processes
Suppose (⌦, A, µ) a probability space. A stochastic process {Xn } is a collection of
random variables Xn : ⌦ ! R, indexed by n 2 T , where the time is
T = Z+ or Z.
Stochastic process can be described by the finite-dimensional distributions (marginals):
for every (n1 , . . . , nk ) 2 T k ,
µ(Xn1 2 ·, . . . , Xnk 2 ·)
is a probability measure on Rk . In the opposite direction, a consistent family of finitedimensional distributions can be used to define a stochastic process.
Process {Xn } is called stationary if for (n1 , . . . , nk ) 2 T k and all t,
µ(Xn1 2 ·, . . . , Xnk 2 ·) = µ(Xn1 +t 2 ·, . . . , Xnk +t 2 ·),
i.e., the time-shift does not a↵ect the finite-dimensional marginal distributions.
1.4. Ergodic theory
Ergodic Theory originates from the Boltzmann-Maxwell ergodic hypothesis and is the
study of measure preserving dynamical systems
(⌦, A, µ, T )
where
• (⌦, A, µ) is a (Lebesgue) probability space
• T : ⌦ ! ⌦ is measure preserving: for all A 2 A
µ(T
1
A) = µ ({x 2 X : T (x) 2 A}) = µ(A).
Examples: ⌦ = T1 = R/Z ⇠
= [0, 1), µ= Lebesgue measure,
• Circle rotation: T↵ (x) = x + ↵ mod 1,
• Doubling map: T (x) = 2x mod 1.
14
1.5. Entropy
Example: Bernoulli shifts.
⇣
⌘
Finite set (alphabet) A = {1, . . . , N } .
n
⌦ = AZ+ = ! = (!n )n
0
o
: !n 2 A .
Measure p = (p(1), . . . , p(N )) on A can be extended to the measure µ = pZ+ on AZ+
⇣
⌘
µ ! : xi1 = !i1 , . . . , !in = ain
= p(ai1 ) · · · p(ain ).
The measure µ is clearly preserved by the left shift
!
= !n+1
n
:⌦!⌦
8n.
Definition 1.1. Measure-preserving dynamical system (⌦, A, µ, T ) is ergodic if every
invariant set is trivial:
A=T
1
)
A
µ(A) = 0 or 1,
equivalently, if every invariant function is constant:
f (!) = f (T (!)) (µ
)
a.e.)
f (!) = const (µ
a.e.).
Theorem 1.2 (Birkho↵’s Pointwise Ergodic Theorem). Suppose (⌦, A, µ, T ) is an ergodic measure-preserving dynamical system. Then for all f 2 L1 (⌦, µ)
n 1
1X
f (T t (x)) !
n t=0
Z
f (x)µ(dx) as n ! 1
µ-almost surely and in L1 .
1.5. Entropy
Suppose p = (p1 , . . . , pN ) is a probability vector, i.e.,
pi
0,
N
X
pi = 1.
i=1
The entropy of p is
H(p) =
N
X
pi log2 pi .
i=1
15
1. Prerequisites
1.5.1. Shannon’s entropy rate per symbol
Definition 1.3. Suppose Y = {Yk } is a stationary process with values in a finite alphabet A, and µ is the corresponding translation invariant measure. Fix n 2 N, and
consider the distribution of n-tuples (Y0 , . . . , Yn 1 ) 2 An . Denote by Hn entropy of this
distribution:
X
Hn = H(Y0n 1 ) =
P[Y0n 1 = an0 1 ] log2 P[Y0n 1 = an0 1 ].
(a0 ,...,an
1 )2A
n
The entropy (rate) of the process Y = {Yk }, equivalently, of the measure P, denoted by
h(Y), h(P), or h (P), is defined as
1
Hn
n!1 n
h(Y) = h(P) = h (P) = lim
(the limit exists!).
Similarly,the entropy can be defined for stationary random fields {Yn }n2Zd , Yn 2 A. Let
⇤n = [0, n 1]d \ Zd . Then
h(Y) = lim
n!1
1
|⇤d |
X
P[Y⇤n = a⇤n ] log2 P[Y⇤n = a⇤n ],
a⇤n 2A⇤n
again one can easily show that the limit exists.
1.5.2. Kolmogorov-Sinai entropy of measure-preserving systems
Suppose (⌦, A, µ, T ) is a measure-preserving dynamical system. Suppose
C = {C1 , . . . , CN }
is a finite measurable partition of ⌦, i.e., a partition of ⌦ into measurable sets Ck 2 A.
For every ! 2 ⌦ and n 2 Z+ (or Z), put
Yn (x) = YnC (!) = j 2 {1, . . . , N }
,
T n (!) 2 Cj .
Proposition: For any C, the corresponding process YC = {Yn }, Yn : ⌦ ! {1, . . . , N },
is a stationary process with
⇣
⌘
P[Y0 = j0 , . . . , Yn = jn ] = µ ! 2 ⌦ : ! 2 Cj0 , . . . , T n (x) 2 Cjn .
Definition 1.4. If (⌦, A, µ, T ) is a measure preserving dynamical system, and C =
{C1 , . . . , CN } is a finite measurable partition of ⌦, then the entropy of (⌦, A, µ, T ) with
16
1.5. Entropy
respect to C is defined as as the Shannon entropy of the corresponding symbolic process
YC :
hµ (T, C) = h(YC ).
Finally, the measure-theoretic or the Kolmogorov-Sinai entropy of (⌦, A, µ, T ) is
defined as
hµ (T ) = sup hµ (T, C).
C is finite
The following theorem of Sinai eliminates the need to consider all finite partitions.
Definition 1.5. A partition C is called generatingpartition (or generator) of the dynamical system ⌦, A, µ, T ) if the smallest -algebra containing all T n (Cj ), j = 1, . . . , N ,
n 2 Z, is A.
Theorem 1.6 (Ya. Sinai). If C is a generating partition then
hµ (T ) = hµ (T, C)
17