Fast-Slow Skew Product Systems and Convergence to Stochastic

Transcription

Fast-Slow Skew Product Systems and Convergence to Stochastic
Fast-Slow Skew Product Systems and Convergence
to Stochastic Differential Equations
Ian Melbourne
17 April 2015
Abstract
These are notes for the LMS-CMI Research School Statistical Properties of
Dynamical Systems at Loughborough, 13–17 April 2015.
1
1.1
Lecture 1: The central limit theorem
Statement of the result
Let Λ = [0, 1] with Lebesgue measure µ. Consider the doubling map f : [0, 1] → [0, 1]
given by
(
2y
0 ≤ y < 12
f y = 2y mod 1 =
.
2y − 1 21 ≤ y ≤ 1
It is easy to verify that µ is invariant: µ(f −1 E) = µ(E) for all measurable E. Although slightly harder to verify, µ is ergodic: if E is measurable and f E ⊂ E, then
µ(E) = 0 or µ(E) = 1. (See Remark 1.8(b) below for a proof of ergodicity.)
Suppose that v : Λ → R is integrable (eg if v is continuous). We consider the
sequence of observations
{v ◦ f n : n ≥ 0}.
Since µ is invariant, it follows immediately that v and v ◦ f are identically distributed,
denoted by v =d v ◦ f . This means that
µ{y ∈ Λ : v(y) < b} = µ{y ∈ Λ : v(f y) < b}
for all b ∈ R.
Inductively v ◦ f m =d v ◦ f n for all m, n and hence {v ◦ f n : n ≥ 0} is a sequence
of identically distributed (but usually not independent) random variables. Hence it
makes sense to ask for statistical properties such as the asymptotic behaviour of
vn =
n−1
X
j=0
1
v ◦ f j,
as n → ∞.
We mention without proof the simplest such property: the Strong Law of Large
Numbers (SLLN) which here is a consequence of Birkhoff’s pointwise ergodic theorem:
Since µ is an ergodic invariant probability measure and v is integrable,
Z
1
v dµ a.e.
lim vn =
n→∞ n
Λ
This means that
Z
n
o
1
µ y ∈ Λ : vn (y) →
v dµ = 1.
n
Λ
In words, this says that time averages converge to the space average with probability 1.
(In everyday life, this is called the law of averages whereby bad luck can be shown to
even out in the long run; simply overlook
the factor of n1 .)
R
Now suppose for simplicity that Λ v dµ = 0. Then n1 vn → 0 a.e. and it makes
sense to look at other normalising factors. In the Central Limit Theorem (CLT) one
considers √1n vn 1
This is the first main theorem that we will prove in these lectures.
Theorem 1.1 (Central Limit Theorem) Continue to assume that f is the doublingRmap and that µ is Lebesgue measure. Suppose that v : Λ → R is Lipschitz and
that Λ v dµ = 0. Then
Z 2
1
2
√ vn dµ exists.
(a) The limit σ = lim
n→∞ Λ
n
1
(b) √ vn →d N (0, σ 2 ) as n → ∞. This means that
n
Z b
n
o
1
1 −x2 /2σ2
e
dx
lim µ y ∈ Λ : √ vn (y) < b = P(Y < b) =
2
n→∞
n
−∞ 2πσ
(1.1)
for all b ∈ R, where Y ∼ N (0, σ 2 ) is a normally distributed random variable
with mean zero and variance σ 2 .
(c) Typically (in a very strong sense that will be made precise later) σ 2 > 0.
Remark 1.2 (a) Note that in (1.1) the randomness on the LHS only occurs through
the initial condition y. Once the initial condition is specified, everything is deterministic. The RHS on the other hand is a genuinely random object.
(b)
R 2In the IID (independent, identically distributed) setting, it would suffice that
v dµ < ∞. However, in the current setting, the CLT fails for typical square
Λ
1
If
R
Λ
v dµ 6= 0, then we would consider
√1 (vn
n
−n
2
R
Λ
v dµ).
integrable (or even continuous) v. Some regularity is required (it is not necessary
that v is Lipschitz; H¨older continuity or bounded variation would suffice).
(c) This result belongs to the topic known as smooth ergodic theory where f , µ and v
are required to be well-behaved. In a purely measure-theoretic set up, such a result
is not possible.
1.2
Koopman and transfer operators
Before getting to the proof of Theorem 1.1, we need some background material on
Koopman and transfer operators. In this section, we suppose that f : Λ → Λ is a
general transformation of a probability space (Λ, µ) with µ invariant under f .
Define the Koopman operator
U : L1 (Λ) → L1 (Λ),
Uv = v ◦ f.
Some elementary properties of U are the following:
(U1) U 1 = 1.
R
R
R
R
(U2) Λ Uv dµ = Λ v dµ. In particular, if Λ v dµ = 0, then Λ U v dµ = 0.
(U3) kUvkp = kvkp for all v ∈ Lp (Λ), 1 ≤ p ≤ ∞.
R
R
(U4) Λ Uv Uw dµ = Λ vw dµ, for all v ∈ L1 (Λ), w ∈ L∞ (Λ).
(U1) is immediate and the others follow fromR invariance of the measure. For
example, if p ∈ [1, ∞), then by definition kvkp = ( Λ |v|p dµ)1/p , and
Z
Z
Z
p
p
|U v| dµ =
|v| ◦ f dµ =
|v|p dµ,
Λ
Λ
Λ
proving (U3). The proof of (U4) is left as an exercise.
Remark 1.3 (U3) and (U4) were presented as nice properties, but actually they say
that U does not contract,
so pleasant given that we are interested in the
Pn−1whichjis not
Pn−1
asymptotics of vn = j=0 v ◦ f = j=0 U j v. Moreover, when passing to regular v
(which is necessary by Remark 1.2(b)) the situation is even worse. For example, if
f is the doubling map, and v is differentiable, then |(U v)0 | = 2|v 0 | (where defined).
Iterating, |(U n v)0 | = 2n |v 0 |.
To remedy the situation in Remark 1.3, we pass to the dual or adjoint of U in the
hope that this will have the opposite behaviour to U and hence have good contraction
properties.
3
In the L2 situation, the transfer operator (or Perron-Frobenius operator) P is
defined to be the adjoint of U , so
P = U ∗ : L2 (Λ) → L2 (Λ).
This means that if v ∈ L2 (Λ), then P v is defined to be the unique element of L2 (Λ)
satisfying
Z
Z
P v w dµ =
v U w dµ for all w ∈ L2 (Λ).
Λ
Λ
More generally, P : L1 (Λ) → L1 (Λ) is defined by requiring that
Z
Z
P v w dµ =
v Uw dµ for all w ∈ L∞ (Λ).
Λ
Λ
We have
(P1) P 1 = 1.
R
R
R
R
(P2) Λ P v dµ = Λ v dµ. In particular, if Λ v dµ = 0, then Λ P v dµ = 0.
(P3) kP vkp ≤ kvkp for all v ∈ Lp , 1 ≤ p ≤ ∞.
(P4) P U = I.
R
R
R
R
R Eg for (P1), note that Λ P 1 w dµ = Λ 1 U w dµ = Λ U w dµ = Λ w dµ =
1 w dµ.
Λ
R
R
R For (P3) with p = 1, take w = sgn P v. Then kP vk1 = Λ |P v| dµ = Λ P v w dµ =
v Uw dµ ≤ kvk1 kUwk∞ = kvk1 kwk∞ = kvk1 .
Λ
(P4) is left as an exercise.
1.3
Decay of correlations for the doubling map
First, we write down the explicit formula for the transfer operator for the doubling
map.
y + 1 o
1n y v
+v
.
Proposition 1.4 (P v)(y) =
2
2
2
Proof By change of variables,
Z
Z
Z
P v w dµ =
v U w dµ =
Λ
Λ
1
v(y) w(f y) dy
0
1
2
Z
=
1
Z
v(y) w(2y − 1) dy
v(y) w(2y) dy +
1
2
0
Z
1 1 y + 1
v
w(y) dy +
v
w(y) dy
2
2 0
2
0
y + 1 o
1 n 1
y
v
+v
w(y) dy.
2
2
0 2
1
=
2
Z
=
Z
1
y 4
Recall that Lip v = sup
x6=y
|v(x) − v(y)|
. We can now prove that P contracts Lips|x − y|
chitz constants.
Lemma 1.5 Lip P v ≤ 21 Lip v.
Proof By Proposition 1.4,
(P v)(x) − (P v)(y) =
y o 1 n x + 1 y + 1 o
1n x
v
−v
+
v
−v
2
2
2
2
2
2
so
y 1 x + 1 y + 1 1 x |(P v)(x) − (P v)(y)| ≤ v
−v
−v
+ v
2
2
2
2
2
2
x y 1
x + 1 y + 1
1
≤ Lip v − + Lip v −
2
2 2
2
2
2
x y 1
= Lip v − = Lip v |x − y|,
2 2
2
and the result follows.
Corollary 1.6 |P n v −
R
Λ
v dµ|∞ ≤
1
2n
Lip v.
R
Proof Note that if w : Λ → R is Lipschitz, then |w − Λ w dµ|∞ ≤ Lip w diam Λ. In
our case, diam Λ = 1. Hence using (P2) and Lemma 1.5,
R
R
|P n v − Λ v dµ|∞ = |P n v − Λ P n v dµ|∞ ≤ Lip P n v ≤ 21n Lip v.
Theorem 1.7 (Decay of correlations) For all v : Λ → R Lipschitz, w ∈ L1 (Λ),
n ≥ 1,
Z
Z
Z
1 n
n
Lip v |w|1 ,
v w ◦ f dµ − v dµ w dµ ≤
2
Λ
Λ
Λ
Proof Compute that
Z
Z
Z
Z
Z
Z
n
n
v w ◦ f dµ − v dµ w dµ =
P v w dµ − v dµ w dµ
Λ
Λ
Λ
Λ
Λ
Λ
Z Z
=
P n v − v dµ w dµ.
Λ
5
Λ
Hence,
Z
Z
Z
Z
n
n
v w ◦ f dµ − v dµ w dµ ≤ P v − v dµ |w|1 ,
Λ
Λ
Λ
Λ
∞
and the result follows from Corollary 1.6.
R
R
R
Remark 1.8 (a) Λ v w ◦ f n dµ − Λ v dµ Λ w dµ is called the correlation function for
v and w. In probabilistic notation, it is the same as Cov(v, w ◦ f n ) = E(vw) − EvEw
where E here denotes expectation with respect to µ.
(b) Set v = 1A and w = 1B to be the indicator functions of measurable subsets
A, B ⊂ Λ (so 1A is equal to one on A and zero elsewhere). Then the correlation
function becomes µ(A ∩ f −n B) − µ(A)µ(B). It is a standard fact that this can decay
arbitrarily slowly, so Theorem 1.7 does not extend to general L∞ functions. However
using an approximation argument, it can be shown that Theorem 1.7 implies that
limn→∞ µ(A ∩ f −n B) = µ(A)µ(B). In other words f is mixing.
Since f is mixing, we can prove that f is ergodic as stated on page 1. Indeed,
suppose f E ⊂ E. Then f n E ⊂ E for all n ≥ 1 and so E ∩ f −n (Λ \ E) = ∅. By mixing
it follows that 0 = µ(E ∩ f −n (Λ \ E)) → µ(E)µ(Λ \ E) and it follows that µ(E) = 0
or µ(Λ \ E) = 0.
(c) Let α ∈ (0, 1]. Define
|v|α = sup
x6=y
|v(x) − v(y)|
.
|x − y|α
A function v : Λ → R is α-H¨older if |v|α < ∞. In particular, 1-H¨older is the same as
Lipschitz. It is easy to check that everything proved so far for Lipschitz observables
goes over to the H¨older case (with different constants).
1.4
A more general framework
Theorem 1.7 establishes exponential decay of correlations for the doubling map. The
statement and proof are somewhat misleading. Suppose that the definition of f is
changed slightly as follows. We still take Λ = [0, 1] and f : Λ → Λ has two branches
mapping [0, 21 ] and [ 12 , 1] monotonically onto [0, 1]. Suppose moreover that f is C 2 on
(0, 12 ) and ( 12 , 1) and extends continuously to [0, 12 ] and [ 12 , 1]. It is easy to obtain a
formula for P analogous to the one in Proposition 1.4 but the proof of Lemma 1.5
breaks down. Nevertheless, using various functional-analytic tools (which will not be
discussed further here) it is possible to prove a result like Theorem 1.7 with the RHS
replaced by
Cγ n kvkLip |w|1 ,
where C > 0, γ ∈ (0, 1) are constants and kvkLip = |v|∞ + Lip v. Obtaining useful
estimates on C and γ turns out to be an intractable problem, but this will not matter
for the purposes of these lectures.
6
Let f : Λ → Λ be a transformation of a probability space (Λ, µ) where µ is assumed
to be f -invariant (ie µ(f −1 E = E)) and ergodic (ie f E ⊂
R E implies that µ(E) = 0
or 1). We suppose that v : Λ → R lies in L∞ (Λ) and Λ v dµ = 0. Moreover, we
require that for this fixed observable v, there are constants C > 0 and β > 1 such
that
Z
C
n
(1.2)
v w ◦ f dµ ≤ β |w|1 for all w ∈ L1 (Λ) and all n ≥ 1.
n
Λ
Remark 1.9 This is an assumption on the dynamics f , the measure µ and the
observable v. Again, such a condition cannot hold for general v ∈ L∞ (Λ), so this is
still smooth ergodic theory even though smoothness is not mentioned anywhere.2
Proposition 1.10 (Martingale-coboundary decomposition) Assume that v
satisfies (1.2). Then there exist m, χ ∈ L∞ (Λ) such that
Z
m dµ = 0, m ∈ ker P.
(1.3)
v = m + χ ◦ f − χ,
Λ
Proof It follows by duality (ie by making judicious choices of w ∈ L1 (Λ)) that
|P n v|∞ ≤ Cn−β . Since β > 1, it follows that |P n v|∞ is summable, and hence following [8] we can define L∞ functions m, χ : Λ → R by setting
χ=
∞
X
P n v,
m = v − χ ◦ f + χ = v − U χ + χ.
n=1
R
R
R
R
R
R
RNotice that Λ m dµ = 0 (since Λ m dµ = Λ v − Λ χ ◦ f dµ + Λ χ dµ = − Λ χ dµ +
χ dµ = 0).
Λ
By (P4),
Pm = Pv − PUχ + Pχ = Pv − χ + Pχ = Pv −
∞
X
n=1
P nv +
∞
X
P n v = 0.
n=2
In the remainder of this subsection, we explain why this decomposition might be
useful. The key points are that (i) the CLT for v is equivalent to the CLT for m,
and (ii) the sequence {m ◦ f j } has certain orthogonality properties that {v ◦ f j } does
not.
P
Pn−1
j
j
For point (i), recall that vn = n−1
j=0 v ◦ f . Similarly, define mn =
j=0 m ◦ f .
Hence vn = mn +χ◦f n −χ. Moreover |χ◦f n −χ|∞ ≤ 2|χ|∞ so n−1/2 vn −n−1/2 mn → 0
a.e. Hence we have:
2
By now it should be clear that “smooth” does not necessarily mean differentiable. Assumptions
such as Lipschitz or H¨
older count as smooth.
7
Corollary 1.11 Suppose that Y is a random variable. Then n−1/2 vn →d Y if and
only if n−1/2 mn →d Y .
Remark 1.12 The conclusion of Corollary 1.11 remains true if χ ∈ L1 (Λ), see the
exercises.
Concerning point (ii), we have the following orthogonality property.
R
Corollary 1.13 The sequence {m ◦ f j : j ≥ 0} is orthogonal: Λ m ◦ f j m ◦ f k dµ = 0
for all 0 ≤ j < k. More generally,
Z
m ◦ f j1 m ◦ f j2 · · · m ◦ f jr dµ = 0,
Λ
for all 0 ≤ j1 < · · · < jr , r ≥ 1.
R
Proof The case r = 1 is immediate by invariance of µ since Λ m dµ = 0. For r ≥ 2,
Z
Z
j1
j2
jr
m ◦ f m ◦ f · · · m ◦ f dµ = {m m ◦ f j2 −j1 · · · m ◦ f jr −j1 } ◦ f j1 dµ
Λ
ZΛ
m m ◦ f j2 −j1 · · · m ◦ f jr −j1 dµ
=
ZΛ
=
m {m ◦ f j2 −j1 −1 · · · m ◦ f jr −j1 −1 } ◦ f dµ
ZΛ
P m m ◦ f j2 −j1 −1 · · · m ◦ f jr −j1 −1 dµ = 0
=
Λ
by Proposition 1.10.
The next result proves Theorem 1.1(a), and in the process identifies σ 2 as
Z
Z
1
2
Proposition 1.14 lim
vn dµ =
m2 dµ.
n→∞ n Λ
Λ
R
Λ
m2 dµ.
R
Pn−1 R
Proof It follows from Corollary 1.13 that Λ m2n dµ = j=0
(m ◦ f j )2 dµ (since
Λ
R
R
the cross-terms vanish). Hence n−1 Λ m2n dµ = Λ m2 dµ.
Since vn = mn + χ ◦ f n − χ,
|vn |2 − |mn |2 ≤ |vn − mn |2 = |χ ◦ f n − χ|2 ≤ |χ ◦ f n |2 + |χ|2 = 2|χ|2 .
Z
1
−1/2
−1/2
vn2 dµ =
Hence n
|vn |2 − n
|mn |2 → 0.
Taking squares, lim
n→∞ n Λ
Z
Z
1
2
lim
mn dµ =
m2 dµ.
n→∞ n Λ
Λ
8
Lemma 1.15 (Bounded convergence theorem)
R Suppose R that fn
n ≥ 1. If |fn |∞ is bounded and fn → f a.e., then Λ fn dµ → Λ f dµ.
∈
L∞ (Λ),
Lemma 1.16 (L´
evy continuity theorem) Suppose that Yn , n = 1, 2, . . . and Y
are random variables with values in Rk . Then Yn →d Y in Rd if and only if
limn→∞ E(eit·Yn ) = E(eit·Y ) for all t ∈ Rk .
Proof of Theorem 1.1(b)
By Corollary 1.11, it suffices to show that n−1/2 mn →d
R
2
2
2
N (0, σ ) where σ = Λ m dµ. By the L´evy continuity theorem, it suffices to show
R
1 2 2
−1/2
that limn→∞ Λ eitn mn dµ = e− 2 t σ for each fixed t ∈ R.
We follow McLeish [13]. Starting from log(1 + ix) = ix + 12 x2 + O(|x|3 ), we obtain
eix = (1 + ix) exp{− 21 x2 + O(|x|3 )}. Hence,
−1/2
exp{itn
mn } =
n−1
Y
exp{itn−1/2 m ◦ f j } = Tn e−Un ,
j=0
where
Tn =
n−1
Y
(1 + itn−1/2 m ◦ f j ),
j=0
Un =
n−1
n−1
1 X
X
3
j
1 21
m ◦ f = 2t
m ◦ f + O 3/2
m2 ◦ f j + O(n−1/2 ).
n j=0
n j=0
n j=0
1 21
t
2
n−1
X
2
j
Now Tn = 1 + · · · where the remaining 2nR− 1 terms are of the form in Corollary 1.13. It follows from Corollary 1.13 that Λ Tn dµ = 1. Also, by the pointwise
ergodic theorem,
Z
Un → 21 t2
Λ
m2 dµ = 12 t2 σ 2
a.e.
Hence
Z
Z
Z
1 2 2
itn−1/2 mn
− 21 t2 σ 2
−Un
− 12 t2 σ 2
e
dµ − e
=
Tn e
dµ − e
=
Tn (e−Un − e− 2 t σ ) dµ,
Λ
Λ
Λ
where the final integrand converges to 0 a.e. We claim that the integrand is uniformly
bounded. The result then follows from the bounded convergence theorem.
Now,
n−1
1
Y
−1 2
j 2 2
.
|Tn | =
1 + n t |m ◦ f |
j=0
−1/2 m
In particular, |Tn | ≥ 1. Since e−Un = eitn
Finally, |Tn |∞ ≤ dn where
dn =
n−1
Y
1 + n−1 t2 |m|2∞
21
n
/Tn , it follows that |e−Un |∞ ≤ 1.
n2
= 1 + n−1 t2 |m|2∞
→ exp 21 t2 |m|2∞ .
j=0
9
In particular, dn is bounded, so Tn is uniformly bounded.
Corollary 1.17 σ 2 = 0 if and only if v = h ◦ f − h for some h ∈ L∞ (Λ).
R
Proof Since Λ m2 = σ 2 , it follows that if σ 2 = 0, then m = 0 a.e. and so the
decomposition (1.3) gives v = h ◦ f − h with h = χ. Conversely, if v = h ◦ f − h, then
vn = h ◦ f n − h so that |vn |∞ ≤ 2|h|∞ is bounded and hence σ 2 = 0.
Remark 1.18 In the
P casen of the doubling map, it follows from Lemma 1.5 and
Corollary 1.6 that ∞
n=1 P v converges in the space of Lipschitz functions, so χ is
Lipschitz. Hence in the situation of Corollary 1.17, it is possible
Pp−1 toj evaluate pχ at
p
specific points. Suppose that f y = y for some y ∈ Λ. Then j=0 v(f y) = χ(f y) −
χ(y) = 0. Hence degeneracy in the CLT (σ 2 = 0) imposes infinitely many degeneracy
conditions on the observable v. This completes the proof of Theorem 1.1(c).
In the generality of Subsection 1.4, we have only that χ ∈ L∞ (Λ), so χ(y) is not
defined for specific points y. However it is often possible to show that if v is regular,
and v = h ◦ f − h for some measurable h, then h is also regular. Such a result is called
a Livˇsic regularity theorem. In this case, periodic points again place constraints on v.
Remark 1.19 The results in this lecture relied on the assumption that |P n v|∞ is
summable. This suffices for the uniformly expanding setting, and results for large
classes of nonuniformly expanding systems can be reduced to this situation by inducing. As indicated in later lectures, there then exist standard ways to pass to invertible
maps and flows.
other hand, there are numerous ways to relax the assumption
P∞On the
n
|P v|∞ < ∞.
In particular, Tyran-Kami´
nska [19] used ideas of [12] to show
n=1P
Pn−1
k
−3/2
n
|
P
v|
<
∞
is
a
sufficient
condition for the CLT.
that ∞
2
n=1
k=0
Remark 1.20 Suppose that f : Λ → Λ is a transformation of a measure space Λ with
ergodic invariant probability measure µ. Suppose that µ0 is a probability measure on
Λ (not necessarily invariant) and that µ0 is absolutely
with respect to µ.
Pn−1continuous
j
Let v : Λ → R be measurable with vn =
j=0 v ◦ f . By [7] (see also [21,
−1/2
Corollary 1]), n
vn →d Y on (Λ, µ) if and only if n−1/2 vn →d Y on (Λ, µ0 ).
In particular, we have strong distributional convergence in Theorem 1.1(b), namely
that n−1/2 vn →d N (0, σ 2 ) on (Λ, µ0 ) for any probability measure µ0 absolutely continuous with respect to µ.
10
2
Lecture 2: Multidimensional limit laws
So far, we considered convergence in distribution for R-valued random variables. Now
we consider random vectors with values in Rd and random elements with values in
the Banach space C([0, 1], Rd ).
We assume throughout that f : Λ → Λ is a transformation and that µ is an
ergodic, invariant probability measure.
2.1
Multidimensional CLT
Consider vector-valued observables v : Λ → Rd lying inRL∞ (Λ, Rd ), so v = (v 1 , . . . , v d )
where each v i is in L∞ (Λ). As usual, we assume that Λ v dµ = 0.
Moreover, we suppose that each component v i satisfies condition (1.2), so there
are constants C > 0, β > 1 such that
Z
C
i
n
v w ◦ f dµ ≤ β |w|1 for all w ∈ L1 (Λ), n ≥ 1, i = 1, . . . , d.
n
Λ
The transfer operator P acts naturally on vector valuedP
observables with P v =
∞
n
1
d
in
(P v , . . . , P v ). Working coordinatewise, it follows that
n=1 P v converges
R
L∞ (Λ, Rd ) and hence that v = m + χ ◦ f − χ where m, χ ∈ L∞ (Λ, Rd ), Λ m dµ = 0,
and m ∈ ker P .
R
Define the covariance matrix Σ = Λ m mT dµ ∈ Rd×d . Note that ΣT = Σ.
Z
1
Proposition 2.1 lim
vn vnT dµ = Σ.
n→∞ n Λ
Z
Z
1
i j
Proof In coordinates, this reduces to showing that lim
vn vn dµ =
mi mj dµ.
n→∞ n Λ
Λ
(Notice that (vn )i = (v i )n , which is hence abbreviated to vni .)
The diagonal terms i = j can be treated as in the one-dimensional case, and
the off-diagonal terms can be treated using polarization. The details are left to the
exercises.
The following elementary idea reduces convergence in distribution for random
vectors in Rd to the 1-dimensional setting.
Proposition 2.2 (Cramer-Wold device) Suppose that Yn , n = 1, 2, . . . and Y are
random vectors in Rd . Then Yn →d Y in Rd if and only if c · Yn →d c · Y in R for all
c ∈ Rd .
Proof This is left as an exercise.
11
1
Theorem 2.3 (d-dimensional CLT) √ vn →d N (0, Σ). That is,
n
n
o Z
1
1
lim µ x ∈ Λ : √ vn (x) ∈ I =
exp − 21 y T Σ−1 y dy,
d/2
1/2
n→∞
(det Σ)
n
I (2π)
for all I = (a1 , b1 ) × · · · × (ad , bd ) ⊂ Rd .
Moreover, det Σ = 0 if and only if there exists c ∈ Rd nonzero such that
cT v = h ◦ f − h for some h ∈ L∞ (Λ).
Proof We apply Proposition 2.2 with Yn = n−1/2 mn and Y ∼ N (0, Σ).
It is elementary that AY ∼ N (0, AΣAT ) for all A ∈ Re×d . In particular, if c ∈ Rd ,
1 2 2
then c · Y = cT Y ∼ N (0, σc2 ) where σc2 = cT Σc. Hence E(eit(c·Y ) ) = e− 2 t σc .
On the other hand, Yn = n−1/2 vn and c · Yn = n−1/2 (c · v)n where (c · v)n is the
one-dimensional Birkhoff sum starting from the observable c · v : Λ → R. Note that
c · v = c · m + h ◦ f − h where h = c · χ. It follows from Theorem 1.1(b) that
n−1/2 (c · v)n → N (0, σc2 ),
R
where σc2 = Λ (c · m)2 dµ.
To prove the CLT, it remains to verify that σc2 = cT Σc for all c. But
Z
Z
Z
T
T
T
c Σc = c
mm dµ c = (c · m)(m · c) dµ = (c · m)2 dµ = σc2
Λ
Λ
Λ
as required.
For nondegeneracy,R note that det Σ = 0 if and only if there exists c 6= 0 such that
Σc = 0. But cT Σc = Λ (c · m)2 dµ. Hence Σc = 0 if and only if c · m = 0 in which
case c · v = h ◦ f − h with h = c · χ.
2.2
Statement of the weak invariance principle (WIP)
To begin with, we revert to the case of real-valued observables v : Λ → R. Assume
the framework of Subsection 1.4. By Theorem 1.1, n−1/2 vn →d N (0, σ 2 ). Define a
sequence of continuous time processes Wn : [0, 1] → R as follows. Set
1
Wn (t) = √ vnt
n
for t =
j
, j = 0, 1, . . . , n.
n
Then linearly interpolate to obtain a continuous function Wn ∈ C[0, 1]. (So on the
interval [ j−1
, nj ], the graph of Wn (t) is a straight line joining the points ( j−1
, √1n vj−1 )
n
n
and ( nj , √1n vj ).)
12
Note that Wn (t) = √1n v[nt] + O( √1n ) uniformly (the error is at most √1n |v|∞ ). Also
for each t ≥ 0, it holds that Wn (t) →d t1/2 N (0, σ 2 ) =d N (0, σ 2 t).
The function Wn is a random element in the Banach space C[0, 1] with the supnorm. (A random element is like a random variable but with values in an infinitedimensional Banach space.)
The next result is called the WIP (also known as the Functional CLT).
Theorem 2.4 (WIP) Let W be Brownian motion with variance σ 2 , where σ 2 is as
in Theorem 1.1. Then Wn →w W in C[0, 1].
Remark 2.5 (a) Recall that a Brownian motion with variance σ 2 is a continuous
time stochastic process {W (t); t ≥ 0} with W (0) = 0 such that
• independent increments The increments W (t1 ) − W (t0 ), . . . , W (tk ) − W (tk−1 )
are independent for all 0 = t0 ≤ t1 ≤ · · · ≤ tk .
• stationarity W (t) − W (s) =d W (t − s) for all t ≥ s ≥ 0.
• normality W (t) =d N (0, σ 2 t) for all t ≥ 0.
• continuous sample paths P(t 7→ W (t) is continuous) = 1.
The last condition implies that we can regard W as a random element in C[0, 1].
(b) More concisely, W is the unique random element in C[0, 1] such that for all
0 = t0 ≤ t1 ≤ · · · ≤ tk ≤ 1, k ≥ 1,
W (t1 ) − W (t0 ), . . . , W (tk ) − W (tk−1 ) ∼d N (0, Σ),
where Σ = σ 2 diag{t1 − t0 , . . . , tk − tk−1 }.
Note that it is not a priori clear that such a process exists or is unique. The first
mathematical proof was by Wiener (1923). Continuity of sample paths can be viewed
as a property of Brownian motion, but actually such a hypothesis is required in order
to obtain uniqueness.
(c) Weak convergence of random elements Wn →w W is analogous to convergence in distribution but now in the infinite-dimensional space C[0, 1]. That is,
limn→∞ µ(Wn ∈ I) = P(W ∈ I) for all “nice” subsets of C[0, 1]. Here “nice” means
that I is an open set with P(∂I) = 0.
Another characterisation of weak convergence is the continuous mapping theorem.
This was the original formulation of the WIP by Donsker, 1951.
Theorem 2.6 (Continuous Mapping Theorem) Suppose that Yn , Y are random
elements in C[0, 1]. The following are equivalent:
(a) Yn →w Y in C[0, 1].
13
(b) χ(Yn ) →d χ(Y ) in R for all continuous maps χ : C[0, 1] → R.
Example 2.7 (a) Taking χ(g) = g(1) in Theorem 2.6 yields the CLT since χ(Wn ) =
Wn (1) = n−1/2 vn and χ(W ) = W (1) =d N (0, σ 2 ). Hence the WIP implies the CLT
and much more. This explains the terminology Functional CLT.
(b) Theorem 2.6 holds also with R replaced by Rk for all k. (See the exercises.) Let
t1 , . . . , tk ∈ [0, 1] and define χ : C([0, 1] → Rk by setting χ(g) = (g(t1 ), . . . , g(tk )).
Then χ is continuous and it follows that if Yn →w Y , then (Yn (t1 ), . . . Yn (tk )) →d
(Y (t1 ), . . . , Y (tk )). This is called convergence of finite-dimensional distributions.
2.3
Prokhorov’s Theorem
Consider the sequence of random variables {Yn } in R where Yn = n (ie P(Yn = n) = 1)
for each n. It is clear that Yn does not converge in distribution. The notion of tightness
rules out this kind of example.
Definition 2.8 A family A of real-valued random variables is tight if for any > 0,
there exists L > 0 such that P(|Y | ≤ L) > 1 − for all Y ∈ A.
Tightness is a necessary condition for convergence in distribution.
Proposition 2.9 (a) Any real-valued random variable is tight (ie a family consisting
of one random variable is tight).
(b) If Yn →d Y in R, then {Yn } is tight.
Proof See the exercises.
The converse of part (b) is not true. For example, the sequence Yn = (−1)n
is tight and does not converge in distribution, though it has plenty of convergent
subsequences. We state without proof:
Theorem 2.10 Suppose that {Yn } is a sequence of real-valued random variables.
If {Yn } is tight then there exists a subsequence {Ynk : k ≥ 1} that converges in
distribution.
A standard method for showing that Yn →d Y in R is to (i) prove tightness
and (ii) show that if a subsequence Ynk converges in distribution then the limit must
be Y . The fact that this suffices is proved as follows. Suppose for contradiction that
Yn 6→d Y . Then there exists b > 0 such that P(Yn < b) 6→ P(Y < b). Hence there is
a subsequence Zk = Ynk such that P(Zk < b) is bounded away from P(Y < b). But
tightness of {Yn } immediately implies tightness of {Zk }, so there is a subsequence
{Zk` } that converges in distribution. Note that Zk` is also a subsequence of {Yn } so
by (ii), Zk` →d Y . In particular P(Zk` < b) → P(Y < b) which contradicts the fact
that P(Zk < b) is bounded away from P(Y < b).
14
Tightness in Rk is very similar. A sequence {Yn } of Rk -valued random variables is
tight if there exists L > 0 such that P(Yn ∈ [−L, L]×· · ·×[−L, L]) > 1− for all n ≥ 1.
Equivalently, there exists a compact subset K ⊂ Rk such that P(Yn ∈ K) > 1 − .
The analogues of Proposition 2.9 and Theorem 2.10 remain true.
Definition 2.11 A sequence of random elements {Yn } in C[0, 1] is tight if for any
> 0 there is a compact subset K ⊂ C[0, 1] such that P(Yn ∈ K) > 1 − for all
n ≥ 1.
Prokhorov’s Theorem implies the analogues of Proposition 2.9 and Theorem 2.10.
Theorem 2.12 (Prokhorov) Let Yn , Y ∈ C[0, 1] be random elements.
Yn →w Y if and only if
Then
(i) (Convergence of finite-dimensional distributions)
(Yn (t1 ), . . . , Yn (tk )) →d (Y (t1 ), . . . , Y (tk )) as n → ∞,
for all t1 , . . . , tk ∈ [0, 1], k ≥ 1.
(ii) (Tightness) The sequence {Yn } is tight: for any > 0 there is a compact set
K ⊂ C[0, 1] such that Pn (Yn ∈ K) > 1 − for all n ≥ 1.
Using tightness in C[0, 1] is much harder than in Rk since the characterisation
of compact subsets is more complicated. Recall by the Arzel`a-Ascoli Theorem that
K ⊂ C[0, 1] is compact if and only if K is closed, bounded (in the sup-norm) and
equicontinuous: for any > 0, there exists δ > 0 such that |g(s) − g(t)| < for all
s, t ∈ [0, 1] with |s − t| < δ and all g ∈ K. (So this is uniform continuity with the
additional property that the same δ works for all g ∈ K.)
One class of compact subsets of C[0, 1] is the following. Let γ ∈ (0, 1], R > 0.
Define
o
n
|g(t) − g(s)|
≤
R
.
BRγ = g ∈ C[0, 1] : g(0) = 0 and sup
|t − s|γ
s6=t
It follows easily from the Arzel`a-Ascoli theorem that BRγ is compact in C[0, 1].
2.4
Proof of the WIP
To prove Theorem 2.4, it suffices to verify the two conditions in Theorem 2.12. As
usual, it suffices to prove that Mn →w W where Mn ∈ C[0, 1] is defined in the same
way as Wn but starting from m instead of v. In particular, Mn (t) = n−1/2 m[nt] +
O(n−1/2 ).
First we verify convergence of finite-dimensional distributions. By Cramer-Wold,
it suffices to prove that
exp{itc · (Mn (t1 ), . . . , Mn (tk ))} → exp{itc · (W (t1 ), . . . , W (tk ))},
15
for all t ∈ R, c ∈ Rk . Equivalently, setting t0 = 0, it suffices to prove that eitYn → eitY
for all t ∈ R, c ∈ Rk , where
Yn = c · (Mn (t1 ) − Mn (t0 ), . . . , Mn (tk ) − Mn (tk−1 )),
Y = c · (W (t1 ) − W (t0 ), . . . , W (tk ) − W (tk−1 )).
Now Y = cT N (0, Σ) = N (0, σc2 ), where
Σ = σ 2 diag{t1 − t0 , . . . , tk − tk−1 },
σc2 = cT Σc = σ 2 (c21 (t1 − t0 ) + · · · + c2k (tk − tk−1 )).
Also,
o
1 n
Yn = √ c1 (m[nt1 ] − m[nt0 ] ) + · · · + ck (mntk − m[ntk−1 ] ) + O(n−1/2 )
n
k
1 X
=√
c`
n `=1
[nt` ]−1
X
j
m ◦ f + O(n
−1/2
j=[nt`−1 ]
[ntk ]−1
1 X
)= √
dj,n m ◦ f j + O(n−1/2 ),
n j=0
where dj,n ∈ {c1 , . . . , ck }.
Now we follow the proof of Theorem 1.1(b). Write eitYn = Tn e−Un where
[ntk ]−1
Tn =
Y
(1 + itn−1/2 dj,n m ◦ f j ),
j=0
[ntk ]−1
1 21 X 2 2
Un = t
d m ◦ f j + O(n−1/2 ).
2 n j=0 j,n
Then
R
Λ
Tn dµ = 1 and |Tn |∞ is bounded as before. Also |e−Un |∞ ≤ 1 and
k
1 1X 2
Un = t2
c
2 n `=1 `
[nt` ]−1
X
m2 ◦ f j + O(n−1/2 ).
j=[nt`−1 ]
Note that
1
n
[nt` ]−1
X
j=[nt`−1 ]
[nt`−1 ]−1
[nt` ]−1
1 X 2
1 X
j
m ◦f =
m ◦f −
m2 ◦ f j
n j=0
n j=0
2
j
[nt` ]−1
1 X 2
1
= t`
m ◦ f j − t`−1
nt` j=0
nt`−1
[nt`−1 ]−1
X
m2 ◦ f j
j=0
2
→ (t` − t`−1 )σ a.e.
by the ergodic theorem. Hence
1
1
Un → t2 (c21 (t1 − t0 ) + · · · + c2k (tk − tk−1 ))σ 2 = t2 σc2 a.e.
2
2
16
R
1 2 2
and it follows as in the proof of Theorem 1.1 that Λ Tn e−Un dµ → e− 2 t σc as required.
Next we check tightness. A useful criterion for tightness of Mn is given by [1,
Lemma, p. 88].
Proposition 2.13 Suppose that
√
lim lim sup λ2 µ(max |mk | ≥ λ n) = 0.
λ→0
n→∞
k≤n
Then {Mn } is tight.3
Corollary 2.14 The sequence {Mn } is tight in C[0, 1].
Proof Let p > 2. By Corollary 3.12 (which is postponed
until after the definition
R
of martingale), there is a constant C > 0 such that Λ maxj≤n |mj |p dµ ≤ Cnp/2 for
all n ≥ 1. By Markov’s inequality4 ,
Z
1/2
p
p p/2
µ(max |mj | ≥ λn ) = µ(max |mj | ≥ λ n ) ≤
max |mj |p dµ/(λp np/2 ) ≤ C/λp .
j≤n
j≤n
Λ j≤n
Hence lim supn→∞ µ(maxj≤n |mj | ≥ λn1/2 ) ≤ C/λp and
λ2 lim sup µ(max |mj | ≥ λn1/2 ) ≤ C/λp−2 → 0
n→∞
j≤n
as λ → ∞. The result follows from Proposition 2.13.
2.5
Multidimensional WIP
Obviously we can now generalise to vector-valued observables v : Λ → Rd as in Subsection 2.1. The sequence of random elements
Wn ∈ C([0, 1], Rd ) is defined exactly
R
as in the one-dimensional case. Let Σ = Λ m mT dµ (corresponding to the decomposition v = m + χ ◦ f − χ). Define W ∈ C([0, 1], Rd ) to be d-dimensional Brownian
motion with covariance Σ (the definition is the same in the 1-dimensional case, except
that N (0, σ 2 t) is replaced by N (0, Σt)).
To prove the multi-dimensional WIP, it again suffices to check convergence of
finite-dimensional distributions (which follows from Cramer-Wold; of course the formulas are even messier) and tightness (which follows componentwise from the 1dimensional result).
Remark 2.15 As in Remark 1.20, we have strong distributional convergence in the
limit laws in this section, so they hold for all probability measures µ0 absolutely
continuous with respect to µ. Moreover, it again suffices that the limit law holds for
at least one µ0 . For the WIP, this is [21, Corollary 3].
3
The result in [1] is for processes arising from stationary sequences, not necessarily martingales.
A sequence {v ◦ f n , n ≥ 0} is stationary precisely when f is measure preserving.
4
Recall that Markov’s inequality states that if X is a nonnegative random variable, then P(X >
λ) ≤ λ−1 EX for all λ > 0.
17
3
Lecture 3: Extensions
3.1
Martingales; the CLT and WIP revisited
Definition 3.1 A sequence of L1 random variables {Sn ; n ≥ 1} is a martingale if
E(Sn+1 |S1 , . . . , Sn ) = E(Sn ) for all n ≥ 1.
More generally, the sequence Sn is a martingale with respect to an increasing
sequence of σ-algebras F1 ⊂ F2 ⊂ . . . (called a filtration) if for all n ≥ 1
(i) Sn is Fn -measurable.
(ii) E(Sn+1 |Fn ) = Sn .
Remark 3.2 The meaning of E(Sn+1 |S1 , . . . , Sn ) = E(Sn ) is familiar from elementary probability. Conditional expectation operators of the type E(Sn+1 |Fn ) are defined as follows. Let (Λ, M, µ) be the underlying probability space and let A ⊂ M
be a σ-algebra. If Y ∈ L1 (Λ), then the conditional expectation E(Y |A) is defined to
be the unique A-measurable function satisfying the relation
Z
Z
E(Y |A) dµ =
Y dµ, for all A ∈ A.
A
A
It can be shown that E(Z|Y1 , . . . , Yn ) = E(Y |A) where A is the σ-algebra generated
by Y1 , . . . , Yn . Hence E(Y |A) generalises the elementary definition of conditional
expectation.
Proposition 3.3
(a) If Y ∈ L1 (Λ) is A-measurable, then E(Y |A) = Y .
(b) Suppose that (Λ1 , M1 , µ1 ), (Λ2 , M2 , µ2 ) are probability spaces and π : Λ1 → Λ2
is measure-preserving. Suppose that Y ∈ L1 (Λ2 ) and that A ⊂ M2 is a σalgebra. Then E(Y ◦ π|π −1 A) = E(Y |A) ◦ π.
Proof Part (a) is immediate from the definitions. For part (b), it is required to
show that the random variable E(Y |A) ◦ π satisfies the required properties on Λ1 .
Certainly it is π −1 A-measurable since E(Y |A) is measurable, and for A ∈ A
Z
Z
Z
Z
E(Y |A) ◦ π dµ1 =
E(Y |A) dµ2 =
Y dµ2 =
Y ◦ π dµ1 .
π −1 A
A
A
π −1 A
Proposition 3.4
PnSuppose that {Fn ; n ≥ 1} is an increasing sequence of σ-algebras
and that Sn = k=1 dk . Then {Sn ; n ≥ 1} is a martingale wrt {Fn ; n ≥ 1} if and
only for all k ≥ 1, (i) E|dk | < ∞, (ii) dk is Fk -measurable, (iii) E(dk+1 |Fk ) = 0.
18
Proof This is an easy exercise.
We now state without proof the martingale CLT/WIP.
Theorem 3.5 (Brown [3]) Let f : Λ → Λ be an ergodic measure-preserving
transPn−1
2
j
formation, and let Y ∈ L (Λ) with EY = 0.. Suppose that Sn =
j=0 Y ◦ f is
a martingale. Then the CLT and WIP are valid. That is n−1/2 Sn →d N (0, σ 2 )
where σ 2 = EY 2 , and if we define Qn ∈ C[0, 1], Qn (t) = n−1/2 S[nt] + O(n−1/2 ), then
Qn →w W where W is Brownian motion with variance σ 2 .
We referred to (1.3) as a martingale-coboundary decomposition, suggesting that
m ∈ ker P means that m is a martingale in some sense. The next lemma points
towards this – though time goes in the wrong direction!
Lemma 3.6 Let (Λ, B, µ) be the underlying probability space.
(a) B ⊃ f −1 B ⊃ f −2 B ⊃ . . . is a decreasing sequence of σ-algebras, and m ◦ f n is
f −n B-measurable for all m ∈ L1 (Λ), n ≥ 0.
(b) U P = E( · |f −1 B).
Proof Part (a) is obvious. Part (b) is left as an exercise.
If f : Λ → Λ were invertible, then we could define the
(increasing) filtration
P−1
j
n
−
=
Fn = f B and consider the backward Birkhoff sums mn
j=−n m ◦ f . To make
sense of the backward sums we pass to the natural extension. We state without proof:
Proposition 3.7 Suppose that f : Λ → Λ is a surjective transformation and that
µ is an f -invariant probability measure. There exists an invertible transformation
˜ →Λ
˜ with an f˜-invariant probability measure µ
f˜ : Λ
˜, as well as a measure-preserving
˜
˜
projection π : Λ → Λ such that π ◦ f = f ◦ π. If µ is ergodic, then the construction
can be chosen so that µ
˜ is ergodic.
Remark 3.8 The natural extension is a minimal extension of the type in Proposition 3.7 and is unique up to isomorphism. These facts are not required here.
Let F0 = π −1 B
5
and define Fn = f˜n F0 . Notice that
F−1 = f˜−1 F0 = f˜−1 π
˜ −1 B = π −1 f −1 B.
Since f −1 B ⊂ B it follows that F−1 ⊂ F0 . Hence Fn ⊂ Fn+1 for all n ∈ Z. In
particular, the sequence of σ-algebras {Fn , n ≥ 1} defines a filtration.
5
˜
F0 is a proper σ-subalgebra of the underlying σ-algebra on Λ.
19
R
Next, let m : Λ → R, m ∈ L2 (Λ), Λ m dµ = 0. Define the lifted observable
˜ → R and the forward and backward Birkhoff sums
m
˜ =m◦π :Λ
m
˜n =
n−1
X
m
˜ ◦ f˜j ,
m
˜−
n
j=0
=
−1
X
m
˜ ◦ f˜j .
j=−n
Since
π is measure-preserving, it is immediate that
R
2
m dµ.
Λ
R
˜
Λ
m
˜ d˜
µ = 0 and
R
˜
Λ
m
˜ 2 d˜
µ =
Proposition 3.9 If m ∈ ker P , then m
˜−
n is a martingale.
Proof This is left as an exercise.
Proof of the CLT: martingale proof It follows from
PropoR Theorem
R3.5 and
2
2
2
2
−1/2 −
˜ d˜
µ = Λ m dµ. But
sition 3.9 that n
m
˜ m →d N (0, σ ) where σ = Λ˜ m
2
−n
−1/2
−
˜
˜ n ◦ f =d m
˜ n so n
mn =d n−1/2 m
˜−
m
˜ n = mn ◦ π =d mn and m
˜n = m
n →d N (0, σ ).
−1/2
2
Hence n
vn →d N (0, σ ).
Proof of the WIP: martingale proof Define Wn , Mn in C[0, 1] as in Lecture 2.
˜ µ
These are defined on the probability space (Λ, µ). On (Λ,
˜), we define the corre−
f
f
sponding forward and backward processes Mn , Mn in C[0, 1] such that
fn (t) = n−1/2 m
M
˜ [nt] + O(n−1/2 ),
−1/2
fn− (t) = n−1/2 m
M
˜−
).
[nt] + O(n
f− →w W .
It follows from Theorem 3.5 and Proposition 3.9 that M
n
We claim that there is a continuous map χ : C[0, 1] → C[0, 1] such that
fn ◦ f˜−n = χ(M
fn− ),
M
χ(W ) =d W
(3.1)
6
By the infinite-dimensional version of the continuous mapping theorem [1, p. 26],
fn− ) →w χ(W ) =d W . Equation (3.1) states that M
fn =d χ(M
fn− ), so M
fn →w W .
χ(M
fn = Mn ◦ π =d Mn , so Mn →w W and hence Wn →w W .
Moreover M
It remains to verify the claim. Consider the continuous function χ : C[0, 1] →
C[0, 1] given by χ(g)(t) = g(1) − g(1 − t). A calculation (in the exercises) shows that
fn ◦ f˜−n = χ(M
f− ), and it follows easily from the definition of Brownian motion that
M
n
χ(W ) = W .
Remark 3.10 (a) For the CLT and WIP, it suffices that the martingale-coboundary
decomposition holds in L2 (ie m, χ ∈ L2 (Λ)). Indeed the argument above shows that
Mn →w M requires only that m lies in L2 . Moreover, supt∈[0,1] |Wn (t) − Mn (t)| → 0
a.e. provided that χ lies in L2 . The last statement is left as an exercise.
(b) The multi-dimensional versions of the CLT and WIP again follow from the
Cramer-Wold device.
6
Here =d signifies equality in distribution as elements of C[0, 1].
20
Another standard result about martingales is the following:
Theorem
3.11 (Burkholder’s inequality [4]) Let p ≥ 2. Suppose that Sn =
Pn
d
is
a martingale with dk ∈ Lp for k ≥ 1. There is a universal constant
k=1 k
Cp such that | max1≤j≤n |Sj | |p ≤ Cp n1/2 maxk≤n |dk |p for all n ≥ 1.
Corollary 3.12 Let p > 2. There exists C > 0 such that | maxj≤n |mj ||p ≤ Cn1/2
for all n ≥ 1.
Proof This is left as an exercise.
3.2
Invertible systems
First of all, a negative result:
Proposition 3.13 Suppose that f : Λ → Λ is invertible with invariant measure µ.
Then P v = v ◦ f −1 . In particular P = U −1 .
Proof
R
Λ
P v w dµ =
R
Λ
v U w dµ =
R
Λ
v w ◦ f dµ =
R
Λ
v ◦ f −1 w dµ.
This means that the set up in Subsection 1.4 is never satisfied for invertible dynamical systems since ker P = {0}. 7
However in many situations it is possible to use a method of Sinai [17] and
Bowen [2] to reduce to the noninvertible case. Roughly speaking, v = vˆ + χ1 ◦ f − χ1
where vˆ is constant along stable manifolds and projects to an observable on a noninvertible dynamical system.8 We then obtain a martingale-coboundary decomposition
at the level of the noninvertible system and the coboundary there can be combined
with the coboundary χ1 ◦ f − χ1 .
This can be formalised as follows: Let f : Λ → Λ be our invertible transformation
with invariant ergodic measure µ. Suppose that there is an associated noninvertible
¯ →Λ
¯ with invariant ergodic measure µ
transformation f¯ : Λ
¯. Moreover, suppose that
¯
there is a measure-preserving semiconjugacy π : Λ → Λ, satisfying π ◦ f = f¯ ◦ π. Let
P denote the transfer operator for f¯.
R
Definition 3.14 Let v ∈ L∞ (Λ, Rd ) with Λ v dµ. We say that v admits a martingale¯ Rd ) and χ ∈ L∞ (Λ, Rd ) such that
coboundary decomposition if there exist m ∈ L∞ (Λ,
v = m ◦ π + χ ◦ f − χ,
m ∈ ker P.
(3.2)
Given the decomposition (3.2), the situation is roughly as follows.
(1) Statistical limit laws for v are equivalent to statistical limit laws for m ◦ π.
7
8
There still exist results on decay of correlations, but these require regularity for w as well as v.
Even if v is C ∞ , usually vˆ is only H¨older.
21
(2) Since π∗ µ = µ
¯, statistical limit laws for m ◦ π on (Λ, µ) are equivalent to
¯ µ
statistical limit laws for m on (Λ,
¯).
(3) Since m ∈ ker P , statistical limit laws for m follow as in the previous sections.
Here are more details. Let m
ˆ = m ◦ π, so v = m
ˆ + χ ◦ f − χ.
R
R
R
• Since π∗ µ = µ
¯ and m
ˆ = m◦π, it is immediate that Λ¯ m d¯
µ = Λm
ˆ dµ = Λ v dµ = 0.
R
• Define Σ = Λ¯ m mT d¯
µ. Then
Z
Z
T
Σ = (m m ) ◦ π dµ =
m
ˆm
ˆ T dµ.
Λ
Λ
ˆ ◦ f j . Since π ◦ f j = f¯j ◦ π, it follows
ˆ n = j=0 m
• Define mn = j=0 m ◦ f¯j . and m
that m
ˆ n = mn ◦ π. Since m ∈ ker P , it follows easily that
Z
Z
T
ˆnm
ˆ Tn dµ.
Σ = n mn mn d¯
µ=n m
Pn−1
Pn−1
¯
Λ
Λ
R
−1
• As in the noninvertible
case, n
R
that limn→∞ n−1 Λ vn vnT dµ = Σ.
v v T dµ − n−1
Λ n n
R
Λ
m
ˆnm
ˆ Tn dµ → 0, so we obtain
• Since m
ˆ n = mn ◦ π, the CLT for m implies the CLT for m
ˆ and hence v; similarly
for the WIP.
Remark 3.15 Again, it suffices that m, χ lie in L2 .
3.3
Flows
Suppose that y˙ = g(y) is an ODE on Rm with flow φt : Rm → Rm . Let Y ⊂ Rm be a
(local) Poincar´e cross-section to the flow (dim Y = m − 1). Define the first hit time
τ : Y → R+ and Poincar´e map f : Y → Y given by
τ (y) = min{t > 0 : φt (y) ∈ Y },
f (y) = φτ (y) (y).
For convenience, we assume that there are constants C2 ≥ C1 > 0 such that C1 ≤
τ (y) ≤ C2 for all y ∈ Y .
Invariant
sets and probability measures If Λ is an invariant set for f , then
S
Ω = t≥0 φt (Λ) is an invariant set for φt . Moreover, if µΛ is an ergodic invariant
probability measure for f : Λ → Λ, then there is a natural method to construct an
ergodic invariant probability measure µΩ for φt : Ω → Ω. (Here, invariant means that
µΩ (φt E) = µΩ (E) for all t and all measurable sets E. Ergodic means that if E ⊂ Ω
is invariant, then µΩ (E) = 0 or 1.)
To construct µΩ , consider the suspension
Λτ = {(y, u) ∈ Λ × R : 0 ≤ u ≤ τ (y)}/ ∼,
22
where (y, τ (y)) ∼ (f y, 0),
and the suspension flowR ft : Λτ → Λτ where ft (y, u) = (y, u + t) computed modulo
identifications. Let τ¯ = Λ τ dµΛ . Then µΛτ = µΛ ×Lebesgue/¯
τ is an ergodic invariant
probability measure9 for the suspension flow.
Finally, it is clear that p : Λτ → Ω, p(y, u) = φu (y), defines a semiconjugacy
(φt ◦ p = p ◦ ft ) and so µΩ = p∗ µΛτ is the desired measure.
Observables
and sequences of random elements Given v : Ω → Rd lying in
R
L∞ with Ω v dµΩ = 0, define the induced observable vˆ : Λ → Rd ,
Z
τ (y)
vˆ(y) =
v(φt y) dt.
0
R
Then vˆ ∈ L∞ (Λ, Rd ) and Λ vˆ dµΛ = 0.
cn ∈ C([0, 1], Rd ) as before, so
For the map f : Λ → Λ, define vˆn and W
vˆn =
n−1
X
vˆ ◦ f j ,
cn (t) = n−1/2 vˆ[nt] + O(n−1/2 ).
W
j=0
For the flow φt : Ω → Ω, define vt and Wn ∈ C([0, 1], Rd ) by setting
Z t
v ◦ φs ds,
Wn (t) = n−1/2 vnt .
vt =
0
The following purely probabilistic result requires no hypotheses of a smooth ergodic theoretic nature. We require that µΛ (and hence µΩ ) is ergodic and we continue
to assume (for simplicity) that v ∈ L∞ (Ω, Rd ) and τ, τ −1 ∈ L∞ (Λ).
cn →w W
c in C([0, 1], Rd ) on (Λ, µΛ ) where W
c is a
Theorem 3.16 Suppose that W
b
d-dimensional Brownian motion with covariance Σ.
d
Then Wn →w W in C([0, 1], R ) on (Ω, µΩ ) where W is a d-dimensional Brownian
b τ.
motion with covariance Σ = Σ/¯
Proof This proof follows [10] (which is based on [16]). First, since p : Λτ → Ω is
measure preserving, we can replace (Ω, µΩ ) throughout with (Λτ , µτ ).
The proof divides into two steps:
cn , pass from convergence on (Λ, µΛ ) to convergence on (Λτ , µΛτ ).
1. Working with W
cn to Wn .
2. Working on (Λτ , µΛτ ), pass from convergence of W
Step 1. Passing from (Λ, µΛ ) to (Λτ , µΛτ ).
9
This notation means that
R
Λτ
v dµΛτ = (1/¯
τ)
R R τ (y)
Λ 0
23
v(y, u) du dµΛ .
cn to Λτ by setting W
cn (x, u) = W
cn (x). Recall that τ ≥ C1 . Form the
Extend W
τ
probability space (Λ , µC1 ) where µC1 = (µ × Lebesgue|[0,C1 ] )/C1 . Then it is immedicn that W
cn →w W
c on (Λτ , µC1 ). Since µC1 is absolutely
ate from the hypothesis on W
continuous with respect to µΛτ , it follows by strong distributional convergence (Recn →w W
c on (Λτ , µΛτ ).
mark 2.15) that W
cn to Wn .
Step 2: Passing from W
For technical reasons, rather than working in C([0, 1], Rd ), it is convenient to
work in the space D([0, 1], Rd ) of cadlag (continuous on the right, limits on the left)
functions. (The acronym comes from the french: “continue a` droite, limite `a gauche”.)
Most of the time, we can work
topology on D([0,P
1], Rd ).
R t in the sup-normP
n−1
j
Recall the notation vt = 0 v ◦ φs ds, vˆn = j=0 vˆ ◦ f j , τn = n−1
j=0 τ ◦ f . Also,
cn (t) = n−1/2 vˆ[nt] .
Wn (t) = n−1/2 vnt and we redefine W
For (x, u) ∈ Λτ and t > 0, we define the lap number N (t) = N (y, u, t) ∈ N:
N (t) = max{n ≥ 0 : τn (x) ≤ u + t}.
Then
cn (N (nt)/n) + O(n−1/2 )
Wn (t) = n−1/2 vnt = n−1/2 vˆN (nt) + O(n−1/2 ) = W
cn ◦ gn (t) + O(n−1/2 )
=W
where gn : Λτ → D[0, 1] is a sequence of random elements given by gn (t) = N (nt)/n.
Hence it remains to prove that
cn ◦ gn →w (¯
c
W
τ )−1/2 W
in D([0, 1], Rd ) on (Λτ , µΛτ ).
(3.3)
Define g¯(t) = t/¯
τ . By the exercises, sup[0,1] |gn − g| → 0 a.e. on (Λτ , µΛτ ). By the
cn →w W
c on (Λτ , µΛτ ). Since g¯ is not random, (W
cn , gn ) →w (W
c , g¯) on
previous step, W
cn ◦ gn →w W
c ◦ g.
(Λτ , µΛτ ).10 It follows from the continuous mapping theorem that W
c ◦ g(t) = W
c (t/¯
c (t/¯
c (t), t ≥ 0} on (Λτ , µΛτ ),
But W
τ ) and {W
τ ), t ≥ 0} =d {(¯
τ )−1/2 W
completing the proof of (3.3).
Corollary 3.17 If the Poincar´e map f : Λ → Λ and induced observable vˆ : Λ →
Rd lie within the set up in Subsection 3.2, so in particular, vˆ admits a martingalecoboundary decomposition (3.2), then Wn →w W in C([0,
1], Rd ) where W is a dR
dimensional Brownian motion with covariance Σ = (¯
τ )−1 Λ m mT dµΛ .
Note that in the situation of this corollary, there are no mixing assumptions on
the flow. So for example, the WIP holds for any nontrivial uniformly hyperbolic
10
Since D([0, 1], Rd ) is not separable, there is a technical issue here. But since the limit processes
c
W and g are continuous, convergence in the sup norm topology is equivalent to convergence in the
standard Skorokhod topology which is separable.
24
(Axiom A) attractor even though such attractors need not be mixing. (Even when
they are mixing, it is usually hard to tell, and even whey are known to be mixing it
may be hard to prove it. The corollary shows that it does not matter.)
25
4
Lecture 4: Fast-slow systems
Let y˙ = g(y) be an ODE on Rm generating a flow φt : Rm → Rm with invariant set
Ω and ergodic invariant probability measure µ supported on Ω.
We consider the fast-slow system of ODEs
x˙ = a(x , y ) + −1 b(x )v(y ), x (0) = ξ ∈ Rd , (slow equation)
y˙ = −2 g(y ),
y (0) = η ∈ Ω, (fast equation)
where a : Rd × Rm → Rd , b : Rd → Rd×e are C 3 (with globally bounded derivatives11 )
e
and v : Ω → R
3.3. In particular, v ∈
R satisfies the conditions from Subsection
∞
e
d
L (Ω, R ) and Ω v dµ = 0. The initial condition ξ ∈ R is fixed throughout (but of
course η ∈ (Ω, µ) is the sole source of randomness in the fast-slow system).12
For the moment, the WIP suffices. So define the family of random elements
W ∈ C([0, 1], Re ),
Z
t
W (t) = vt−2 ,
v ◦ φs ds.
vt =
0
Then we assume that W →w W in C([0, 1], Re ) where W ∈ C([0, 1], Re ) is edimensional Brownian motion with covariance Σ ∈ Re×e .
Proposition 4.1 The slow ODE can be rewritten as
˙ ,
x˙ = a(x , y ) + b(x )W
x (0) = ξ,
(4.1)
Proof This is left as an exercise.
More suggestively, we can write
dx = a(x , y ) dt + b(x ) dW ,
x (0) = ξ.
Since W →w W in C([0, 1], Re ), a natural guess is that X →w X in C([0, 1], Rd )
where
dX = a
¯(X) dt + b(X) ? dW.
R
R
Here a
¯(x) = Ω a(x, y) dµ(y) and b(X) ? dW is some kind of stochastic integral.
The aim is to prove such a result, and in the process to determine the nature of the
stochastic integral.
11
Boundedness of the derivatives is not required but simplifies the arguments (and guarantees
global existence of solutions).
12
Such systems are called skew products because the fast equation does not depend on the slow
variable. Introducing such dependence raises numerous difficulties and we refer to [5, 6] for results
in this direction.
26
4.1
Diversion: Wong-Zakai approximation
Consider the simpler case where a is a function only of x and a
¯(x) = a(x). The
Wong-Zakai approximation problem seeks the weak limit of the solution of the ODE
dx = a(x ) dt + b(x ) dW given that W →w W . Since W and x are C 1 functions
of t, this is a question about smooth approximation of stochastic processes.
In a probabilistic set up, Wong-Zakai [20] gave sufficient conditions under which
x →w X where X satisfies the Stratonovich SDE
dX = a(X) dt + b(X) ◦ dW.
These conditions are automatically satisfied in one dimension, but not necessarily
in higher dimensions. McShane [14] gave counterexamples in two dimensions, and
Sussmann
R [18] showed that there are lots of possible interpretations for the stochastic
integral b(X) ? dW that arise in two dimensions.
The problem arises from the arbitrariness of how to interpolate (how to join the
dots) when defining Wn (t) for tn not an integer. But for flows, there is a canonical
choice. So a byproduct of solving the fast-slow problem is a definitive solution [10] to
the Wong-Zakai approximation problem.
4.2
No multiplicative noise
The first result deals with the case where d = e and b is the d × d identity matrix (so
there is no stochastic integral to worry about).
Theorem 4.2 (Melbourne-Stuart [15]) Consider the case d = e, b = I.
W →w W in C([0, 1], Rd ), then X →w X in C([0, 1], Rd ), where
dX = a
¯(X) dt + dW,
If
X(0) = ξ.
˙ , so
Proof Case 1: a ≡ 0. The slow equation (4.1) is just x˙ = W
x = ξ + W →w X where dX = dW , X(0) = ξ.
˙ , so x (t) =
Case 2: a(x,
¯(x). The slow equation (4.1) is x˙ = a
¯(x ) + W
R ty) ≡ a
ξ + W (t) + 0 a
¯(x (s)) ds.
Consider the map G : C([0, 1], Rd ) → C([0, 1], Rd ) given by G(f ) = u where
Z t
u(t) = ξ + f (t) +
a
¯(u(s)) ds.
0
Then we have shown that x = G(W ). It follows just as in standard ODE theory
(existence, uniqueness, continuous dependence on initial data) that G : C([0, 1], Rd ) →
C([0, 1], Rd ) is continuous. Moreover, an infinite-dimensional version of the continuous
mapping theorem shows that G preserves weak convergence. Hence G(W ) →w G(W )
and so x →w X where X = G(W ).
27
Finally, X = G(W ) means that X(t) = ξ + W (t) +
dX = a
¯(X) dt + dW , X(0) = ξ, as required.
Rt
0
a
¯(X(s)) ds. In other words,
General case: Let a
˜(x, y) = a(x, y) − a
¯(x) and define Z (t) =
Then
Z t
x (t) = ξ + W (t) + Z (t) +
a
¯(x (s)) ds.
Rt
0
a
˜(x (s), y (s)) ds.
0
Hence x = G(W + Z ) where G is the continuous map defined in Case 2. We claim
that Z → 0 in probability in C([0, 1], Rd ) and hence that W →w W in C([0, 1], Rd ).
By the continuous mapping theorem, x →w X = G(W ) completing the proof.
It remains to verify the claim. First we consider the case where there exists Q0 > 0
such that |x (t)| ≤ Q0 for all ≥ 0 and all t ∈ [0, 1]. (This proof also applies to the
case where a is periodic and the slow equations lie on Td rather than Rd .)
Note that |x (s) − x (t)| ≤ (|a|∞ + −1 |v|∞ )|s − t|. Hence
Z (t) = Z ([t−3/2 ]3/2 ) + O(3/2 )
[t−3/2 ]−1 Z (n+1)3/2
X
=
a
˜(x (s), y (s)) ds + O(3/2 )
n3/2
n=0
=
[t−3/2 ]−1 Z (n+1)3/2
X
n3/2
n=0
[t−3/2 ]−1
=
X
n=0
a
˜(x (n3/2 ), y (s)) ds + O(1/2 )
2
Z
(n+1)−1/2
a
˜(x (n3/2 ), y1 (s)) ds + O(1/2 )
n−1/2
[t−3/2 ]−1
=
X
3/2 J (n) + O(1/2 ),
n=0
R (n+1)−1/2
where J (n) = 1/2 n−1/2
a
˜(x (n3/2 ), y1 (s)) ds. Hence
[T −3/2 ]−1
max |Z | ≤
[0,1]
X
3/2 |J (n)| + O(1/2 ).
(4.2)
n=0
For u ∈ Rd fixed, we define
Z (n+1)−1/2
1/2
J˜ (n, u) = Au ◦ φs ds,
Au (y) = a
˜(u, y).
n−1/2
R
R
Note that J˜ (n, u) = J˜ (0, u) ◦ φn−1/2 , and so Ω |J˜ (n, u)| dµ = Ω |J˜ (0, u)|. By the
R
ergodic theorem Ω |J˜ (0, u)| dµ → 0 as → 0 for each n and u.
For any r > 0, there exists a finite subset S ⊂ Rd such that dist(x, S) ≤ r/(2 Lip a)
for any x with |x| ≤ Q0 . Then for all n ≥ 0, > 0,
X
|J (n)| ≤
|J˜ (n, u)| + r.
u∈S
28
Hence by (4.2),
[−3/2 ]−1
Z
X
max |Z | dµ ≤
Ω [0,1]
3/2
n=0
X
3/2
n=0
≤
XZ
u∈S
XZ
|J˜ (n, u)| dµ + r + O(1/2 )
Ω
u∈S
[−3/2 ]−1
=
XZ
|J˜ (0, u)| dµ + r + O(1/2 )
Ω
|J˜ (0, u)| dµ + r + O(1/2 ).
Ω
u∈S
Since r > 0 is arbitrary, we obtain that max[0,1] |Z | → 0 in L1 (Ω, Rd ), and hence in
probability, as → 0.
Finally, we drop the assumption that x is bounded. Let Q > 0 and write Z =
ZQ,1 + ZQ,2 where
ZQ,1 (t) = Z (t)1B (Q) , ZQ,2 (t) = Z (t)1B (Q)c , B (Q) = max |x | ≤ Q .
[0,1]
Then for all n ≥ 0, > 0,
1B (Q) |J (n)| ≤
X
|J˜ (n, u)| + r.
u∈S
Hence by (4.2),
[−3/2 ]−1
Z
max |ZQ,1 | dµ
Ω [0,1]
≤
X
3/2
n=0
X
3/2
n=0
≤
u∈S
XZ
u∈S
XZ
|J˜ (n, u)| dµ + r + O(1/2 )
Ω
u∈S
[−3/2 ]−1
=
XZ
|J˜ (0, u)| dµ + r + O(1/2 )
Ω
|J˜ (0, u)| dµ + r + O(1/2 ).
Ω
Since r > 0 is arbitrary, we obtain for each fixed Q that max[0,1] |ZQ,1 | → 0 in
L1 (Ω, Rd ), and hence in probability, as → 0.
Next, since x − W is bounded on [0, 1], for Q sufficiently large
µ max |ZQ,2 | > 0 ≤ µ max |x | ≥ Q ≤ µ max |W | ≥ Q/2 .
[0,1]
[0,1]
[0,1]
Fix c > 0. Increasing Q if necessary, we can arrange that µ{max[0,1] |W | ≥ Q/2} <
c/4. By the continuous mapping theorem, max[0,1] |W | →d max[0,1] |W |. Hence there
exists 0 > 0 such that µ{max[0,1] |W | ≥ Q/2} < c/2 for all ∈ (0, 0 ). For such ,
µ max |ZQ,2 | > 0 < c/2.
[0,1]
29
Shrinking 0 if necessary, we also have that µ{max[0,1] |ZQ,1 | > c/2} < c/2. Hence
µ{max[0,1] |Z | > c} < c as required.
4.3
One-dimensional multiplicative noise
The next
result deals with the case of one-dimensional multiplicative noise. Recall
R
that b(X) ◦ dW denotes the Stratonovich integral. The only fact that we require
about this integral is that it transforms according to the usual laws of calculus.
Theorem 4.3 (Gottwald-Melbourne [9]) Consider the case d = e = 1.
W →w W in C([0, 1], R), then X →w X in C([0, 1], R), where
dX = a
¯(X) dt + b(X) ◦ dW,
If
X(0) = ξ.
Proof Write z = h(x ) where h0 = 1/b. Then
˙ = A(z , y ) + W
˙ ,
z˙ = h0 (x )x˙ = b(x )−1 a(x , y ) + W
z (0) = h(ξ),
where A(z, y) = h0 (h−1 (z))a(h−1 (z), y). By Theorem 4.2, z →w Z where dZ =
¯
A(Z)
dt + dW . Here
Z
¯
A(z) =
A(z, y) dµ(y) = h0 (h−1 (z))¯
a(h−1 (z)).
Ω
By the continuous mapping theorem, x = h−1 (z ) →w h−1 (Z), so it remains to
determine X = h−1 (Z). Since the Stratonovich integral transforms according to the
standard laws of calculus,
¯
dX = (h−1 )0 (Z) ◦ dZ = h0 (X)−1 [A(Z)
dt + ◦ dW ]
= h0 (X)−1 [h0 (X)¯
a(X) dt + ◦ dW ] = a
¯(X) dt + b(X) ◦ dW,
as required.
Remark 4.4 (a) This result extends classical Wong-Zakai approximation [20] from
the probabilistic setting to the deterministic setting. Clearly the same argument
works in higher dimensions provided that b−1 = dh for some h : Rd → Rd .
Of course, it is usually not the case that b−1 has this form in higher dimensions.
Indeed, [20] gave sufficient conditions for convergence with b(X) ◦ dW Stratonovich,
and these conditions were shown in the probabilistic setting to be automatic in one dimension but not in higher dimensions. Moreover, McShane [14] gave counterexamples
in two-dimensions, and Sussmann [18] showed that numerous different interpretations
for the stochastic integral could arise in two dimensions.
(b) The proof of Theorem 4.3R works for general stochastic processes W provided that
there is a stochastic integral W ◦dW that transforms according to the standard laws
of calculus. In the case of a one-dimensional L´evy process, the appropriate integral is
the “Marcus” integral. This case is treated in [9] (the scalings in the fast-slow ODE
are modified accordingly).
30
4.4
Two examples
Before considering the general slow equation (4.1), it is instructive to consider two
special cases. Take a ≡ 0 and ξ = 0. Let d = e = 2 and consider the examples defined
by
1 0
1 0
1
2
1
2
.
b(x , x ) =
,
b(x , x ) =
x1 0
0 x1
In the first case, the slow equation (4.1) becomes
˙ 1 ,
x˙ 1 = W
so
x1
=
W1 ,
x2 (t)
˙ 1 ,
x˙ 2 = x1 W
t
Z
=
0
˙ 1 (s) ds = 1 [W 1 (t)]2 .
W1 (s)W
2 The mapping χ : C([0, 1], R ) → C([0, 1], R2 ) given by χ(g) = (g 1 , 21 (g 1 )2 ) is continuous, so by the continuous mapping theorem x = χ(W1 ) →w χ(W 1 ) = (W 1 , 21 (W 1 )2 ).
Finally, X = χ(W 1 ) = (W 1 , 21 (W 1 )2 ) satisfies the SDE
2
dX 1 = dW 1 ,
dX 2 = W 1 ◦ dW 1 = X 1 ◦ dW 1 ,
or simply dX = b(X) ◦ dW .
In the second case, the slow equation (4.1) becomes
˙ 1 ,
x˙ 1 = W
so
x1
=
W1 ,
x2 (t)
Z
=
˙ 2 ,
x˙ 2 = x1 W
t
˙ 2 (s) ds
W1 (s)W
0
Z
=
t
W1 dW2 .
0
2
ThisR time, the mapping χ : C([0, 1], R ) → C([0, 1], R2 ) given by χ(g)(t) =
t
(g 1 , 0 g 1 dg 2 ) is not continuous. The second coordinate is not even well-defined but
that is easily fixed; it is the lack of continuity that is a problem. Indeed, the starting
point for rough path theory is thatR the WIP W →w W does not uniquely pin down
t
the weak limit of the process t 7→ 0 W1 dW2 .
The second example demonstrates that solving the following “iterated weak invariance principle” is an unavoidable step in understanding homogenization of fast-slow
systems. (Fortunately, rough path theory tells us that if we solve this, then we are
almost done.)
d
d
d×d
Given v : Λ →
R R , we define (Wn Wn ) ∈ C([0, 1], R × R ) by setting Wn (t) =
−1/2
−1/2 nt
n
vnt = n
v ◦ φs ds as before and setting
0
Z nt nZ s
Z t
o
ij
i
j
−1
i
j
Wn (t) =
W dW = n
v ◦ φr v ◦ φs dr ds.
0
0
0
The aim of the iterated WIP is to determine the process W ∈ C([0, 1], Rd×d ) such
that (Wn , Wn ) →w (W, W) in C([0, 1], Rd × Rd×d ).
31
5
Lecture 5: The iterated WIP
The previous lecture ended with a statement of the iterated WIP problem that needs
to be resolved. As in the case of the WIP, it is convenient to first consider the discrete
time setting.
5.1
The iterated WIP; discrete time
Assume first the noninvertible setting from Section 1.4, so f : Λ → Λ is a transformation of a probability space (Λ, µ) where µ is f -invariantR and ergodic. Suppose as
in Section 2.1 that v : Λ → Rd lying in L∞ (Λ, Rd ) where Λ v dµ = 0. Moreover, we
suppose as before that each component of v satisfies condition (1.2). In particular,
we have a martingale-coboundary decomposition
Z
∞
d
v = m + χ ◦ f − χ, m, χ ∈ L (Λ, R ),
m dµ = 0, m ∈ ker P.
Λ
It is again convenient to work in the space D([0, 1], Rd ) of cadlag functions with
(most of the time) the sup-norm topology. It is now no longer necessary to linearly
interpolate, so we have the simpler definition
[nt]−1
Wn (t) = n−1/2 v[nt] = n−1/2
X
v ◦ f k.
(5.1)
k=0
d
The WIP holds by Theorem 2.4, so Wn →w W in D([0,
R 1], R T) where W is ddimensional Brownian motion with covariance matrix Σ = Λ m m dµ.
Now define the iterated sum Wn ∈ D([0, 1], Rd×d ) where
Z tZ s
X
ij
Wn (t) =
Wni dWnj =
vi ◦ f k vj ◦ f `.
(5.2)
0
0
0≤k<`≤[nt]−1
(The double sum can be taken as a definition of the double integral.)
Our immediate aim is to prove:
Theorem 5.1 (Kelly-Melbourne [10]) Assume the above martingale-coboundary
decomposition for v and suppose moreover that f is mixing. Then (Wn , Wn ) →w
(W, W) in D([0, 1], Rd × Rd×d ) where
ij
Z
W (t) =
t
i
j
W dW + tE,
0
(Here
Rt
0
ij
E =
∞
X
n=1
W i dW j is the Itˆo integral.)
32
v i v j ◦ f j dµ.
5.2
A cohomological invariant
To exploit the martingale-coboundary decomposition v = m+χ◦f −χ, it is necessary
to figure out how the iterated WIP varies under cohomology. (This was not an issue
for the ordinary WIP since the coboundary telescoped and hence was negligible.)
The following result just requires that f is mixing.
Lemma 5.2 Let (Λ, µ) be a probability spaceR and suppose that
R f : RΛ → Λ is a
mixing measure-preserving transformation (ie Λ φ ψ ◦ f n dµ → Λ φ dµ Λ ψ dµ for all
φ, ψ ∈ L2 (Λ)).
Let v, vˆ, χ ∈ L2 (Λ, Rd ) and suppose that v = vˆ + χ ◦ f − χ. Define (Wn , Wn ) ∈
cn , c
D([0, 1], Rd × Rd×d ) as in (5.1) and (5.2), and define (W
Wn ) using vˆ instead of v.
Then
R
P∞ R i j
k
i j
k
(v
v
◦
f
−
v
ˆ
v
ˆ
◦
f
)
dµ
=
(χi v j − vˆi χj ◦ f ) dµ.
(a)
k=1 Λ
Λ
cn , c
(b) (Wn , Wn ) − (W
Wn ) →w (0, A) in D([0, 1], Rd × Rd×d ), where
∞
X
A (t) = t
(v i v j ◦ f k − vˆi vˆj ◦ f k ) dµ.
ij
k=1
Proof This is left as an exercise.
Now we apply this result in the case where vˆ = m ∈ ker P . Accordingly, define
(Mn , Mn ) ∈ D([0, 1], R × Rd×d ) as in (5.1) and (5.2), but using m instead of v. We
already know that Mn →w W and Wn →w W .
Corollary 5.3 Assume the hypotheses of Theorem 5.1, and define E ∈ Rd×d , A ∈
D([0, 1], Rd×d ),
∞ Z
X
ij
E =
v i v j ◦ f k dµ, A(t) = tE.
k=1
Λ
Suppose that (Mn , Mn ) →w (W, M) in D([0, 1], Rd × Rd×d ).
Then (Wn , Wn ) →w (W, M + A) in D([0, 1], Rd × Rd×d ).
Proof This is left as an exercise.
5.3
Proof of Theorem 5.1
e n ), (M
fn , M
fn− , M
fn− ) ∈ D([0, 1], Rd × Rd×d ). Here (Mm , Mn ) is
Define (Mn , Mn ), (M
defined as in Section 5.2:
[nt]−1
Mn (t) = n
−1/2
X
m ◦ f k,
−1
Mij
n (t) = n
k=0
X
0≤k<`≤[nt]−1
33
mi ◦ f k mj ◦ f ` ,
˜ →Λ
˜ as in Section 3.1, we define
Similarly, at the level of the natural extension f˜ : Λ
the forward processes
[nt]−1
X
fn (t) = n−1/2
M
m
˜ ◦ f˜k ,
X
e ij (t) = n−1
M
n
k=0
m
˜ i ◦ f˜k m
˜ j ◦ f˜` ,
0≤k<`≤[nt]−1
and the backward processes
fn− (t) = n−1/2
M
−1
X
m
˜ ◦ f˜k ,
X
e ij,− (t) = n−1
M
n
m
˜ i ◦ f˜k m
˜ j ◦ f˜` ,
−[nt]≤`<k≤−1
k=−[nt]
fn− } is a sequence
In particular, we saw in Section 3.1 that the backward process {M
of continuous time martingales and hence can appeal to results from stochastic analye − ) →w (W, M) where
f− , M
sis. In particular, it followsR from [11, Theorem 2.2] that (M
n
n
t
Mij (t) is the Itˆo integral 0 W i dW j . Some complicated but straightforward calcue n ) →w
fn , M
lations similar to those in Section 3.1 (see [10, Section 4.2] show that (M
e n ) = (Mn , Mn ) ◦ π and so (Mn , Mn ) →w (W, M). This comfn , M
(W, M). Finally (M
bined with Corollary 5.3 completes the proof of Theorem 5.1.
5.4
Iterated WIP for flows
In this section, we state the analogue of Theorem 3.16 for the iterated WIP. Let
(Ω, µΩ ), (Λ, µΛ ), v : Ω → Rd , vˆ : Λ → Rd be as in Section 3.3. On (Ω, µΩ ), define
Z nt
Z t
−1/2
ij
Wn (t) = n
v ◦ φs ds, Wn (t) =
Wni dWnj ,
0
0
and on (Λ, µΛ ), define
[nt]−1
cn (t) = n−1/2
W
X
k
c
Wij
n (t) =
vˆ ◦ f ,
Z
t
c i dW
cj .
W
n
n
0
k=0
cn , c
c, c
Theorem 5.4 Suppose that (W
Wn ) →w (W
W) in D([0, 1], Rd × Rd×d ) on (Λ, µΛ )
Rt i
c is a d-dimensional Brownian motion and c
c dW
ˆ j + tEˆ ij for
where W
Wij (t) = 0 W
R
u
some Eˆ ∈ Rd×d . Let H(y, u) = 0 (v(x, s) ds. Assume that
max |ˆ
vk | = o(n).
1≤k≤n
2
c
Then (Wn , Wn ) →w (W, W) in D([0, 1], Rd ×Rd×d ) on (Ω, µΩ ) where W = τ¯−1/2 W
and
Z t
Z
ij
i
j
ij
ij
−1 ˆ ij
W (t) =
W dW + tE ,
E = (¯
τ ) E + H i v j dµΩ .
0
Ω
UNDER CONSTRUCTION!
34
5.5
Back to fast-slow systems
References
[1] P. Billingsley, Convergence of probability measures, second ed., Wiley Series in
Probability and Statistics: Probability and Statistics, John Wiley & Sons Inc.,
New York, 1999, A Wiley-Interscience Publication.
[2] R. Bowen, Equilibrium States and the Ergodic Theory of Anosov Diffeomorphisms, Lecture Notes in Math. 470, Springer, Berlin, 1975.
[3] B. M. Brown, Martingale central limit theorems, Ann. Math. Statist. 42 (1971),
59–66.
[4] D. L. Burkholder, Distribution function inequalities for martingales, Ann. Probability 1 (1973), 19–42.
[5] D. Dolgopyat, Limit theorems for partially hyperbolic systems, Trans. Amer.
Math. Soc. 356 (2004), 1637–1689.
[6] D. Dolgopyat, Averaging and invariant measures, Mosc. Math. J. 5 (2005), 537–
576, 742.
[7] G. K. Eagleson, Some simple conditions for limit theorems to be mixing, Teor.
Verojatnost. i Primenen 21 (1976), 653–660.
[8] M. I. Gordin, The central limit theorem for stationary processes, Soviet Math.
Dokl. 10 (1969), 1174–1176.
[9] G. A. Gottwald and I. Melbourne, Homogenization for deterministic maps and
multiplicative noise, Proc. R. Soc. London A (2013), 20130201.
[10] D. Kelly and I. Melbourne, Smooth approximation of stochastic differential equations, Ann. Probab. (2014), to appear.
[11] T. G. Kurtz and P. Protter, Weak limit theorems for stochastic integrals and
stochastic differential equations, Ann. Probab. 19 (1991), 1035–1070.
[12] M. Maxwell and M. Woodroofe, Central limit theorems for additive functionals
of Markov chains, Ann. Probab. 28 (2000), 713–724.
[13] D. L. McLeish, Dependent central limit theorems and invariance principles, Ann.
Probability 2 (1974), 620–628.
[14] E. J. McShane, Stochastic differential equations and models of random processes,
Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and
Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. III: Probability
theory (Berkeley, Calif.), Univ. California Press, 1972, pp. 263–294.
35
[15] I. Melbourne and A. Stuart, A note on diffusion limits of chaotic skew product
flows, Nonlinearity (2011), 1361–1367.
[16] I. Melbourne and R. Zweim¨
uller, Weak convergence to stable L´evy processes for
nonuniformly hyperbolic dynamical systems, Ann Inst. H. Poincar´e (B) Probab.
Statist., to appear.
[17] Y. G. Sina˘ı, Gibbs measures in ergodic theory, Russ. Math. Surv. 27 (1972),
21–70.
[18] H. J. Sussmann, Limits of the Wong-Zakai type with a modified drift term,
Stochastic analysis, Academic Press, Boston, MA, 1991, pp. 475–493.
[19] M. Tyran-Kami´
nska, An invariance principle for maps with polynomial decay of
correlations, Comm. Math. Phys. 260 (2005), 1–15.
[20] E. Wong and M. Zakai, On the convergence of ordinary integrals to stochastic
integrals, Ann. Math. Statist. 36 (1965), 1560–1564.
[21] R. Zweim¨
uller, Mixing limit theorems for ergodic transformations, J. Theoret.
Probab. 20 (2007), 1059–1071.
36