Liverani`s complete notes and problems/solutions

Transcription

Liverani`s complete notes and problems/solutions
LMS-CMI RESEARCH SCHOOL, LOUGHBOROUGH,
13-17 APRIL 2015
NOTES
CARLANGELO LIVERANI
Abstract. This are informal notes for a five hours mini course. Each one
hour lecture contains a selection of problems suitable for at least a 45 minute
problem session. These notes are not original as they are largely based on the
paper [4] of which they reproduced a lot of material almost verbatim. They are
intended for the exclusive use of the students of the Loughborough LMS-CMI
2015 Research School Statistical Properties of Dynamical Systems.
Contents
1. Lecture One: the transfer operator
1.1. Introduction
1.2. Invariant measures
1.3. Expanding maps of the circle
2. Lecture Two: standard pairs
2.1. Standard Pairs (expanding maps)
2.2. Coupling
2.3. Expanding maps revisited
3. Lecture three: Fast-Slow partially hyperbolic systems
3.1. The unperturbed system: ε = 0
3.2. Trajectories and path space
3.3. What we would like to prove
3.4. Standard Pairs (partially hyperbolic systems)
3.5. Conditioning
4. Lecture Four: Averaging (the Law of Large Numbers)
4.1. Differentiating with respect to time
4.2. The Martingale Problem at work
4.3. A recap of what we have done so far
5. Lecture five: A brief survey of further results
5.1. Fluctuations around the limit
5.2. Local Limit Theorem
5.3. Wentzel-Friedlin processes
5.4. Statistical propeties of Fε
6. Supplementary material: Proof of Theorem 5.1
6.1. Tightness
6.2. Differentiating with respect to time (poor man’s It¯
o’s formula)
6.3. Uniqueness of the Martingale Problem
Appendix A. Geometry
Appendix B. Shadowing
Appendix C. Martingales, operators and It¯
o’s calculus
Appendix D. Solutions to the Problems
References
Date: April 16, 2015.
1
2
2
3
5
6
6
7
8
10
10
11
11
12
15
17
17
19
20
20
20
21
21
22
22
23
26
30
32
33
34
35
40
2
CARLANGELO LIVERANI
1. Lecture One: the transfer operator
1.1. Introduction.
1.1.1. Dynamical Systems (few general facts). The concept of Dynamical System
is a very general one and it appears in many branches of mathematics from discrete mathematics, number theory, probability, geometry and analysis and has wide
applications in physics, chemistry, biology, economy and social sciences.
Probably the most general formulation of such a concept is the action of a group
over an algebra. Given a group G and a vector space V, over a filed K, the (left)action of G on V is simply a map f : G × V → V such that
(1) f (gh, a) = f (g, f (h, a)) for each g, h ∈ G and a ∈ V;
(2) f (e, a) = a for every a ∈ V, where e is the identity element of G;
(3) f (g, αa + βb) = αf (g, a) + βf (g, b) for each g ∈ G and a, b ∈ V, α, β ∈ K;
In our discussion we will be mainly motivated by physics. In fact, we will consider
mainly the cases G = N as the simplest representative of the most common physics
situations in which G ∈ {N, Z, R+ , R}1 is interpreted as time and V is given by the
space of functions over some set X, interpreted as the observables of the system.2 In
addition, we will restrict ourselves to situations where the action over the algebra
is induced by an action over the set X (this is a map f : G × X → X that satisfies
condition 1, 2 above).3 Indeed, given an action f of G on X and the vector space
V, over the filed C, of function on X one can define f˜(g, a)(x) := a(f (g, x)) for all
g ∈ G, a ∈ V and x ∈ X. It is then easy to verify that f˜ satisfies conditions 1-3
above.
The Dynamical System for which G ∈ {N, Z} are said to be in discrete time
while the Dynamical Systems in which G ∈ {R+ , R} are in continuos time. Note
that, in the first case, f (n, x) = f (n − 1 + 1, x) = f (1, f (n − 1, x)), thus defining
T : X → X as T (x) = f (1, x), holds f (n, x) = T n (x).4 Thus in such a case
we can (and will) specify the Dynamical System by writing only (X, T ). In the
case of continuos Dynamical Systems we will write φt (x) := f (t, x) and call φt a
flow (if the group is R) or a semi-flow (if the group is R+ ) and will specify the
Dynamical System by writing (X, φt ). In fact, in this notes we will be interested
only in Dynamical Systems with more structure i.e. topological, measurable or
smooth Dynamical Systems. By topological Dynamical Systems we mean a triplet
(X, T , T ), where T is a topology and T is continuos (if B ∈ T , then T −1 B ∈ T ).
By smooth we consider the case in which X has a differentiable structure and T is
r-times differentiable for some r ∈ N. Finally, a measurable Dynamical Systems is
a quadruple (X, Σ, T, µ) where Σ is a σ-algebra, T is measurable (if B ∈ Σ, then
T −1 B ∈ Σ) and µ is an invariant measure (for all B ∈ Σ, µ(T −1 B) = µ(B)).5
1Although even in physics other possibilities are very relevant, e.g. in the case of Statistical
Mechanics it is natural to consider the action of the space translations, i.e. the groups {Zd , Rd }
for some d ∈ N, d > 1.
2Again other possibilities are relevant, e.g. the case of Quantum Mechanics (in the so called
Heisenberg picture) where the vector space is given by the non-abelian algebra of the observables
and consists of the bounded operators over some Hilbert space.
3Again relevant cases are not included, for example all Markov Process where the evolution is
given by the action of some semigroup.
4Obviously T 2 (x) = T ◦ T (x) = T (T (x)), T 3 (x) = T ◦ T ◦ T (x) = T (T (T (x))) and so on.
5The definitions for continuos Dynamical Systems are the same with {φ } taking the place of
t
T.
NOTES
3
1.1.2. Strong dependence on initial conditions. One of the great discoveries of the
last century, starting with the seminal work of Poincar`e, is that it may be essentially impossible to predict the long term behaviour of a dynamical system. This
might seem strange as such a behaviour is, in principle, exactly computable. To
understand better what is going one let us consider one of the simplest possible
√
cases X = T and T (x) = 2x mod 1. Let us say that you start from x = 1/ 2 and
106
you want to compute TP
x. If you do it on a computer, then it is better to write
∞
x in binary digits: x = n=1 an 2−n . Then
Tx =
∞
X
−n−1
an 2
mod 1 =
n=1
∞
X
an+1 2−n .
n=2
Hence
6
T 10 x =
∞
X
an+106 2−n .
n=106 +1
6
This means that if T 10 x ∈ [0, 1/2] or not depends on the 106 + 1 digit of x. This
of course can be computed, but it is not super easy. And this is a really simple
example.
A slightly more sophisticated exampled is the so called Arnold cat map (which
is the simplest possible example of a, so called, hyperbolic system: let F : T2 → T2
be defined by
x
1 1
x
F (x, y) = A
mod 1 :=
mod 1.
y
1 2
y
Problem 1.1. Compute the eigenvalues of A a verify that they are of the form
λ, λ−1 with λ > 1. Deduce that, for almost all small vectors h ∈ R2 , distance
between the orbits of (x, y) and (x, y) + h grows exponentially both in the past and
in the future.
1.2. Invariant measures. The above seems to cast a shadow of despair on the
possibility to use the theory of dynamical systems to make long term predictions.
In such a bleak scenario another fundamental fact was discovered in the second
half of the last century (starting with the work of Kolmogorov and other prominent
members of the so called Russian School): even though it may be impossible to
make predictions for a single orbit, some times it is possible to make statistical
predictions. That is, instead of looking at the evolution of points, look at the
evolution of measures: let M1 (X) be the set of Borel probability measures on X
and consider the map T∗ : M1 (x) → M1 (X) defined by T∗ µ(A) = µ(T −1 (A))
for each µ ∈ M1 (X) and measurable set A. Note that (M1 (X), T∗ ) is itself a
topological dynamical systems.6 Moreover the fixed points of (M1 (X), T∗ ), i.e. the
probability measures such that T∗ µ = µ, correspond to the invariant measures of
(X, T ).
Problem 1.2. Show that, for all f ∈ C 0 , and measure µ on X, T∗ µ(f ) = µ(f ◦ T ).
6 The natural topology on M (X) being the weak topology.
1
4
CARLANGELO LIVERANI
Problem* 1.3 (Krylov–Bogolyubov theorem). 7 Let (X, T ) be a topological dynamical system, with X compact. Let Σ be the Borel σ-algebra. Show that there
exists at least a measure µ such that (X, Σ, T, µ) is a measurable dynamical system.8
Remark 1.1. The idea of studying the dynamical system (M1 (X), T∗ ) instead of
(X, T ) may seem totally crazy. Indeed, first of all if δx ∈ M1 (X) is defined as
δx (f ) = f (x), then the reader can easily check that T∗ δx = δT x , hence (X, T )
can be seen as a minuscule part of (M1 (X), T∗ ). Second (M1 (X), T∗ ) is infinite
dimensional even if (X, T ) is one dimensional. The only redeeming property is that
T∗ is a linear operator on the space of measures. We will see that this is has deep
implications.
Due to the above considerations it is not surprising that the study of (M1 (X), T∗ )
might be profitable if one restrict oneself to a space smaller than M1 (X). This can
be done in many ways and the actual choice will depend on the questions one
is interested in. In this note we will consider the simplest possible case: we are
interested in measures that have some connection with Lebesgue. What this means
in general it is not so clear, yet we can easily explain it in a very simple class of
examples.
1.2.1. Few basic definitions. Let us remind to the reader few basic definitions that
will be used during these notes. For many more details see [11].
Definition 1. A topological dynamical system (X, T ) is called transitive if it has
a dense orbit.9
Definition 2. A measurable dynamical system (X, Σ, T, µ), µ ∈ M1 , is called
ergodic if µ is the only invariant measure in the set M(µ) = {ν ∈ M1 : ν u}.
Definition 3. A measurable dynamical system (X, Σ, T, µ), µ ∈ M1 , is called
mixing if for each ν ∈ M(µ) one has limn→∞ T∗n ν = µ, where the convergence is
in the strong topology.
The last thing we need is a quantitative form of mixing (which often goes under
the name of decay of correlation. This can be done in several different ways. We
will adopt the following.
Definition 4. Consider a measurable dynamical system (X, Σ, T, µ), µ ∈ M1 , and
sets of functions D+ , D− ⊂ L1 (X, µ) such that for each f ∈ D− and g ∈ D+ we
have gf ∈ L1 (X, µ); if f ∈ D+ , then f ◦ T ∈ D+ and if dν = f dµ with f ∈ D− ,
then d(T∗ ν) = f¯dµ with f¯ ∈ D− . Then we say that for such sets of function we
have decay of correlation with speed {cn } if for each f ∈ D+ , g ∈ D− there exists a
constant C > 0 such that, for all n ∈ N we have
Z
Z
Z
gf ◦ T n dµ −
≤ Ccn .
gdµ
·
f
dµ
X
X
X
7Starred problems are harder, If you do not have any idea in 15 minutes, skyp to the next one.
8Hint: choose any initial probability measure µ and let the dynamics work for you by consid-
0
1 P−1
k
ering the sequence µu = n
k=0 T∗ µ0 .
9 Actually, the usual definition is: for every pair of non-empty open U, V ⊂ X, there exists n ∈ N such that T n (U ) ∩ V 6= ∅. This is not equivalent to our definitions in general,
but it is equivalent when X has no isolated points and is separable and second category (see
http://www.scholarpedia.org/article/Topological transitivity for more infos).
NOTES
5
1.3. Expanding maps of the circle. An expanding map of the circle is a map
T ∈ C 2 (T, T) such that T 0 (x) ≥ λ > 1 for all x ∈ T. In this case we will be interested
in measures absolutely continuous with respect to Lebesgue. A first remarkable fact
is that such a class of measure is invariant under the action of T∗ .
Problem 1.4. Show that if dµ = h(x)dx, then d(T∗ µ) = (Lh)(x)dx where
X
h(y)
.
(Lh)(x) =
0 (y)
T
−1
y∈T
(x)
The operator L describes the dynamics on the densities and is commonly called
Ruelle operator or Ruelle-Perron-Frobenius operator. It is naturally defined on L1
but it preserves also smaller spaces.
0
1
Problem 1.5. Let W 1,1 be the Sobolev
space
R
R 0of the functions0 such that f, f ∈ L
equipped with the norm kf kW 1,1 = |f | + |f | = kf kL1 + kf kL1 . Prove that, for
all h ∈ W 1,1 ,
kLhkL1 ≤ khkL1
k
d n
(L h)kL1 ≤ λ−n kh0 kL1 + DkhkL1
dx
00
|T (x)|
where D = supx |T
0 (x)|2 .
Prove that, there exist C > 0, such that for all n ∈ N and h ∈ C 1 ,
kLn hkC 0 ≤ CkhkC 0
kLn hkC 1 ≤ Cλ−n khkC 1 + CkhkC 0 .
Prove that for each σ ∈ (λ−1 , 1) and a ≥ D(σ − λ−1 )−1 , setting
f (x)
Da = f ∈ C 1 (T, R≥ ) :
≤ ea|x−y| ,
f (y)
holds true
L(Da ) ⊂ Dσa .
The above first two sets of inequalities are commonly called Lasota-Yorke or
Doeblin-Fortet (in L1 , W 1,1 or C 0 , C 1 , respectively). The last one is a different manifestation of the same phenomena: the dynamics tends to make densities smoother,
up to a point.
From the above Lasota-Yorke inequalities and an abstract functional analytic
theorem it follows that the essential spectrum of L is contained in the disk of
radius λ. We will not develop such a theory (see [1] for details) but let us consider
the simple case in which D < 1 (that is the map is not so far from being linear).
Lemma 1.2. If D < 1, then T has a unique invariant measure µ∗ absolutely
continuous with respect to Lebesgue. In addition, every measure with density in
W 1,1 converges to it exponentially fast.
R
R
R
Proof.
First of all note that if h = 0, then Lh = h = 0, thus V = {h ∈ W 1,1 :
R
h = 0} is an invariant space for L. Also note that khk∗ = kh0 kL1 is a norm on
V. Moreover,
h ∈ V,10 then there must exists a ∈ T such that h(a) = 0 and then
R x if
0
|h(x)| = | a h | ≤ khk∗ . Thus, for each h ∈ V,
kLn hk∗ ≤ λ−n khk∗ + DkhkL1 ≤ (λ−n + D)khk∗ .
10 Recall that W 1,1 ⊂ C 0 .
6
CARLANGELO LIVERANI
By hypotheses we can choose n∗ so that λ−n∗ + D = σ < 1 which shows that the
spectral radius of L on V is smaller that σ 1/n∗ < 1. On the other hand, setting
hn =
n−1
1X k
L 1
n
k=0
it follows khn kW 1,1 ≤ D + 1, hence {hn } is a compact sequence in L1 ,11 This means
that there exists a subsequence
hnj converging in L1 to some h∗ ∈ W 1,1 . Moreover
R
Lh∗ = h∗ , h∗ ≥ 0 and h∗ = 1. We have thus an invariant probability measure.
On the other hand for each h ∈ W 1,1 there exists a ∈ R such that h − ah∗ ∈ V,
thus
kLn h − ah∗ kW 1,1 = kLn (h − ah∗ )kW 1,1 ≤ σ n kh − ah∗ kW 1,1 ,
which implies the Lemma.
We can now make clear what we mean by statistical predictions.
Problem 1.6. Let T (x) = 3x + 10−2 sin(2πx) mod 1. Consider a situation in
which x ∈ [0, 10−3 ] with uniform probability. Give an estimate of the probability
that T 30 x ∈ [1/4, 1/2].
Problem 1.7. Show that Lemma 1.2 implies that an invariant measure with density
in W 1,1 is mixing.
2. Lecture Two: standard pairs
2.1. Standard Pairs (expanding maps). Let us revisit what we have learned
about the smooth expanding maps of the circle using a different technique: standard
pairs. This tool is less powerful than the transfer operator one, but much more
flexible, it will then be instrumental in the study of less trivial systems. We start
by presenting it in a very simplified manner, such a simplification is possible only
because we treat very simple systems: smooth expanding maps. A standard pair is
a|x−y|
a couple ` = (I, ρ) where I = [α, β] ⊂ T and ρ ∈ C 1 (I, R≥ ) such that ρ(x)
,
ρ(y) ≤ e
R
for all x, y ∈ I, and I ρ = 1. We fix some δ < 1/2 and call the set of all possible
such objects, such that δ ≤ |I| ≤ 2δ, La . To a standard pair is uniquely associated
the probability measure.
Z
µ` (ϕ) = ϕ(x)ρ(x)dx.
I
Remark 2.1. In this particular case we could had considered only the case I = T,
but this would not have illustrated the flexibility of the method nor prepared us for
future developments.
R (k+1)/m
1
11 Indeed, if, for each m ∈ N, we define Π h = Pm−1 1
h, then
m
[k/m,(k+1)/m] m k/m
k=0
kΠm h − hkL1 ≤
m−1
X Z (k+1)/m
k=0
k/m
Z (k+1)/m m−1
X 1 Z (k+1)/m
1
1
h ≤
|h0 | ≤
khkW 1,1 .
h −
m k/m
m
m
k/m
k=0
Thus the set AC = {khkW 1,1 ≤ C} can be uniformly approximated by the finite dimensional set
Πm AC , hence the claim.
NOTES
7
Lemma 2.2. There exists a0 > 0 such that, for all a ≥ a0 and ` ∈ La there exists
N ∈ N and {`i }N
i=1 ⊂ La such that
T∗ µ` =
N
X
pi µ`i ,
i=1
where
P
i
pi = 1.
Proof. Note that, if we choose 2δ small enough, then T is invertible on each interval
I of length smaller than I. Hence, calling φ the inverse of T |I .
Z
T∗ µ` (ϕ) =
ρ ◦ φ · |φ0 | · ϕ
TI
Note that by hypothesis T I is longer than λ times I. If it is longer than 2δ, then
we can divide it in sub-interval of length between
δ and 2δ. Let {Ii } the collection
R
0
of such a partition of I. Also, letting pi = Ii ρ ◦ φ · |φ0 | and ρi = p−1
i ρ ◦ φ · |φ |, we
have
X Z
T∗ µ` (ϕ) =
pi
ρi · ϕ.
Ii
i
R
Note that ρi = 1 by construction and that ρi ∈ Da follows by the same computations done in Problem 1.5.
We have thus seen that the convex combinations of standard pairs (called standard families) are invariant under the dynamics. This is a different way to restrict
the action of the transfer operator on a suitable class of measures. In fact, it is
not so different from the previous one as (finite) standard families yield measure
absolutely continuos with respect to Lebesgue and with densities that are piecewise
C1.
2.2. Coupling. Given two measures µ, ν on two probability spaces X, Y , respectively, a coupling of µ, ν is a probability measure α on X × Y such that, for all
f ∈ C 0 (X, R) and g ∈ C 0 (Y, R) we have
Z
f (x)α(dx, dy) = µ(f )
X×Y
Z
g(y)α(dx, dy) = ν(g).
X×Y
That is, the marginals of α are exactly µ and ν.
Problem* 2.1. Let X be a compact Polish space,12 let d be the distance and
consider the Borel σ-algebra. For each couple of probability measures µ, ν let G(µ, ν)
be the set of coupling of µ and ν.
(1) Show that G(µ, ν) 6= ∅.
(2) Show that
Z
d(µ, ν) = inf
d(x, y)α(dx, dy)
α∈G(µ,ν)
X2
is a distance in the space of measures.
(3) Show that the topology induced by d on the set of probability measures is
the weak topology.
12 That is a complete, separable, metric space.
8
CARLANGELO LIVERANI
(4) Discuss the cases X = [0, 1], d(x, y) = |x − y| and X = [0, 1], d0 (x, y) = 0
iff x = y and d0 (x, y) = 1 otherwise.
2.3. Expanding maps revisited. In this section we provide an alternative approach to exponential mixing for expanding maps based on the above mentioned
ideas.
Let us consider any two standard pairs `, `˜ ∈ La . First of all note that there
exists n0 ∈ N such that, for each I of length δ, T n0 I = T1 . We then consider the
standard pairs {`i = (Ii , ρi )} and {`˜i = (I˜i , ρ˜i )} in which, according to Lemma
2.2, we can decompose the measures T∗n0 µ` and T∗n0 µ`˜, respectively. Chose any of
˜ let us call them
the intervals {Ii }, say I0 . There are, at most, three intervals I,
I˜0 , I˜1 , I˜2 , whose union covers I0 . Then there must exists an interval I˜i , say I˜0 , such
that J˜ = I˜0 ∩ I0 is an interval of length at least δ/3. Let J be the central third
˜ We can then subdivide I0 and I˜0 , I˜1 , I˜2 in subintervals of size at least δ/9 so
of J.
that J appears in both subdivisions.
Arguing like in Lemma 2.2 we can write
Z
X Z
n
T∗ µ` (ϕ) = pJ ρJ ϕ +
pi
ρi ϕ
Ii
i
T∗n µ`˜(ϕ) = p˜J
Z
ρ˜J ϕ +
X
Z
p˜i
i
I˜i
ρ˜i ϕ.
Note hoverer that the above decomposition it is not in standard pairs as the interval may be too short (but always longer than δ/9). Note that there exist a
fixed constant c0 > 0 such that min{pJ , p˜J } ≥ 4c0 . In addition, by definition
inf{ρJ , ρ˜J } ≥ e−aδ ≥ 1/2, provided δ has been chosen small enough. Accordingly,
1
, we can write
setting ρ¯ = |J|
Z
Z
Z
X Z
T∗n µ` (ϕ) =c0 ρ¯ϕ + (pJ − 2c0 ) ρJ ϕ + c0 (2ρJ − ρ¯)ϕ +
pi
ρi ϕ
J
J
J
i
Ii
=c0 ν(ϕ) + (1 − c0 )νR (ϕ)
Z
Z
Z
X Z
n
T∗ µ`˜(ϕ) =c0 ρ¯ϕ + (˜
pJ − 2c0 ) ρ˜J ϕ + c0 (2˜
ρJ − ρ¯)ϕ +
p˜i
ρ˜i ϕ
J
J
J
i
I˜i
=c0 ν(ϕ) + (1 − c0 )˜
νR (ϕ)
We can then consider the coupling
Z
α1 (g) = c0 ρ¯(x)g(x, x) + (1 − c0 )νR × ν˜R (g).
Problem 2.2. For each m ∈ N, calling ρ¯n the density of T∗n ν, show that
Z
(T∗ × T∗ )n α1 (g) = c0 ρ¯n (x)g(x, x) + (1 − c0 )T∗n νR × T∗n ν˜R (g).
Problem 2.3. Show that there exists n1 ∈ N, such that both T∗n1 νR and T∗n1 ν˜R
admit a decomposition in standard pairs {`1i } and {`˜1i }, respectively.
The above implies that, at time n
¯ = n0 + n1 , we can take any two standard pairs
`1i and `˜1j , apply to them the same argument and obtain
1,i,j
T∗n0 µ`1i = c0 νi,j + (1 − c0 )˜
νR
1,i,j
T∗n0 µ`˜1 = c0 νi,j + (1 − c0 )˜
νR
i
NOTES
9
P
P
We can thus write T∗n0 νR = i,j pi p˜j T∗n0 µ`1i and T∗n0 ν˜R = i,j pi p˜j T∗n0 µ`˜1 and,
j
letting ρ¯i,j be the density of the measure νi,j consider the coupling
Z
X
1,i,j
1,i,j
α2 (g) =
pi p˜j c0
ρ¯i,j (x)g(x, x) + (1 − c0 )νR
× ν˜R
(g).
Ji,j
ij
Collecting the above consideration, it follows that there exists a probability density
ρ¯1 such that the measures T∗2¯n µ` and T∗2¯n µ`˜ admit a coupling of the form
Z
1,i,j
1,i,j
n1
n1
T∗ × T∗ α(g) = [c0 + (1 − c0 )c0 ] ρ¯1 (x)g(x, x) + (1 − c0 )2 T∗n1 νR
× T∗n1 ν˜R
(g).
But the above implies, using the discrete distance d0 of Problem 2.1,
d(T∗2¯n µ` , T∗2¯n µ`˜) ≤ (1 − c0 )2 .
By induction it then follows
d(T∗k¯n µ` , T∗k¯n µ`˜) ≤ (1 − c0 )k .
PN
Remark 2.3. Note that if µ = i=1 pi µ`i is the measure associated to the standard
family {pi , `i }N
i=1 , then it is absolutely continuous with respect to Lebesgue with
density ρ given by
X
X
ρ(x) =
pi ρi (x)1Ii (x) ≤
p i ea ≤ ea .
i
i
Remark 2.4. Note that this implies that we can choose D− as the standard families
and D+ = L∞ in Definition 4 and obtain, for each µ, ν ∈ D− ,13
|T∗n µ(f ) − T∗n ν(f )| ≤ Ce−cn kf kL∞ .
Thus, if we have a measure µ determined by a standard family it follows that
{T∗n µ} is a sequence of measures determined by standard families, hence it must be
a Cauchy sequence (just apply the previous remark to µ and T∗m µ). If follows that
there exists a unique measure ν, with density ρ ∈ L∞ such that, for each standard
family measure µ and measurable set A,
lim µ(T −n A) = ν(A) = ν(T −1 A).
n→∞
Moreover it is easy to see that we can approximate any measure µ that it is absolutely with respect to Lebesgue by a sequence of measures {µk } that arise as
standard families. It thus follows that the dynamical systems (T, T, ν) is mixing,
i.e., for each measure µ absolutely with respect to Lebesgue (hence with respect to
ν ) we have
lim µ(T −n A) = ν(A).
n→∞
In particular, ν is the unique absolutely continue invariant measure.
13 Indeed, let any G a coupling of T n µ and T n ν, then let g(x, y) = f (x) − f (y)
∗
∗
|T∗n µ(f ) − T∗n ν(f )| = |G(g)| ≤ G(d · |g|) ≤ kgk∞ G(d) ≤ 2kf k∞ G(d)
and the claim follows by taking the infimum on the couplings.
10
CARLANGELO LIVERANI
3. Lecture three: Fast-Slow partially hyperbolic systems
Now that we have reviewed some basic results on expanding maps we are ready
to tackle less trivial systems. At this point we could take several interesting directions to illustrate in which sense a deterministic system can behave similarly to a
stochastic one. I choose to study fast-slow systems in which the fast variable undergoes a strongly chaotic motion. Namely, let M, S be two compact Riemannian
manifolds, let X = M × S be the configuration space of our systems and let mLeb
be the Riemannian measure on M . For simplicity, we consider only the case in
which S = M = T1 . We consider a map F0 ∈ C r (T2 , T2 ), r ≥ 3, defined by
F0 (x, θ) = (f (x, θ), θ)
where the maps f (·, θ) are uniformly expanding for every θ, i.e.14
∂x f (x, θ) ≥ λ > 2.
If we consider a small perturbation of F0 we note that the perturbation of f still
yields a uniformly hyperbolic system, by structural stability. Thus such a perturbation can be subsumed in the original maps. Hence, it suffices to study families
of maps of the form
Fε (x, θ) = (f (x, θ), θ + εω(x, θ))
with ε ∈ (0, ε0 ), for some ε0 small enough, and ω ∈ C r .
Such systems are called fast-slow, since the variable θ, the slow variable, needs
a time at least ε−1 to change substantially.
Problem 3.1. Show that there exists invariant cone fields for Fε .
3.1. The unperturbed system: ε = 0.
The statistical properties of the system are well understood in the case ε = 0. In
such a case θ is an invariant of motion, while for every θ the map f (·, θ) has strong
statistical properties. We will need such properties in the following discussion which
will be predicated on the idea that, for times long but much shorter than ε−1 , on the
one hand θ remains almost constant, while, on the other hand, its change depends
essentially on the behavior of an ergodic sum with respect to a fixed dynamics
f (·, θ).
It follows that the dynamical systems (T2 ,RF0 ) has uncountable many SRB measures: all the measures of the form µ(ϕ) = ϕ(x, θ)mθ (dx)ν(dθ) for an arbitrary
measure ν. The ergodic measures are the ones in which ν is a point mass. The
system is partially hyperbolic and has a central foliation. Indeed, the f (·, θ) are all
topologically conjugate by structural stability of expanding maps [11]. Let h(·, θ) be
the map conjugating f (·, 0) with f (·, θ), that is h(f (x, 0), θ) = f (h(x, θ), θ). Thus
the foliation Wxc = {(h(x, θ), θ)}θ∈S is invariant under F0 and consists of points
that stay, more or less, always at the same distance, hence it is a center foliation.
Note however that, since in general h is only a H¨older continuous function (see [11])
the foliation is very irregular and, typically, not absolutely continuous.
In conclusion, the map F0 has rather poor statistical properties and a not very
intuitive description as a partially hyperbolic system. It is then not surprising that
its perturbations form a very rich universe to explore and already the study of the
behavior of the dynamics for times of order ε−1 (a time long enough so that the
14 In fact, λ > 1 would suffice, we put 2 to simplify a bit later computations.
NOTES
11
variable θ has a non trivial evolution, but far too short to investigate the statistical
properties of Fε ) is interesting and non trivial.
3.2. Trajectories and path space. To continue, a more detailed discussion concerning the initial conditions is called for. Note that not all measures are reasonable
as initial conditions. Just think of the possibility to start with initial conditions
given by a point mass, hence killing any trace of randomness. The best one can reasonably do is to fix the slow variable and leave the randomness only in the fast one.
Thus we Rwill consider measures µ0 of the following type: for each ϕ ∈ C 0 (T2 , R),
µ0 (ϕ) = ϕ(x, θ0 )h(x)dx for some θ0 ∈ S and h ∈ C 1 (M, R+ ).
Let µ0 be a probability measure on T2 . Let us define (xn , θn ) = Fεn (x, θ), then
(xn , θn ) are random variables 15 if (x0 , θ0 ) are distributed according to µ0 .16 It is
natural to define the polygonalization17
(3.1)
Θε (t) = θbε−1 tc + (ε−1 t − bε−1 tc)(θbε−1 tc+1 − θbε−1 tc ),
2
t ∈ [0, T ].
0
Note that Θε is a random variable on T with values in C ([0, T ], S). Also, note
the time rescaling done so that one expects non trivial paths.
It is often convenient to consider random variables defined directly on the space
C 0 ([0, T ], S) rather than T2 . Let us discuss the set up from such a point of view.
The space C 0 ([0, T ], S) endowed with the uniform topology is a separable metric
space. We can then view C 0 ([0, T ], S) as a probability space equipped with the
Borel σ-algebra.
Problem T
3.2. Show that such a σ-algebra is the minimal σ-algebra containing the
n
open sets i=1 {ϑ ∈ C 0 ([0, T ], S) | ϑ(ti ) ∈ Ui } for each {ti } ⊂ [0, T ] and open sets
Ui ⊂ S.
Since Θε can be viewed as a continuous map from T2 to C 0 ([0, T ], S), the measure
µ0 induces naturally a measure Pε on C 0 ([0, T ], S): Pε = (Θε )∗ µ0 .18 Also, for
each t ∈ [0, T ] let Θ(t) ∈ C 0 (C 0 ([0, T ], S), S) be the random variable defined by
Θ(t, ϑ) = ϑ(t), for each ϑ ∈ C 0 ([0, T ], S). Next, for each A ∈ C 0 (C 0 ([0, T ], S), R),
we will write Eε (A) for the expectation with respect to Pε . For A ∈ C 0 (S, R)
and t ∈ [0, T ], Eε (A ◦ Θ(t)) = Eε (A(Θ(t))) is the expectation of the function
A(ϑ) = A(ϑ(t)), ϑ ∈ C 0 ([0, T ], S).
3.3. What we would like to prove. Our first problem is to understand limε→0 Pε .
After some necessary preliminaries, in Lecture 4 we will prove the following result.
Theorem 3.1. The measures {Pε } have a weak limit P, moreover P is a measure
supported on the trajectory determined by the O.D.E.
˙ =ω
Θ
¯ (Θ)
(3.2)
Θ(0) = θ0
R
where ω
¯ (θ) = M ω(x, θ)mθ (dx).
15 Recall that a random variable is a measurable function from a probability space to a measurable space.
16 That is, the probability space is T2 equipped with the Borel σ-algebra, µ is the probability
0
measure and (xn , θn ) are functions of (x0 , θ0 ) ∈ T2 .
17
Since we interpolate between near by points the procedure is uniquely defined in T.
18 Given a measurable map T : X → Y between measurable spaces and a measure P on X,
T∗ P is a measure on Y defined by T∗ P (A) = P (T −1 (A)) for each measurable set A ⊂ Y .
12
CARLANGELO LIVERANI
The above theorem specifies in which sense the random variable Θε converges to
the average dynamics described by equation (3.2).
Having stated our goal, let us begin with a first, very simple, result.
Problem 3.3. Show that the measures {Pε } are tight.
Proof. By (3.1) it follows that the path Θε is made of segments of length ε and
maximal slope kωkL∞ , thus for all h > 0,19
bε−1 (t+h)c−1
kΘε (t + h) − Θε (t)k ≤ C# h + ε
X
kω(xk , θk )k ≤ C# h.
k=dε−1 te
Thus the measures Pε are all supported on a set of uniformly Lipschitz functions,
that is a compact set.
The above means that there exist converging subsequences {Pεj }. Our next step
is to identify the set of accumulation points.
An obstacle that we face immediately is the impossibility of using some typical
probabilistic tools. In particular, conditioning with respect to the past and It¯o’s
formula. In fact, even if the initial condition is random, the dynamics is still
deterministic, hence conditioning with respect to the past seems hopeless as it
might kill all the randomness at later times.
To solve the first problem it is therefore necessary to devise a systematic way to
use the strong dependence on the initial condition (typical of hyperbolic systems)
to show that the dynamics, in some sense, forgets the past. One way of doing this
effectively is to use standard pairs, introduced in the next section, whereby slightly
enlarging our allowed initial conditions. Exactly how this solves the conditioning
problem will be explained in Section 3.5. The lack of It¯o’s formula will be overcome
by taking the point of view of the Martingale problem to define the solution of a
SDE. To explain what this means in the present context is the goal of the present
note, but see Appendix C for a brief comment on this issue in the simple case of
an SDE. We will come back to the problem of studying the accumulation points of
{Pε } after having settled the issue of conditioning.
3.4. Standard Pairs (partially hyperbolic systems). Let us fix δ > 0 small
enough, and D > 0 large enough, to be specified later; for c1 > 0 consider the set
of functions
Σc1 = {G ∈ C 2 ([a, b], S) | a, b ∈ T1 , b − a ∈ [δ/2, δ],
kG0 kC 0 ≤ εc1 , kG00 kC 0 ≤ εDc1 , }.
Let us associate to each G ∈ Σc1 the map G ∈ C 2 ([a, b], T2 ) defined by G(x) =
(x, G(x)) whose image is a curve –the graph of G– which will be called a standard
curve. For c2 > 0 large enough, let us define the set of c2 -standard probability
densities on the standard curve as
(
)
Z b
0
ρ
Dc2 (G) = ρ ∈ C 1 ([a, b], R+ ) ρ(x)dx = 1, ρ 0 ≤ c2 .
a
C
19 The reader should be aware that we use the notation C to designate a generic constant
#
(depending only on f and ω) which numerical value can change from one occurrence to the next,
even in the same line.
NOTES
13
A standard pair ` is given by ` = (G, ρ) where G ∈ Σc1 and ρ ∈ Dc2 (G). Let Lε
be the collection of all standard pairs for a given ε > 0. A standard pair ` = (G, ρ)
induces a probability measure µ` on T2 defined as follows: for any continuous
function g on T2 let
Z b
µ` (g) :=
g(x, G(x))ρ(x)dx.
a
We define20 a standard family L = (A, ν, {`j }j∈A ), where A ⊂ N and ν is a probability measureP
on A; i.e. we associate to each standard pair `j a positive weight
ν({j}) so that j∈A ν({j}) = 1. For the following we will use also the notation
ν`j = ν({j}) for each j ∈ A and we will write ` ∈ L if ` = `j for some j ∈ A.
A standard family L naturally induces a probability measure µL on T2 defined as
follows: for any measurable function g on T2 let
X
µL (g) :=
ν` µ` (g).
`∈L
Let us denote by ∼ the equivalence relation induced by the above correspondence
i.e. we let L ∼ L0 if and only if µL = µL0 .
Proposition 3.2 (Invariance). There exist δ and D such that, for any c1 , c2 sufficiently large, and ε sufficiently small, for any standard family L, the measure Fε∗ µL
can be decomposed in standard pairs, i.e. there exists a standard family L0 such that
Fε∗ µL = µL0 . We say that L0 is a standard decomposition of Fε∗ µL .
Proof. For simplicity, let us assume that L is given by a single standard pair `; the
general case does not require any additional ideas and it is left to the reader. By
definition, for any measurable function g:
Fε∗ µ` (g) = µ` (g ◦ Fε ) =
Z b
=
g(f (x, G(x)), G(x) + εω(x, G(x))) · ρ(x)dx.
a
It is then natural to introduce the map fG : [a, b] → T1 defined by fG (x) = f ◦ G(x).
Note that fG0 ≥ λ − εc1 k∂θ f kC 0 > 3/2 provided that ε is small enough (depending
on how large is c1 ). Hence all fG ’s are expanding maps, moreover they are invertible
if δ has been chosen small enough. In addition, for any sufficiently smooth function
A on T2 , it is trivial to check that, by the definition of standard curve, if ε is small
enough (once again depending on c1 )21
(3.3a)
k(A ◦ G)0 kC 0 ≤ kdAkC 0 + εkdAkC 0 c1
k(A ◦ G)00 kC 0 ≤ 2kdAkC 1 + εkdAkC 0 Dc1 .
Sm
Then, fix a partition (mod 0) [fG (a), fG (b)] = j=1 [aj , bj ], with bj − aj ∈ [δ/2, δ]
and bj = aj+1 ; moreover let ϕj (x) = fG−1 (x) for x ∈ [aj , bj ] and define
(3.3b)
Gj (x) = G ◦ ϕj (x) + εω(ϕj (x), G ◦ ϕj (x));
ρ˜j (x) = ρ ◦ ϕj (x)ϕ0j (x).
20 This is not the most general definition of standard family, yet it suffices for our purposes.
21 Given a function A by dA we mean the differential.
14
CARLANGELO LIVERANI
By a change of variables we can thus write:
m Z bj
X
(3.4)
Fε∗ µ` (g) =
ρ˜j (x)g(x, Gj (x))dx.
j=1
aj
Observe that, by immediate differentiation we obtain, for ϕj :
(3.5)
ϕ0j =
1
◦ ϕj
fG0
ϕ00j = −
fG00
◦ ϕj .
fG03
¯ = G + εωG . Differentiating the definitions of Gj and ρ˜j and
Let ωG = ω ◦ G and G
using (3.5) yields
(3.6)
G0j =
¯0
G
◦ ϕj
fG0
G00j =
¯ 00
G
f 00
◦ ϕj − G0j · G
◦ ϕj
02
fG
fG02
and similarly
(3.7)
ρ˜0j
f 00
ρ0
◦ ϕj − G
◦ ϕj .
=
0
ρ˜j
ρ · fG
fG02
Using the above equations it is possible to conclude our proof: first of all, using (3.6),
¯ and equations (3.3) we obtain, for small enough ε:
the definition of G
0
0 G + εωG
0
≤ 2 (1 + C# ε)εc1 + C# ε ≤
kGj k ≤ 0
3
fG
3
≤ εc1 + C# ε ≤ εc1 ,
4
provided that c1 is large enough; then:
00
00 G + εωG
00
+ C# (1 + εDc1 )εc1 ≤
kGj k ≤ fG02
3
≤ εDc1 + εC# c1 + εC# ≤ εDc1
4
provided c1 and D are sufficiently large. Likewise, using (3.3) together with (3.7)
we obtain
0
ρ˜j 2
≤ c2 + C# (1 + Dc1 ) ≤ c2 ,
ρ˜j 3
provided that c2 is large enough. This concludes our proof: it suffices to define
Rb
the family L0 given by (A, ν, {`j }j∈A ), where A = {1, . . . , m}, ν({j}) = ajj ρ˜j ,
ρj = ν({j})−1 ρ˜j and `j = (Gj , ρj ). Our previousPestimates imply that (Gj , ρj ) are
0
standard pairs; note moreover that (3.4) implies `∈L
˜ 0 ν`˜ = 1, thus L is a standard
family. Then we can rewrite (3.4) as follows:
X
Fε∗ µ` (g) =
ν`˜µ`˜(g) = µL0 (g).
˜ 0
`∈L
Remark 3.3. Given a standard pair ` = (G, ρ), we will interpret (xk , θk ) as random
variables defined as (xk , θk ) = Fεk (x, G(x)), where x is distributed according to ρ.
NOTES
15
3.5. Conditioning. In probability, conditioning is one of the most basic techniques
and one would like to use it freely when dealing with random variables. Yet, as
already mentioned, conditioning seems unnatural when dealing with deterministic
systems. The use of standard pairs provides a very efficient solution to this conundrum. The basic idea is that one can apply repeatedly Proposition 3.2 to obtain at
each time a family of standard pairs and then “condition” by specifying to which
standard pair the random variable belongs at a given time.22
Note that if ` is a standard pair with G0 = 0, then it belongs to Lε for all
ε > 0. In the following, abusing notations, we will use ` also toTdesignate a family
{`ε }, `ε ∈ Lε that weakly converges to a standard pair ` ∈ ε>0 Lε . For every
standard pair ` we let Pε` be the induced measure in path space and Eε` the associated
expectation.
Before continuing, let us recall and state a bit of notation: for each t ∈ [0, T ]
recall that the random variable Θ(t) ∈ C 0 (C 0 ([0, T ], S), S) is defined by Θ(t, ϑ) =
ϑ(t), for all ϑ ∈ C 0 ([0, T ], S). Also we will need the filtration of σ-algebras Ft
defined as the smallest σ-algebra for which all the functions {Θ(s) : s ≤ t} are
measurable. Last, we consider the shift τs : C 0 ([0, T ], S) → C 0 ([0, T − s], S) defined
by τs (ϑ)(t) = ϑ(t + s). Note that Θ(t) ◦ τs = Θ(t + s). Also, it is helpful to keep in
mind that, for all A ∈ C 0 (S, R), we have23
Eε` (A(Θ(t + kε))) = µ` (A(Θε (t + kε))) = µ` (A(Θε (t) ◦ Fεk )).
Our goal is to compute, in some reasonable way, expectations of a function of
Θ(t + s) conditioned to Ft , notwithstanding the above mentioned problems due to
the fact that the dynamics is deterministic. Obviously, we can hope to obtain a
result only in the limit ε → 0. Note that we can always reduce to the case in which
the conditional expectation is zero by subtracting an appropriate function, thus it
suffices to analyze such a case.
The basic fact that we will use is the following.
Lemma 3.4. Let t0 ∈ [0, T ] and A be a continuous bounded random variable on
C 0 ([0, t0 ], S) with values in R. If we have
lim sup |Eε` (A)| = 0,
ε→0
`∈Lε
0
then, for each s ∈ [0, T − t ], standard pair `, uniformly bounded continuous functions {Bi }m
i=1 , Bi : S → R and times {t1 < · · · < tm } ⊂ [0, s),
!
m
Y
ε
lim E`
Bi (Θ(ti )) · A ◦ τs = 0.
ε→0
i=1
Proof. The quantity we want to study can be written as
!
m
Y
µ`
Bi (Θε (ti )) · A(τs (Θε )) .
i=1
22 Note that the set of standard pairs does not form a σ-algebra, so to turn the above into
a precise statement would be a bit cumbersome. We thus prefer to follow a slightly different
strategy, although the substance is unchanged.
23 To be really precise, maybe one should write, e.g., Eε (A ◦ Θ(t + kε)), but we conform to the
`
above more intuitive notation.
16
CARLANGELO LIVERANI
To simplify our notation, let ki = bti ε−1 c and km+1 = bsε−1 c. Also, for every stan˜ let L ˜ denote an arbitrary standard decomposition of (Fεki+1 −ki )∗ µ ˜
dard pair `,
i,`
`
Rb
and define θ`∗ = µ` (θ) = a`` ρ` (x)G` (x)dx. Then, by Proposition 3.2,
!
!
m
m
Y
Y
km+1
µ`
Bi (Θε (ti )) · A(τs (Θε )) = µ`
Bi (Θε (ti )) · A(τs−εkm+1 (Θε ◦ Fε
))
i=1
=
=
X
i=1
X
···
`1 ∈L1,`
`m+1 ∈Lm,`m
X
X
`1 ∈L1,`
···
`m+1 ∈Lm,`m
"m
Y
#
ν`i Bi (θ`∗i ) νm+1 µ`m+1 (A(Θε )) + o(1)
i=1
"m
Y
#
ν`i Bi (θ`∗i ) νm+1 Eε`m+1 (A) + o(1)
i=1
where limε→0 o(1) = 0. The lemma readily follows.
Lemma 3.4 implies that, calling P an accumulation point of Pε` , we have24
!
m
Y
(3.8)
E
Bi (Θ(ti )) · A ◦ τs = 0.
i=1
This solves the conditioning problems thanks to the following
Lemma 3.5. Property (3.8) is equivalent to
E (A ◦ τs | Fs1 ) = 0,
for all s1 ≤ s.
Proof. Note that the statement of the Lemma immediately implies (3.8), we thus
worry only about the other direction. If the lemma were not true then there would
exist a positive measure set of the form
K=
∞
\
{ϑ(ti ) ∈ Ki },
i=0
where the {Ki } is a collection of compact sets in S, and ti < s, on which the
conditional expectation is strictly positive (or strictly negative, which can be treated
in exactly the same way). For some arbitrary δ > 0, consider open sets Ui ⊃ Ki be
such that P({ϑ(ti ) ∈ Ui \ Ki }) ≤ δ2−i . Also, let Bδ,i be a continuous function such
that Bδ,i (ϑ) = 1 for ϑ ∈ Ki and Bδ,i (ϑ) = 0 for ϑ 6∈ Ui . Then
!
n
Y
0 < E(1K A ◦ τs ) = lim E
Bδ,i (Θ(ti )) · A ◦ τs + C# δ = C# δ
n→∞
i=1
which yields a contradiction by the arbitrariness of δ.
In other words, we have recovered the possibility of conditioning with respect to
the past after the limit ε → 0.
24 By E we mean the expectation with respect to P.
NOTES
17
4. Lecture Four: Averaging (the Law of Large Numbers)
We are now ready to provide the proof of Theorem 3.1. The proof consists of
several steps; we first illustrate the global strategy while momentarily postponing
the proof of the single steps.
Proof of Theorem 3.1. As already mentioned we will prove the theorem for a
larger class of initial conditions: any initial condition determined by a sequence of
standard pairs. Note that for any fixed flat standard pair `, i.e. G` (x) = θ, we have
an initial condition assumed in the statement of the Theorem. Given a standard
pair ` let {Pε` } be the associate measures in path space (the latter measures being
determined, as explained at the beginning of Section ??, by the standard pair `
and (3.1)). We have already seen in Lemma 3.3 that the set {Pε` } is tight.
Next we will prove in Lemma 4.1 that, for each A ∈ C 2 (S, R), we have
Z t
hω(Θ(τ )), ∇A(Θ(τ ))idτ = 0.
(4.1)
lim sup Eε` A(Θ(t)) − A(Θ(0)) −
ε→0
0
`∈Lε
Accordingly, it is natural to consider the random variables A(t) defined by
Z t
A(t, ϑ) = A(ϑ(t)) − A(ϑ(0)) −
hω(ϑ(τ )), ∇A(ϑ(τ ))idτ,
0
for each t ∈ [0, T ] and ϑ ∈ C 0 ([0, T ], S), and the first order differential operator
(4.2)
LA = hω, ∇Ai.
Then equation (4.1), together with Lemmata 3.4 and 3.5, means that each accumulation point P` of {Pε` } satisfies, for all s ∈ [0, T ] and t ∈ [0, T − s],
Z t+s
(4.3) E` (A ◦ τs | Fs ) = E` A(Θ(t + s)) − A(Θ(s)) −
LA(Θ(τ ))dτ Fs = 0
s
this is the simplest possible version of the Martingale Problem. Indeed it implies
that, for all θ, A and standard pair ` such that G` (x) = θ,
Z t
M (t) = A(Θ(t)) − A(Θ(0)) −
LA(Θ(s))ds
0
is a martingale with respect to the measure Pθ and the filtration Ft (i.e., for each
0 ≤ s ≤ t ≤ T , Eθ (M (t) | Fs ) = M (s)).25 Finally we will show in Lemma 4.2 that
there is a unique measure that has such a property: the measure supported on the
unique solution of equation (3.2). This concludes the proof of the theorem.
In the rest of this section we provide the missing proofs.
4.1. Differentiating with respect to time.
Before starting with the proof of (4.1) we need to compare the dynamics of (xn , θn ) =
Fεn (x0 , θ0 ) with the one of f∗ (x) = f (x, θ0 ). This is a type of shadowing but here
we need it a quantitative form.
1
Problem 4.1. Show that there, for each n ≤ ε− 2 , there exist a map Yn ∈ C 1 (T, T)
such that, setting x∗k = f∗k (Yn (x0 )) we have x∗n = xn and, for all k ≤ n,
|xk − x∗k | ≤ C1 εk.
25 We use P to designate any measure P with G (x) = θ.
θ
`
`
18
CARLANGELO LIVERANI
1
Problem 4.2. Sow that there exists C > 0 such that, for each n ≤ Cε− 2 ,
2
2
e−c# εn ≤ |Yn0 | ≤ ec# εn .
(4.4)
In particular, Yn is invertible with uniformly bounded derivative.
Lemma 4.1. For each A ∈ C 2 (S, R) we have26
Z t
hω(Θ(s)), ∇A(Θ(s))ids = 0,
lim sup Eε` A(Θ(t)) − A(Θ(0)) −
ε→0
0
`∈Lε
where (we recall) ω(θ) = mθ (ω(·, θ)) and mθ is the unique SRB measure of f (·, θ).
Proof. We will use the notation of Appendix B. Given a standard pair ` let ρ` = ρ,
θ`∗ = µ` (θ) and f∗ (x) = f (x, θ`∗ ). Then, by Lemmata B.1 and B.2, we can write,
1
for n ≤ Cε− 2 ,27
!
Z b
n−1
X
µ` (A(θn )) =
ω(xk , θk ) dx
ρ(x)A θ0 + ε
a
Z
k=0
b
=
θ`∗
ρ(x)A
+ε
a
Z
n−1
X
dx + O(ε2 n2 + ε)
k=0
b
ρ(x)A(θ`∗ )dx
=
+ε
a
Z
!
ω(xk , θ`∗ )
n−1
XZ b
k=0
ρ(x)h∇A(θ`∗ ), ω(xk , θ`∗ )idx + O(ε)
a
b
ρ(x)A(G` (x))dx + O(ε)
=
a
+ε
n−1
XZ b
k=0
ρ(x)h∇A(θ`∗ ), ω(f∗k ◦ Yn (x), θ`∗ )idx
a
= µ` (A(θ0 )) + ε
n−1
XZ
k=0
h
χ[a,b] ρ
Yn0
T1
ρ˜n (x)h∇A(θ`∗ ), ω(f∗k (x), θ`∗ )idx + O(ε)
i
R
ρkBV
◦ Yn−1 (x). Note that T1 ρ˜n = 1 but, unfortunately, k˜
where ρ˜n (x) =
may be enormous. Thus, we cannot estimate the integral in the above expression by
naively using decay of correlations. Yet, equation (B.3) implies |Yn0 − 1| ≤ C# εn2 .
Moreover, ρ¯ = (χ[a,b] ρ) ◦ Y −1 has uniformly bounded variation.28 Accordingly, by
the decay of correlations and the C 1 dependence of the invariant measure on θ (see
Section 3.1) we have
Z
Z
ρ˜n (x)h∇A(θ`∗ ), ω(f∗k (x), θ`∗ )idx =
ρ¯n (x)h∇A(θ`∗ ), ω(f∗k (x), θ`∗ )idx + O(εn2 )
T1
T1
= mLeb (˜
ρn (x))mθ`∗ (h∇A(θ`∗ ), ω(·, θ`∗ )i) + O(εn2 + e−c# k )
= µ` (h∇A(θ0 ), ω(θ0 )i) + O(εn2 + e−c# k ).
Hence,
(4.5)
µ` (A(θn )) = µ` (A(θ0 ) + εnh∇A(θ0 ), ω
¯ (θ0 )i) + O(n3 ε2 + ε).
26 See Lemma C.1 for the relation between the differentiability mentioned in the title of the
subsection and the present integral formula.
27 By O(εa nb ) we mean a quantity bounded by C εa nb , where C does not depend on `.
#
R
R #
28 Indeed, for all ϕ ∈ C 1 , |ϕ| ≤ 1, R ρ
¯ϕ0 = ab ρ · ϕ0 ◦ Y · Y 0 = ab ρ(ϕ ◦ Y )0 ≤ kρkBV .
∞
NOTES
19
1
Finally, we choose n = dε− 3 e and set h = εn. We define inductively standard
families such that L`0 = {`} and for each standard pair `i+1 ∈ L`i the family L`i+1 is
2
a standard decomposition of the measure (Fεn )∗ µ`i+1 . Thus, setting m = dtε− 3 e−1,
recalling equation (4.5) and using repeatedly Proposition 3.2,
Eε` (A(Θ(t))) = µ` (A(θtε−1 )) = µ` (A(θ0 )) +
m−1
X
µ` (A(θε−1 (k+1)h )) − A(θε−1 kh ))
k=0
= µ` (A(θ0 )) +
m−1
X
X
k=0 `1 ∈L`0
=
Eε`
A(Θ(0)) +
···
k−1
Y
X
i
h
2
¯ (θ0 )i) + O(ε)
ν`j µ`k−1 (ε 3 h∇A(θ0 ), ω
`k−1 ∈L`k−2 j=1
!
m−1
X
h∇A(Θ(kh)), ω
¯ (Θ(kh))ih
1
+ O(ε 3 t)
k=0
Z t
1
= Eε` A(Θ(0)) +
h∇A(Θ(s)), ω
¯ (Θ(s))ids + O(ε 3 t).
0
The lemma follows by taking the limit ε → 0.
4.2. The Martingale Problem at work.
First of all let us finally specify precisely what we mean by the martingale problem.
Definition 5 (Martingale Problem). Given a Riemannian manifold S, a linear
operator L : D(L) ⊂ C 0 (S, Rd ) → C 0 (S, Rd ), a set of measures Py , y ∈ S, on
C 0 ([0, T ], S) and a filtration Ft we say that {Py } satisfies the martingale problem if
for each function A ∈ D(L),
Py ({z(0) = y}) = 1
Z
M (t, z) := A(z(t)) − A(z(0)) −
t
LA(z(s))ds is Ft -martingale under all Py .
0
We can now prove the last announced result.
Lemma 4.2. If ω
¯ is Lipschitz, then the martingale problem determined by (4.2)
has a unique solution consisting of the measures supported on the solutions of the
ODE
˙ = ω(Θ)
Θ
(4.6)
Θ(0) = y.
Proof. Let Θ be the solution of (4.6) with initial condition y ∈ Td and Py the
probability measure in the martingale problem. The idea is to compute
d
d
Ey (kΘ(t) − Θ(t)k2 ) = Ey (hΘ(t), Θ(t)i) − 2h¯
ω (Θ(t)), Ey (Θ(t))i
dt
dt
d
− 2hΘ(t), Ey (Θ(t))i + 2h¯
ω (Θ(t)), Θ(t))i.
dt
To continue we use Lemma C.1 where, in the first term A(θ) = kθk2 , in the third
A(θ) = θi and the generator in (4.3) is given by LA(θ) = h∇A(θ), ω
¯ (θ)i.
d
Ey (kΘ(t) − Θ(t)k2 ) = 2Ey (hΘ(t), ω
¯ (Θ(t))i) − 2h¯
ω (Θ(t)), Ey (Θ(t))i
dt
− 2hΘ(t), Ey (¯
ω (Θ(t)))i + 2Ey (hΘ(t), ω
¯ (Θ(t))i)
= Ey (hΘ(t) − Θ(t), ω(Θ(t)) − ω(Θ(t))i).
20
CARLANGELO LIVERANI
By the Lipschitz property of ω
¯ (let CL be the Lipschitz constant), using the
Schwartz inequality and integrating we have
Z t
Ey (kΘ(t) − Θ(t)k2 ) ≤ 2CL
Ey (kΘ(s) − Θ(s)k2 )ds
0
which, by Gronwall’s inequality, implies that
Py ({Θ}) = 1.
4.3. A recap of what we have done so far. We have just seen that the martingale method (in Dolgopyat’s version) consists of four steps
(1) Identify a suitable class of measures on path space which allow one to handle
the conditioning problem (in our case: the one coming from standard pairs)
(2) Prove tightness for such measures (in our case: they are supported on
uniformly Lipschitz functions)
(3) Identify an equation characterizing the accumulation points (in our case:
an ODE)
(4) Prove uniqueness of the limit equation in the martingale sense.
The beauty of the previous scheme is that it can be easily adapted to a variety of
problems. To convince the reader of this fact we proceed further and apply it to
obtain more refined information on the behavior of the system.
5. Lecture five: A brief survey of further results
In this lecture we will describe more precise, and interesting results, that can be
obtained by the technique discussed in the previous lectures (and quite some more
extra work).
Problem 5.1. To complete the understanding of the previous sections read an
meditate Appendix C.
5.1. Fluctuations around the limit. The next natural question is how fast the
convergence takes place. To this end it is natural to consider, for each t ∈ [0, T ],
1 ζε (t) = ε− 2 Θε (t) − Θ(t) .
Note that ζε is a random variable on T2 with values in C 0 ([0, T ], Rd ) which describes
eε be the path measure describing ζε
the fluctuations around the average.29 Let P
eε = (ζε )∗ µ0 .
when (x0 , θ0 ) are distributed according to the measure µ0 . That is, P
eε , hence of the fluctuation
So it is natural to ask what is the limit behavior of P
around the average. The following result holds (the proof is provided, for the reader
conveniece in the supplementary material presented in Section 6).
eε } have a weak limit P.
e Moreover, P
e is the measure
Theorem 5.1. The measures {P
of the zero average Gaussian process defined by the Stochastic Differential Equation
(SDE)
(5.1)
dζ = Dω
¯ (Θ)ζdt + σ(Θ)dB
ζ(0) = 0,
29 Here we are using that S = Td can be lifted to its universal cover Rd .
NOTES
21
where B is the Rd dimensional standard Brownian motion and the diffusion coefficient σ is given by 30
∞
X
σ(θ)2 =mθ (ˆ
ω (·, θ) ⊗ ω
ˆ (·, θ)) +
mθ (ˆ
ω (fθm (·), θ) ⊗ ω
ˆ (·, θ)) +
m=1
(5.2)
+
∞
X
mθ (ˆ
ω (·, θ) ⊗ ω
ˆ (fθm (·), θ)) .
m=1
where ω
ˆ = ω−ω
¯ and we have used the notation fθ (x) = f (x, θ). In addition, σ 2 is
symmetric and non-negative, hence σ is uniquely defined as a symmetric positive
definite matrix. Finally, σ(θ) is strictly positive, unless ω
ˆ (θ, ·) is a coboundary for
fθ .
Problem 5.2. Show that, setting ψ(λ, t) = E(eihλ,ζ(t)i ), equation (5.1) implies that
1
∂t ψ = hλ, Dω
¯ ∂λ ψi − hλ, σ 2 λiψ
2
ψ(λ, 0) = 1
which implies that ψ is a zero mean Gaussian.
Remark 5.2. It is interesting to notice that equation (5.1) with σ ≡ 0 is just the
equation for the evolution of an infinitesimal displacement of the initial condition,
that is the linearised equation along an orbit of the averaged deterministic system.
This is rather natural, since in the time scale we are considering, the fluctuations
around the deterministic trajectory are very small.
Remark 5.3. Note that the condition that insures that the diffusion coefficient σ
is non zero can be constructively checked by finding periodic orbits with different
averages.
5.2. Local Limit Theorem. In fact, with much more work, it is possible to prove
a much sharper results
Theorem 5.4 ([4]). For any T > 0, there exists ε0 > 0 so that the following holds.
For any compact interval I ⊂ R, real numbers κ > 0, ε ∈ (0, ε0 ) and t ∈ [ε1/2000 , T ],
any fixed θ0 ∈ T1 , if x0 is distributed according to a smooth distribution, we have:
2
(5.3)
− 21
ε
2
e−κ /2σt (θ0 )
√
P(ζε (t) ∈ ε I + κ) = |I| ·
+ O(1).
σ t (θ0 ) 2π
1
2
where the variance σ 2t (θ0 ) is given by
Z t R
t 0 ¯
¯ θ0 ))ds,
(5.4)
σ 2t (θ0 ) =
e2 s ω¯ (θ(r,θ0 ))dr σ 2 (θ(s,
0
5.3. Wentzel-Friedlin processes. Wentzel and Friedlin in [21] have developed
a rather detailed theory to study small perturbation of a deterministic dynamical
system. The simplest possible example of the type of situations that they discuss
is given by
√
dη(t) = ω
¯ (η(t))dt + εσ(η(t))dB
(5.5)
η(0) = θ0
30 In our notation, for any measure µ and vectors v, w, µ(v ⊗ w) is a matrix with entries
µ(vi wj ).
22
CARLANGELO LIVERANI
where B is the standard Brownian motion. In fact, using Theorem 5.4 and [21,
Theorem 2.2], it is possible to show that the process θε−1 t and η(t) are ε close with
very high probability for times of √
order C# ln ε−1 . For such times the evolution of
η(t) is such that the process is C# ε close to a sink of ω
¯ with very high probability,
thus the same happens to our process.
We have thus obtained a very good and simple description of our process θn for
times of order n ∼ ε−1 ln ε−1 . Yet, a natural question now appears: is it possible
to understand the process for longer times? Possibly infinitely long?
5.4. Statistical propeties of Fε . I believe that it is possible to answer positively
to the above question for a generic ω. However at the moment the only available
results hold in the case in which the center foliation has negative Lypunov exponent.
Loosely stated the result reads:
Theorem 5.5 ([5]). Assume that the central foliation has negative Lyapunov exponent and let nZ be the number of sinks of ω
¯ . Then, for a generic ω with the
above property the following hold. For all ε > 0 sufficiently small, the map Fε has a
finite number of SRB measures µε . If Fε has a unique SRB (and physical) measure
which enjoys exponential decay of correlations with rate cε > 0. In other words,
there exists C1 , C2 , C3 , C4 > 0 such that, for any two functions A, B ∈ C 2 ,
|m(A · B ◦ Fεn ) − m(A)µε (B)| ≤ C1 sup kA(·, θ)kC 2 sup kB(x, ·)kC 2 exp(−cε n).
x
θ
Moreover, the rate of decay of correlations satisfies the following bounds:
(
C2 ε/ log ε−1
if nZ = 1,
(5.6)
cε =
C3 exp(−C4 ε−1 ) otherwise.
The above shows that it is possible to understand quite well what happens in a
particular case. The study of the general case stands as a challenge.
6. Supplementary material: Proof of Theorem 5.1
It is possible to study the limit behavior of ζε using the strategy summarized in
Section 4.3, even though now the story becomes technically more involved. Let us
eε be the path measure describing
discuss the situation a bit more in detail. Let P
`
ζε when (x0 , θ0 ) are distributed according to the standard pair `.31 Note that,
eε = (ζε )∗ µ` . Again, we provide a proof of the claimed results based on some facts
P
`
that will be proven in later sections.
eε is tight, which
Proof of Theorem 5.1. First of all, the sequence of measures P
`
will be proven in Proposition 6.1. Next, by Proposition 6.4, we have that
Z t
ε
e
Ls A(ζ(s))ds = 0,
(6.1)
lim sup E` A(ζ(t)) − A(ζ(0)) −
ε→0
`∈Lε
0
where
(6.2)
d
1 X 2
(Ls A)(ζ) = h∇A(ζ), Dω
¯ (Θ(s))ζi +
[σ (Θ(s))]i,j ∂ζi ∂ζj A(ζ),
2 i,j=1
31 As already explained, here we allow ` to stand also for a family {` } which weakly converges
ε
to `. In particular, this means that Θ is also a random variable, as it depends on the initial
condition θ0 .
NOTES
23
with diffusion coefficient σ 2 given by (5.2). In the following we will often write,
slightly abusing notations, σ(t) for σ(Θ(t)).
We can then use equation (6.1) and Lemma 3.4 followed by Lemma 3.5 to obtain
that
Z
t
A(ζ(t)) − A(ζ(0)) −
Ls A(ζ(s))ds
0
e of the measures P
eε with respect
is a martingale under any accumulation point P
`
e
to the filtration Ft with P({ζ(0) = 0}) = 1. In Proposition 6.6 we will prove that
eε = P.
e
such a problem has a unique solution thereby showing that limε→0 P
`
Note that the time dependent operator Ls is a second order operator, this means
that the accumulation points of ζε do not satisfy a deterministic equation, but
e is equal in law to
rather a stochastic one. Indeed our last task is to show that P
the measure determined by the stochastic differential equation
dζ = Dω
¯ ◦ Θ(t) ζdt + σdB
(6.3)
ζ(0) = 0
d
where B is a standard R dimensional Brownian motion. Note that the above
equation is well defined in consequence of Lemma 6.5 which shows that the matrix
σ 2 is symmetric and non negative, hence σ = σ T is well defined and strictly positive
if ω
ˆ is not a coboundary (see Lemma 6.5). To conclude it suffices to show that
the probability measure describing the solution of (6.3) satisfies the martingale
problem.32 It follows from It¯o’s calculus, indeed if ζ is the solution of (6.3) and
A ∈ C r , then It¯
o’s formula reads
X
1X
dA(ζ) =
∂ζi A(ζ)dζi +
∂ζi ∂ζj A(ζ)σik σjk dt.
2
i
i,j,k
Integrating it from s to t and taking the conditional expectation we have
Z t
E A(ζ(t)) − A(ζ(s)) −
Lτ A(ζ(τ ))dτ Fs = 0.
s
See Appendix C for more details on the relation between the Martingale problem
and the theory of SDE and how this allows to dispense form It¯o’s formula altogether.
We have thus seen that the measure determined by (6.3) satisfies the martingale
e since P
e is the unique solution of the martingale
problem, hence it must agree with P
problem. The proof of the Theorem is concluded by noticing that (6.3) defines a
zero mean Gaussian process (see the end of the proof of Proposition 6.6).
6.1. Tightness.
eε }ε>0 are tight.
Proposition 6.1. For every standard pair `, the measures {P
`
Proof. Now the proof of tightness is less obvious since the paths have a Lipschitz
constant that explodes. Luckily, there exists a convenient criterion for tightness:
Kolmogorov criterion [19, Remark A.5].
Theorem 6.2 (Kolmogorov). Given a sequence of measures Pε on C 0 ([0, T ], R), if
there exists α, β, C > 0 such that
Eε (|z(t) − z(s)|β ) ≤ C|t − s|1+α
32 We do not prove that such a solution exists as this is a standard result in probability, [19].
24
CARLANGELO LIVERANI
for all t, s ∈ [0, T ] and the distribution of z(0) is tight, then {Pε } is tight.
Note that ζε (0) = 0. Of course, it is easier to apply the above criteria with
β ∈ N. It is reasonable to expect that the fluctuations behave like a Brownian
motion, so the variance should be finite. To verify this let us compute first the case
β = 2. Note that, setting ω
ˆ (x, θ) = ω(x, θ) − ω
¯ (θ),
ζε (t) =
√
dε−1 te−1
X
ε
√
ω(xk , θk ) − ω
¯ (Θ(εk)) + O( ε)
k=0
=
√
dε
ε
−1
te−1
X
√
ω
ˆ (xk , θk ) + ω
¯ (θk ) − ω
¯ (Θ(εk)) + O( ε)
k=0
(6.4)
=
√
dε−1 te−1
ε
X
√
√
ω
ˆ (xk , θk ) + εDω
¯ (Θ(εk))ζε (kε) + O( ε)
k=0
dε−1 te−1
+
3
X
O(ε 2 kζε (εk)k2 ).
k=0
We start with a basic result.
Lemma 6.3. For each standard pair ` and k, l ∈ {0, . . . , ε−1 }, k ≥ l, we have

2 
X
k


ω
ˆ (xj , θj )
µ` 
 ≤ C# (k − l).
j=l
The proof of the above Lemma is postponed to the end of the section. Let us
see how it can be profitably used. Note that, for t = εk, s = εl,
(6.5)
e ε (kζ(t) − ζ(s)k2 ) ≤ C# |t − s| + C# |t − s|ε
E
`
k
X
µ` (kζε (εj)k2 ) + C# ε,
j=l
1
where we have used Lemma 6.3 and the trivial estimate kζε k ≤ C# ε− 2 . If we use
the above with s = 0 and define Mk = supj≤k µ` (|ζε (εj)|2 ) we have
Mk ≤ C# εk + C# k 2 ε2 Mk .
Thus there exists C > 0 such that, if k ≤ Cε−1 , we have Mk ≤ C# εk. Hence, we
can substitute such an estimate in (6.5) and obtain
(6.6)
e ε (kζ(t) − ζ(s)k2 ) ≤ C# |t − s| + C# ε.
E
`
Since the estimate for |t − s| ≤ C# ε is trivial, we have the bound,
e ε (kζ(t) − ζ(s)k2 ) ≤ C# |t − s|.
E
`
This is interesting but, unfortunately, it does not suffice to apply the Kolmogorov
criteria. The next step could be to compute for β = 3. This has the well known
disadvantage of being an odd function of the path, and hence one has to deal with
the absolute value. Due to this, it turns out to be more convenient to consider
directly the case β = 4. This can be done in complete analogy with the above
NOTES
25
computation, by first generalizing the result of Lemma 6.3 to higher momenta.
Doing so we obtain
e ε (kζ(t) − ζ(s)k4 ) ≤ C# |t − s|2 ,
E
`
(6.7)
which concludes the proof of the proposition. Indeed, the proof of Lemma 6.3
explains how to treat correlations. Multiple correlations can be treated similarly
and one can thus show that they do not contribute to the leading term. Thus the
computation becomes similar (although much more involved) to the case of the sum
independent zero mean random variables Xi (where no correlations are present),
that is
k
X
E([
Xi ]4 ) =
i=l
k
X
E(Xi1 Xi2 Xi2 Xi4 ) =
i1 ,...,i4 =l
k
X
E(Xi2 Xj2 ) = O((k − l)2 ).
i,j=l
For future use let us record that, by equation (6.7) and the Young inequality,
(6.8)
e ε (kζ(t) − ζ(s)k3 ) ≤ C# |t − s| 23 .
E
`
We still owe the reader the
Proof of Lemma 6.3. The proof starts with a direct computation:33

2 
k
k
X
 X

ω
ˆ (xj , θj )  ≤
µ` ω
ˆ (xj , θj )2
µ` 
j=l
j=l
+2
k
k
X
X
µ` (ˆ
ω (xj , θj )ˆ
ω (xr , θr ))
j=l r=l+1
≤ C# |k − l| + 2
k
k
X
X
µ` (ˆ
ω (xj , θj )ˆ
ω (xr , θr )) .
j=l r=j+1
To compute the last correlation, remember Proposition 3.2. We can thus call Lj
the standard family associated to (Fεj )∗ µ` and, for r ≥ j, we write
X
µ` (ˆ
ω (xj , θj )ˆ
ω (xr , θr )) =
ν`1 µ`1 (ˆ
ω (x0 , θ0 )ˆ
ω (xr−j , θr−j ))
`1 ∈Lj
=
X
Z
b `1
ν`1
ρ`1 (x)ˆ
ω (x, G`1 (x))ˆ
ω (xr−j , θr−j ).
a `1
`1 ∈Lj
We would like to argue as in the proof of Lemma 4.1 and try to reduce the problem
to
Z b`
Z b`
1
1
ρ`1 (x)ˆ
ω (x, θ`∗1 )ˆ
ω (xr−j , θ`∗1 ) =
ρ`1 (x)ˆ
ω (x, θ`∗1 )ˆ
ω (f∗r−j (Yr−j (x)), θ`∗1 )
a `1
a`1
Z
=
T1
−1
ρ˜(x)ˆ
ω (Yr−j
(x), θ`∗1 )ˆ
ω (f∗r−j (x), θ`∗1 ),
but then the mistake that we would make substituting ρ˜ with ρ¯ is too big for
our current purposes. It is thus necessary to be more subtle. The idea is to write
33 To simplify notation we do the computation in the case d = 1, the general case is identical.
26
CARLANGELO LIVERANI
ρ`1 (x)ˆ
ω (x, G`1 (x)) = α1 ρˆ1 (x)+α2 ρˆ2 (x), where ρˆ1 , ρˆ2 are standard densities.34 Note
that α1 , α2 are uniformly bounded. Next, let us fix L > 0 to be chosen later and
assume r − j ≥ L. Since `1,i = (G, ρˆi ) are standard pairs, by construction, calling
L`1,i = (F r−j−L )∗ µ`1,i we have
Z
b`1
ρˆi (x)ˆ
ω (xr−j , θr−j ) =
a`1
X
Z
a`2
`2 ∈L`1,i
=
X
Z
ν`2
T1
`2 ∈L`1,i
=
X
`2 ∈L`1,i
b`2
ν`2
Z
ν`2
T1
ρ`2 (x)ˆ
ω (f∗L (YL (x)), θ`∗2 ) + O(εL)
ρ˜(x)ˆ
ω (f∗L (x), θ`∗2 ) + O(εL)
ρ¯(x)ˆ
ω (f∗L (x), θ`∗2 ) + O(εL2 )
= O(e−c# L + εL2 ),
due to the decay of correlations for the map f∗ and the fact that ω
ˆ (·, θ`∗2 ) is a zero
average function for the invariant measure of f∗ . By the above we have

2 
X
k
k
X

−c L

[e # + εL2 ](k − j) + 1 + εL3
ω
ˆ (xj , θj )  ≤ C#
µ` 
j=l
j=l
which yields the result by choosing L = c log(k − j) for c large enough.
6.2. Differentiating with respect to time (poor man’s It¯
o’s formula).
Proposition 6.4. For every standard pair ` and A ∈ C 3 (S, R) we have
Z t
ε
e
Ls A(ζ(s))ds = 0.
lim sup E` A(ζ(t)) − A(ζ(0)) −
ε→0
`∈Lε
0
Proof. As in Lemma 4.1, the idea is to fix h ∈ (0, 1) to be chosen later, and compute
(6.9)
e ε (A(ζ(t + h)) − A(ζ(t))) = E
e ε (h∇A(ζ(t)), ζ(t + h) − ζ(t)i)
E
`
`
1
e ε ( h(D2 A)(ζ(t))(ζ(t + h) − ζ(t)), ζ(t + h) − ζ(t)i) + O(h 23 ),
+E
`
2
where we have used (6.8). Unfortunately this time the computation is a bit lengthy
and rather boring, yet it basically does not contain any new idea, it is just a brute
force computation.
Let us start computing the last term of (6.9). Setting ζ h (t) = ζ(t + h) − ζ(t)
P(t+h)ε−1
and Ωh = k=tε−1 ω
ˆ (xk , θk ), by equations (6.4) and using the trivial estimate
34 In fact, it would be more convenient to define standard pairs with signed (actually even
complex) measures, but let us keep it simple.
NOTES
27
1
kζε (t)k ≤ C# ε− 2 , we have
(t+h)ε−1
X
e ε (h(D2 A)(ζ(t))ζ h (t), ζ h (t)i) = ε
E
`
µ` (h(D2 A)(ζε (t))ˆ
ω (xk , θk ), ω
ˆ (xj , θj )i)
k,j=tε−1

(t+h)ε−1
+ O ε
3
2
X
µ`

h
Ω kζε (jε)k  + O εµ` (kΩh k)
j=ε−1 t

(t+h)ε−1
(t+h)ε−1
X
X
+ O  ε2
µ` (kΩh k kζε (jε)k2 ) + ε2
j=tε−1

+ O ε

X

(t+h)ε−1
µ` (kζε (kε)k) + ε
5
2
X
k=tε−1
k,j=tε−1
(t+h)ε−1
(t+h)ε−1
X
X
+ O  ε2
µ` (kζε (kε)k kζε (jε)k)
k,j=tε−1
(t+h)ε−1
3
2

µ` (kζε (kε)k2 ) + ε3
k=tε−1
µ` (kζε (kε)k kζε (jε)k2 ) + ε

µ` (kζε (kε)k2 kζε (jε)k2 ) .
k,j=tε−1
Observe that (6.6), (6.8) and (6.7) yield
m
µ` (kζε (kε)km ) = µ` (kζε (kε) − ζε (0)km ) ≤ C# (εk) 2 ≤ C#
for m ∈ {1, 2, 3, 4} and k ≤ C# ε−1 . We can now use Lemma 6.3 together with
Schwartz inequality to obtain
(6.10)
(t+h)ε−1
e ε (h(D2 A)(ζ(t))ζ h (t), ζ h (t)i) = ε
E
`
X
µ` (h(D2 A)(ζε (t))ˆ
ω (xk , θk ), ω
ˆ (xj , θj )i)
k,j=tε−1
√
+ O( εh + h2 + ε).
Next, we must perform a similar analysis on the first term of equation (6.9).
e ε (h∇A(ζ(t)), ζ h (t)i)
E
`
=
√
(t+h)ε−1
ε
X
µ` (h∇A(ζε (t)), ω
ˆ (xk , θk )i)
k=tε−1
(6.11)
(t+h)ε−1
X
+ε
√
µ` (h∇A(ζε (t)), Dω
¯ (Θ(εk))ζε (εk)i) + O( ε).
k=tε−1
To estimate the term in the second line of (6.11) we have to use again (6.4):
(t+h)ε−1
X
µ` (h∇A(ζε (t)), Dω
¯ (Θ(εk))ζε (εk)i) = hε−1 µ` (h∇A(ζε (t)), Dω
¯ (Θ(t))ζε (t)i)
k=tε−1
−1 2
+ O(ε
h +ε
− 12
h) +
√
(t+h)ε−1
ε
X
k
X
k=tε−1
j=tε−1
ω (xj , θj )i).
µ` (h∇A(ζε (t)), Dω
¯ (Θ(t))ˆ
To compute the last term in the above equation let L` be the standard family
1
generated by ` at time ε−1 t, then, setting αε (θ, t) = ∇A(ε− 2 (θ − Θ(t))) and ˆj =
28
CARLANGELO LIVERANI
j − tε−1 , we can write
µ` (h∇A(ζε (t)), Dω
¯ (Θ(t))ˆ
ω (xj , θj )i) =
d
X X
ν`1 µ`1 (αε (θ0 , t)r Dω
¯ (Θ(t))r,s ω
ˆ (xˆj , θˆj )s ).
`1 ∈L` r,s=1
Next, notice that for every r, the signed measure µ`1 ,r (φ) = µ`1 (αε (θ0 , t)r φ) has
density ρ`1 αε (G`1 (x), t)r whose derivative is uniformly bounded in ε, t. We can
then write µ`1 ,r as a linear combination of two standard pairs `1,i . Finally, given
L ∈ N, if ˆj ≥ L, we can consider the standard families L`1,i generated by `1,i at
time ˆj − L and write, arguing as in the proof of Lemma 6.3,
X
µ`1,i (ˆ
ω (xˆj , θˆj )s ) =
ν`2 µ`2 (ˆ
ω (xL , θL )s )
`2 ∈L`1,i
=
X
`2 ∈L`1,i
Z
b `2
ν`2
a `2
ρ`2 (x)ˆ
ω (fθL`∗ (x), θ`∗2 )s + O(εL2 ) = O(e−C# L + εL2 ).
2
Collecting all the above estimates yields
(t+h)ε−1
(6.12)
ε
1
X
µ` (h∇A(ζε (t)), Dω
¯ (Θ(εk))ζε (εk)i) = O(h2 + ε 2 h)
k=tε−1
1
1
1
+ hµ` (h∇A(ζε (t)), Dω
¯ (Θ(t))ζε (t)i) + O(h2 ε 2 L2 + h2 ε− 2 e−C# L + ε 2 Lh).
To deal with the second term in the first line of equation (6.11) we argue as before:
(t+h)ε−1
X
k=tε−1
µ` (h∇A(ζε (t)), ω
ˆ (xk , θk )i) =
−1
tεX
+L
µ` (h∇A(ζε (t)), ω
ˆ (xk , θk )i)
k=tε−1
+ O(hL2 + ε−1 heC# L )
= O(L + hL2 + ε−1 heC# L ).
Collecting the above computations and remembering (6.4) we obtain
(6.13)
e ε (h∇A(ζ(t)), ζ h (t)i) =hE
e ε (h∇A(ζ(t)), Dω
E
¯ (Θ(t))ζ(t)i)
`
`
√
√
2
+ O(h + L ε + h εL2 )
1
provided L is chosen in the interval [C∗ ln ε−1 , ε− 4 ] with C∗ > 0 large enough.
To conclude we must compute the term on the right hand side of the first line of
equation (6.10). Consider first the case |j − k| > L. Suppose k > j, the other case
being equal, then, letting L` be the standard family generated by ` at time ε−1 t,
1
and set kˆ = k − ε−1 t, ˆj = j − ε−1 t, B(x, θ, t) = (D2 A)(ε− 2 (θ − Θ(t)))
X
µ` (h(D2 A)(ζε (t))ˆ
ω (xk , θk ), ω
ˆ (xj , θj )i) =
ν`1 µ`1 (hB(x0 , θ0 , t)ˆ
ω (xkˆ , θkˆ ), ω
ˆ (xˆj , θˆj )i).
`1 ∈L`
Note that the signed measure µ
ˆ`1 ,r,s (g) = µ`1 (Br,s g) has a density with uniformly
bounded derivative given by ρˆ`1 ,r,s = ρ`1 B(x, G`1 (x), t)r,s . Such a density can then
be written as a linear combination of standard densities ρˆ`1 ,r,s = α1,`1 ,r,s ρ1,`1 ,r,s +
α2,`1 ,r,s ρ2,`1 ,r,s with uniformly bounded coefficients αi,`1 ,r,s . We can then use the
NOTES
29
same trick at time j and then at time k − L and obtain that the quantity we are
interested in can be written as a linear combination of quantities of the type
Z b
∗
µ`3 ,r,s (ˆ
ωs (xL , θL )) = µ`3 ,r,s (ˆ
ωs (xL , θ`3 ) + O(Lε) =
ρ˜r,s ω
ˆ s (fθL`∗ (x), θ`∗3 ) + O(L2 ε)
a
= O(e
−C# L
3
2
+ L ε)
where we argued as in the proof of Lemma 6.3. Thus the total contribution of all
such terms is of order L2 h2 + ε−1 e−C# L h2 . Next, the terms such that |k − j| ≤ L
but j ≤ ε−1 t + L give a total contribution of order L2 ε while to estimate the other
terms it is convenient to proceed as before but stop at the time j − L. Setting
k˜ = k − j + L we obtain terms of the form
µ`2 ,r,s (ˆ
ωs (xk˜ , θk˜ )ˆ
ωr (xL , θL )i) = Γk−j (θ`∗2 )r,s + O(e−C# L + L2 ε)
where
Z
Γk (θ) =
ω
ˆ (fθk (x), θ) ⊗ ω
ˆ (x, θ) mθ (dx).
S
The case j > k yields the same results but with Γ∗j . Remembering the smooth
dependence of the covariance on the parameter θ (see [13]), substituting the result
of the above computation in (6.10) and then (6.10) and (6.13) in (6.9) we finally
have
e ε (A(ζ(t + h)) − A(ζ(t))) = hE
e ε (h∇A(ζ(t)), Dω
E
¯ (Θ(t))ζ(t)i)
`
`
√
√
ε
2
2
e (Tr(σ (Θ(t))D A(ζ(t)))) + O(L ε + hL2 ε + h2 L2 )
+ hE
`
Z t+h h
i
e ε (h∇A(ζ(s)), Dω
e ε (Tr(σ 2 (Θ(s))D2 A(ζ(s))) ds
=
E
¯
(Θ(s))ζ(s)i)
+
E
`
`
t
√
√
3
+ O(L ε + hL2 ε + h 2 + h2 L2 ).
The proposition follows by summing the h−1 t terms in the interval [0, t] and by
1
1
choosing L = ε− 100 and h = ε 3 .
In the previous Lemma the expression σ 2 just stands for a specified matrix,
we did not prove that such a matrix is positive definite and hence it has a well
defined square root σ, nor we have much understanding of the properties of such a
σ (provided it exists). To clarify this is our next task.
Lemma 6.5. The matrices σ 2 (s), s ∈ [0, T ], are symmetric and non negative,
hence they have a unique real symmetric square root σ(s). In addition, if, for
each v ∈ Rd , hv, ω
¯ i is not a smooth coboundary, then there exists c > 0 such that
σ(s) ≥ c1.
Proof. For each v ∈ Rd a direct computation shows that
"
#2 
n−1
n−1
X
1 X
1
lim mθ 
hv, ω(fθk (·), θ)i  = lim
mθ hv, ω(fθk (·), θ)ihv, ω(fθj (·), θ)i
n→∞ n
n→∞ n
k=0
k,j=0
= mθ (hv, ω(·, θ)i2 ) + lim
n→∞
= mθ (hv, ω(·, θ)i2 ) + 2
∞
X
k=1
2
n
n−1
X
(n − k)mθ (hω(·, θ), vihv, ω(fθk (·), θ)i)
k=1
mθ hω(·, θ), vihv, ω(fθk (·), θ)i = hv, σ(θ)2 vi.
30
CARLANGELO LIVERANI
This implies that σ(θ)2 ≥ 0 and since it is symmetric, there exists, unique, σ(θ)
symmetric and non-negative. On the other hand if hv, σ 2 (θ)vi = 0, then, by the
decay of correlations and the above equation, we have
"
#2 
n−1
X
mθ 
hv, ω(fθk (·), θ)i  = n mθ (hv, ω(·, θ)i2 )
k=1
+ 2n
n−1
X
mθ hv, ω(·, θ)ihv, ω(fθk (·), θ)i + O(1)
k=0
= 2n
∞
X
mθ hv, ω(·, θ)ihv, ω(fθk (·), θ)i + O(1) = O(1).
k=n
Pn−1
Thus the L2 norm of φn = k=1 hv, ω(fθk (·), θ)i is uniformly bounded. Hence there
exist a weakly convergent subsequence. Let φ ∈ L2 be an accumulation point, then
for each ϕ ∈ C 1 we have
mθ (φ ◦ fθ ϕ) = lim mθ (φnk ◦ fθ ϕ) = mθ (φϕ) − mθ (hv, ω(·, θ)iϕ)
k→∞
That is hv, ω(x, θ)i = φ(x) − φ ◦ fθ (x). In other words hv, ω(x, θ)i is an L2 coboundary. Since the Livsic Theorem [14] states that the solution of the cohomological
equation must be smooth, we have φ ∈ C 1 .
6.3. Uniqueness of the Martingale Problem.
We are left with the task of proving the uniqueness of the martingale problem. Note
that in the present case the operator depends explicitly on time. Thus if we want to
set the initial condition at a time s 6= 0 we need to slightly generalise the definition
of martingale problem. To avoid this, for simplicity, here we consider only initial
conditions at time zero, which suffice for our purposes. In fact, we will consider
only the initial condition ζ(0) = 0, since it is the only one we are interested in. We
have then the same definition of the martingale problem as in Definition 5, apart
form the fact that L is replaced by Ls and y = 0.
Since the operators Ls are second order operators, we could use well known
results. Indeed, there exists a deep theory due to Stroock and Varadhan that
establishes the uniqueness of the martingale problem for a wide class of second
order operators, [17]. Yet, our case is especially simple because the coefficients of
the higher order part of the differential operator depend only on time and not on
ζ. In this case it is possible to modify a simple proof of the uniqueness that works
when all the coefficients depend only on time, [17, Lemma 6.1.4]. We provide here
the argument for the reader’s convenience.
Proposition 6.6. The martingale problem associated to the operators Ls in Proposition 6.4 has a unique solution.
Proof. As already noticed, Lt , defined in (6.2), depends on ζ only via the coefficient
of the first order part. It is then natural to try to change measure so that such
a dependence is eliminated and we obtain a martingale problem with respect to
an operator with all coefficients depending only on time, then one can conclude
arguing as in [17, Lemma 6.1.4]. Such a reduction is routinely done in probability
via the Cameron-Martin-Girsanov formula. Yet, given the simple situation at hand
NOTES
31
one can proceed in a much more naive manner. Let S(t) : [0, T ] → Md , Md being
the space of d × d matrices, be the generated by the differential equation
˙
S(t)
= −Dω(Θ(t))S(t)
S(0) = 1.
Note that, setting ς(t) = det S(t) and B(t) = Dω(Θ(t)), we have
ς(t)
˙ = − tr(B(t))ς(t)
ς(0) = 1.
The above implies that S(t) is invertible.
Define the map S ∈ C 0 (C 0 ([0, T ], Rd ), C 0 ([0, T ], Rd )) by [Sζ](t) = S(t)ζ(t) and
e Note that the map S is invertible. Finally, we define the operator
set P = S∗ P.
1Xb 2
Lbt =
[Σ(t) ]i,j ∂ζi ∂ζj ,
2 i,j
b 2 = S(t)σ(t)2 S(t)∗ , σ(t) = σ(Θ(t)) as mentioned after (6.2). Let us
where Σ(t)
verify that P satisfies the martingale problem with respect to the operators Lbt . By
Lemma C.1 we have
d
de
| Fs )
E(A(ζ(t)) | Fs ) = E(A(S(t)ζ(t))
dt
dt
e S(t)∇A(S(t)ζ(t))
˙
= E(
+ Lt A(S(t)ζ(t)) | Fs )


1e X 2
= E
σ (t)i,j ∂ζk ∂ζl A(S(t)ζ(t))S(t)k,i S(t)l,j Fs 
2
i,j,k,l
= E(Lbt A(ζ(t)) | Fs ).
Thus the claim follows by Lemma C.1 again.
Accordingly, if we prove that the above martingale problem has a unique solution,
e concluding
then P is uniquely determined, which, in turn, determines uniquely P,
the proof.
Let us define the function B ∈ C 1 (R2d+1 , R) by
1
B(t, ζ, λ) = ehλ,ζi− 2
Rt
s
b )2 λidτ
hλ,Σ(τ
then Lemma C.1 implies
d
1
b 2 λiB(t, ζ(t), λ) + Lbt B(t, ζ(t), λ) | Fs ) = 0.
E(B(t, ζ(t), λ) | Fs ) = E(− hλ, Σ(t)
dt
2
Hence
Rt
2
1
b
E(ehλ,ζ(t)i | Fs ) = ehλ,ζ(s)i+ 2 s hλ,Σ(τ ) λidτ .
From this follows that the finite dimensional distributions are uniquely determined.
Indeed, for each n ∈ N, {λi }ni=1 and 0 ≤ t1 < · · · < tn we have
Pn
Pn−1
E e i=1 hλi ,ζ(ti )i = E e i=1 hλi ,ζ(ti )i E ehλn ,ζ(tn )i Ftn−1
Pn−2
1 R tn hλ ,Σ(τ
b )2 λn idτ
n
= E e i=1 hλi ,ζ(ti )i+hλn−1 +λn ,ζ(tn−1 )i e 2 tn−1
1
= e2
R tn Pn
b )2 Pn
h i=n(τ ) λi ,Σ(τ
i=n(τ ) λi idτ
0
32
CARLANGELO LIVERANI
where n(τ ) = inf{m | tm ≥ τ }. This concludes the Lemma since it implies that
the measure is uniquely determined on the sets that generate the σ-algebra.35 Note
that we have also proven that the process is a zero mean Gaussian process; this,
after translating back to the original measure, generalises Remark 5.2.
Appendix A. Geometry
For c > 0, consider the cones Cc = {(ξ, η) ∈ R2 : |η| ≤ εc|ξ|}. Note that
∂x f
∂θ f
dFε =
.
ε∂x ω 1 + ε∂θ ω
Thus if (1, εu) ∈ Cc ,
dp Fε (1, εu) = (∂x f (p) + εu∂θ f (p), ε∂x ω(p) + εu + ε2 u∂θ ω(p))
∂θ f (p)
= ∂x f (p) 1 + ε
u · (1, εΞp (u))
∂x f (p)
where
(A.1)
Ξp (u) =
∂x ω(p) + (1 + ε∂θ ω(p))u
.
∂x f (p) + ε∂θ f (p)u
Thus the vector (1, εu) is mapped to the vector (1, εΞp (u)). Thus letting K =
max{k∂x ωk∞ , k∂θ ωk∞ , k∂θ f k∞ } we have, for |u| ≤ c and assuming Kεc ≤ 1,
|Ξp (u)| ≤
K + (1 + εK)c
K +1+c
≤
.
λ − εKc
λ−1
−1
Thus, if we choose c ∈ [ K+1
] we have that dp Fε (Cc ) ⊂ Cc . Since this
λ−2 , (εK)
−1
implies that dp Fε {Cc ⊂ {Cc we have that the complementary cone {CKε−1 is
invariant under dFε−1 . From now on we fix c = K+1
λ−2 .
1+d
Hence, for any p ∈ T
and n ∈ N, we can define the quantities vn , un , sn , rn
as follows:
(A.2)
dp Fεn (1, 0) = vn (1, εun )
dp Fεn (sn , 1) = rn (0, 1)
with |un | ≤ c and |sn | ≤ K. For each n the slope field sn is smooth, therefore
integrable; given any small ∆ > 0 and p = (x, θ) ∈ T1+d , define Wnc (p, ∆) the local
n-step central manifold of size ∆ as the connected component containing p of the
intersection with the strip {|θ0 − θ| < ∆} of the integral curve of (sn , 1) passing
through p.
Notice that, by definition, dp Fε (sn (p), 1) = rn /rn−1 (sn−1 (Fε p), 1); thus, by definition, there exists a constant b such that:
rn
(A.3)
exp(−bε) ≤
≤ exp(bε).
rn−1
Qn−1
Furthermore, define Γn = k=0 ∂x f ◦ Fεk , and let
∂θ f (A.4)
a = c
∂x f .
∞
Clearly,
(A.5)
35 See Problem 3.2.
Γn exp(−aεn) ≤ vn ≤ Γn exp(aεn).
NOTES
33
Appendix B. Shadowing
In this section we provide a simple quantitative version of shadowing that is
needed in the argument. Let (xk , θk ) = Fεk (x, θ) with k ∈ {0, . . . , n}. We assume
that θ belongs to the range of a standard pair ` (i.e., θ = G(x) for some x ∈ [a, b]).
Let θ∗ ∈ S such that kθ∗ − θk ≤ ε and set f∗ (x) = f (x, θ∗ ). Let us denote with
πx : T2 → S the canonical projection on the x coordinate; then, for any s ∈ [0, 1],
let
n
Hn (x, z, s) = πx Fsε
(x, θ∗ + s(G` (x) − θ∗ )) − f∗n (z)
Note that, Hn (x, x, 0) = 0, in addition, for any x, z and s ∈ [0, 1]
∂z Hn (x, z, s) = −(f∗n )0 (z).
Accordingly, by the Implicit Function Theorem any n ∈ N and s ∈ [0, 1], there
exists Yn (x, s) such that36 Hn (x, Yn (x, s), s) = 0; from now on Yn (x) stands for
Yn (x, 1). Note that setting x∗k = f∗k (Yn (x)), by construction, x∗n = xn . Observe
moreover that
(1 − G0` sn )vn
(B.1)
∂x Yn = (f∗n )0 (z)−1 d(πx Fεn ) =
,
(f∗n )0 ◦ Yn
where we have used the notations introduced in equation (A.2). Recalling (A.5)
and by the cone condition we have
n−1
n−1
Y ∂x f (xk , θk ) (1 − G0 sn )vn Y ∂x f (xk , θk )
`
≤ ec# εn
(B.2)
e−c# εn
≤
.
∗
0
n
0
f∗ (xk )
(f∗ )
f∗0 (x∗k )
k=0
k=0
Next, we want to estimate to which degree
x∗k
shadows the true trajectory.
1
Lemma B.1. There exists C > 0 such that, for each k ≤ n < Cε− 2 we have
kθk − θ∗ k ≤ C# εk
|xk − x∗k | ≤ C# εk.
Proof. Observe that
θk = ε
k−1
X
ω(xj , θj ) + θ0
j=0
thus kθk − θ∗ k ≤ C# εk. Accordingly, let us set37 ξk = x∗k − xk ; then, by the mean
value theorem,
|ξk+1 | = |∂x f · ξk + ∂θ f · (θk − θ∗ )|
≥ λ|ξk | − C# εk.
Since, by definition, ξn = 0, we can proceed by backward induction, which yields
|ξk | ≤
n−1
X
j=k
−j+k
λ
C# εj ≤ C# ε
∞
X
λ−j (j + k) ≤ C# εk.
j=0
36 The Implicit Function Theorem allows to define Y (x, s) in a neighborhood of s = 0; in fact
n
we claim that this neighborhood necessarily contains [0, 1]. Otherwise, there would exist s¯ ∈ (0, 1)
and x
¯ so that Yn is defined at (¯
x, s¯) but not at (¯
x, s) with s > s¯. We then could apply the Implicit
Function Theorem at the point (¯
x, Yn (¯
x, s¯), s¯) and obtain, by uniqueness, an extension of the
previous function Yn to a larger neighborhood of s = 0, which contradicts our assumption.
37 Here, as we already done before, we are using the fact that we can lift T1 to the universal
covering R.
34
CARLANGELO LIVERANI
1
Lemma B.2. There exists C > 0 such that, for each n ≤ Cε− 2 ,
2
2
e−c# εn ≤ |Yn0 | ≤ ec# εn .
(B.3)
In particular, Yn is invertible with uniformly bounded derivative.
Proof. Let us prove the upper bound, the lower bound being similar. By equations
(B.1), (B.2) and Lemma B.1 we have
|Yn0 | ≤ ec# εn e
Pn−1
k=0
ln ∂x f (xk ,θk )−ln f∗0 (x∗
k)
≤ ec# εn ec#
Pn−1
k=0
εk
.
¯ ’s calculus
Appendix C. Martingales, operators and Ito
Suppose that Lt ∈ L(C r (Rd , R), C 0 (Rd , R)), t ∈ R, is a one parameter family of
bounded linear operators that depends continuously on t.38 Also suppose that P is
a measure on C 0 ([0, T ], Rd ) and let Ft be the σ-algebra generated by the variables
{z(s)}s≤t .39
Lemma C.1. The two properties below are equivalent:
(1) For all A ∈ C 1 (Rd+1 , R), such that, for all t ∈ R, A(t, ·) ∈ C r (Rd , R), and
for all times s, t ∈ [0, T ], s < t, the function g(t) = E(A(t, z(t)) | Fs ) is
differentiable and g 0 (t) = E(∂t A(t, z(t)) + Lt A(t, z(t)) | Fs ).
Rt
(2) For all A ∈ C r (Rd , R), M (t) = A(z(t)) − A(z(0)) − 0 Ls A(z(s))ds is a
martingale with respect to Ft .
Proof. Let us start with (1) ⇒ (2). Let us fix t ∈ [0, T ], then for each s ∈ [0, t] let
us define the random variables B(s) by
Z t
B(s, z) = A(z(t)) − A(z(s)) −
Lτ A(z(τ ))dτ.
s
Clearly, for each z ∈ C 0 , B(s, z) is continuous in s, and B(t, z) = 0. Hence, for all
τ ∈ (s, t], by Fubini we have40
Z t
d
d
d
E(B(τ ) | Fs ) = − E(A(z(τ )) | Fs ) −
E(Lr A(z(r)) | Fs )dr
dτ
dτ
dτ τ
= E(−Lτ A(z(τ )) + Lτ A(z(τ )) | Fs ) = 0.
Thus, since B is bounded, by Lebesgue dominated convergence theorem, we have
0 = E(B(t) | Fs ) = lim E(B(τ ) | Fs ) = E(B(s) | Fs ).
τ →0
This implies
E(M (t) | Fs ) = E(B(s) | Fs ) + M (s) = M (s)
as required.
Next, let us check (2) ⇒ (1). For each h > 0 we have
E(A(t + h, z(t + h))−A(t, z(t)) | Fs ) = E ((∂t A)(t, z(t + h)) | Fs ) h + o(h)
!
Z t+h
+ E M (t + h) − M (t) +
Lτ A(t, z(τ ))dτ | Fs .
t
38 Here C r are thought as Banach spaces, hence consist of bounded functions. A more general
setting can be discussed by introducing the concept of a local martingale.
39 At this point the reader is supposed to be familiar with the intended meaning: for all
ϑ ∈ C 0 ([0, T ], Rd ), [z(s)](ϑ) = z(ϑ, s) = ϑ(s).
40 If uncomfortable about applying Fubini to conditional expectations, then have a look at
[19, Theorem 4.7].
NOTES
35
Since M is a martingale E (M (t + h) − M (t) | Fs ) = 0. The lemma follows by
Lebesgue dominated convergence theorem.
The above is rather general, to say more it is necessary to specify other properties
of the family of operators Ls . A case of particular interest arises for second order
differential operators like (6.2). Namely, suppose that
(Ls A)(z) =
X
a(z, s)i ∂zi A(z) +
i
d
1 X 2
[σ (z, s))]i,j ∂zi ∂zj A(z),
2 i,j=1
where, for simplicity, we assume a, σ to be smooth and bounded and σij = σji .
Clearly, (6.2) is a special case of the above. In such a case it turns out that it
can be established a strict connection between Ls and the Stochastic Differential
Equation
(C.1)
dz = adt + σdB
where B is the standard Brownian motion. The solution of (C.1) can be defined
in various way. One possibility is to define it as the solution of the Martingale
problem [17], another is to use stochastic integrals [19, Theorem 6.1]. The latter,
more traditional, approach leads to It¯o’s formula that reads, for each bounded
continuous function A of t and z, [19, page 91],
Z t
Z tX
A(z(t), t) − A(z(0), 0) =
∂s A(z(s), s)ds +
a(z(s), s)i ∂zi A(z(s), s)ds
0
0
Z
1 tX
+
+
2
0
σ 2 (z(s), s)∂zi ∂zj A(z(s), s)ds
i,j
XZ
i,j
i
t
σij (z(s), s)∂zj A(z(s), s)j dBi (s)
0
where the last is a stochastic integral [19, Theorem 5.3]. This formula is often
written in the more impressionistic form
1
dA = ∂t Adt + a∂z Adt + σ∂z AdB + σ 2 ∂z2 Adt = ∂t Adt + σ∂z AdB + Lt Adt.
2
Taking the expectation with respect to E(· | Fs ) we obtain exactly condition (1) of
Lemma C.1, hence we have that the solution satisfies the Martingale problem.
Remark C.2. Note that, if one defines the solution of (C.1) as the solution of the
associated Martingale problem, then one can dispense from It¯
o’s calculus altogether.
This is an important observation in our present context in which the fluctuations
come form a deterministic problem rather than from a Brownian motion and hence
a direct application of It¯
o’s formula is not possible.
Appendix D. Solutions to the Problems
Here we provide the solution of the problems given in the text.
Problem 1.1. Just compute.
Problem 1.2. Approximate uniformly f with step functions.
36
CARLANGELO LIVERANI
Problem 1.3. Consider any Borel probability measure ν and define the following
sequence of measures {νn }n∈N :41 for each Borel set A
νn (A) = ν(T −n A).
The reader can easily see that νn ∈ M1 (X), the sets of the probability measures.
Indeed, since T −1 X = X, νn (X) = 1 for each n ∈ N. Next, define
µn =
n−1
1X
νi .
n i=0
Again µn (X) = 1, so the sequence {µi }∞
i=1 is contained in a weakly compact set
(the unit ball) and therefore admits a weakly convergent subsequence {µni }∞
i=1 ; let
µ be the weak limit.42 We claim that µ is T invariant. Since µ is a Borel measure
it suffices to verify that for each f ∈ C 0 (X) holds µ(f ◦ T ) = µ(f ). Let f be a
continuous function, then by the weak convergence we have43
nj −1
nj −1
1 X
1 X
νi (f ◦ T ) = lim
ν(f ◦ T i+1 )
j→∞ nj
j→∞ nj
i=0
i=0
(nj −1
)
X
1
nj
= lim
νi (f ) + ν(f ◦ T ) − ν(f ) = µ(f ).
j→∞ nj
i=0
µ(f ◦ T ) = lim
Problem 1.4. For any continuous function f
Z
T∗ µ(f ) =
f ◦ T (x)h(x)dx.
T
Divide the integral in monotonicity intervals for T and change variable in each
interval.
Problem 1.5. For any function f ∈ C 1 compute
d
T 00
1
Lh = L h0 0 + L h 0 2
dx
T
(T )
Integrating and iterating this yields the first inequalities. Since (Ln h)0 is bounded
in L1 , then also Ln h must be bounded in C 0 . The second inequalities follows. The
third inequality is more of the same.
Problem 1.6. Use the last fact of Problem 1.5 and Lemma 1.2.
41Intuitively, if we chose a point x ∈ X at random, according to the measure ν and we ask
what is the probability that T n x ∈ A, this is exactly ν(T −n A). Hence, our procedure to produce
the point T n x is equivalent to picking a point at random according to the evolved measure νn .
42This depends on the Riesz-Markov Representation Theorem [16] that states that M(X) is
exactly the dual of the Banach space C 0 (X). Since the weak convergence of measures in this
case correspond exactly to the weak-* topology [16], the result follows from the Banach-Alaoglu
theorem stating that the unit ball of the dual of a Banach space is compact in the weak-* topology.
43Note that it is essential that we can check invariance only on continuous functions: if we
would have to check it with respect to all bounded measurable functions we would need that µn
converges in a stronger sense (strong convergence) and this may not be true. Note as well that
this is the only point where the continuity of T is used: to insure that f ◦ T is continuous and
hence that µnj (f ◦ T ) → µ(f ◦ T ).
NOTES
37
Problem 1.7. Recall that a dynamical systems (X, T, µ) is mixing if for all probability measures µ absolutely continuous with respect to µ one has limn→∞ T∗n ν = µ,
where the converges takes place in the weak topology. Also note that the third fact
of Problem 1.5 implies that the unique invariant measure µ absolutely continuous with respect to Lebesgue is, in fact, equivalente to Lebesgue. Let dµ = hdx,
h ∈ L1 , be a probability measure. Then we can approximate it, in L1 , with a
sequence of density of probability densities {hn } ⊂ W 1,1 . Let ν be the invariant
measure provided by Lemma 1.2. Then for each f ∈ C 0 ,
Z
Z
n
lim T∗ µ(f ) = lim lim
L hk f = lim lim
Ln hk f = ν(f ).
n→∞
n→∞ k→∞
k→∞ n→∞
Problem 2.1.
R
(1) Consider α(f ) = f (x, y)µ(dx)ν(dy). Obviously α ∈ G(µ, ν)
(2) First of all d(µ, µ) = 0 since we can consider the coupling
Z
α(f ) = f (x, x)µ(dx).
Next, note that G(µ, ν) is weakly closed and is a subset of the probability measures on X 2 , which is weakly compact. Hence, G(µ, ν) is weakly
compact, thus the inf is attained. Accordingly, if d(ν, µ) = 0, there exists α ∈ G(ν, µ) such that α(d) = 0. Thus α is supported on the set
D = {(x, y) ∈ X 2 | x = y}. Thus, for each ϕ ∈ C 0 (X), we have, setting
π1 (x, y) = x and π2 (x, y) = y,
µ(ϕ) = α(ϕ ◦ π1 ) = α(1D ϕ ◦ π1 ) = α(1D ϕ ◦ π2 ) = α(ϕ ◦ π2 ) = ν(ϕ).
The fact that d(ν, µ) = d(µ, ν) is obvious from the definition. It remains
to prove the triangle inequality. It is possible to obtain a fast argument by
using the disintegration of the couplings, but here is an alimentary proof.
Let us start with some preparation. Since X is compact, for each ε we can
construct a finite partition {pi }M
i=1 of X such that each pi has diameter less
than ε (do it). Given two probability measures µ, ν and α ∈ G(µ, ν) note
that if µ(pi ) = 0, then
X
α(pi × pj ) = α(pi × X) = µ(pi ) = 0.
j
Hence α(pi × pj ) = 0 for all j. Thus we can define
XZ
α(pi × pj )
µ(dx)ν(dy),
αε (f ) =
f (x, y)1pi (x)1pj (y)
µ(p
2
i )ν(pj )
i,j X
where the sun runs only on the indixes for which the denominator is strictly
positive. It is easy to check that αε ∈ G(µ, ν) and that the weak limit of αε is
α. Finally, let ν, µ, ζ be three probability measures and let α ∈ G(µ, ζ), β ∈
G(ζ, ν) such that d(µ, ζ) = α(d) and d(ν, ζ) = β(d). For each ε > 0, there
exists δ > 0 such that |α(d) − αδ (d)| ≤ ε and, likewise, |β(d) − βδ (d)| ≤ ε.
We can then define the following measure on X 3
XZ
α(pi × pj )β(pj × pk )
γδ (f ) =
f (x, z, y)1pi (x)1pj (z)1pk (y)
µ(dx)ν(dy)ζ(dz),
µ(pi )ζ(pj )2 ν(pk )
3
X
i,j,k
38
CARLANGELO LIVERANI
where again the sum is restricted to the indexes for which the denominator
is strictly positive. The reader can check that the marginal on (x, z) is αδ ,
the marginal on (z, y) is βδ . It follows that the marginal on (x, y) belongs
to G(µ, ν), thus
Z
Z
Z
d(µ, ν) ≤
d(x, y)γδ (dx, dz, dy) ≤
d(x, z)αδ (dx, dz) +
d(z, y)βδ (dz, dy)
X3
X2
X2
≤ α(d) + β(d) + 2ε = d(µ, ζ) + d(ζ, ν) + 2ε.
The result follows by the arbitrariness of ε.
(3) It suffices to prove that µn converges weakly to µ if and only if
lim d(µn , µ) = 0.
n→∞
If it converges in the metric, then, letting αn ∈ G(µn , µ), for each Lipschitz
function f we have
Z
|µn (f ) − µ(f )| ≤
|f (x) − f (y)|αn (dx, dy) ≤ Lf αn (d)
X2
where Lf is the Lipschitz constant. Taking the inf on αn we have
|µn (f ) − µ(f )| ≤ Lf d(µn , µ).
We have thus that limn→∞ µn (f ) = µ(f ) for each Lipschitz function. The
claim follows since the Lipschitz functions are dense by StoneWeierstrass
Theorem.
If it converges weakly, then, to prove convergence in the metric, we need
a slightly more sophisticated partitions Pδ : partitions with the property
that µ(∂pi ) = 0. Note that this implies limn→∞ µn (pi ) = µ(pi ), [18]. Let
us construct explicitly such partitions. For each x ∈ X consider the balls
Br (x) = {z ∈ X : d(x, z) < r}. Give δ, let S1,1 = {x ∈ B δ \ B 34 δ }
and S1,2 = {x ∈ B 12 δ \ B 41 δ }. These two spherical shells are disjoins. Let
σ(1) ∈ {1, 2} be such that µ(S1,σ(1) ) = min{µ(S1,1 ), µ(S1,2 )}. Divide again
the spherical shell S1,σ(1) into three, throw away the middle part and let
S2,σ(2) be the one with the smaller measure. Continue in this way to obtain
a sequence Sn,σ(n) . Note that µ(Bδ (x)) ≥ 2n µ(Sn,σ(n) ) and Sn+1,σ(n+1) ⊂
Sn,σ(n) , thus there exists r(x) ∈ [δ/3, δ] such that ∂Br(x) (x) = ∩∞
n=1 Sn,σ(n)
and µ(∂Br(x) (x)) = 0. Since X is compact we can extract a finite sub
cover, {Bi }N
i=1 , from the cover {Br(x) (x)}x∈X . If we consider all the (open)
sets Bi ∩ Bj they form a mod-0 partition of X. To get a partition {pi } just
attribute the boundary points in any (measurable) way you like.44 Also,
for each partition element pi choose a reference point xi ∈ pi .
Having constructed the wanted partition we can discretize any measure
ν by associating to it the measure
X
νδ (f ) =
f (xi )ν(pi ).
i
Define also
α(f ) =
XZ
i
f (x, xi )ν(dx)
pi
44 E.g., if C are the elements of the partition, you can set p = C , p = C \ (∂C ) and so
1
1
2
2
1
i
on.
NOTES
39
and check that α ∈ G(ν, νδ ), hence
d(ν, νδ ) ≤ α(d) ≤ 2δ.
For each n ∈ N and δ > 0 we can then write
d(µ, µn ) ≤ d(µδ , µn,δ ) + 4δ.
P
Next, let zn,i = min{µ(pi ), µn (pi )}, Zn−1 = i zn,i , and define
X
(1 − Zn )2 X
βn (f ) = Zn
f (xi , xi )zi +
f (xi , xj )[µ(pi ) − zn,i ] · [µn (pi ) − zn,i ]
Zn2
i
i,j
and verify that β ∈ G(µδ , µn,δ ). In addition, for each δ > 0, limn→∞ zn,i =
µn (pi ). Hence, limn→∞ Zn = 1. Collecting the above fact, and calling K
the diameter of X, yields
(1 − Zn )2
+ 4δ = 4δ.
n→∞
n→∞
n→∞
Zn2
The result follows by the arbitriness of δ.
Comment: In the field of optimal transport one usually would prove
the above facts via the duality relation
lim d(µ, µn ) ≤ lim βn (d) + 4δ ≤ K lim
d(µ, ν) = sup {µ(φ) − ν(φ)}
φ∈L1
where L1 is the set of Lipschitz function with Lipschitz constant equal 1.
We refrain from this point of view because, in spite of its efficiency, requires
the development of a little bit of technology outside the scopes of this notes.
The interested reader can see [20, Chapter 1] for details.
(4) The first metric gives rise to the usual topology, hence convergence in d
is equivalente to the usual weak convergence of measures. The metric d0
instead give rise to the discrete topology, hence each function in L∞ is
continuous. Hence the convergence in the d is equivalent to the usual
strong convergence of measures.
Problem 2.2. Just compute.
Problem 2.3. Use the third part of Problem 1.5.
Problem 3.1. See Appendix A.
Problem 3.2. See [17, Section 1.3].
Problem 3.3. By (3.1) it follows that the path Θε is made of segments of length
ε and maximal slope kωkL∞ , thus for all h > 0,45
bε−1 (t+h)c−1
kΘε (t + h) − Θε (t)k ≤ C# h + ε
X
kω(xk , θk )k ≤ C# h.
k=dε−1 te
Thus the measures Pε are all supported on a set of uniformly Lipschitz functions,
that is a compact set.
Problem 4.1. See Lemma B.1.
45 The reader should be aware that we use the notation C to designate a generic constant
#
(depending only on f and ω) which numerical value can change from one occurrence to the next,
even in the same line.
40
CARLANGELO LIVERANI
Problem 4.2. See Lemma B.2.
Problem 5.1. Do it.
Problem 5.2. Compute using Ito’s formula. In turn, this implies that ζ is a zero
mean Gaussian process, see the proof of Proposition 6.6 for more details.
References
[1] Baladi, Viviane. Positive transfer operators and decay of correlations. Advanced Series in
Nonlinear Dynamics, 16. World Scientific Publishing Co., Inc., River Edge, NJ, 2000.
[2] Baladi, Viviane; Tsujii, Masato. Anisotropic H¨
older and Sobolev spaces for hyperbolic diffeomorphisms. Ann. Inst. Fourier (Grenoble) 57 (2007), no. 1, 127–154.
[3] Jacopo De Simoi, Carlangelo Liverani, The Martingale approach after Varadhan and Dolpogpyat. In ”Hyperbolic Dynamics, Fluctuations and Large Deviations”, Dolgopyat, Pesin,
Pollicott, Stoyanov editors, Proceedings of Symposia in Pure Mathematics, 89, AMS (2015)
Preprint. arXiv:1402.0090.
[4] Jacopo De Simoi, Carlangelo Liverani, Fast-slow partially hyperbolic systems: Beyond averaging. Part I (Limit Theorems). Preprint. arXiv:1408.5453.
[5] Jacopo De Simoi, Carlangelo Liverani, Fast-slow partially hyperbolic systems: Beyond averaging. Part II (statistical properties). Preprint. arXiv:1408.5454.
[6] Dolgopyat, Dmitry. Averaging and invariant measures. Mosc. Math. J. 5 (2005), no. 3,
537–576, 742.
[7] Donsker, M. D.; Varadhan, S. R. S. Large deviations for noninteracting infinite-particle
systems. J. Statist. Phys. 46 (1987), no. 5-6, 1195–1232
[8] Gou¨
ezel, S´
ebastien; Liverani, Carlangelo. Banach spaces adapted to Anosov systems. Ergodic
Theory Dynam. Systems 26 (2006), no. 1, 189–217.
[9] Gou¨
ezel, S´
ebastien; Liverani, Carlangelo. Compact locally maximal hyperbolic sets for
smooth maps: fine statistical properties. J. Differential Geom. 79 (2008), no. 3, 433–477.
[10] Guo, M. Z.; Papanicolaou, G. C.; Varadhan, S. R. S. Nonlinear diffusion limit for a system
with nearest neighbor interactions. Comm. Math. Phys. 118 (1988), no. 1, 31–59.
[11] Katok, Anatole; Hasselblatt, Boris. Introduction to the modern theory of dynamical systems.
With a supplementary chapter by Anatole Katok and Leonardo Mendoza. Encyclopedia of
Mathematics and its Applications, 54. Cambridge University Press, Cambridge, 1995.
[12] R.Z. Khasminskii. On the averaging principle for It¯
o stochastic differential equations, Kybernetika 4 (1968) 260–279 (in Russian).
[13] Liverani, Carlangelo. Invariant measures and their properties. A functional analytic point
of view, Dynamical Systems. Part II: Topological Geometrical and Ergodic Properties of
Dynamics. Pubblicazioni della Classe di Scienze, Scuola Normale Superiore, Pisa. Centro
di Ricerca Matematica ”Ennio De Giorgi” : Proceedings. Published by the Scuola Normale
Superiore in Pisa (2004).
[14] Livsic, A.N. Cohomology of dynamical systems, Math. USSR Iz . 6 (1972) 12781301.
[15] Pugh, Charles; Shub, Michael Stably ergodic dynamical systems and partial hyperbolicity.
J. Complexity 13 (1997), no. 1, 125–179.
[16] Reed, Michael; Simon, Barry Methods of modern mathematical physics. I. Functional analysis. Second edition. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New
York, 1980. xv+400 pp.
[17] Stroock, Daniel W.; Varadhan, S. R. Srinivasa. Multidimensional diffusion processes.
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 233. Springer-Verlag, Berlin-New York, 1979. xii+338 pp.
[18] Varadhan, S. R. S. Probability theory. Courant Lecture Notes in Mathematics, 7. New York
University, Courant Institute of Mathematical Sciences, New York; American Mathematical
Society, Providence, RI, 2001. viii+167 pp.
[19] Varadhan, S. R. S. Stochastic processes. Courant Lecture Notes in Mathematics, 16. Courant
Institute of Mathematical Sciences, New York; American Mathematical Society, Providence,
RI, 2007. x+126 pp.
[20] Villani, C´
edric, Topics in optimal transportation. Graduate Studies in Mathematics, 58.
American Mathematical Society, Providence, RI, 2003. xvi+370 pp.
NOTES
41
[21] M. I. Freidlin and A. D. Wentzell. Random perturbations of dynamical systems, volume
260 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences].
` di Roma (Tor VerCarlangelo Liverani, Dipartimento di Matematica, II Universita
gata), Via della Ricerca Scientifica, 00133 Roma, Italy.
E-mail address: [email protected]