Fast-Slow Skew Product Systems and Convergence to Stochastic
Transcription
Fast-Slow Skew Product Systems and Convergence to Stochastic
Fast-Slow Skew Product Systems and Convergence to Stochastic Differential Equations Ian Melbourne 17 April 2015 Abstract These are notes for the LMS-CMI Research School Statistical Properties of Dynamical Systems at Loughborough, 13–17 April 2015. 1 1.1 Lecture 1: The central limit theorem Statement of the result Let Λ = [0, 1] with Lebesgue measure µ. Consider the doubling map f : [0, 1] → [0, 1] given by ( 2y 0 ≤ y < 12 f y = 2y mod 1 = . 2y − 1 21 ≤ y ≤ 1 It is easy to verify that µ is invariant: µ(f −1 E) = µ(E) for all measurable E. Although slightly harder to verify, µ is ergodic: if E is measurable and f E ⊂ E, then µ(E) = 0 or µ(E) = 1. (See Remark 1.8(b) below for a proof of ergodicity.) Suppose that v : Λ → R is integrable (eg if v is continuous). We consider the sequence of observations {v ◦ f n : n ≥ 0}. Since µ is invariant, it follows immediately that v and v ◦ f are identically distributed, denoted by v =d v ◦ f . This means that µ{y ∈ Λ : v(y) < b} = µ{y ∈ Λ : v(f y) < b} for all b ∈ R. Inductively v ◦ f m =d v ◦ f n for all m, n and hence {v ◦ f n : n ≥ 0} is a sequence of identically distributed (but usually not independent) random variables. Hence it makes sense to ask for statistical properties such as the asymptotic behaviour of vn = n−1 X j=0 1 v ◦ f j, as n → ∞. We mention without proof the simplest such property: the Strong Law of Large Numbers (SLLN) which here is a consequence of Birkhoff’s pointwise ergodic theorem: Since µ is an ergodic invariant probability measure and v is integrable, Z 1 v dµ a.e. lim vn = n→∞ n Λ This means that Z n o 1 µ y ∈ Λ : vn (y) → v dµ = 1. n Λ In words, this says that time averages converge to the space average with probability 1. (In everyday life, this is called the law of averages whereby bad luck can be shown to even out in the long run; simply overlook the factor of n1 .) R Now suppose for simplicity that Λ v dµ = 0. Then n1 vn → 0 a.e. and it makes sense to look at other normalising factors. In the Central Limit Theorem (CLT) one considers √1n vn 1 This is the first main theorem that we will prove in these lectures. Theorem 1.1 (Central Limit Theorem) Continue to assume that f is the doublingRmap and that µ is Lebesgue measure. Suppose that v : Λ → R is Lipschitz and that Λ v dµ = 0. Then Z 2 1 2 √ vn dµ exists. (a) The limit σ = lim n→∞ Λ n 1 (b) √ vn →d N (0, σ 2 ) as n → ∞. This means that n Z b n o 1 1 −x2 /2σ2 e dx lim µ y ∈ Λ : √ vn (y) < b = P(Y < b) = 2 n→∞ n −∞ 2πσ (1.1) for all b ∈ R, where Y ∼ N (0, σ 2 ) is a normally distributed random variable with mean zero and variance σ 2 . (c) Typically (in a very strong sense that will be made precise later) σ 2 > 0. Remark 1.2 (a) Note that in (1.1) the randomness on the LHS only occurs through the initial condition y. Once the initial condition is specified, everything is deterministic. The RHS on the other hand is a genuinely random object. (b) R 2In the IID (independent, identically distributed) setting, it would suffice that v dµ < ∞. However, in the current setting, the CLT fails for typical square Λ 1 If R Λ v dµ 6= 0, then we would consider √1 (vn n −n 2 R Λ v dµ). integrable (or even continuous) v. Some regularity is required (it is not necessary that v is Lipschitz; H¨older continuity or bounded variation would suffice). (c) This result belongs to the topic known as smooth ergodic theory where f , µ and v are required to be well-behaved. In a purely measure-theoretic set up, such a result is not possible. 1.2 Koopman and transfer operators Before getting to the proof of Theorem 1.1, we need some background material on Koopman and transfer operators. In this section, we suppose that f : Λ → Λ is a general transformation of a probability space (Λ, µ) with µ invariant under f . Define the Koopman operator U : L1 (Λ) → L1 (Λ), Uv = v ◦ f. Some elementary properties of U are the following: (U1) U 1 = 1. R R R R (U2) Λ Uv dµ = Λ v dµ. In particular, if Λ v dµ = 0, then Λ U v dµ = 0. (U3) kUvkp = kvkp for all v ∈ Lp (Λ), 1 ≤ p ≤ ∞. R R (U4) Λ Uv Uw dµ = Λ vw dµ, for all v ∈ L1 (Λ), w ∈ L∞ (Λ). (U1) is immediate and the others follow fromR invariance of the measure. For example, if p ∈ [1, ∞), then by definition kvkp = ( Λ |v|p dµ)1/p , and Z Z Z p p |U v| dµ = |v| ◦ f dµ = |v|p dµ, Λ Λ Λ proving (U3). The proof of (U4) is left as an exercise. Remark 1.3 (U3) and (U4) were presented as nice properties, but actually they say that U does not contract, so pleasant given that we are interested in the Pn−1whichjis not Pn−1 asymptotics of vn = j=0 v ◦ f = j=0 U j v. Moreover, when passing to regular v (which is necessary by Remark 1.2(b)) the situation is even worse. For example, if f is the doubling map, and v is differentiable, then |(U v)0 | = 2|v 0 | (where defined). Iterating, |(U n v)0 | = 2n |v 0 |. To remedy the situation in Remark 1.3, we pass to the dual or adjoint of U in the hope that this will have the opposite behaviour to U and hence have good contraction properties. 3 In the L2 situation, the transfer operator (or Perron-Frobenius operator) P is defined to be the adjoint of U , so P = U ∗ : L2 (Λ) → L2 (Λ). This means that if v ∈ L2 (Λ), then P v is defined to be the unique element of L2 (Λ) satisfying Z Z P v w dµ = v U w dµ for all w ∈ L2 (Λ). Λ Λ More generally, P : L1 (Λ) → L1 (Λ) is defined by requiring that Z Z P v w dµ = v Uw dµ for all w ∈ L∞ (Λ). Λ Λ We have (P1) P 1 = 1. R R R R (P2) Λ P v dµ = Λ v dµ. In particular, if Λ v dµ = 0, then Λ P v dµ = 0. (P3) kP vkp ≤ kvkp for all v ∈ Lp , 1 ≤ p ≤ ∞. (P4) P U = I. R R R R R Eg for (P1), note that Λ P 1 w dµ = Λ 1 U w dµ = Λ U w dµ = Λ w dµ = 1 w dµ. Λ R R R For (P3) with p = 1, take w = sgn P v. Then kP vk1 = Λ |P v| dµ = Λ P v w dµ = v Uw dµ ≤ kvk1 kUwk∞ = kvk1 kwk∞ = kvk1 . Λ (P4) is left as an exercise. 1.3 Decay of correlations for the doubling map First, we write down the explicit formula for the transfer operator for the doubling map. y + 1 o 1n y v +v . Proposition 1.4 (P v)(y) = 2 2 2 Proof By change of variables, Z Z Z P v w dµ = v U w dµ = Λ Λ 1 v(y) w(f y) dy 0 1 2 Z = 1 Z v(y) w(2y − 1) dy v(y) w(2y) dy + 1 2 0 Z 1 1 y + 1 v w(y) dy + v w(y) dy 2 2 0 2 0 y + 1 o 1 n 1 y v +v w(y) dy. 2 2 0 2 1 = 2 Z = Z 1 y 4 Recall that Lip v = sup x6=y |v(x) − v(y)| . We can now prove that P contracts Lips|x − y| chitz constants. Lemma 1.5 Lip P v ≤ 21 Lip v. Proof By Proposition 1.4, (P v)(x) − (P v)(y) = y o 1 n x + 1 y + 1 o 1n x v −v + v −v 2 2 2 2 2 2 so y 1 x + 1 y + 1 1 x |(P v)(x) − (P v)(y)| ≤ v −v −v + v 2 2 2 2 2 2 x y 1 x + 1 y + 1 1 ≤ Lip v − + Lip v − 2 2 2 2 2 2 x y 1 = Lip v − = Lip v |x − y|, 2 2 2 and the result follows. Corollary 1.6 |P n v − R Λ v dµ|∞ ≤ 1 2n Lip v. R Proof Note that if w : Λ → R is Lipschitz, then |w − Λ w dµ|∞ ≤ Lip w diam Λ. In our case, diam Λ = 1. Hence using (P2) and Lemma 1.5, R R |P n v − Λ v dµ|∞ = |P n v − Λ P n v dµ|∞ ≤ Lip P n v ≤ 21n Lip v. Theorem 1.7 (Decay of correlations) For all v : Λ → R Lipschitz, w ∈ L1 (Λ), n ≥ 1, Z Z Z 1 n n Lip v |w|1 , v w ◦ f dµ − v dµ w dµ ≤ 2 Λ Λ Λ Proof Compute that Z Z Z Z Z Z n n v w ◦ f dµ − v dµ w dµ = P v w dµ − v dµ w dµ Λ Λ Λ Λ Λ Λ Z Z = P n v − v dµ w dµ. Λ 5 Λ Hence, Z Z Z Z n n v w ◦ f dµ − v dµ w dµ ≤ P v − v dµ |w|1 , Λ Λ Λ Λ ∞ and the result follows from Corollary 1.6. R R R Remark 1.8 (a) Λ v w ◦ f n dµ − Λ v dµ Λ w dµ is called the correlation function for v and w. In probabilistic notation, it is the same as Cov(v, w ◦ f n ) = E(vw) − EvEw where E here denotes expectation with respect to µ. (b) Set v = 1A and w = 1B to be the indicator functions of measurable subsets A, B ⊂ Λ (so 1A is equal to one on A and zero elsewhere). Then the correlation function becomes µ(A ∩ f −n B) − µ(A)µ(B). It is a standard fact that this can decay arbitrarily slowly, so Theorem 1.7 does not extend to general L∞ functions. However using an approximation argument, it can be shown that Theorem 1.7 implies that limn→∞ µ(A ∩ f −n B) = µ(A)µ(B). In other words f is mixing. Since f is mixing, we can prove that f is ergodic as stated on page 1. Indeed, suppose f E ⊂ E. Then f n E ⊂ E for all n ≥ 1 and so E ∩ f −n (Λ \ E) = ∅. By mixing it follows that 0 = µ(E ∩ f −n (Λ \ E)) → µ(E)µ(Λ \ E) and it follows that µ(E) = 0 or µ(Λ \ E) = 0. (c) Let α ∈ (0, 1]. Define |v|α = sup x6=y |v(x) − v(y)| . |x − y|α A function v : Λ → R is α-H¨older if |v|α < ∞. In particular, 1-H¨older is the same as Lipschitz. It is easy to check that everything proved so far for Lipschitz observables goes over to the H¨older case (with different constants). 1.4 A more general framework Theorem 1.7 establishes exponential decay of correlations for the doubling map. The statement and proof are somewhat misleading. Suppose that the definition of f is changed slightly as follows. We still take Λ = [0, 1] and f : Λ → Λ has two branches mapping [0, 21 ] and [ 12 , 1] monotonically onto [0, 1]. Suppose moreover that f is C 2 on (0, 12 ) and ( 12 , 1) and extends continuously to [0, 12 ] and [ 12 , 1]. It is easy to obtain a formula for P analogous to the one in Proposition 1.4 but the proof of Lemma 1.5 breaks down. Nevertheless, using various functional-analytic tools (which will not be discussed further here) it is possible to prove a result like Theorem 1.7 with the RHS replaced by Cγ n kvkLip |w|1 , where C > 0, γ ∈ (0, 1) are constants and kvkLip = |v|∞ + Lip v. Obtaining useful estimates on C and γ turns out to be an intractable problem, but this will not matter for the purposes of these lectures. 6 Let f : Λ → Λ be a transformation of a probability space (Λ, µ) where µ is assumed to be f -invariant (ie µ(f −1 E = E)) and ergodic (ie f E ⊂ R E implies that µ(E) = 0 or 1). We suppose that v : Λ → R lies in L∞ (Λ) and Λ v dµ = 0. Moreover, we require that for this fixed observable v, there are constants C > 0 and β > 1 such that Z C n (1.2) v w ◦ f dµ ≤ β |w|1 for all w ∈ L1 (Λ) and all n ≥ 1. n Λ Remark 1.9 This is an assumption on the dynamics f , the measure µ and the observable v. Again, such a condition cannot hold for general v ∈ L∞ (Λ), so this is still smooth ergodic theory even though smoothness is not mentioned anywhere.2 Proposition 1.10 (Martingale-coboundary decomposition) Assume that v satisfies (1.2). Then there exist m, χ ∈ L∞ (Λ) such that Z m dµ = 0, m ∈ ker P. (1.3) v = m + χ ◦ f − χ, Λ Proof It follows by duality (ie by making judicious choices of w ∈ L1 (Λ)) that |P n v|∞ ≤ Cn−β . Since β > 1, it follows that |P n v|∞ is summable, and hence following [8] we can define L∞ functions m, χ : Λ → R by setting χ= ∞ X P n v, m = v − χ ◦ f + χ = v − U χ + χ. n=1 R R R R R R RNotice that Λ m dµ = 0 (since Λ m dµ = Λ v − Λ χ ◦ f dµ + Λ χ dµ = − Λ χ dµ + χ dµ = 0). Λ By (P4), Pm = Pv − PUχ + Pχ = Pv − χ + Pχ = Pv − ∞ X n=1 P nv + ∞ X P n v = 0. n=2 In the remainder of this subsection, we explain why this decomposition might be useful. The key points are that (i) the CLT for v is equivalent to the CLT for m, and (ii) the sequence {m ◦ f j } has certain orthogonality properties that {v ◦ f j } does not. P Pn−1 j j For point (i), recall that vn = n−1 j=0 v ◦ f . Similarly, define mn = j=0 m ◦ f . Hence vn = mn +χ◦f n −χ. Moreover |χ◦f n −χ|∞ ≤ 2|χ|∞ so n−1/2 vn −n−1/2 mn → 0 a.e. Hence we have: 2 By now it should be clear that “smooth” does not necessarily mean differentiable. Assumptions such as Lipschitz or H¨ older count as smooth. 7 Corollary 1.11 Suppose that Y is a random variable. Then n−1/2 vn →d Y if and only if n−1/2 mn →d Y . Remark 1.12 The conclusion of Corollary 1.11 remains true if χ ∈ L1 (Λ), see the exercises. Concerning point (ii), we have the following orthogonality property. R Corollary 1.13 The sequence {m ◦ f j : j ≥ 0} is orthogonal: Λ m ◦ f j m ◦ f k dµ = 0 for all 0 ≤ j < k. More generally, Z m ◦ f j1 m ◦ f j2 · · · m ◦ f jr dµ = 0, Λ for all 0 ≤ j1 < · · · < jr , r ≥ 1. R Proof The case r = 1 is immediate by invariance of µ since Λ m dµ = 0. For r ≥ 2, Z Z j1 j2 jr m ◦ f m ◦ f · · · m ◦ f dµ = {m m ◦ f j2 −j1 · · · m ◦ f jr −j1 } ◦ f j1 dµ Λ ZΛ m m ◦ f j2 −j1 · · · m ◦ f jr −j1 dµ = ZΛ = m {m ◦ f j2 −j1 −1 · · · m ◦ f jr −j1 −1 } ◦ f dµ ZΛ P m m ◦ f j2 −j1 −1 · · · m ◦ f jr −j1 −1 dµ = 0 = Λ by Proposition 1.10. The next result proves Theorem 1.1(a), and in the process identifies σ 2 as Z Z 1 2 Proposition 1.14 lim vn dµ = m2 dµ. n→∞ n Λ Λ R Λ m2 dµ. R Pn−1 R Proof It follows from Corollary 1.13 that Λ m2n dµ = j=0 (m ◦ f j )2 dµ (since Λ R R the cross-terms vanish). Hence n−1 Λ m2n dµ = Λ m2 dµ. Since vn = mn + χ ◦ f n − χ, |vn |2 − |mn |2 ≤ |vn − mn |2 = |χ ◦ f n − χ|2 ≤ |χ ◦ f n |2 + |χ|2 = 2|χ|2 . Z 1 −1/2 −1/2 vn2 dµ = Hence n |vn |2 − n |mn |2 → 0. Taking squares, lim n→∞ n Λ Z Z 1 2 lim mn dµ = m2 dµ. n→∞ n Λ Λ 8 Lemma 1.15 (Bounded convergence theorem) R Suppose R that fn n ≥ 1. If |fn |∞ is bounded and fn → f a.e., then Λ fn dµ → Λ f dµ. ∈ L∞ (Λ), Lemma 1.16 (L´ evy continuity theorem) Suppose that Yn , n = 1, 2, . . . and Y are random variables with values in Rk . Then Yn →d Y in Rd if and only if limn→∞ E(eit·Yn ) = E(eit·Y ) for all t ∈ Rk . Proof of Theorem 1.1(b) By Corollary 1.11, it suffices to show that n−1/2 mn →d R 2 2 2 N (0, σ ) where σ = Λ m dµ. By the L´evy continuity theorem, it suffices to show R 1 2 2 −1/2 that limn→∞ Λ eitn mn dµ = e− 2 t σ for each fixed t ∈ R. We follow McLeish [13]. Starting from log(1 + ix) = ix + 12 x2 + O(|x|3 ), we obtain eix = (1 + ix) exp{− 21 x2 + O(|x|3 )}. Hence, −1/2 exp{itn mn } = n−1 Y exp{itn−1/2 m ◦ f j } = Tn e−Un , j=0 where Tn = n−1 Y (1 + itn−1/2 m ◦ f j ), j=0 Un = n−1 n−1 1 X X 3 j 1 21 m ◦ f = 2t m ◦ f + O 3/2 m2 ◦ f j + O(n−1/2 ). n j=0 n j=0 n j=0 1 21 t 2 n−1 X 2 j Now Tn = 1 + · · · where the remaining 2nR− 1 terms are of the form in Corollary 1.13. It follows from Corollary 1.13 that Λ Tn dµ = 1. Also, by the pointwise ergodic theorem, Z Un → 21 t2 Λ m2 dµ = 12 t2 σ 2 a.e. Hence Z Z Z 1 2 2 itn−1/2 mn − 21 t2 σ 2 −Un − 12 t2 σ 2 e dµ − e = Tn e dµ − e = Tn (e−Un − e− 2 t σ ) dµ, Λ Λ Λ where the final integrand converges to 0 a.e. We claim that the integrand is uniformly bounded. The result then follows from the bounded convergence theorem. Now, n−1 1 Y −1 2 j 2 2 . |Tn | = 1 + n t |m ◦ f | j=0 −1/2 m In particular, |Tn | ≥ 1. Since e−Un = eitn Finally, |Tn |∞ ≤ dn where dn = n−1 Y 1 + n−1 t2 |m|2∞ 21 n /Tn , it follows that |e−Un |∞ ≤ 1. n2 = 1 + n−1 t2 |m|2∞ → exp 21 t2 |m|2∞ . j=0 9 In particular, dn is bounded, so Tn is uniformly bounded. Corollary 1.17 σ 2 = 0 if and only if v = h ◦ f − h for some h ∈ L∞ (Λ). R Proof Since Λ m2 = σ 2 , it follows that if σ 2 = 0, then m = 0 a.e. and so the decomposition (1.3) gives v = h ◦ f − h with h = χ. Conversely, if v = h ◦ f − h, then vn = h ◦ f n − h so that |vn |∞ ≤ 2|h|∞ is bounded and hence σ 2 = 0. Remark 1.18 In the P casen of the doubling map, it follows from Lemma 1.5 and Corollary 1.6 that ∞ n=1 P v converges in the space of Lipschitz functions, so χ is Lipschitz. Hence in the situation of Corollary 1.17, it is possible Pp−1 toj evaluate pχ at p specific points. Suppose that f y = y for some y ∈ Λ. Then j=0 v(f y) = χ(f y) − χ(y) = 0. Hence degeneracy in the CLT (σ 2 = 0) imposes infinitely many degeneracy conditions on the observable v. This completes the proof of Theorem 1.1(c). In the generality of Subsection 1.4, we have only that χ ∈ L∞ (Λ), so χ(y) is not defined for specific points y. However it is often possible to show that if v is regular, and v = h ◦ f − h for some measurable h, then h is also regular. Such a result is called a Livˇsic regularity theorem. In this case, periodic points again place constraints on v. Remark 1.19 The results in this lecture relied on the assumption that |P n v|∞ is summable. This suffices for the uniformly expanding setting, and results for large classes of nonuniformly expanding systems can be reduced to this situation by inducing. As indicated in later lectures, there then exist standard ways to pass to invertible maps and flows. other hand, there are numerous ways to relax the assumption P∞On the n |P v|∞ < ∞. In particular, Tyran-Kami´ nska [19] used ideas of [12] to show n=1P Pn−1 k −3/2 n | P v| < ∞ is a sufficient condition for the CLT. that ∞ 2 n=1 k=0 Remark 1.20 Suppose that f : Λ → Λ is a transformation of a measure space Λ with ergodic invariant probability measure µ. Suppose that µ0 is a probability measure on Λ (not necessarily invariant) and that µ0 is absolutely with respect to µ. Pn−1continuous j Let v : Λ → R be measurable with vn = j=0 v ◦ f . By [7] (see also [21, −1/2 Corollary 1]), n vn →d Y on (Λ, µ) if and only if n−1/2 vn →d Y on (Λ, µ0 ). In particular, we have strong distributional convergence in Theorem 1.1(b), namely that n−1/2 vn →d N (0, σ 2 ) on (Λ, µ0 ) for any probability measure µ0 absolutely continuous with respect to µ. 10 2 Lecture 2: Multidimensional limit laws So far, we considered convergence in distribution for R-valued random variables. Now we consider random vectors with values in Rd and random elements with values in the Banach space C([0, 1], Rd ). We assume throughout that f : Λ → Λ is a transformation and that µ is an ergodic, invariant probability measure. 2.1 Multidimensional CLT Consider vector-valued observables v : Λ → Rd lying inRL∞ (Λ, Rd ), so v = (v 1 , . . . , v d ) where each v i is in L∞ (Λ). As usual, we assume that Λ v dµ = 0. Moreover, we suppose that each component v i satisfies condition (1.2), so there are constants C > 0, β > 1 such that Z C i n v w ◦ f dµ ≤ β |w|1 for all w ∈ L1 (Λ), n ≥ 1, i = 1, . . . , d. n Λ The transfer operator P acts naturally on vector valuedP observables with P v = ∞ n 1 d in (P v , . . . , P v ). Working coordinatewise, it follows that n=1 P v converges R L∞ (Λ, Rd ) and hence that v = m + χ ◦ f − χ where m, χ ∈ L∞ (Λ, Rd ), Λ m dµ = 0, and m ∈ ker P . R Define the covariance matrix Σ = Λ m mT dµ ∈ Rd×d . Note that ΣT = Σ. Z 1 Proposition 2.1 lim vn vnT dµ = Σ. n→∞ n Λ Z Z 1 i j Proof In coordinates, this reduces to showing that lim vn vn dµ = mi mj dµ. n→∞ n Λ Λ (Notice that (vn )i = (v i )n , which is hence abbreviated to vni .) The diagonal terms i = j can be treated as in the one-dimensional case, and the off-diagonal terms can be treated using polarization. The details are left to the exercises. The following elementary idea reduces convergence in distribution for random vectors in Rd to the 1-dimensional setting. Proposition 2.2 (Cramer-Wold device) Suppose that Yn , n = 1, 2, . . . and Y are random vectors in Rd . Then Yn →d Y in Rd if and only if c · Yn →d c · Y in R for all c ∈ Rd . Proof This is left as an exercise. 11 1 Theorem 2.3 (d-dimensional CLT) √ vn →d N (0, Σ). That is, n n o Z 1 1 lim µ x ∈ Λ : √ vn (x) ∈ I = exp − 21 y T Σ−1 y dy, d/2 1/2 n→∞ (det Σ) n I (2π) for all I = (a1 , b1 ) × · · · × (ad , bd ) ⊂ Rd . Moreover, det Σ = 0 if and only if there exists c ∈ Rd nonzero such that cT v = h ◦ f − h for some h ∈ L∞ (Λ). Proof We apply Proposition 2.2 with Yn = n−1/2 mn and Y ∼ N (0, Σ). It is elementary that AY ∼ N (0, AΣAT ) for all A ∈ Re×d . In particular, if c ∈ Rd , 1 2 2 then c · Y = cT Y ∼ N (0, σc2 ) where σc2 = cT Σc. Hence E(eit(c·Y ) ) = e− 2 t σc . On the other hand, Yn = n−1/2 vn and c · Yn = n−1/2 (c · v)n where (c · v)n is the one-dimensional Birkhoff sum starting from the observable c · v : Λ → R. Note that c · v = c · m + h ◦ f − h where h = c · χ. It follows from Theorem 1.1(b) that n−1/2 (c · v)n → N (0, σc2 ), R where σc2 = Λ (c · m)2 dµ. To prove the CLT, it remains to verify that σc2 = cT Σc for all c. But Z Z Z T T T c Σc = c mm dµ c = (c · m)(m · c) dµ = (c · m)2 dµ = σc2 Λ Λ Λ as required. For nondegeneracy,R note that det Σ = 0 if and only if there exists c 6= 0 such that Σc = 0. But cT Σc = Λ (c · m)2 dµ. Hence Σc = 0 if and only if c · m = 0 in which case c · v = h ◦ f − h with h = c · χ. 2.2 Statement of the weak invariance principle (WIP) To begin with, we revert to the case of real-valued observables v : Λ → R. Assume the framework of Subsection 1.4. By Theorem 1.1, n−1/2 vn →d N (0, σ 2 ). Define a sequence of continuous time processes Wn : [0, 1] → R as follows. Set 1 Wn (t) = √ vnt n for t = j , j = 0, 1, . . . , n. n Then linearly interpolate to obtain a continuous function Wn ∈ C[0, 1]. (So on the interval [ j−1 , nj ], the graph of Wn (t) is a straight line joining the points ( j−1 , √1n vj−1 ) n n and ( nj , √1n vj ).) 12 Note that Wn (t) = √1n v[nt] + O( √1n ) uniformly (the error is at most √1n |v|∞ ). Also for each t ≥ 0, it holds that Wn (t) →d t1/2 N (0, σ 2 ) =d N (0, σ 2 t). The function Wn is a random element in the Banach space C[0, 1] with the supnorm. (A random element is like a random variable but with values in an infinitedimensional Banach space.) The next result is called the WIP (also known as the Functional CLT). Theorem 2.4 (WIP) Let W be Brownian motion with variance σ 2 , where σ 2 is as in Theorem 1.1. Then Wn →w W in C[0, 1]. Remark 2.5 (a) Recall that a Brownian motion with variance σ 2 is a continuous time stochastic process {W (t); t ≥ 0} with W (0) = 0 such that • independent increments The increments W (t1 ) − W (t0 ), . . . , W (tk ) − W (tk−1 ) are independent for all 0 = t0 ≤ t1 ≤ · · · ≤ tk . • stationarity W (t) − W (s) =d W (t − s) for all t ≥ s ≥ 0. • normality W (t) =d N (0, σ 2 t) for all t ≥ 0. • continuous sample paths P(t 7→ W (t) is continuous) = 1. The last condition implies that we can regard W as a random element in C[0, 1]. (b) More concisely, W is the unique random element in C[0, 1] such that for all 0 = t0 ≤ t1 ≤ · · · ≤ tk ≤ 1, k ≥ 1, W (t1 ) − W (t0 ), . . . , W (tk ) − W (tk−1 ) ∼d N (0, Σ), where Σ = σ 2 diag{t1 − t0 , . . . , tk − tk−1 }. Note that it is not a priori clear that such a process exists or is unique. The first mathematical proof was by Wiener (1923). Continuity of sample paths can be viewed as a property of Brownian motion, but actually such a hypothesis is required in order to obtain uniqueness. (c) Weak convergence of random elements Wn →w W is analogous to convergence in distribution but now in the infinite-dimensional space C[0, 1]. That is, limn→∞ µ(Wn ∈ I) = P(W ∈ I) for all “nice” subsets of C[0, 1]. Here “nice” means that I is an open set with P(∂I) = 0. Another characterisation of weak convergence is the continuous mapping theorem. This was the original formulation of the WIP by Donsker, 1951. Theorem 2.6 (Continuous Mapping Theorem) Suppose that Yn , Y are random elements in C[0, 1]. The following are equivalent: (a) Yn →w Y in C[0, 1]. 13 (b) χ(Yn ) →d χ(Y ) in R for all continuous maps χ : C[0, 1] → R. Example 2.7 (a) Taking χ(g) = g(1) in Theorem 2.6 yields the CLT since χ(Wn ) = Wn (1) = n−1/2 vn and χ(W ) = W (1) =d N (0, σ 2 ). Hence the WIP implies the CLT and much more. This explains the terminology Functional CLT. (b) Theorem 2.6 holds also with R replaced by Rk for all k. (See the exercises.) Let t1 , . . . , tk ∈ [0, 1] and define χ : C([0, 1] → Rk by setting χ(g) = (g(t1 ), . . . , g(tk )). Then χ is continuous and it follows that if Yn →w Y , then (Yn (t1 ), . . . Yn (tk )) →d (Y (t1 ), . . . , Y (tk )). This is called convergence of finite-dimensional distributions. 2.3 Prokhorov’s Theorem Consider the sequence of random variables {Yn } in R where Yn = n (ie P(Yn = n) = 1) for each n. It is clear that Yn does not converge in distribution. The notion of tightness rules out this kind of example. Definition 2.8 A family A of real-valued random variables is tight if for any > 0, there exists L > 0 such that P(|Y | ≤ L) > 1 − for all Y ∈ A. Tightness is a necessary condition for convergence in distribution. Proposition 2.9 (a) Any real-valued random variable is tight (ie a family consisting of one random variable is tight). (b) If Yn →d Y in R, then {Yn } is tight. Proof See the exercises. The converse of part (b) is not true. For example, the sequence Yn = (−1)n is tight and does not converge in distribution, though it has plenty of convergent subsequences. We state without proof: Theorem 2.10 Suppose that {Yn } is a sequence of real-valued random variables. If {Yn } is tight then there exists a subsequence {Ynk : k ≥ 1} that converges in distribution. A standard method for showing that Yn →d Y in R is to (i) prove tightness and (ii) show that if a subsequence Ynk converges in distribution then the limit must be Y . The fact that this suffices is proved as follows. Suppose for contradiction that Yn 6→d Y . Then there exists b > 0 such that P(Yn < b) 6→ P(Y < b). Hence there is a subsequence Zk = Ynk such that P(Zk < b) is bounded away from P(Y < b). But tightness of {Yn } immediately implies tightness of {Zk }, so there is a subsequence {Zk` } that converges in distribution. Note that Zk` is also a subsequence of {Yn } so by (ii), Zk` →d Y . In particular P(Zk` < b) → P(Y < b) which contradicts the fact that P(Zk < b) is bounded away from P(Y < b). 14 Tightness in Rk is very similar. A sequence {Yn } of Rk -valued random variables is tight if there exists L > 0 such that P(Yn ∈ [−L, L]×· · ·×[−L, L]) > 1− for all n ≥ 1. Equivalently, there exists a compact subset K ⊂ Rk such that P(Yn ∈ K) > 1 − . The analogues of Proposition 2.9 and Theorem 2.10 remain true. Definition 2.11 A sequence of random elements {Yn } in C[0, 1] is tight if for any > 0 there is a compact subset K ⊂ C[0, 1] such that P(Yn ∈ K) > 1 − for all n ≥ 1. Prokhorov’s Theorem implies the analogues of Proposition 2.9 and Theorem 2.10. Theorem 2.12 (Prokhorov) Let Yn , Y ∈ C[0, 1] be random elements. Yn →w Y if and only if Then (i) (Convergence of finite-dimensional distributions) (Yn (t1 ), . . . , Yn (tk )) →d (Y (t1 ), . . . , Y (tk )) as n → ∞, for all t1 , . . . , tk ∈ [0, 1], k ≥ 1. (ii) (Tightness) The sequence {Yn } is tight: for any > 0 there is a compact set K ⊂ C[0, 1] such that Pn (Yn ∈ K) > 1 − for all n ≥ 1. Using tightness in C[0, 1] is much harder than in Rk since the characterisation of compact subsets is more complicated. Recall by the Arzel`a-Ascoli Theorem that K ⊂ C[0, 1] is compact if and only if K is closed, bounded (in the sup-norm) and equicontinuous: for any > 0, there exists δ > 0 such that |g(s) − g(t)| < for all s, t ∈ [0, 1] with |s − t| < δ and all g ∈ K. (So this is uniform continuity with the additional property that the same δ works for all g ∈ K.) One class of compact subsets of C[0, 1] is the following. Let γ ∈ (0, 1], R > 0. Define o n |g(t) − g(s)| ≤ R . BRγ = g ∈ C[0, 1] : g(0) = 0 and sup |t − s|γ s6=t It follows easily from the Arzel`a-Ascoli theorem that BRγ is compact in C[0, 1]. 2.4 Proof of the WIP To prove Theorem 2.4, it suffices to verify the two conditions in Theorem 2.12. As usual, it suffices to prove that Mn →w W where Mn ∈ C[0, 1] is defined in the same way as Wn but starting from m instead of v. In particular, Mn (t) = n−1/2 m[nt] + O(n−1/2 ). First we verify convergence of finite-dimensional distributions. By Cramer-Wold, it suffices to prove that exp{itc · (Mn (t1 ), . . . , Mn (tk ))} → exp{itc · (W (t1 ), . . . , W (tk ))}, 15 for all t ∈ R, c ∈ Rk . Equivalently, setting t0 = 0, it suffices to prove that eitYn → eitY for all t ∈ R, c ∈ Rk , where Yn = c · (Mn (t1 ) − Mn (t0 ), . . . , Mn (tk ) − Mn (tk−1 )), Y = c · (W (t1 ) − W (t0 ), . . . , W (tk ) − W (tk−1 )). Now Y = cT N (0, Σ) = N (0, σc2 ), where Σ = σ 2 diag{t1 − t0 , . . . , tk − tk−1 }, σc2 = cT Σc = σ 2 (c21 (t1 − t0 ) + · · · + c2k (tk − tk−1 )). Also, o 1 n Yn = √ c1 (m[nt1 ] − m[nt0 ] ) + · · · + ck (mntk − m[ntk−1 ] ) + O(n−1/2 ) n k 1 X =√ c` n `=1 [nt` ]−1 X j m ◦ f + O(n −1/2 j=[nt`−1 ] [ntk ]−1 1 X )= √ dj,n m ◦ f j + O(n−1/2 ), n j=0 where dj,n ∈ {c1 , . . . , ck }. Now we follow the proof of Theorem 1.1(b). Write eitYn = Tn e−Un where [ntk ]−1 Tn = Y (1 + itn−1/2 dj,n m ◦ f j ), j=0 [ntk ]−1 1 21 X 2 2 Un = t d m ◦ f j + O(n−1/2 ). 2 n j=0 j,n Then R Λ Tn dµ = 1 and |Tn |∞ is bounded as before. Also |e−Un |∞ ≤ 1 and k 1 1X 2 Un = t2 c 2 n `=1 ` [nt` ]−1 X m2 ◦ f j + O(n−1/2 ). j=[nt`−1 ] Note that 1 n [nt` ]−1 X j=[nt`−1 ] [nt`−1 ]−1 [nt` ]−1 1 X 2 1 X j m ◦f = m ◦f − m2 ◦ f j n j=0 n j=0 2 j [nt` ]−1 1 X 2 1 = t` m ◦ f j − t`−1 nt` j=0 nt`−1 [nt`−1 ]−1 X m2 ◦ f j j=0 2 → (t` − t`−1 )σ a.e. by the ergodic theorem. Hence 1 1 Un → t2 (c21 (t1 − t0 ) + · · · + c2k (tk − tk−1 ))σ 2 = t2 σc2 a.e. 2 2 16 R 1 2 2 and it follows as in the proof of Theorem 1.1 that Λ Tn e−Un dµ → e− 2 t σc as required. Next we check tightness. A useful criterion for tightness of Mn is given by [1, Lemma, p. 88]. Proposition 2.13 Suppose that √ lim lim sup λ2 µ(max |mk | ≥ λ n) = 0. λ→0 n→∞ k≤n Then {Mn } is tight.3 Corollary 2.14 The sequence {Mn } is tight in C[0, 1]. Proof Let p > 2. By Corollary 3.12 (which is postponed until after the definition R of martingale), there is a constant C > 0 such that Λ maxj≤n |mj |p dµ ≤ Cnp/2 for all n ≥ 1. By Markov’s inequality4 , Z 1/2 p p p/2 µ(max |mj | ≥ λn ) = µ(max |mj | ≥ λ n ) ≤ max |mj |p dµ/(λp np/2 ) ≤ C/λp . j≤n j≤n Λ j≤n Hence lim supn→∞ µ(maxj≤n |mj | ≥ λn1/2 ) ≤ C/λp and λ2 lim sup µ(max |mj | ≥ λn1/2 ) ≤ C/λp−2 → 0 n→∞ j≤n as λ → ∞. The result follows from Proposition 2.13. 2.5 Multidimensional WIP Obviously we can now generalise to vector-valued observables v : Λ → Rd as in Subsection 2.1. The sequence of random elements Wn ∈ C([0, 1], Rd ) is defined exactly R as in the one-dimensional case. Let Σ = Λ m mT dµ (corresponding to the decomposition v = m + χ ◦ f − χ). Define W ∈ C([0, 1], Rd ) to be d-dimensional Brownian motion with covariance Σ (the definition is the same in the 1-dimensional case, except that N (0, σ 2 t) is replaced by N (0, Σt)). To prove the multi-dimensional WIP, it again suffices to check convergence of finite-dimensional distributions (which follows from Cramer-Wold; of course the formulas are even messier) and tightness (which follows componentwise from the 1dimensional result). Remark 2.15 As in Remark 1.20, we have strong distributional convergence in the limit laws in this section, so they hold for all probability measures µ0 absolutely continuous with respect to µ. Moreover, it again suffices that the limit law holds for at least one µ0 . For the WIP, this is [21, Corollary 3]. 3 The result in [1] is for processes arising from stationary sequences, not necessarily martingales. A sequence {v ◦ f n , n ≥ 0} is stationary precisely when f is measure preserving. 4 Recall that Markov’s inequality states that if X is a nonnegative random variable, then P(X > λ) ≤ λ−1 EX for all λ > 0. 17 3 Lecture 3: Extensions 3.1 Martingales; the CLT and WIP revisited Definition 3.1 A sequence of L1 random variables {Sn ; n ≥ 1} is a martingale if E(Sn+1 |S1 , . . . , Sn ) = E(Sn ) for all n ≥ 1. More generally, the sequence Sn is a martingale with respect to an increasing sequence of σ-algebras F1 ⊂ F2 ⊂ . . . (called a filtration) if for all n ≥ 1 (i) Sn is Fn -measurable. (ii) E(Sn+1 |Fn ) = Sn . Remark 3.2 The meaning of E(Sn+1 |S1 , . . . , Sn ) = E(Sn ) is familiar from elementary probability. Conditional expectation operators of the type E(Sn+1 |Fn ) are defined as follows. Let (Λ, M, µ) be the underlying probability space and let A ⊂ M be a σ-algebra. If Y ∈ L1 (Λ), then the conditional expectation E(Y |A) is defined to be the unique A-measurable function satisfying the relation Z Z E(Y |A) dµ = Y dµ, for all A ∈ A. A A It can be shown that E(Z|Y1 , . . . , Yn ) = E(Y |A) where A is the σ-algebra generated by Y1 , . . . , Yn . Hence E(Y |A) generalises the elementary definition of conditional expectation. Proposition 3.3 (a) If Y ∈ L1 (Λ) is A-measurable, then E(Y |A) = Y . (b) Suppose that (Λ1 , M1 , µ1 ), (Λ2 , M2 , µ2 ) are probability spaces and π : Λ1 → Λ2 is measure-preserving. Suppose that Y ∈ L1 (Λ2 ) and that A ⊂ M2 is a σalgebra. Then E(Y ◦ π|π −1 A) = E(Y |A) ◦ π. Proof Part (a) is immediate from the definitions. For part (b), it is required to show that the random variable E(Y |A) ◦ π satisfies the required properties on Λ1 . Certainly it is π −1 A-measurable since E(Y |A) is measurable, and for A ∈ A Z Z Z Z E(Y |A) ◦ π dµ1 = E(Y |A) dµ2 = Y dµ2 = Y ◦ π dµ1 . π −1 A A A π −1 A Proposition 3.4 PnSuppose that {Fn ; n ≥ 1} is an increasing sequence of σ-algebras and that Sn = k=1 dk . Then {Sn ; n ≥ 1} is a martingale wrt {Fn ; n ≥ 1} if and only for all k ≥ 1, (i) E|dk | < ∞, (ii) dk is Fk -measurable, (iii) E(dk+1 |Fk ) = 0. 18 Proof This is an easy exercise. We now state without proof the martingale CLT/WIP. Theorem 3.5 (Brown [3]) Let f : Λ → Λ be an ergodic measure-preserving transPn−1 2 j formation, and let Y ∈ L (Λ) with EY = 0.. Suppose that Sn = j=0 Y ◦ f is a martingale. Then the CLT and WIP are valid. That is n−1/2 Sn →d N (0, σ 2 ) where σ 2 = EY 2 , and if we define Qn ∈ C[0, 1], Qn (t) = n−1/2 S[nt] + O(n−1/2 ), then Qn →w W where W is Brownian motion with variance σ 2 . We referred to (1.3) as a martingale-coboundary decomposition, suggesting that m ∈ ker P means that m is a martingale in some sense. The next lemma points towards this – though time goes in the wrong direction! Lemma 3.6 Let (Λ, B, µ) be the underlying probability space. (a) B ⊃ f −1 B ⊃ f −2 B ⊃ . . . is a decreasing sequence of σ-algebras, and m ◦ f n is f −n B-measurable for all m ∈ L1 (Λ), n ≥ 0. (b) U P = E( · |f −1 B). Proof Part (a) is obvious. Part (b) is left as an exercise. If f : Λ → Λ were invertible, then we could define the (increasing) filtration P−1 j n − = Fn = f B and consider the backward Birkhoff sums mn j=−n m ◦ f . To make sense of the backward sums we pass to the natural extension. We state without proof: Proposition 3.7 Suppose that f : Λ → Λ is a surjective transformation and that µ is an f -invariant probability measure. There exists an invertible transformation ˜ →Λ ˜ with an f˜-invariant probability measure µ f˜ : Λ ˜, as well as a measure-preserving ˜ ˜ projection π : Λ → Λ such that π ◦ f = f ◦ π. If µ is ergodic, then the construction can be chosen so that µ ˜ is ergodic. Remark 3.8 The natural extension is a minimal extension of the type in Proposition 3.7 and is unique up to isomorphism. These facts are not required here. Let F0 = π −1 B 5 and define Fn = f˜n F0 . Notice that F−1 = f˜−1 F0 = f˜−1 π ˜ −1 B = π −1 f −1 B. Since f −1 B ⊂ B it follows that F−1 ⊂ F0 . Hence Fn ⊂ Fn+1 for all n ∈ Z. In particular, the sequence of σ-algebras {Fn , n ≥ 1} defines a filtration. 5 ˜ F0 is a proper σ-subalgebra of the underlying σ-algebra on Λ. 19 R Next, let m : Λ → R, m ∈ L2 (Λ), Λ m dµ = 0. Define the lifted observable ˜ → R and the forward and backward Birkhoff sums m ˜ =m◦π :Λ m ˜n = n−1 X m ˜ ◦ f˜j , m ˜− n j=0 = −1 X m ˜ ◦ f˜j . j=−n Since π is measure-preserving, it is immediate that R 2 m dµ. Λ R ˜ Λ m ˜ d˜ µ = 0 and R ˜ Λ m ˜ 2 d˜ µ = Proposition 3.9 If m ∈ ker P , then m ˜− n is a martingale. Proof This is left as an exercise. Proof of the CLT: martingale proof It follows from PropoR Theorem R3.5 and 2 2 2 2 −1/2 − ˜ d˜ µ = Λ m dµ. But sition 3.9 that n m ˜ m →d N (0, σ ) where σ = Λ˜ m 2 −n −1/2 − ˜ ˜ n ◦ f =d m ˜ n so n mn =d n−1/2 m ˜− m ˜ n = mn ◦ π =d mn and m ˜n = m n →d N (0, σ ). −1/2 2 Hence n vn →d N (0, σ ). Proof of the WIP: martingale proof Define Wn , Mn in C[0, 1] as in Lecture 2. ˜ µ These are defined on the probability space (Λ, µ). On (Λ, ˜), we define the corre− f f sponding forward and backward processes Mn , Mn in C[0, 1] such that fn (t) = n−1/2 m M ˜ [nt] + O(n−1/2 ), −1/2 fn− (t) = n−1/2 m M ˜− ). [nt] + O(n f− →w W . It follows from Theorem 3.5 and Proposition 3.9 that M n We claim that there is a continuous map χ : C[0, 1] → C[0, 1] such that fn ◦ f˜−n = χ(M fn− ), M χ(W ) =d W (3.1) 6 By the infinite-dimensional version of the continuous mapping theorem [1, p. 26], fn− ) →w χ(W ) =d W . Equation (3.1) states that M fn =d χ(M fn− ), so M fn →w W . χ(M fn = Mn ◦ π =d Mn , so Mn →w W and hence Wn →w W . Moreover M It remains to verify the claim. Consider the continuous function χ : C[0, 1] → C[0, 1] given by χ(g)(t) = g(1) − g(1 − t). A calculation (in the exercises) shows that fn ◦ f˜−n = χ(M f− ), and it follows easily from the definition of Brownian motion that M n χ(W ) = W . Remark 3.10 (a) For the CLT and WIP, it suffices that the martingale-coboundary decomposition holds in L2 (ie m, χ ∈ L2 (Λ)). Indeed the argument above shows that Mn →w M requires only that m lies in L2 . Moreover, supt∈[0,1] |Wn (t) − Mn (t)| → 0 a.e. provided that χ lies in L2 . The last statement is left as an exercise. (b) The multi-dimensional versions of the CLT and WIP again follow from the Cramer-Wold device. 6 Here =d signifies equality in distribution as elements of C[0, 1]. 20 Another standard result about martingales is the following: Theorem 3.11 (Burkholder’s inequality [4]) Let p ≥ 2. Suppose that Sn = Pn d is a martingale with dk ∈ Lp for k ≥ 1. There is a universal constant k=1 k Cp such that | max1≤j≤n |Sj | |p ≤ Cp n1/2 maxk≤n |dk |p for all n ≥ 1. Corollary 3.12 Let p > 2. There exists C > 0 such that | maxj≤n |mj ||p ≤ Cn1/2 for all n ≥ 1. Proof This is left as an exercise. 3.2 Invertible systems First of all, a negative result: Proposition 3.13 Suppose that f : Λ → Λ is invertible with invariant measure µ. Then P v = v ◦ f −1 . In particular P = U −1 . Proof R Λ P v w dµ = R Λ v U w dµ = R Λ v w ◦ f dµ = R Λ v ◦ f −1 w dµ. This means that the set up in Subsection 1.4 is never satisfied for invertible dynamical systems since ker P = {0}. 7 However in many situations it is possible to use a method of Sinai [17] and Bowen [2] to reduce to the noninvertible case. Roughly speaking, v = vˆ + χ1 ◦ f − χ1 where vˆ is constant along stable manifolds and projects to an observable on a noninvertible dynamical system.8 We then obtain a martingale-coboundary decomposition at the level of the noninvertible system and the coboundary there can be combined with the coboundary χ1 ◦ f − χ1 . This can be formalised as follows: Let f : Λ → Λ be our invertible transformation with invariant ergodic measure µ. Suppose that there is an associated noninvertible ¯ →Λ ¯ with invariant ergodic measure µ transformation f¯ : Λ ¯. Moreover, suppose that ¯ there is a measure-preserving semiconjugacy π : Λ → Λ, satisfying π ◦ f = f¯ ◦ π. Let P denote the transfer operator for f¯. R Definition 3.14 Let v ∈ L∞ (Λ, Rd ) with Λ v dµ. We say that v admits a martingale¯ Rd ) and χ ∈ L∞ (Λ, Rd ) such that coboundary decomposition if there exist m ∈ L∞ (Λ, v = m ◦ π + χ ◦ f − χ, m ∈ ker P. (3.2) Given the decomposition (3.2), the situation is roughly as follows. (1) Statistical limit laws for v are equivalent to statistical limit laws for m ◦ π. 7 8 There still exist results on decay of correlations, but these require regularity for w as well as v. Even if v is C ∞ , usually vˆ is only H¨older. 21 (2) Since π∗ µ = µ ¯, statistical limit laws for m ◦ π on (Λ, µ) are equivalent to ¯ µ statistical limit laws for m on (Λ, ¯). (3) Since m ∈ ker P , statistical limit laws for m follow as in the previous sections. Here are more details. Let m ˆ = m ◦ π, so v = m ˆ + χ ◦ f − χ. R R R • Since π∗ µ = µ ¯ and m ˆ = m◦π, it is immediate that Λ¯ m d¯ µ = Λm ˆ dµ = Λ v dµ = 0. R • Define Σ = Λ¯ m mT d¯ µ. Then Z Z T Σ = (m m ) ◦ π dµ = m ˆm ˆ T dµ. Λ Λ ˆ ◦ f j . Since π ◦ f j = f¯j ◦ π, it follows ˆ n = j=0 m • Define mn = j=0 m ◦ f¯j . and m that m ˆ n = mn ◦ π. Since m ∈ ker P , it follows easily that Z Z T ˆnm ˆ Tn dµ. Σ = n mn mn d¯ µ=n m Pn−1 Pn−1 ¯ Λ Λ R −1 • As in the noninvertible case, n R that limn→∞ n−1 Λ vn vnT dµ = Σ. v v T dµ − n−1 Λ n n R Λ m ˆnm ˆ Tn dµ → 0, so we obtain • Since m ˆ n = mn ◦ π, the CLT for m implies the CLT for m ˆ and hence v; similarly for the WIP. Remark 3.15 Again, it suffices that m, χ lie in L2 . 3.3 Flows Suppose that y˙ = g(y) is an ODE on Rm with flow φt : Rm → Rm . Let Y ⊂ Rm be a (local) Poincar´e cross-section to the flow (dim Y = m − 1). Define the first hit time τ : Y → R+ and Poincar´e map f : Y → Y given by τ (y) = min{t > 0 : φt (y) ∈ Y }, f (y) = φτ (y) (y). For convenience, we assume that there are constants C2 ≥ C1 > 0 such that C1 ≤ τ (y) ≤ C2 for all y ∈ Y . Invariant sets and probability measures If Λ is an invariant set for f , then S Ω = t≥0 φt (Λ) is an invariant set for φt . Moreover, if µΛ is an ergodic invariant probability measure for f : Λ → Λ, then there is a natural method to construct an ergodic invariant probability measure µΩ for φt : Ω → Ω. (Here, invariant means that µΩ (φt E) = µΩ (E) for all t and all measurable sets E. Ergodic means that if E ⊂ Ω is invariant, then µΩ (E) = 0 or 1.) To construct µΩ , consider the suspension Λτ = {(y, u) ∈ Λ × R : 0 ≤ u ≤ τ (y)}/ ∼, 22 where (y, τ (y)) ∼ (f y, 0), and the suspension flowR ft : Λτ → Λτ where ft (y, u) = (y, u + t) computed modulo identifications. Let τ¯ = Λ τ dµΛ . Then µΛτ = µΛ ×Lebesgue/¯ τ is an ergodic invariant probability measure9 for the suspension flow. Finally, it is clear that p : Λτ → Ω, p(y, u) = φu (y), defines a semiconjugacy (φt ◦ p = p ◦ ft ) and so µΩ = p∗ µΛτ is the desired measure. Observables and sequences of random elements Given v : Ω → Rd lying in R L∞ with Ω v dµΩ = 0, define the induced observable vˆ : Λ → Rd , Z τ (y) vˆ(y) = v(φt y) dt. 0 R Then vˆ ∈ L∞ (Λ, Rd ) and Λ vˆ dµΛ = 0. cn ∈ C([0, 1], Rd ) as before, so For the map f : Λ → Λ, define vˆn and W vˆn = n−1 X vˆ ◦ f j , cn (t) = n−1/2 vˆ[nt] + O(n−1/2 ). W j=0 For the flow φt : Ω → Ω, define vt and Wn ∈ C([0, 1], Rd ) by setting Z t v ◦ φs ds, Wn (t) = n−1/2 vnt . vt = 0 The following purely probabilistic result requires no hypotheses of a smooth ergodic theoretic nature. We require that µΛ (and hence µΩ ) is ergodic and we continue to assume (for simplicity) that v ∈ L∞ (Ω, Rd ) and τ, τ −1 ∈ L∞ (Λ). cn →w W c in C([0, 1], Rd ) on (Λ, µΛ ) where W c is a Theorem 3.16 Suppose that W b d-dimensional Brownian motion with covariance Σ. d Then Wn →w W in C([0, 1], R ) on (Ω, µΩ ) where W is a d-dimensional Brownian b τ. motion with covariance Σ = Σ/¯ Proof This proof follows [10] (which is based on [16]). First, since p : Λτ → Ω is measure preserving, we can replace (Ω, µΩ ) throughout with (Λτ , µτ ). The proof divides into two steps: cn , pass from convergence on (Λ, µΛ ) to convergence on (Λτ , µΛτ ). 1. Working with W cn to Wn . 2. Working on (Λτ , µΛτ ), pass from convergence of W Step 1. Passing from (Λ, µΛ ) to (Λτ , µΛτ ). 9 This notation means that R Λτ v dµΛτ = (1/¯ τ) R R τ (y) Λ 0 23 v(y, u) du dµΛ . cn to Λτ by setting W cn (x, u) = W cn (x). Recall that τ ≥ C1 . Form the Extend W τ probability space (Λ , µC1 ) where µC1 = (µ × Lebesgue|[0,C1 ] )/C1 . Then it is immedicn that W cn →w W c on (Λτ , µC1 ). Since µC1 is absolutely ate from the hypothesis on W continuous with respect to µΛτ , it follows by strong distributional convergence (Recn →w W c on (Λτ , µΛτ ). mark 2.15) that W cn to Wn . Step 2: Passing from W For technical reasons, rather than working in C([0, 1], Rd ), it is convenient to work in the space D([0, 1], Rd ) of cadlag (continuous on the right, limits on the left) functions. (The acronym comes from the french: “continue a` droite, limite `a gauche”.) Most of the time, we can work topology on D([0,P 1], Rd ). R t in the sup-normP n−1 j Recall the notation vt = 0 v ◦ φs ds, vˆn = j=0 vˆ ◦ f j , τn = n−1 j=0 τ ◦ f . Also, cn (t) = n−1/2 vˆ[nt] . Wn (t) = n−1/2 vnt and we redefine W For (x, u) ∈ Λτ and t > 0, we define the lap number N (t) = N (y, u, t) ∈ N: N (t) = max{n ≥ 0 : τn (x) ≤ u + t}. Then cn (N (nt)/n) + O(n−1/2 ) Wn (t) = n−1/2 vnt = n−1/2 vˆN (nt) + O(n−1/2 ) = W cn ◦ gn (t) + O(n−1/2 ) =W where gn : Λτ → D[0, 1] is a sequence of random elements given by gn (t) = N (nt)/n. Hence it remains to prove that cn ◦ gn →w (¯ c W τ )−1/2 W in D([0, 1], Rd ) on (Λτ , µΛτ ). (3.3) Define g¯(t) = t/¯ τ . By the exercises, sup[0,1] |gn − g| → 0 a.e. on (Λτ , µΛτ ). By the cn →w W c on (Λτ , µΛτ ). Since g¯ is not random, (W cn , gn ) →w (W c , g¯) on previous step, W cn ◦ gn →w W c ◦ g. (Λτ , µΛτ ).10 It follows from the continuous mapping theorem that W c ◦ g(t) = W c (t/¯ c (t/¯ c (t), t ≥ 0} on (Λτ , µΛτ ), But W τ ) and {W τ ), t ≥ 0} =d {(¯ τ )−1/2 W completing the proof of (3.3). Corollary 3.17 If the Poincar´e map f : Λ → Λ and induced observable vˆ : Λ → Rd lie within the set up in Subsection 3.2, so in particular, vˆ admits a martingalecoboundary decomposition (3.2), then Wn →w W in C([0, 1], Rd ) where W is a dR dimensional Brownian motion with covariance Σ = (¯ τ )−1 Λ m mT dµΛ . Note that in the situation of this corollary, there are no mixing assumptions on the flow. So for example, the WIP holds for any nontrivial uniformly hyperbolic 10 Since D([0, 1], Rd ) is not separable, there is a technical issue here. But since the limit processes c W and g are continuous, convergence in the sup norm topology is equivalent to convergence in the standard Skorokhod topology which is separable. 24 (Axiom A) attractor even though such attractors need not be mixing. (Even when they are mixing, it is usually hard to tell, and even whey are known to be mixing it may be hard to prove it. The corollary shows that it does not matter.) 25 4 Lecture 4: Fast-slow systems Let y˙ = g(y) be an ODE on Rm generating a flow φt : Rm → Rm with invariant set Ω and ergodic invariant probability measure µ supported on Ω. We consider the fast-slow system of ODEs x˙ = a(x , y ) + −1 b(x )v(y ), x (0) = ξ ∈ Rd , (slow equation) y˙ = −2 g(y ), y (0) = η ∈ Ω, (fast equation) where a : Rd × Rm → Rd , b : Rd → Rd×e are C 3 (with globally bounded derivatives11 ) e and v : Ω → R 3.3. In particular, v ∈ R satisfies the conditions from Subsection ∞ e d L (Ω, R ) and Ω v dµ = 0. The initial condition ξ ∈ R is fixed throughout (but of course η ∈ (Ω, µ) is the sole source of randomness in the fast-slow system).12 For the moment, the WIP suffices. So define the family of random elements W ∈ C([0, 1], Re ), Z t W (t) = vt−2 , v ◦ φs ds. vt = 0 Then we assume that W →w W in C([0, 1], Re ) where W ∈ C([0, 1], Re ) is edimensional Brownian motion with covariance Σ ∈ Re×e . Proposition 4.1 The slow ODE can be rewritten as ˙ , x˙ = a(x , y ) + b(x )W x (0) = ξ, (4.1) Proof This is left as an exercise. More suggestively, we can write dx = a(x , y ) dt + b(x ) dW , x (0) = ξ. Since W →w W in C([0, 1], Re ), a natural guess is that X →w X in C([0, 1], Rd ) where dX = a ¯(X) dt + b(X) ? dW. R R Here a ¯(x) = Ω a(x, y) dµ(y) and b(X) ? dW is some kind of stochastic integral. The aim is to prove such a result, and in the process to determine the nature of the stochastic integral. 11 Boundedness of the derivatives is not required but simplifies the arguments (and guarantees global existence of solutions). 12 Such systems are called skew products because the fast equation does not depend on the slow variable. Introducing such dependence raises numerous difficulties and we refer to [5, 6] for results in this direction. 26 4.1 Diversion: Wong-Zakai approximation Consider the simpler case where a is a function only of x and a ¯(x) = a(x). The Wong-Zakai approximation problem seeks the weak limit of the solution of the ODE dx = a(x ) dt + b(x ) dW given that W →w W . Since W and x are C 1 functions of t, this is a question about smooth approximation of stochastic processes. In a probabilistic set up, Wong-Zakai [20] gave sufficient conditions under which x →w X where X satisfies the Stratonovich SDE dX = a(X) dt + b(X) ◦ dW. These conditions are automatically satisfied in one dimension, but not necessarily in higher dimensions. McShane [14] gave counterexamples in two dimensions, and Sussmann R [18] showed that there are lots of possible interpretations for the stochastic integral b(X) ? dW that arise in two dimensions. The problem arises from the arbitrariness of how to interpolate (how to join the dots) when defining Wn (t) for tn not an integer. But for flows, there is a canonical choice. So a byproduct of solving the fast-slow problem is a definitive solution [10] to the Wong-Zakai approximation problem. 4.2 No multiplicative noise The first result deals with the case where d = e and b is the d × d identity matrix (so there is no stochastic integral to worry about). Theorem 4.2 (Melbourne-Stuart [15]) Consider the case d = e, b = I. W →w W in C([0, 1], Rd ), then X →w X in C([0, 1], Rd ), where dX = a ¯(X) dt + dW, If X(0) = ξ. ˙ , so Proof Case 1: a ≡ 0. The slow equation (4.1) is just x˙ = W x = ξ + W →w X where dX = dW , X(0) = ξ. ˙ , so x (t) = Case 2: a(x, ¯(x). The slow equation (4.1) is x˙ = a ¯(x ) + W R ty) ≡ a ξ + W (t) + 0 a ¯(x (s)) ds. Consider the map G : C([0, 1], Rd ) → C([0, 1], Rd ) given by G(f ) = u where Z t u(t) = ξ + f (t) + a ¯(u(s)) ds. 0 Then we have shown that x = G(W ). It follows just as in standard ODE theory (existence, uniqueness, continuous dependence on initial data) that G : C([0, 1], Rd ) → C([0, 1], Rd ) is continuous. Moreover, an infinite-dimensional version of the continuous mapping theorem shows that G preserves weak convergence. Hence G(W ) →w G(W ) and so x →w X where X = G(W ). 27 Finally, X = G(W ) means that X(t) = ξ + W (t) + dX = a ¯(X) dt + dW , X(0) = ξ, as required. Rt 0 a ¯(X(s)) ds. In other words, General case: Let a ˜(x, y) = a(x, y) − a ¯(x) and define Z (t) = Then Z t x (t) = ξ + W (t) + Z (t) + a ¯(x (s)) ds. Rt 0 a ˜(x (s), y (s)) ds. 0 Hence x = G(W + Z ) where G is the continuous map defined in Case 2. We claim that Z → 0 in probability in C([0, 1], Rd ) and hence that W →w W in C([0, 1], Rd ). By the continuous mapping theorem, x →w X = G(W ) completing the proof. It remains to verify the claim. First we consider the case where there exists Q0 > 0 such that |x (t)| ≤ Q0 for all ≥ 0 and all t ∈ [0, 1]. (This proof also applies to the case where a is periodic and the slow equations lie on Td rather than Rd .) Note that |x (s) − x (t)| ≤ (|a|∞ + −1 |v|∞ )|s − t|. Hence Z (t) = Z ([t−3/2 ]3/2 ) + O(3/2 ) [t−3/2 ]−1 Z (n+1)3/2 X = a ˜(x (s), y (s)) ds + O(3/2 ) n3/2 n=0 = [t−3/2 ]−1 Z (n+1)3/2 X n3/2 n=0 [t−3/2 ]−1 = X n=0 a ˜(x (n3/2 ), y (s)) ds + O(1/2 ) 2 Z (n+1)−1/2 a ˜(x (n3/2 ), y1 (s)) ds + O(1/2 ) n−1/2 [t−3/2 ]−1 = X 3/2 J (n) + O(1/2 ), n=0 R (n+1)−1/2 where J (n) = 1/2 n−1/2 a ˜(x (n3/2 ), y1 (s)) ds. Hence [T −3/2 ]−1 max |Z | ≤ [0,1] X 3/2 |J (n)| + O(1/2 ). (4.2) n=0 For u ∈ Rd fixed, we define Z (n+1)−1/2 1/2 J˜ (n, u) = Au ◦ φs ds, Au (y) = a ˜(u, y). n−1/2 R R Note that J˜ (n, u) = J˜ (0, u) ◦ φn−1/2 , and so Ω |J˜ (n, u)| dµ = Ω |J˜ (0, u)|. By the R ergodic theorem Ω |J˜ (0, u)| dµ → 0 as → 0 for each n and u. For any r > 0, there exists a finite subset S ⊂ Rd such that dist(x, S) ≤ r/(2 Lip a) for any x with |x| ≤ Q0 . Then for all n ≥ 0, > 0, X |J (n)| ≤ |J˜ (n, u)| + r. u∈S 28 Hence by (4.2), [−3/2 ]−1 Z X max |Z | dµ ≤ Ω [0,1] 3/2 n=0 X 3/2 n=0 ≤ XZ u∈S XZ |J˜ (n, u)| dµ + r + O(1/2 ) Ω u∈S [−3/2 ]−1 = XZ |J˜ (0, u)| dµ + r + O(1/2 ) Ω |J˜ (0, u)| dµ + r + O(1/2 ). Ω u∈S Since r > 0 is arbitrary, we obtain that max[0,1] |Z | → 0 in L1 (Ω, Rd ), and hence in probability, as → 0. Finally, we drop the assumption that x is bounded. Let Q > 0 and write Z = ZQ,1 + ZQ,2 where ZQ,1 (t) = Z (t)1B (Q) , ZQ,2 (t) = Z (t)1B (Q)c , B (Q) = max |x | ≤ Q . [0,1] Then for all n ≥ 0, > 0, 1B (Q) |J (n)| ≤ X |J˜ (n, u)| + r. u∈S Hence by (4.2), [−3/2 ]−1 Z max |ZQ,1 | dµ Ω [0,1] ≤ X 3/2 n=0 X 3/2 n=0 ≤ u∈S XZ u∈S XZ |J˜ (n, u)| dµ + r + O(1/2 ) Ω u∈S [−3/2 ]−1 = XZ |J˜ (0, u)| dµ + r + O(1/2 ) Ω |J˜ (0, u)| dµ + r + O(1/2 ). Ω Since r > 0 is arbitrary, we obtain for each fixed Q that max[0,1] |ZQ,1 | → 0 in L1 (Ω, Rd ), and hence in probability, as → 0. Next, since x − W is bounded on [0, 1], for Q sufficiently large µ max |ZQ,2 | > 0 ≤ µ max |x | ≥ Q ≤ µ max |W | ≥ Q/2 . [0,1] [0,1] [0,1] Fix c > 0. Increasing Q if necessary, we can arrange that µ{max[0,1] |W | ≥ Q/2} < c/4. By the continuous mapping theorem, max[0,1] |W | →d max[0,1] |W |. Hence there exists 0 > 0 such that µ{max[0,1] |W | ≥ Q/2} < c/2 for all ∈ (0, 0 ). For such , µ max |ZQ,2 | > 0 < c/2. [0,1] 29 Shrinking 0 if necessary, we also have that µ{max[0,1] |ZQ,1 | > c/2} < c/2. Hence µ{max[0,1] |Z | > c} < c as required. 4.3 One-dimensional multiplicative noise The next result deals with the case of one-dimensional multiplicative noise. Recall R that b(X) ◦ dW denotes the Stratonovich integral. The only fact that we require about this integral is that it transforms according to the usual laws of calculus. Theorem 4.3 (Gottwald-Melbourne [9]) Consider the case d = e = 1. W →w W in C([0, 1], R), then X →w X in C([0, 1], R), where dX = a ¯(X) dt + b(X) ◦ dW, If X(0) = ξ. Proof Write z = h(x ) where h0 = 1/b. Then ˙ = A(z , y ) + W ˙ , z˙ = h0 (x )x˙ = b(x )−1 a(x , y ) + W z (0) = h(ξ), where A(z, y) = h0 (h−1 (z))a(h−1 (z), y). By Theorem 4.2, z →w Z where dZ = ¯ A(Z) dt + dW . Here Z ¯ A(z) = A(z, y) dµ(y) = h0 (h−1 (z))¯ a(h−1 (z)). Ω By the continuous mapping theorem, x = h−1 (z ) →w h−1 (Z), so it remains to determine X = h−1 (Z). Since the Stratonovich integral transforms according to the standard laws of calculus, ¯ dX = (h−1 )0 (Z) ◦ dZ = h0 (X)−1 [A(Z) dt + ◦ dW ] = h0 (X)−1 [h0 (X)¯ a(X) dt + ◦ dW ] = a ¯(X) dt + b(X) ◦ dW, as required. Remark 4.4 (a) This result extends classical Wong-Zakai approximation [20] from the probabilistic setting to the deterministic setting. Clearly the same argument works in higher dimensions provided that b−1 = dh for some h : Rd → Rd . Of course, it is usually not the case that b−1 has this form in higher dimensions. Indeed, [20] gave sufficient conditions for convergence with b(X) ◦ dW Stratonovich, and these conditions were shown in the probabilistic setting to be automatic in one dimension but not in higher dimensions. Moreover, McShane [14] gave counterexamples in two-dimensions, and Sussmann [18] showed that numerous different interpretations for the stochastic integral could arise in two dimensions. (b) The proof of Theorem 4.3R works for general stochastic processes W provided that there is a stochastic integral W ◦dW that transforms according to the standard laws of calculus. In the case of a one-dimensional L´evy process, the appropriate integral is the “Marcus” integral. This case is treated in [9] (the scalings in the fast-slow ODE are modified accordingly). 30 4.4 Two examples Before considering the general slow equation (4.1), it is instructive to consider two special cases. Take a ≡ 0 and ξ = 0. Let d = e = 2 and consider the examples defined by 1 0 1 0 1 2 1 2 . b(x , x ) = , b(x , x ) = x1 0 0 x1 In the first case, the slow equation (4.1) becomes ˙ 1 , x˙ 1 = W so x1 = W1 , x2 (t) ˙ 1 , x˙ 2 = x1 W t Z = 0 ˙ 1 (s) ds = 1 [W 1 (t)]2 . W1 (s)W 2 The mapping χ : C([0, 1], R ) → C([0, 1], R2 ) given by χ(g) = (g 1 , 21 (g 1 )2 ) is continuous, so by the continuous mapping theorem x = χ(W1 ) →w χ(W 1 ) = (W 1 , 21 (W 1 )2 ). Finally, X = χ(W 1 ) = (W 1 , 21 (W 1 )2 ) satisfies the SDE 2 dX 1 = dW 1 , dX 2 = W 1 ◦ dW 1 = X 1 ◦ dW 1 , or simply dX = b(X) ◦ dW . In the second case, the slow equation (4.1) becomes ˙ 1 , x˙ 1 = W so x1 = W1 , x2 (t) Z = ˙ 2 , x˙ 2 = x1 W t ˙ 2 (s) ds W1 (s)W 0 Z = t W1 dW2 . 0 2 ThisR time, the mapping χ : C([0, 1], R ) → C([0, 1], R2 ) given by χ(g)(t) = t (g 1 , 0 g 1 dg 2 ) is not continuous. The second coordinate is not even well-defined but that is easily fixed; it is the lack of continuity that is a problem. Indeed, the starting point for rough path theory is thatR the WIP W →w W does not uniquely pin down t the weak limit of the process t 7→ 0 W1 dW2 . The second example demonstrates that solving the following “iterated weak invariance principle” is an unavoidable step in understanding homogenization of fast-slow systems. (Fortunately, rough path theory tells us that if we solve this, then we are almost done.) d d d×d Given v : Λ → R R , we define (Wn Wn ) ∈ C([0, 1], R × R ) by setting Wn (t) = −1/2 −1/2 nt n vnt = n v ◦ φs ds as before and setting 0 Z nt nZ s Z t o ij i j −1 i j Wn (t) = W dW = n v ◦ φr v ◦ φs dr ds. 0 0 0 The aim of the iterated WIP is to determine the process W ∈ C([0, 1], Rd×d ) such that (Wn , Wn ) →w (W, W) in C([0, 1], Rd × Rd×d ). 31 5 Lecture 5: The iterated WIP The previous lecture ended with a statement of the iterated WIP problem that needs to be resolved. As in the case of the WIP, it is convenient to first consider the discrete time setting. 5.1 The iterated WIP; discrete time Assume first the noninvertible setting from Section 1.4, so f : Λ → Λ is a transformation of a probability space (Λ, µ) where µ is f -invariantR and ergodic. Suppose as in Section 2.1 that v : Λ → Rd lying in L∞ (Λ, Rd ) where Λ v dµ = 0. Moreover, we suppose as before that each component of v satisfies condition (1.2). In particular, we have a martingale-coboundary decomposition Z ∞ d v = m + χ ◦ f − χ, m, χ ∈ L (Λ, R ), m dµ = 0, m ∈ ker P. Λ It is again convenient to work in the space D([0, 1], Rd ) of cadlag functions with (most of the time) the sup-norm topology. It is now no longer necessary to linearly interpolate, so we have the simpler definition [nt]−1 Wn (t) = n−1/2 v[nt] = n−1/2 X v ◦ f k. (5.1) k=0 d The WIP holds by Theorem 2.4, so Wn →w W in D([0, R 1], R T) where W is ddimensional Brownian motion with covariance matrix Σ = Λ m m dµ. Now define the iterated sum Wn ∈ D([0, 1], Rd×d ) where Z tZ s X ij Wn (t) = Wni dWnj = vi ◦ f k vj ◦ f `. (5.2) 0 0 0≤k<`≤[nt]−1 (The double sum can be taken as a definition of the double integral.) Our immediate aim is to prove: Theorem 5.1 (Kelly-Melbourne [10]) Assume the above martingale-coboundary decomposition for v and suppose moreover that f is mixing. Then (Wn , Wn ) →w (W, W) in D([0, 1], Rd × Rd×d ) where ij Z W (t) = t i j W dW + tE, 0 (Here Rt 0 ij E = ∞ X n=1 W i dW j is the Itˆo integral.) 32 v i v j ◦ f j dµ. 5.2 A cohomological invariant To exploit the martingale-coboundary decomposition v = m+χ◦f −χ, it is necessary to figure out how the iterated WIP varies under cohomology. (This was not an issue for the ordinary WIP since the coboundary telescoped and hence was negligible.) The following result just requires that f is mixing. Lemma 5.2 Let (Λ, µ) be a probability spaceR and suppose that R f : RΛ → Λ is a mixing measure-preserving transformation (ie Λ φ ψ ◦ f n dµ → Λ φ dµ Λ ψ dµ for all φ, ψ ∈ L2 (Λ)). Let v, vˆ, χ ∈ L2 (Λ, Rd ) and suppose that v = vˆ + χ ◦ f − χ. Define (Wn , Wn ) ∈ cn , c D([0, 1], Rd × Rd×d ) as in (5.1) and (5.2), and define (W Wn ) using vˆ instead of v. Then R P∞ R i j k i j k (v v ◦ f − v ˆ v ˆ ◦ f ) dµ = (χi v j − vˆi χj ◦ f ) dµ. (a) k=1 Λ Λ cn , c (b) (Wn , Wn ) − (W Wn ) →w (0, A) in D([0, 1], Rd × Rd×d ), where ∞ X A (t) = t (v i v j ◦ f k − vˆi vˆj ◦ f k ) dµ. ij k=1 Proof This is left as an exercise. Now we apply this result in the case where vˆ = m ∈ ker P . Accordingly, define (Mn , Mn ) ∈ D([0, 1], R × Rd×d ) as in (5.1) and (5.2), but using m instead of v. We already know that Mn →w W and Wn →w W . Corollary 5.3 Assume the hypotheses of Theorem 5.1, and define E ∈ Rd×d , A ∈ D([0, 1], Rd×d ), ∞ Z X ij E = v i v j ◦ f k dµ, A(t) = tE. k=1 Λ Suppose that (Mn , Mn ) →w (W, M) in D([0, 1], Rd × Rd×d ). Then (Wn , Wn ) →w (W, M + A) in D([0, 1], Rd × Rd×d ). Proof This is left as an exercise. 5.3 Proof of Theorem 5.1 e n ), (M fn , M fn− , M fn− ) ∈ D([0, 1], Rd × Rd×d ). Here (Mm , Mn ) is Define (Mn , Mn ), (M defined as in Section 5.2: [nt]−1 Mn (t) = n −1/2 X m ◦ f k, −1 Mij n (t) = n k=0 X 0≤k<`≤[nt]−1 33 mi ◦ f k mj ◦ f ` , ˜ →Λ ˜ as in Section 3.1, we define Similarly, at the level of the natural extension f˜ : Λ the forward processes [nt]−1 X fn (t) = n−1/2 M m ˜ ◦ f˜k , X e ij (t) = n−1 M n k=0 m ˜ i ◦ f˜k m ˜ j ◦ f˜` , 0≤k<`≤[nt]−1 and the backward processes fn− (t) = n−1/2 M −1 X m ˜ ◦ f˜k , X e ij,− (t) = n−1 M n m ˜ i ◦ f˜k m ˜ j ◦ f˜` , −[nt]≤`<k≤−1 k=−[nt] fn− } is a sequence In particular, we saw in Section 3.1 that the backward process {M of continuous time martingales and hence can appeal to results from stochastic analye − ) →w (W, M) where f− , M sis. In particular, it followsR from [11, Theorem 2.2] that (M n n t Mij (t) is the Itˆo integral 0 W i dW j . Some complicated but straightforward calcue n ) →w fn , M lations similar to those in Section 3.1 (see [10, Section 4.2] show that (M e n ) = (Mn , Mn ) ◦ π and so (Mn , Mn ) →w (W, M). This comfn , M (W, M). Finally (M bined with Corollary 5.3 completes the proof of Theorem 5.1. 5.4 Iterated WIP for flows In this section, we state the analogue of Theorem 3.16 for the iterated WIP. Let (Ω, µΩ ), (Λ, µΛ ), v : Ω → Rd , vˆ : Λ → Rd be as in Section 3.3. On (Ω, µΩ ), define Z nt Z t −1/2 ij Wn (t) = n v ◦ φs ds, Wn (t) = Wni dWnj , 0 0 and on (Λ, µΛ ), define [nt]−1 cn (t) = n−1/2 W X k c Wij n (t) = vˆ ◦ f , Z t c i dW cj . W n n 0 k=0 cn , c c, c Theorem 5.4 Suppose that (W Wn ) →w (W W) in D([0, 1], Rd × Rd×d ) on (Λ, µΛ ) Rt i c is a d-dimensional Brownian motion and c c dW ˆ j + tEˆ ij for where W Wij (t) = 0 W R u some Eˆ ∈ Rd×d . Let H(y, u) = 0 (v(x, s) ds. Assume that max |ˆ vk | = o(n). 1≤k≤n 2 c Then (Wn , Wn ) →w (W, W) in D([0, 1], Rd ×Rd×d ) on (Ω, µΩ ) where W = τ¯−1/2 W and Z t Z ij i j ij ij −1 ˆ ij W (t) = W dW + tE , E = (¯ τ ) E + H i v j dµΩ . 0 Ω UNDER CONSTRUCTION! 34 5.5 Back to fast-slow systems References [1] P. Billingsley, Convergence of probability measures, second ed., Wiley Series in Probability and Statistics: Probability and Statistics, John Wiley & Sons Inc., New York, 1999, A Wiley-Interscience Publication. [2] R. Bowen, Equilibrium States and the Ergodic Theory of Anosov Diffeomorphisms, Lecture Notes in Math. 470, Springer, Berlin, 1975. [3] B. M. Brown, Martingale central limit theorems, Ann. Math. Statist. 42 (1971), 59–66. [4] D. L. Burkholder, Distribution function inequalities for martingales, Ann. Probability 1 (1973), 19–42. [5] D. Dolgopyat, Limit theorems for partially hyperbolic systems, Trans. Amer. Math. Soc. 356 (2004), 1637–1689. [6] D. Dolgopyat, Averaging and invariant measures, Mosc. Math. J. 5 (2005), 537– 576, 742. [7] G. K. Eagleson, Some simple conditions for limit theorems to be mixing, Teor. Verojatnost. i Primenen 21 (1976), 653–660. [8] M. I. Gordin, The central limit theorem for stationary processes, Soviet Math. Dokl. 10 (1969), 1174–1176. [9] G. A. Gottwald and I. Melbourne, Homogenization for deterministic maps and multiplicative noise, Proc. R. Soc. London A (2013), 20130201. [10] D. Kelly and I. Melbourne, Smooth approximation of stochastic differential equations, Ann. Probab. (2014), to appear. [11] T. G. Kurtz and P. Protter, Weak limit theorems for stochastic integrals and stochastic differential equations, Ann. Probab. 19 (1991), 1035–1070. [12] M. Maxwell and M. Woodroofe, Central limit theorems for additive functionals of Markov chains, Ann. Probab. 28 (2000), 713–724. [13] D. L. McLeish, Dependent central limit theorems and invariance principles, Ann. Probability 2 (1974), 620–628. [14] E. J. McShane, Stochastic differential equations and models of random processes, Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. III: Probability theory (Berkeley, Calif.), Univ. California Press, 1972, pp. 263–294. 35 [15] I. Melbourne and A. Stuart, A note on diffusion limits of chaotic skew product flows, Nonlinearity (2011), 1361–1367. [16] I. Melbourne and R. Zweim¨ uller, Weak convergence to stable L´evy processes for nonuniformly hyperbolic dynamical systems, Ann Inst. H. Poincar´e (B) Probab. Statist., to appear. [17] Y. G. Sina˘ı, Gibbs measures in ergodic theory, Russ. Math. Surv. 27 (1972), 21–70. [18] H. J. Sussmann, Limits of the Wong-Zakai type with a modified drift term, Stochastic analysis, Academic Press, Boston, MA, 1991, pp. 475–493. [19] M. Tyran-Kami´ nska, An invariance principle for maps with polynomial decay of correlations, Comm. Math. Phys. 260 (2005), 1–15. [20] E. Wong and M. Zakai, On the convergence of ordinary integrals to stochastic integrals, Ann. Math. Statist. 36 (1965), 1560–1564. [21] R. Zweim¨ uller, Mixing limit theorems for ergodic transformations, J. Theoret. Probab. 20 (2007), 1059–1071. 36