MATS423 OPTIMAL MASS TRANSPORTATION FALL 2014 Foreword
Transcription
MATS423 OPTIMAL MASS TRANSPORTATION FALL 2014 Foreword
MATS423 OPTIMAL MASS TRANSPORTATION FALL 2014 Foreword These are the lecture notes for the course Optimal Mass Transportation given at the University of Jyv¨askyl¨ a in the Fall of 2014. The course aims at providing the basics of optimal mass transportation for students who are familiar with basic abstract measure theory. – —— – In the course we study the Monge and Kantorovich formulations of optimal mass transportation, existence and uniqueness of optimal transport maps, Wasserstein distance, brief introduction to functionals and gradient flows in Wasserstein spaces, Ricci curvature lower bounds in metric spaces using optimal mass transportation. – —— – The lecture notes can by found (with a possible delay) from the course website http://users.jyu.fi/~tamaraja/MATS423/ Version: November 6, 2014. 1 2 OPTIMAL MASS TRANSPORTATION 0. Introduction The study of optimal mass transportation has a long history, dating back to Gaspard Monge and his 1781 publication M´emoire sur la th´eorie des d´eblais et des remblais. The problem he addresses is the following: suppose you have certain amount of soil taken from the ground, at different locations, that you want to transport to construction sites. Because transporting the soil takes a lot of resources, one wants to determine where it is optimal to send which part of the extracted soil. When faced with the problem, one wonders what should be the cost of transporting the soil. Monge considered the cost to be the distance times the mass of the soil transported. Many years later Leonid Kantorovich tackled similar problems that this time arose in various areas of economics. In 1975 Kantorovich was awarded the Nobel Prize in economics, together with Tjalling Koopmans, for their contributions to the theory of optimum allocation of resources. Kantorovich also introduced a distance between measures coming from the optimal transport problem. This distance, which we shall study in Section 1 has many names. It is called the Kantorovich-Rubinstein distance, the Wasserstein distance, the Lp transportation distance, Prokhorov distance, and so on. The problem of finding the optimal way to transport the mass (soil, goods, . . . ) is nowadays called the Monge-Kantorovich problem. In the first part of the course we will formulate the Monge-Kantorovich problem in Rn , introduce a useful dual formulation of the problem, and study the existence and uniqueness of the solution to the Monge-Kantorovich problem. After this we will study optimal mass transportation in the more general setting of metric spaces, define there the Kantorovich-Rubinstein distance, and as time permits, study a bit gradient flows and Ricci curvature in metric spaces. 1. Optimal mass transportation in Rn In optimal mass transportation the mass is usually understood as a Borel probability measure. The reason for assuming the measures to be probability measures is just a normalization. However, the measures should have the same total mass in order to make our formulation of the problem reasonable. Let us recall some definitions in measure theory. Definition 1.1. The Borel σ-algebra B(Rn ) is the σ-algebra generated by the open sets of n Rn . In other words, it is the smallest set Σ ⊂ 2R with the properties (1) Σ 6= ∅, (2) A ∈ Σ ⇒ Rn \ AS∈ Σ, and (3) (Ai )i∈N ⊂ Σ ⇒ i∈N Ai ∈ Σ. Definition 1.2. A Borel probability measure µ is a function µ : B(Rn ) → [0, 1] with the properties (1) µ(∅) = 0, (2) µ(Rn ) = 1, and OPTIMAL MASS TRANSPORTATION 3 P S (3) µ i∈N Ai = i∈N µ(Ai ) for all pairwise disjoint collections {Ai } ⊂ B(Rn ). We denote the space of all Borel probability measures on Rn as P(Rn ). For any closed C ⊂ Rn we denote P(C) := {µ ∈ P(Rn ) : µ(K) = 1}. Definition 1.3. A mapping f : Rn → Rm is Borel measurable, if f −1 (A) ∈ B(Rn ) for all A ∈ B(Rm ). Suppose T : Rn → Rm is Borel measurable and µ ∈ P(Rn ). Then the pushforward of µ through T is the measure T♯ µ ∈ P(Rm ) defined as T♯ µ(A) = µ(T −1 (A)) for all A ∈ B(Rm ). Notice that by the Borel measurability of T , also the pushforward measure is a Borel measure. Let us now list some basic examples of Borel probability measures that will help us understand different aspects of optimal transportation. Examples 1.4. (i) Measures µ ∈ P(Rn ) that are absolutely continuous with respect to the LebesgueR measure Ln . In other words, µ = f Ln with f : Rn → [0, ∞) Borel measurable and kf k1 = Rn |f (x)| Ln (x) = 1. The notation ’µ = f Ln ’ means Z f (x) dLn (x) for all A ∈ B(Rn ). µ(A) = (ii) Dirac measures δx ∈ A P(Rn ) for x ∈ Rn defined by ( 1, if x ∈ A δx (A) = 0, if x ∈ / A, P P and their combinations µ = i∈N ai δx with weights ai ∈ [0, 1] satisfying i∈N ai = 1. (iii) Hausdorff measures Hs , defined as ) ( [ X and diam(Ai ) < δ for all i ∈ N , diam(Ai )s : A ⊂ Hs (A) = lim inf δց0 i∈N i∈N weighted with a Borel measurable function f as in (i) such that f Hs ∈ P(Rn ). (In particular f 6= 0 on a set of positive and finite Hs -measure.) We will weak topology on the space P(Rn ). Later on we will see that the Lp transportation distances metrize the weak topology. Let us recall the definition of weak convergence. Definition 1.5. Let µk , µ ∈ P(Rn ). We say that µk converges weakly to µ if Z Z ϕ dµ ϕ dµk → Rn for all ϕ ∈ Cb (Rn ) = {φ : Rn Rn → R continuous and bounded}. Recall that the weak convergence of µk to µ is equivalent with requiring that lim sup µk (C) ≤ µ(K) for all closed set K ⊂ Rn k→∞ as well as equivalent with requiring lim inf µk (U ) ≥ µ(U ) k→∞ for all open set U ⊂ Rn . 4 OPTIMAL MASS TRANSPORTATION 1.1. Monge and Kantorovich formulations of the problem. Let c : Rn × Rn → R ∪ {+∞} be Borel measurable. We will call this function the cost function. In Monge’s original work the cost function was c(x, y) = kx − yk. Now we are ready to formulate Monge’s formulation of the optimal transport problem Let µ, ν ∈ P(Rn ). Minimize Z c(x, T (x)) dµ(x) T 7→ Rn over all transport maps T from µ to ν, i.e. over all maps T such that T♯ µ = ν. It is easy to see that in this generality Monge’s formulation can be ill-posed. The simplest way this can happen is if µ = δx and ν = 21 (δy + δz ) with y 6= z. Now for every function T we have T♯ µ = δT (x) 6= ν. In other words, it might be that there are no transport maps to minimize over. The next example, which is a slight modification of the previous example on dirac measures, shows that even when there exist transport maps, the mimizer might not exist. This shows that the condition T♯ µ = ν is not weakly sequentally closed (in any applicable weak topology). Example 1.6. Define two measures on the plane as 1 H1 |{−1}×[0,1] + H1 |{1}×[0,1] µ = H1 |{0}×[0,1] and ν = 2 and suppose that the cost function is c(x, y) = kx − yk. Now ( (1, 2x), if x ≤ 21 T1 (0, x) = (−1, 2x − 1), if x > 21 transports µ to ν, so there at least exist maps transporting µ to ν. Moreover, for the transport maps ( k k , if 2n < x ≤ k+1 1, 2x − 2n 2n and k is odd Tn (0, x) = k k+1 k+1 −1, 2x − 2n , if 2n < x ≤ 2n and k is even we have Z kx, Tn (x)k dµ(x) → 1. R2 T2 T1 ... T4 However, no transport map realizes the transport cost 1. Such mapping T should transport a.e. horizontally, which is impossible. OPTIMAL MASS TRANSPORTATION 5 The ill-posedness of the optimal transport problem was removed by Kantorovich by considering more general transports. Kantorovich’s formulation of the optimal transport problem Let µ, ν ∈ P(Rn ). Minimize Z c(x, y) dσ(x, y) σ 7→ Rn ×Rn over all transport plans σ from µ to ν, i.e. over all measures σ ∈ P(Rn ×Rn ) for which σ(A × Rn ) = µ(A) and σ(Rn × A) = ν(A) for all A ∈ B(Rn ) We denote the set of transport plans from µ to ν as A(µ, ν). Any transport map T from µ to ν naturally induces a tranport plan σ = (id × T )♯ µ. Therefore the set of transport plans always include all transport maps. Moreover, the measure µ × ν ∈ A(µ, ν) so A(µ, ν) 6= ∅. Example 1.7. Let us revisit Example 1.6. The minimizing sequence (Tn ) induce a sequence of transport plans (σn ), σn = (id × T )♯ µ. Define 1 ((id × R)♯ µ + (id × L)♯ µ) 2 with R(x, y) = (x, y + 1) and L(x, y) = (x, y − 1). Now σ is a minimizer. Notice also that not only Z Z σ= kx − yk dσ(x, y), kx − yk dσn (x, y) → R2 ×R2 R2 ×R2 but also for all ϕ ∈ Cb (R2 × Z ϕ(x, y) dσn (x, y) → R2 ×R2 2 R ). In other Z ϕ(x, y) dσ(x, y), R2 ×R2 words, σn converges to σ weakly. One might wonder how generally are Monge’s and Kantorovich’s formulations the same in the sense that the infimums in the problems agree. It can be shown that they agree for example when the starting measure µ has no atoms, i.e. µ(x) = 0 for all x ∈ Rn , and the cost function c is continuous. 1.2. Existence of optimal transport plans. The proof of the existence of optimal transport plans, i.e. transport plans minimizing Kantorovich’s optimal transport problem follows the basic scheme in variational problems. The ingredients are R (1) the lower semicontinuity of σ 7→ Rn ×Rn c(x, y) dσ(x, y) and (2) the compactness of A(µ, ν) Let us start with (1). It is clear that in general the lower semicontinuity cannot hold. In order to obtain lower semicontinuity for the transport cost we assume lower semicontinuity of the cost function c. Let us first recall what is meant by lower semicontinuity. Definition 1.8. Let X be a topological space and f : X → R ∪ {−∞, ∞}. The function f is lower semicontinuous at x0 ∈ X if for every ǫ > 0 there exists a neighbourhood U of x0 such that f (x) ≥ f (x0 ) − ǫ for every x ∈ U . The function f is called lower semicontinuous if it is lower semicontinuous at every point x0 ∈ X. 6 OPTIMAL MASS TRANSPORTATION Lower semicontinuity means that the function cannot jump up at the limit when we converge towards a point. Notice that lower semicontinuous functions need not be continuous. Below is an illustration of a graph of a lower semicontinuous function. Lemma 1.9. Let (X, d) be a metric space and f : X → R ∪ {∞} a lower semicontinuous function that is bounded from below by some constant C. Then f can be written as the pointwise limit of a nondecreasing family (fn )n ∈ N of bounded continuous functions fn : X → R. Proof. We may assume that f is not identically +∞. Define fn (x) = inf {f (y) + nd(x, y) : y ∈ X} . Then we immediately have |fn (x) − fn (y)| ≤ nd(x, y) and fn (x) ≤ fm (x) ≤ f (x) for all n ≤ m and x, y ∈ X. To see the pointwise convergence, fix x0 ∈ X and ǫ > 0. By the lower semicontinuity of f there exists δ > 0 such that f (x) ≥ f (x0 ) − ǫ for all x ∈ B(x0 , δ). Let n ∈ N be such that nδ ≥ f (x0 ) − C. Then for all y ∈ / B(x0 , δ) we have f (y) + nd(x0 , y) ≥ C + nd(x0 , y) ≥ C + nδ ≥ f (x0 ) and for y ∈ B(x0 , δ) f (y) + nd(x0 , y) ≥ f (y) ≥ f (x0 ) − ǫ. Finally, in order to make fn bounded, we can take min(fn , n). With Lemma 1.9 we can prove lower semicontinuity for the transport cost under very mild assumptions on the cost function c. Lemma 1.10. Let c : Rn × Rn → [0, +∞] be a lower semicontinuous cost function. Suppose that a sequence (σk )k ⊂ P(Rn × Rn ) converges weakly to some σ ∈ P(Rn × Rn ). Then Z Z c dσk . c dσ ≤ lim inf Rn ×Rn k→∞ Rn ×Rn Proof. By Lemma 1.9 the function c can be written as the pointwise limit of a nondecreasing family (cm )m∈N of continuous functions cm : Rn × Rn → R. By monotone convergence, Z Z Z Z c dσ = lim cm σ = lim lim cm dσk ≤ lim inf c dσk , m→∞ m→∞ k→∞ where in the last inequality we just use the trivial estimate Z Z cm dσk ≤ c dσk . k→∞ OPTIMAL MASS TRANSPORTATION 7 Now that we have established the lower semicontinuity under suitable conditions, let us turn to compactness. Let us start with a general theorem giving us precompactness.1 In the special case of the Prokhorov’s theorem stated below, where the underlying space is Rn , the precompactness of a collection of measures is proven to be the same as the intuitive condition that no mass is leaking to infinity. Theorem 1.11 (Prokhorov’s theorem). A set P ⊂ P(Rn ) is precompact in the weak topology if and only if it is tight, i.e. for any ǫ > 0 there is a compact set Kǫ ⊂ Rn such that µ(Rn \ Kǫ ) ≤ ǫ for all µ ∈ P. Next we will prove Prokhorov’s theorem. The proof relies on Riesz representation theorem, so let us recall a version of it which will be sufficient. Theorem 1.12 ((a version of) Riesz representation theorem). For every positive linear functional φ on Cc (Rn ) there exists a Borel measure µ on Rn such that Z f (x) dµ(x) for all f ∈ Cc (Rn ). φ(f ) = Rn Here Cc (Rn ) is the space of continuous compactly supported functions on Rn . The measure in the Riesz representation is obtained by setting first µ(U ) := sup {φ(f ) : φ ∈ Cc (Rn ), 0 ≤ f ≤ 1, spt(f ) ⊂ U } for all open U and then defining µ(A) := inf {µ(U ) : A ⊂ U open} for all A ∈ B(Rn ). One then needs to check that this gives a measure with the desired properties. Also the following basic theorem in Functional analysis, Banach-Alaoglu theorem, comes in handy, although we will also sketch the proof of Prokhorov’s theorem without directly using Banach-Alaoglu. Theorem 1.13 (Banach-Alaoglu theorem). The closed unit ball of the dual of a normed space is weak∗ compact. Remark 1.14. A few words on the different topologies is probably needed. Notice that we defined the weak convergence using Cb (Rn ). This is not the same topology as the one defined by using Cc (Rn ) (or C0 (Rn ) consisting of functions vanishing at infinity). To see this, take µn = L1 |[n,n+1] . Then µn does not converge weakly to any measure, but still Z ϕ dµn → 0 for all ϕ ∈ C0 (R). R Banach-Alouglu theorem naturally is true also for Cb (R). However, (Cb (R))′ should be idenˇ tified with measures on the Stone-Cech compactification of R. The convergence using Cb (Rn ) is also called narrow convergence. Lemma 1.15. Let K ⊂ Rn be compact. Then P(K) is compact. 1Notice that Rn could be replaecd by any complete and separable metric space in the statement. 8 OPTIMAL MASS TRANSPORTATION Proof. Since K is compact, any continuous function on K is bounded and has compact support. Thus Cb (K) = Cc (K) = C(K) = {f : K → R continuous}. Recall that C(K) is a Banach space when equipped with the supremum norm kf k∞ = sup |f |. x∈K By the Banach-Alaoglu theorem the unit ball B ′ = {ϕ ∈ C(K)′ : kϕk ≤ 1} of the dual space C(K)′ is compact in the weak∗ topology. Now consider the weak∗ closed subset of B ′ defined as Σ := {ϕ ∈ B ′ : ϕ(1) = 1, and ϕ(f ) ≥ 0 for all f ∈ C(K) with f ≥ 0}. By the Riesz representation theorem, the map T : P(K) → Σ : µ 7→ ϕµ with Z f dµ, f ∈ C(K) ϕµ (f ) := K is a bijection. Since the weak topology on P(K) is given in duality to Cb (K), the map T is a homeomorphism. Thus P(K) is compact. Let us also give the same proof written without the explicit use of the Banach-Alaoglu theorem. Second proof of Lemma 1.15. Let us show that P(K) is sequentially compact. For this purpose take a sequence of measures (µk ) ⊂ P(K). We will extract a converging subsequence using a diagonal argument. Let (µ0,k )k∈N be defined as µ0,k := µk . Suppose that a sequence (µi,k )k∈N has been defined for some i ∈ N. Take a finite collection of i balls (B(xi,j , 1i ))N j=1 covering K and select a subsequence (µi+1,k )k∈N of (µi,k )k∈N such that 1 |µi+1,k (B(xi,j , i )) − µi+1,l (B(xi,j , 1i ))| ≤ iN1 i for all j and all k, l ∈ N. Finally define a converging subsequence (νk )k by taking the diagonal νk := µk,k . Let us now check the weak convergence of (νk )k . Take ϕ ∈ Cb (K) and ǫ > 0. Write m−1 [ 1 1 B(xk,i , ). Ak,m := B(xk,m, ) \ k k i=1 Since K is compact, ϕ is uniformly continuous. Let δ > 0 be such 1 kx − yk < δ. Let k ≥ 2δ . Then for any j, l ≥ k Z Z Z Nk Z X ϕ dνj − ϕ dν ϕ dν − ϕ dν ≤ j l l Ak,m K K m=1 Ak,m Z Z Nk X ≤ inf ϕ(y) − ϕ dνj + m=1 Ak,m y∈Ak,m + inf that |ϕ(x) − ϕ(y)| < ǫ if inf ϕ(y) − ϕ dνl y∈Ak,m Ak,m ! ϕ(y) |νj (Ak,n ) − νl (Ak,n )| y∈Ak,m 1 ≤ǫ+ǫ+ . k OPTIMAL MASS TRANSPORTATION 9 R Thus we can define a functional φν (ϕ) = limk→∞ K ϕ dνk which by the Riesz representation theorem corresponds to a measure ν ∈ P(K) towards which (νk ) converges. Proof of Prokhorov’s theorem in Rn . “⇒” Suppose that the claim is not true. Thus there exists ǫ > 0 and a sequence of measures (µk )k∈N ⊂ P(Rn ) such that µk (Rn \ B(0, k)) ≥ ǫ for all k ∈ N. By precompactness of P there exists a subsequence, still noted by µk , converging weakly to some µ ∈ P(Rn ). But now ! ∞ [ n 1 = µ(R ) = µ B(0, k) = lim µ(B(0, k)) ≤ lim lim inf µj (B(0, k)) ≤ 1 − ǫ, k=1 k→∞ j→∞ k→∞ which is a contradiction. “⇐” Take a sequence (µk ) ⊂ P. We want to show that the sequence has a converg¯ 1)) defined as νk = f♯ (µk ) with a ing subsequence. Consider the sequence νk ⊂ P(B(0, n ¯ homeomorphism f : R → B(0, 1). Since B(0, 1) is compact, by Lemma 1.15 there exists a ¯ 1)). By the tightness, for subsequence νkj of νk weakly converging to a measure ν ∈ P(B(0, every ǫ > 0 ¯ 1) \ f (Kǫ )) ≤ ǫ. ν(S(0, 1)) ≤ lim inf νkj (B(0, j→∞ Thus µkj weakly converges to a measure f♯−1 ν ∈ P(Rn ). Now we can prove the desired compactness of A(µ, ν) Lemma 1.16. Let µ, ν ∈ P(Rn ). Then A(µ, ν) is compact in the weak topology. Proof. Let us start with the tightness of A(µ, ν). Because {µ} and {ν} are both tight, for every ǫ > 0 there exists a compact set Kǫ ⊂ Rn such that µ(Kǫ ) ≥ 1 − ǫ and ν(Kǫ ) ≥ 1 − ǫ. Let σ ∈ A(µ, ν) Then σ(Kǫ × Kǫ ) ≥ 1− σ((Rn \Kǫ )× Rn )− σ(Rn × (Rn \Kǫ )) = 1− µ(Rn \Kǫ )− ν(Rn \Kǫ ) ≥ 1− 2ǫ. Since Kǫ × Kǫ is compact, we have proven tightness of A(µ, ν). By Prokhorov’s theorem A(µ, ν) is then weakly precompact in P(Rn × Rn ). Let us next prove the compactness of A(µ, ν). Let (σk )k∈N ⊂ A(µ, ν) be a sequence converging weakly to σ ∈ P(Rn × Rn ). We have to prove that the projections of σ are µ and ν. This is clear since σ(U × Rn ) ≤ lim inf σk (U × Rn ) = lim inf µ(U ) = µ(U ) k→∞ k→∞ for all open U ⊂ Rn and similarly σ(Rn × U ) ≤ ν(U ) for all open U ⊂ Rn . Theorem 1.17 (Existence of optimal plans). Assume that c : Rn × Rn → [0, +∞] is lower semicontinuous. Then there exists a minimizer to Kantorovich’s formulation of the optimal mass transportation problem. Proof. Let (σk )k∈N ⊂ A(µ, ν) be such that Z c(x, y) dσk (x, y) ≤ inf Rn ×Rn σ∈A(µ,ν) 1 c(x, y) dσ(x, y) + . k Rn ×Rn Z 10 OPTIMAL MASS TRANSPORTATION By Lemma 1.16 the set A(µ, ν) is weakly compact and hence there exists a subsequence (σkj )j of (σk )k converging weakly to some σ∞ ∈ A(µ, ν). Now by Lemma 1.10 Z Z c(x, y) dσkj (x, y) c(x, y) dσ∞ (x, y) ≤ lim inf j→∞ Rn ×Rn Rn ×Rn Z 1 ≤ lim inf c(x, y) dσ(x, y) + inf j→∞ kj σ∈A(µ,ν) Rn ×Rn Z c(x, y) dσ(x, y) = inf σ∈A(µ,ν) Rn ×Rn and hence σ∞ is a minimizer for the problem. Notice that it might well be the case that Z c(x, y) dσ∞ (x, y) = ∞ Rn ×Rn in the case the transport cost is infinite for all σ ∈ A(µ, ν). We denote by Opt(µ, ν) ⊂ A(µ, ν) the set of σ that minimize the optimal tranportation problem. Notice that by the lower semicontinuity of the transportation cost, Lemma 1.10, also the set Opt(µ, ν) is weakly compact in the setting of Theorem 1.17. Now that we have established the existence of minimizers, the next obvious question is whether the minimizer is unique. This is not always the case. Example 1.18. Suppose c(x, y) = h(kx − yk) for some function h : R → R ∪ {−∞, ∞} and every x, y ∈ R2 . Let µ = 12 (δ(0,0) + δ(1,1) ) and ν = 12 (δ(1,0) + δ(0,1) ). Then 1 1 Opt(µ, ν) = A(µ, ν) = t (δ(0,0,1,0) + δ(1,1,0,1) ) + (1 − t) (δ(0,0,0,1) + δ(1,1,1,0) ) : t ∈ [0, 1] 2 2 since the the mass is always transported a distance one and because of the form of the cost function, the transportation cost is then always h(1). ν µ Moreover, the transports 12 (δ(0,0,1,0) + δ(1,1,0,1) ) and 12 (δ(0,0,0,1) + δ(1,1,1,0) ) are induced by optimal transport maps while all the other (optimal) transport plans are not. The phenomena in Example 1.18 are quite general. Let us write some of them in the following Remark 1.19. (i) Let σ1 , σ2 ∈ A(µ, ν). Then for any t ∈ [0, 1] also tσ1 + (1 − t)σ2 ∈ A(µ, ν). Similarly, if σ1 , σ2 ∈ Opt(µ, ν), then for any t ∈ [0, 1] also tσ1 + (1 − t)σ2 ∈ Opt(σ1 , σ2 ). This is because Z Z Z c d(tσ1 + (1 − t)σ2 ) = t c dσ1 + (1 − t) c dσ2 . OPTIMAL MASS TRANSPORTATION 11 (ii) Suppose that σ1 , σ2 ∈ A(µ, ν) are both induced by some maps and that σ1 6= σ2 . Then for any t ∈ (0, 1) the measure tσ1 + (1 − t)σ2 is not induced by a map. In order to see this, let Ti be the maps satisfying σi = (id × Ti )♯ µ and suppose that tσ1 + (1 − t)σ2 is induced by some map T . Then (id × T )♯ µ = tσ1 + (1 − t)σ2 = t(id × T1 )♯ µ + (1 − t)(id × T2 )♯ µ = (id × (tT1 + (1 − t)T2 ))♯ µ and so δT (x) = tδT1 (x) + (1 − t)δT2 (x) at µ-a.e. x ∈ Rn . In particular, T1 (x) = T2 (x) at µ-a.e. x ∈ Rn and hence σ1 = σ2 . (iii) In many cases (ii) is the way one proves uniqueness of optimal transport plans: First one shows that any optimal transport plan is induced by a map. If there then were two optimal transport plans their convex combination would not be induced by a map, which is a contradiction. Hence the uniqueness. 1.3. Cyclical monotonicity and subdifferentials. Let us now study in more detail the structure of optimal transport plans. This will later lead to results showing existence of optimal transport maps and uniqueness of optimal transport plans with the approach mentioned in Remark 1.19 (iii). The idea is to characterize optimal tranport plans σ ∈ Opt(µ, ν) as • the σ ∈ A(µ, ν) for which spt(σ) is cˆacyclically monotone. c-cyclical monotonicity of a set means that one cannot decrease the cost of transport on any finite subset of the set by permuting the transport. (A more rigorous definition is given later.) • and as the σ ∈ A(µ, ν) for which there exists a convex and lower semicontinuous funciton ϕ such that σ is concentrated on the graph of the c-subdifferential of ϕ. Recall the definition of a support of a measure spt(µ) := {x ∈ Rn : for all ǫ > 0 we have µ(B(x, ǫ)) > 0}. Equivalently, the support is the smallest closed set of full measure. Recall also the notion of classical subdifferential ∂ − ϕ of a function ϕ : R → R used in convex analysis: ϕ ∂−ϕ Definition 1.20 (c-cyclical monotonicity). A set Γ ⊂ Rn × Rn is called c-cyclically monotone if for every (xi , yi )N i=1 , N ∈ N, we have N X i=1 c(xi , yi ) ≤ N X c(xi , yp(i) ) for all permutations p of {1, 2, . . . , N }. i=1 Definition 1.21 (c-transforms). For a function ϕ : Rn → R ∪ {−∞, +∞} the c+ -transforms c c ϕl + , ϕr+ : Rn → R ∪ {−∞, +∞} are defined as c ϕl + (x) := infn (c(x, y) − ϕ(y)) y∈R 12 OPTIMAL MASS TRANSPORTATION and2 ϕcr+ (y) := infn (c(x, y) − ϕ(x)) . x∈R The c− -transforms of ϕ are c c ϕl − , ϕr− : Rn → R ∪ {−∞, +∞} defined as c ϕl − (x) := sup (−c(x, y) − ϕ(y)) y∈Rn c and ϕr− (x) similarly. If there is no risk for confusion, we drop the subscripts r and l. Below are illustrations of c+ - and c− -transforms of a function on R for the cost function c(x, y) = |x − y|. −ϕc+ ϕ −ϕc− Definition 1.22 (c-concavity and c-convexity). A function ϕ : Rn → R ∪ {−∞, +∞} is cconcave if there exists ψ : Rn → R ∪ {−∞, +∞} such that ϕ = ψ c+ , and ϕ is called c-convex if there exists ψ such that ϕ = ψ c− . Definition 1.23 (c-superdifferential and c-subdifferential). Let ϕ : Rn → R ∪ {−∞, +∞} be a c-concave function. The c-superdifferential ∂ c+ ϕ ⊂ Rn × Rn is defined as ∂ c+ ϕ := {(x, y) ∈ Rn × Rn : ϕ(x) + ϕc+ (y) = c(x, y)} . Similarly, for a c-convex ϕ the c-subdifferential ∂ c− ϕ ⊂ Rn × Rn is defined as ∂ c− ϕ := {(x, y) ∈ Rn × Rn : ϕ(x) + ϕc− (y) = −c(x, y)} . We will also write ∂ c+ ϕ(x) = {y ∈ Rn : (x, y) ∈ ∂ c+ ϕ} and similarly for ∂ c− ϕ. The following example shows where the above terminology originates. Example 1.24. Let c(x, y) = −hx, yi. Then the c-cyclical monotonicity reads as N X i=1 hxi , yi i ≥ N X hxi , yp(i) i for all permutations p of {1, 2, . . . , N }. i=1 which is usually called cyclical monotonicity. (Notice that the same monotonicity is also equivalent with the c-cyclical monotonicity for c(x, y) = kx − yk2 , since kx − yk2 = kxk2 − 2hx, yi + kyk2 .) 2Notice, that since c(x, y) is not always symmetric ϕc+ (x) and ϕc+ (x) are not always the same. r l OPTIMAL MASS TRANSPORTATION 13 Let us next see what c-concavity means in this case. Take ψ : Rn → R ∪ {−∞, +∞} and define ϕ = ψ c+ . For x1 , x2 ∈ Rn and t ∈ [0, 1] we notice that ϕ(tx1 + (1 − t)x2 ) = infn (−htx1 + (1 − t)x2 , yi − ψ(y)) y∈R = infn (t(−hx1 , yi − ψ(y)) + (1 − t)(−hx2 , yi − ψ(y))) y∈R ≥ t infn (−hx1 , yi − ψ(y)) + (1 − t) infn (−hx2 , yi − ψ(y)) y∈R y∈R = tϕ(x1 ) + (1 − t)ϕ(x2 ). In other words, ϕ is concave. Moreover, for any x ∈ Rn and (xi ) ⊂ Rn converging to x, assuming ϕ(x) > −∞, we have ϕ(x) ≥ −hx, yǫ i − ψ(yǫ ) − ǫ ≥ −hxi , yǫ i − ψ(yǫ ) − hx − xi , yǫ i − ǫ ≥ ϕ(xi ) − hx − xi , yǫ i − ǫ where yǫ ∈ Rn is chosen suitably depending on ǫ > 0. Thus ϕ is upper semicontinuous. (Exercise: show that ϕ is actually c-concave if and only if it is concave and lower semicontinuous.) The transform ϕc− (x) = supy∈Rn (hx, yi − ϕ(y)) is called the Legendre transform. Finally, the c-superdifferential is ∂ c+ ϕ(x) = {y ∈ Rn : ϕ(x) + infn (−hy, zi − ϕ(z)) = −hx, yi} z∈R n = {y ∈ R : ϕ(x) − ϕ(z) ≥ hy, z − xi for all z ∈ Rn }. In other words, it is the usual superdifferential. Theorem 1.25. Suppose c : Rn × Rn → R is continuous and bounded from below. Let µ, ν ∈ P(Rn ) be such that c(x, y) ≤ a(x) + b(y), for some a ∈ L1 (µ) and b ∈ L1 (ν). Also, let σ ∈ A(µ, ν). Then the following are equivalent: (1) the plan σ is optimal, (2) the set spt(σ) is c-cyclically monotone, (3) there exists a c-concave function ϕ such that max{ϕ, 0} ∈ L1 (µ) and spt(σ) ⊂ ∂ c+ ϕ. Proof. (1) ⇒ (2): Assume that this is not the case. Then there exist N ∈ N, (xi , yi )N i=1 ⊂ spt(σ) and a permutation p of {1, . . . , N } such that N X c(xi , yi ) > i=1 N X c(xi , yp(i) ). i=1 Since c is continuous, there exists ǫ > 0 such that N X i=1 c(ai , bi ) > N X c(ai , bp(i) ) for all (ai , bi ) ∈ B(xi , ǫ) × B(yi , ǫ). i=1 Now the idea is to modify σ so that positive part of the transport from B(xi , ǫ) to B(yi , ǫ) is changed to transport from B(xi , ǫ) to B(yp(i) , ǫ) for all i. One way to avoid deciding where to send ai ∈ B(xi , ǫ) in B(yp(i) , ǫ) is to send it everywhere. In other words, define P ∈ P(R2nN ) as the product of 1 σ , with mi = σ(B(xi , ǫ) × B(yi , ǫ)). mi |B(xi ,ǫ)×B(yi ,ǫ) 14 OPTIMAL MASS TRANSPORTATION Let π i,j be the orthogonal projection from R2nN to the (2(i − 1) + j):th copy of Rn in the product. Now, defining N min mi X i,1 p(i),2 (π , π )♯ P − (π i,1 , π i,2 )♯ P σ ˜ =σ+ N i=1 we obtain σ ˜ ∈ A(µ, ν) with Z c(x, y) d˜ σ (x, y) < Rn ×Rn contradicting the optimality of σ. Z c(x, y) dσ(x, y) Rn ×Rn (2)⇒(3): Fix some (¯ x, y¯) ∈ spt(σ) and define ϕ(x) := inf (c(x, y1 ) − c(x1 , y1 ) + c(x1 , y2 ) − c(x2 , y2 ) + · · · + c(xN , y¯) − c(¯ x, y¯)) , where the infimum is over all N ∈ N and (xi , yi )N i=1 ⊂ spt(σ). First of all, ϕ(x) ≤ c(x, y¯) − c(¯ x, y¯) < a(x) + b(¯ y ) − c(¯ x, y¯) 1 1 and since a ∈ L (µ), also max{ϕ, 0} ∈ L (µ). Secondly, we have ϕ(x) = inf n c(x, y1 ) − ψ(y1 ) y1 ∈R with ψ(y1 ) = sup (c(x1 , y1 ) − c(x1 , y2 ) + c(x2 , y2 ) − · · · − c(xN , y¯) + c(¯ x, y¯)) , N where again the supremum is over all N ∈ N and (xi , yi )i=1 ⊂ spt(σ). (If c(x1 , y1 ) ∈ / spt(σ) for all x1 ∈ Rn , we are taking the supremum over an empty set and by definition we then have ψ(y1 ) = −∞.) Thus ϕ is c-concave. Finally, for any (ˆ x, yˆ) ∈ spt(γ) we have by the definition of ϕ that for every x ∈ Rn it holds ϕ(x) ≤ c(x, yˆ) − c(ˆ x, yˆ) + inf (c(ˆ x, y2 ) − c(x2 , y2 ) + · · · + c(xN , y¯) − c(¯ x, y¯)) = c(x, yˆ) − c(ˆ x, yˆ) + ϕ(ˆ x) which is the same as sup (ϕ(x) − c(x, yˆ)) = −ϕc+ (ˆ y ) = ϕ(ˆ x) − c(ˆ x, yˆ) x∈Rn and so spt(γ) ⊂ ∂ c+ ϕ. (3)⇒(1): In order to show optimality of σ we take a competitor σ ˜ ∈ A(µ, ν). From the c-concavity of ϕ it follows that ϕ(x) + ϕc+ (y) = c(x, y) for all (x, y) ∈ spt(σ) and ϕ(x) + ϕc+ (y) ≤ c(x, y) for all (x, y) ∈ Rn × Rn . Thus Z c(x, y) dσ(x, y) = = Z Z c+ (ϕ(x) + ϕ (y)) dσ(x, y) = c+ (ϕ(x) + ϕ (y)) d˜ σ (x, y) ≤ Z Z ϕ(x) dµ(x) + Z ϕc+ (y) dν(y) c(x, y) d˜ σ (x, y). OPTIMAL MASS TRANSPORTATION 15 A consequence of Theorem 1.25 is that the optimality of the transport plan depends only on the support of the plan. Let us now give a uniqueness result in the simpliest case where c(x, y) = kx − yk2 and the starting measure µ gives zero measure for any Lipschitz graph ) (n−1 X xi ei + f (x1 , . . . , xn )en : (ei ) is an ON-basis of Rn and f is Lipschitz . G= i=1 (For example if µ is absolutely continuous with respect to the Lebesgue measure.) Later we will sharpen the assumption on the measure µ. Theorem 1.26. Let c(x, y) = kx − yk2 and µ ∈ P(Rn ) such that µ(G) = 0 for all Lipschitz graphs G. Suppose that ν ∈ P(Rn ) is such that there exists a transport from µ to ν with finite cost. Then there exists a unique optimal tranport plan from µ to ν and it is induced by a map. Let us start with a simple lemma. Lemma 1.27. Let σ ∈ A(µ, ν). Then σ is induced by a map if and only if there exists a σ-measurable set Γ ⊂ Rn × Rn where σ is concentrated such that for µ-a.e. x ∈ Rn there exists only one y ∈ Rn such that (x, y) ∈ Γ. Proof. Suppose that σ is induced by some map T . Then σ is concentrated on the graph (id × T )(Rn ) as required. To prove the other direction, suppose that σ is concentrated on the set Γ with the property that for µ-a.e. x ∈ Rn there exists only one y ∈ Rn such that (x, y) ∈ Γ. By assumption, outside a σ-negligible set N × Rn the set Γ is a graph, i.e. for all x ∈ Rn \ N there exists only on y =: T (x) such that (x, y) ∈ Γ. Moreover, by the inner regularity of σ we find a sequence of compact sets Γi ⊂ Γ \ (N × Rn ) such that ! ∞ [ Γi = 0. σ Γ\ i=1 S As a continuous image of a σ-compact set the projection of ∞ i=1 Γi to the first component Rn is σ-compact. Moreover, T |π(Γ ) is continuous from π(Γi ) to Rn for all Γi . Therefore T is i Borel. Since for all ϕ ∈ Cc (Rn × Rn ) we have Z Z Z ϕ(x, y) dσ(x, y) = ϕ(x, T (x)) dσ(x, y) = ϕ(x, T (x)) dµ(x), the equality σ = (id × T )♯ µ holds. We will also need the following geometric lemma. Lemma 1.28. Let µ ∈ P(Rn ) be such that µ(G) = 0 for any graph G of a Lipschitz function. Then for every ǫ > 0 and v ∈ Sn−1 we have C(x, v, ǫ) ∩ spt(µ) 6= ∅ at µ-a.e. point x ∈ spt(µ), where C(x, v, ǫ) := {x + tv + ǫtw : t ∈ (0, ∞) and w ∈ Bn }. 16 OPTIMAL MASS TRANSPORTATION B(x + vt, ǫt) v C(x, v, ǫ) x Proof. Suppose that the claim is not true. Then there exist ǫ > 0, v ∈ (S)n−1 and a set A ⊂ spt(µ) with µ(A) > 0 such that C(x, v, ǫ) ∩ spt(µ) = ∅ for all x ∈ A. But now the orthogonal projection πv⊥ : R2 → v ⊥ is a bi-Lipschitz map between A and πv⊥ (A). Hence A is a subset of a Lipschitz graph and by assumption µ(A) = 0. This is a contradiction. Proof of Theorem 1.26. By Theorem 1.17 there exists an optimal transport plan σ from µ to ν. By contradiction let us assume that σ is not induced by a map. By Lemma 1.27 the measure σ is not concentrated on any set Γ ⊂ Rn × Rn with the property that µ-a.e. x ∈ Rn there exists only one y ∈ Rn such that (x, y) ∈ Γ. In particular, we have that the set A := {x ∈ Rn : the set {y ∈ Rn : (x, y) ∈ spt(σ)} has more than one element} has positive µ-measure. Our aim is now to find a contradiction with the c-cyclical monotonicity of spt(σ) provided by Theorem 1.25. We will arrive at the contradiction via a discretization and a final geometric argument at a density point. Let us start with the discretization. We can write [ A= Ai i∈N with 1 n Ai := x ∈ R : there exist (x, y1 ), (x, y2 ) ∈ spt(σ) with ky1 − y2 k ≥ . i Since µ(A) > 0, there exists i ∈ N such that µ(Ai ) > 0. Let us fix such i. Next we can cover 1 Rn with balls B(xj , 10i ), j ∈ N, and write [ Ai := Fj,k j,k∈N with Fj,k 1 ) := x ∈ Rn : there exist (x, y1 ), (x, y2 ) ∈ spt(σ) with y1 ∈ B(xj , 10i 1 1 and y2 ∈ B(xk , ) such that ky1 − y2 k ≥ . 10i i Again, we can fix j, k ∈ N such that µ(Fj,k ) > 0. Take a point z1 of µ(Fj,k ) given by Lemma 1.28 such that there exists a point z2 ∈ µ(Fj,k ) 9 1 with hz1 − z2 , xj − xk i > 10 kz1 − z2 k kxk − xk k. Now there exist y1 ∈ B(xj , 10i ) and y2 ∈ 1 B(xk , 10i ) such that (z1 , y1 ), (x2 , y2 ) ∈ spt(σ). Now hz1 , y1 i + hz2 , y2 i − hz1 , y2 i − hz2 , y1 i = hz1 − z2 , y1 − y2 i < 0 contradicting the c-cyclical monotonicity of spt(σ). (Recall the remark in Example 1.24.) OPTIMAL MASS TRANSPORTATION 17 xk y2 xj y1 z2 z1 1.4. Dual transportation problem. Let us next connect the Kantorovich problem to a dual formulation: Dual formulation of the optimal transport problem Let µ, ν ∈ P(Rn ). Maximize Z Z ϕ(x) dµ(x) + ψ(y) dν(y) among all functions ϕ ∈ L1 (µ), ψ ∈ L1 (ν) such that ϕ(x) + ψ(y) ≤ c(x, y), for all x, y ∈ Rn . Theorem 1.29 (duality). Let µ, ν ∈ R and c : Rn × Rn → R continuous and bouned below such that c(x, y) ≤ a(x) + b(y) for some a ∈ L1 (µ) and b ∈ L1 (ν). Then the minimum of the Kantorovich problem equals the supremum in the dual formulation and this supremum is attained by some couple (ϕ, ϕc+ ) with ϕ a c-concave function. Proof. Let σ ∈ A(µ, ν). For any pair ϕ ∈ L1 (µ) and ψ ∈ L1 (ν) satisfying ϕ(x) + ψ(y) ≤ c(x, y), we have Z c(x, y) dσ(x, y) ≥ Z for all x, y ∈ Rn (ϕ(x) + ψ(y)) dσ(x, y) = Z ϕ(x) dµ(x) + Z ψ(y) dν(y). Thus the supremum in the dual problem never exceeds the minimum of the Kantorovich problem. For the other direction, take σ ∈ Opt(µ, ν) and let ϕ be the c-concave function given by Theorem 1.25: spt(σ) ⊂ ∂ c+ ϕ and max{ϕ, 0} ∈ L1 (µ). 18 OPTIMAL MASS TRANSPORTATION Notice that by the assumption c(x, y) ≤ a(x) + b(y) and by the inequality ϕ(x) + ϕc+ (y) ≤ c(x, y), we have Z Z Z c+ ϕ (y) dν(y) ≤ (c(x, y) − ϕ(x)) dν(y) ≤ (a(x) + b(y) − ϕ(x)) dν(y) < ∞. Hence also max{ϕc+ , 0} ∈ L1 (ν). Now Z Z Z Z c+ c+ ϕ(x) dµ(x) + ϕ (y) dν(y) = (ϕ(x) + ϕ (y)) dσ(x, y) = c(x, y) dσ(x, y), and so ϕ ∈ L1 (µ), ϕc+ ∈ L1 (ν). Thus (ϕ, ϕc+ ) is an admissiple pair of functions for the dual problem and we have proven the claim. Definition 1.30 (Kantorovich potential). A c-concave function ϕ such that (ϕ, ϕc+ ) is a maximizing pair for the dual problem is called a Kantorovich potential for the couple µ and ν. Let us now have another look at the existence of optimal maps with the cost c(x, y) = kx−yk2 . 2 Proposition 1.31. Let ϕ : Rn → R ∪ {−∞}. Then ϕ is c-concave if and only if x 7→ ϕ(x) ¯ := kxk2 c − + ¯ 2 − ϕ(x) is convex and lower semicontinuous. In this case ∂ ϕ = ∂ ϕ. Proof. Notice that ϕ(x) = infn y∈R kyk2 kxk2 kx − yk2 − ψ(y) = infn − hx, yi + − ψ(y) , y∈R 2 2 2 or equivalently kyk2 kxk2 − ϕ(x) = sup hx, yi − + ψ(y) , ϕ(x) ¯ = 2 2 y∈Rn which proves the first claim. For the second claim, observe that ( ϕ(x) = y ∈ ∂ c+ ϕ(x) ⇐⇒ ϕ(z) ≤ kx−yk2 2 kz−yk2 2 − ϕc+ (y), − ϕc+ (y), ∀z ∈ Rn or equivalently ( ϕ(x) − ϕ(z) − kxk2 2 kzk2 2 which is the same as ϕ(z) − = hx, −yi + ≤ hz, −yi + kyk2 c+ 2 − ϕ (y), 2 kyk c+ 2 − ϕ (y), ∀z ∈ Rn kzk2 kxk2 ≤ ϕ(x) − + hz − x, −yi 2 2 and −y ∈ ∂ + (−ϕ)(x) ¯ ⇐⇒ y ∈ ∂ − ϕ(x). ¯ Definition 1.32 (c − c hypersurface). A set E ⊂ Rn is called a c − c hypersurface, if it is the graph of the difference of two real valued convex functions in some coordinate system. OPTIMAL MASS TRANSPORTATION 19 Notice that any c − c hypersurface is contained in a countable union of Lipschitz graphs since a convex function is locally Lipschitz. Hence µ(L) = 0 for every Lipschitz graph L ⇒ µ(E) for every c − c hypersurface E. We will need the following result in differentiability of convex functions. Theorem 1.33. Let A ⊂ Rn . Then there exists a convex function f : Rn → R such that A is contained in the set of points of non differentiability of f if and only if A can be covered by countably many c − c hypersurfaces. Proof. (skipped) Theorem 1.34 (Brenier). Let µ ∈ P(Rd ) such that kxk2 dµ(x) < ∞. Then the following two conditions are equivalent: R (i) for every ν ∈ P(Rn ) with kxk2 dν(x) < ∞ there exists only one transport plan from µ to ν and this plan is induced by a map, (ii) for every c − c hypersurface E ⊂ Rn we have µ(E) = 0. R Furthermore, when (i) and (ii) hold, the optimal map is the gradient of a convex function. Proof. (ii) ⇒ (i): By taking a(x) = b(x) = kxk2 we notice that c(x, y) = kx − yk2 ≤ kxk2 + kyk2 2 and a ∈ L1 (µ) and b ∈ L1 (ν). Thus the assumptions of Theorem 1.25 are satisfied. Thus any optimal plan σ ∈ Opt(µ, ν) is concentrated on the superdifferential of a Kantorovich potential ϕ. By Proposition 1.31 the function ϕ(x) ¯ := kxk2 /2 − ϕ(x) is convex and ∂ c+ ϕ = ∂ − ϕ. ¯ Since ϕ¯ is convex, by Theorem 1.33 the set of non differentiability points of ϕ is contained in a union of countably many c − c hypersurfaces. By assumption, this set has zero µ measure. Thus ϕ¯ is differentiable µ-almost everywhere. In particular, ∂ − ϕ(x) ¯ is single-valued for µ-almost d every x ∈ R and by Lemma 1.27 the optimal plan is induced by a map. Consequently, the optimal plan is unique. (i) ⇒ (ii): Assume that the claim is not true. Then by Theorem 1.33 there exists a convex function ϕ¯ : Rn → R such that the set E of non differentiability points of ϕ¯ has positive µ measure. We may assume that ϕ¯ has linear growth at infinity. In the non differentiability points x ∈ E the set ∂ − ϕ(x) ¯ has more than one point. Let us select measurably for every x ∈ E points S(x), T (x) ∈ ∂ − ϕ(x) ¯ such that T (x) 6= S(x). Now define σ := 1 ((id × T )♯ µ + (id × S)♯ µ) . 2 Since ϕ¯ has linear growth, ν := π♯2 σ by Proposition 1.31 the support of measure σ ∈ Opt(µ, ν) by Theorem since T (x) 6= S(x) for all x ∈ E and R has compact support and thus kxk2 dν(x) < ∞. Since the measure σ ∈ A(µ, ν) is c-cyclically monotone, the 1.25. However, the measure σ is not induced by a map E has positive µ measure. 1.5. A few applications of optimal transport. Recall that by the classical Helmholtz decomposition a sufficiently smooth vector field can be decomposed into a curl-free component and a divergence-free component. We can now prove a generalization of this decomposition. 20 OPTIMAL MASS TRANSPORTATION Theorem 1.35 (Polar factorization). Suppose Ω ⊂ Rn is a bounded domain and µΩ the normalized Lebesgue measure on Ω. Let S ∈ L2 (µΩ ; Rn ) be such that ν := S♯ µΩ gives zero measure to c − c hypersurfaces. Then there exist unique s ∈ S(Ω) := {s : Ω → Ω Borel map with s♯ µΩ = µΩ } and ∇ϕ with ϕ convex, such that S = (∇ϕ) ◦ s. Moreover, s is the unique minimizer of Z kS − s˜k2 dµΩ among all s˜ ∈ S(Ω). Before proving the polar factorization, let us formally see how it generalizes the Helmholtz decomposition. For this purpose, suppose that Ω (and everything else) is smooth. Let u : Ω → Rn be a vector field and consider the polar factorization of Sǫ := id + ǫu with |ǫ| small. Then we have the decomposition Sǫ = (∇ϕǫ ) ◦ sǫ with ∇ϕǫ = id + ǫv + o(ǫ) and sǫ = id + ǫw + o(ǫ). For the curl-free component, notice that since ∇ × (∇ϕǫ ) = 0, we have ∇ × v = 0 and thus v is the gradient of some function p. On the other hand, since sǫ is measure preserving, we have ∇ · (wχΩ ) = 0 in the sense of distributions, giving us the divergent-free component. R R Proof of Theorem 1.35. By assumption kxk2 dµΩ (x) < ∞ and kxk2 dν(x) < ∞. We claim that Z Z inf s˜∈S(Ω) kS(x) − s˜(x)k2 dµ(x) = min σ∈A(µ,ν) kx − yk2 dσ(x, y). (1.1) To see this, let σs˜ := (˜ s, S)♯ µ ∈ A(µΩ , ν) for all s˜ ∈ S(Ω). This already gives that the left-hand side is at least the right-hand side in (1.1). Now, take σ ¯ ∈ Opt(µΩ , ν), which by Theorem 1.34 is unique. Moreover, by Theorem 1.34 we have σ ¯ = (id, ∇ϕ)♯ µΩ = (∇ϕ, ˜ id)♯ ν for some convex functions ϕ, ϕ. ˜ Now we have ∇ϕ ◦ ∇ϕ(x) ˜ = x for µΩ -almost every x ∈ Rn . Define s := ∇ϕ˜ ◦ S. Then s♯ µΩ = µΩ , and thus s ∈ S(Ω). Also S = ∇ϕ ◦ s giving the polar factorization. Furthermore, Z Z Z 2 2 kx − yk dσs (x, y) = ks(x) − S(x)k dµΩ (x) = k∇ϕ˜ ◦ S(x) − S(x)k2 dµΩ (x) Z Z = k∇ϕ(y) ˜ − yk2 dν(y) = min kx − yk2 dσ(x, y) σ∈A(µΩ ,ν) giving the claimed equality in (1.1). Finally, in order to see the uniqueness of the factorization, assume that S = (∇ϕ) ¯ ◦ s¯ is another polar factorization of S. Since ∇ϕ¯♯ µΩ = ((∇ϕ) ¯ ◦ s¯)♯ µΩ = ν, ∇ϕ¯ is a transport map from µΩ to ν. Moreover, since ϕ¯ is convex, ∇ϕ¯ is optimal. By the uniqueness of the optimal map, ∇ϕ¯ = ∇ϕ. As another application of optimal transport we give a short proof of the isoperimetric inequality in Rn . OPTIMAL MASS TRANSPORTATION 21 Theorem 1.36 (Isoperimetric inequality). Let E ⊂ Rn be open. Then P (E) 1 Ln (E)1− n ≤ 1 nLn (B) n , where B is the unit ball in Rn and P (E) is the perimeter of the set E. Proof. We will give the proof without paying too much attention to smoothness issues. Let the cost-function be c(x, y) = kx − yk2 . Define 1 1 µ := n Ln and ν := n Ln , L (E) |E L (B) |B and let T : Rn → Rn be the optimal transport map given by Theorem 1.34. By the change of variable formula, we have 1 1 = det(∇T (x)) n , for all x ∈ E. Ln (E) L (B) Since T is the gradient of a convex function, ∇T (x) is a symmetric matrix with nonnegative eigenvalues for every x ∈ E. Thus by the inequality for arithmetic-geometric means we have ∇ · T (x) , n Combining the previous two observations we get 1 (det ∇T (x)) n ≤ 1 Ln (E) 1 n = 1 ∇ · T (x) 1 , n Ln (B) n for all x ∈ E. for all x ∈ E. Integrating over E and by using the divergence theorem we get Z Z 1 1 1 ∇ · T (x) dx = hT (x), v(x)i dHd−1 (x), L(E)1− n ≤ 1 1 n n nL (B) n E nL (B) n ∂E where v : ∂E → Rn is the outer unit normal vector. Since T (x) ∈ B for all x ∈ E, we have hT (x), v(x)i ≤ 1 for all x ∈ E and thus Z 1 1 P (E) 1− n ≤ hT (x), v(x)i dHd−1 (x) ≤ L(E) 1 1 . n nL (B) n ∂E nLn (B) n As a third application of optimal transport we will prove the standard Sobolev inequality in Rn . Theorem 1.37 (Sobolev inequality). Let 1 ≤ p < n and define p∗ := a constant C > 0 depending only on n and p such that kf kp∗ ≤ Ck∇f kp , np n−p . Then there exists for all f ∈ W 1,p (Rn ). Proof. Let n and p be fixed. We may assume that f ≥ 0 and kf kp∗ = 1. Our aim is then to prove that k∇f kp ≥ C for some constant C independent of f . Let g : Rn → R be a smooth nonnegative function with kgk1 = 1, and define ∗ µ := f p ∗ Ln and ν := gLn . Let T be the optimal transport map from µ to ν (again with the cost being given by the square of the distance). 22 OPTIMAL MASS TRANSPORTATION The change of variable formula gives ∗ for all x ∈ Rn . f p (x) = det(∇T (x))g(T (x)), Hence Z g 1 1− n = Z g 1 −n g= Z 1 −n (g ◦ T ) f p∗ = Z 1 ∗ 1 det(∇T ) n (f p )1− n . As in the previous proof, we know that T is the gradient of a convex function and thus ∇T (x) is a symmetric matrix with nonnegative eigenvalues, and thus by the inequality for arithmetic-geometric means we have 1 (det ∇T (x)) n ≤ Therefore where 1 p + 1 q Z g 1 1− n 1 ≤ n Z ∇ · T (x) , n 1 p∗ 1− n ∇ · T (f ) p∗ =− n for all x ∈ E. 1 1− n Z f p∗ q T · ∇f, = 1. By H¨older inequality we finally get Z 1 Z 1 Z q p 1 1 p∗ 1− n q p∗ p g 1− f |T | ≤ |∇f | n n 1 1 Z Z q p 1 p∗ p q |∇f | g(y)|y| dy 1− = n n giving the claim. 2. Lp transportation distances in metric spaces Let us now turn to optimal mass transportation in more general metric spaces. Definition 2.1. Let (X, d) be a complete and separable metric space and 1 ≤ p < ∞. We define the Lp transportation distance ( Wasserstein distance, Kantorovich-Rubinstein distance, ...) Wp between two measures µ, ν ∈ P(X) as 1 Z p p d (x, y) dσ(x, y) Wp (µ, ν) := inf . σ∈A(µ,ν) X×X Remarks 2.2. (i) The definition of the distance makes sense without knowing the existence of a minimizer in the definition of Wp (µ, ν). However, the existence follows as in the Euclidean case. (ii) The function Wp : P(X) × P(X) → [0, ∞] is typically a distance because it may P∞not −i attain the value +∞. For example, in R if µ = δ0 and ν = i=1 2 δ2i , we have W1 (µ, ν) = ∞ X 2−i 2i = ∞. i=1 In order to have a finite distance, one restricts the function Wp to a subset Pµ,p (X) ⊂ P(X) defined as Pµ,p (X) := {ν ∈ P(X) : Wp (µ, ν) < ∞} for any µ ∈ P(X). The most commonly used subset is Pp (X) := Pδx ,p (X) for some x ∈ X. By triangle inequality, this definition is independent of the point x. OPTIMAL MASS TRANSPORTATION 23 Theorem 2.3. Let (X, d) be a complete separable metric space and 1 ≤ p < ∞. Then (Pp (X), Wp ) is a metric space. Proof. Obviously Wp (µ, µ) = 0 and Wp (µ, ν) = Wp (ν, µ). R Assume that Wp (µ, ν) = 0. Then there exists an optimal plan σ ∈ Opt(µ, ν) such that dp (x, y) dσ(x, y) = 0 meaning that x = y for σ-a.e. (x, y) ∈ X × X. Thus µ = π♯1 σ = π♯2 σ = ν. We still need to show that the triangle inequality is satisfied. The triangle inequality will also imply that Wp (µ, ν) ≤ Wp (µ, δx ) + Wp (δx , ν) < ∞. Let µ1 , µ2 , µ3 ∈ Pp (X). Take σ1,2 ∈ Opt(µ1 , µ2 ) and σ2,3 ∈ Opt(µ2 , µ3 ) where the optimality is measured with the cost kx − ykp . We want to construct an admissible transport σ1,3 ∈ A(µ1 , µ3 ) by gluing together σ1,2 and σ2,3 . The gluing can be done by using the disintegration theorem: write dσ1,2 (x, y) = dµ2 (y)dσy1,2 (x) and define dσ1,3 (x, z) = Z and dσ2,3 (y, z) = dµ2 (y)σy2,3 (z), d(σy1,2 × σy2,3 )(x, z)dµ2 (y) by integrating over y. Now 1 Z Z 1 Z p p p 1,2 2,3 p d (x, z) d(σy × σy )(x, z)dµ2 (y) = d (x, z) dσ1,3 (x, z) Wp (µ1 , µ3 ) ≤ ≤ Z Z + p d Z Z = Z Z = Z Z (x, y) d(σy1,2 p d × (y, z) d(σy1,2 1 p σy2,3 )(x, z)dµ2 (y) × 1 σy2,3 )(x, z)dµ2 (y) p 1 Z Z 1 p p p 2,3 dp (x, y) dσy1,2 (x)dµ2 (y) d (y, z) dσy (z)dµ2 (y) + 1 Z Z 1 p p p d (x, y) dσ1,2 (x, y) + d (y, z) dσ2,3 (y, z) p = Wp (µ1 , µ2) + Wp (µ2 , µ3 ). Let us next look at the topology given by the Wp distance. Theorem 2.4. Let p ∈ [1, ∞) and (X, d) complete and separable. Then for µi , µ ∈ Pp (X) we have Wp (µi , µ) → 0 if and only if µi → µ weakly and Z Z dp (x, x0 ) dµ(x) for some x0 ∈ X. dp (x, x0 ) dµi (x) → X X Proof. We will prove the claim only in the simple case where (X, d) is proper (i.e. closed balls are compact). Assume first that Wp (µi , µ) → 0. Then Z 1 1 Z p p dp (x, x0 ) dµ(x) = |Wp (µi , δx0 ) − Wp (µ, δx0 )| − dp (x, x0 ) dµi (x) ≤ Wp (µi , µ) → 0. 24 OPTIMAL MASS TRANSPORTATION Since µi (X \ B(x0 , R) ≤ Z X\B(x0 ,R) dp (x, x0 ) 1 dµi (x) ≤ p Wpp (µi , δx0 ), p R R the set of measures {µi } is tight. Thus it suffices to check the weak∗ convergence. Since Lipschitz functions are dense in Cc (X) with respect to the uniform convergence, it suffices to check the convergence against Lipschitz functions f . Let σi ∈ Opt(µi , µ), for the cost dp (x, y), and estimate Z Z Z f (x) dµi (x) − (f (x) − f (y)) dσi (x, y) f (y) dµ(y) = X X Z X×X |f (x) − f (y)| dσi (x, y) ≤ X×X Z Lip(f )d(x, y) dσi (x, y) ≤ X×X ≤ Lip(f ) Z X×X 1 p d (x, y) dσi (x, y) p = Lip(f )Wp (µi , µ) → 0. Let us then prove the converse direction. Suppose the claim is not true. Then there exists a subsequence of (µi ), still denoted by (µi ) and ǫ > 0 such that Wpp (µi , µ) ≥ ǫ Now take R > 0 such that Z dp (x, x0 ) dµi (x) < X\B(x0 ,R) for all i. ǫ 3 · 2p+1 for all i. Notice that dp (x, y) ≤ 2p (dp (x, x0 ) + dp (y, x0 )). Let σi ∈ Opt(µi , µ). Since {µi } is tight, so is {σi }. Therefore there exists a subsequence converging weakly to some σ. Now along this subsequence Z Z 2ǫ dp (x, y) dσi (x, y) + dp (x, y) dσi (x, y) ≤ ǫ ≤ Wpp (µi , µ) = 3 B(x0 ,R)×B(x0 ,R) X×X Z 2ǫ dp (x, y) dσ(x, y) + , → 3 B(x0 ,R)×B(x0 ,R) which is a contradiction provided that we can show σ ∈ Opt(µ, µ). Since clearly σ ∈ A(µ, µ), we only need to show optimality. This is seen using cyclical monotonicity: Since σi are optimal, their supports are cyclically monotone set. Suppose (xj , yj ) ∈ spt(σ) for j = 1, . . . , N . Since σi → σ weakly, there exist (xij , yji ) ∈ spt(σi ) such that d(xij , xj ), d(yji , yj ) → 0 as i → ∞. dp Thus by continuity of the cost function and the cyclical monotonicity of spt(σi ) we have that no permutation of the pairs (xj , yj ) lower the cost for σ. Thus spt(σ) is cyclically monotone and hence σ optimal. Theorem 2.5. Let p ∈ [1, ∞) and (X, d) complete and separable. Then (Pp (X), Wp ) is complete and separable. OPTIMAL MASS TRANSPORTATION 25 Proof. Let us again prove only the simple case with (X, d) proper. Let us first prove completeness. Let (µi ) ⊂ Pp (X) be a Cauchy sequence. Since Wp (µi , δx0 ) ≤ Wp (µi , µ1 ) + Wp (µ1 , δx0 ), The sequence (µi ) is tight. Hence there exists a subsequence converging weakly to a measure µ ∈ P(X). For n ∈ N consider the sequence (σi ) of measures σi ∈ Opt(µi , µn ) that is tight by the tightness of (µi ). Hence there exists a subsequence weakly converging to σ ∈ A(µ, µn ). Using this limit we get Z Z p p dp (x, y) dσi (x, y) = lim inf Wpp (µi , µn ). d (x, y) dσ(x, y) ≤ lim inf Wp (µ, µn ) ≤ i→∞ X×X i→∞ X×X Hence lim sup Wp (µ, µn ) ≤ lim sup lim inf Wp (µi , µn ) = 0. n→∞ i→∞ n→∞ Therefore (Pp (X), Wp ) is complete. Let (xi )∞ i=1 ⊂ X be dense in (X, d). Then N N X X D := aj = 1, aj ∈ [0, 1] ∩ Q aj δxj : N ∈ N, j=1 j=1 is dense in (Pp (X), Wp ). To see this, take µ ∈ Pp (X) and ǫ > 0. Let R > 0 be such that Z dp (x, x0 ) µ(x) < ǫp . X\B(x1 ,R) ¯ 1 , R) is compact, there exists a finite set of points {yj }N ⊂ {xi } such that Since B(x j=1 B(x1 , R) ⊂ N [ B(yj , ǫ). j=1 Define inductively Aj = Bj \ j−1 [ Bj k=1 for j = 1, . . . , N + 1 with BN +1 := X. By perturbing the weights a tiny amount, we may assume that µ(Aj ) ∈ Q. Using the sets Aj we define a measure ν ∈ D as ν := N +1 X µ(Aj )δyj , j=1 with yN +1 := x1 . Now N Z N +1 Z X X p p d (x, yj ) dµ(x) ≤ Wp (µ, ν) ≤ j=1 Aj Thus (Pp (X), Wp ) is separable. j=1 Aj p p ǫ dµ(x) + ǫ ≤ Z ǫp dµ(x) + ǫp = 2ǫp . X Let us note that local compactness of (X, d) does not imply that (Pp (X), Wp ) is locally compact: 26 OPTIMAL MASS TRANSPORTATION Example 2.6. Take ǫ > 0 and define a sequence of measures ǫp ǫp δn ∈ Pp (N). µn := 1 − p δ1 + n (n − 1)p Now ǫp Wpp (µn , δ1 ) = (n − 1)p = ǫp , (n − 1)p ¯ 1 , ǫ), but also that B(δ ¯ 1 , ǫ) is not a compact subset showing on one hand that (µn )n∈N ∈ B(δ of (Pp (X), Wp ). This can be seen from the fact that µn weakly converges to δ1 , but not w.r.t. Wp . 2.1. Geodesic spaces. Let us now consider geodesic (complete, separable) metric spaces. Let us first recall some definitions Definition 2.7. Let (X, d) be a metric space. A curve γ : [0, 1] → X is called a (constant speed) geodesic if d(γt , γs ) = |t − s|d(γ0 , γ1 ) for all t, s ∈ [0, 1], where γt := γ(t). We denote the set of all geodesics in X by Geo(X). The space (X, d) is called geodesic if for every x, y ∈ X there exists γ ∈ Geo(X) with γ0 = x and γ1 = y. We equip Geo(X) with the supremum distance: d(γ, γ ′ ) = sup d(γt , γt′ ). t∈[0,1] To ease the notation, we define the evaluation map et : Geo(X) → X : γ 7→ γt . Notice first that for any metric space (X, d) the space (P1 (X), W1 ) is geodesic: For any pair µ0 , µ1 ∈ P1 (X) we have a geodesic µt := tµ1 + (1 − t)µ0 . For Wp with p > 1 the situation is different. We will prove the following. Theorem 2.8. Let (X, d) be a complete, separable and geodesic. Then (Pp (X), Wp ) is geodesic. We will again only show the case with (X, d) proper. In order to obtain a geodesic of Borel measures in (Pp (X), Wp ) we need a measurable selection theorem. Theorem 2.9 (Theorem 6.9.6 in [1]). Let X and Y be complete and separable metric spaces and Γ ∈ B(X × Y ). Suppose that Γx := {y ∈ Y : (x, y) ∈ Γ} is nonempty and σ-compact for all x ∈ X. Then Γ contains the graph of some Borel mapping f : X → Y . Let us use Theorem 2.9 to measurably select the geodesics. Lemma 2.10. Let (X, d) be a proper metric space. Then there exists a Borel map S : X×X → Geo(X) such that S(x, y)0 = x and S(x, y)1 = y. Proof. We need to show that for every x, y ∈ X the set Γx,y := {γ ∈ Geo(X) : γ0 = x, γ1 = y} is σ-compact and that set Γ= [ x,y∈X is Borel. (x, y) × Γx,y OPTIMAL MASS TRANSPORTATION 27 ¯ d(x, y)) is compact. Now take a sequence Let x, y ∈ X. Since (X, d) is proper, the set B(x, ¯ ⊂ Γx,y . For n ∈ N the set B(x, d(x, y)) is covered by finitely many balls of radius d(x, y)/n. On the other hand for any t ∈ [0, 1] we have d(γsi , γti ) < d(x, y)/n if |t − s| < 1/n. Hence there exists a subsequence of (γ i ) with diameter bounded by a given constant. Therefore it has a subsequence converging to some γ : [0, 1] → X. It is easy to check that also γ ∈ Γx,y . Hence Γx,y is compact. ¯ it is easy to check that γ ∈ Γ. Hence Γ is closed and thus Borel. Now Similarly, if γ ∈ Γ, we are in the position to use Theorem 2.9 to make the claimed Borel selection. (γ i ) Proof of Theorem 2.8. Let µ0 , µ1 ∈ Pp (X) and σ ∈ Opt(µ0 , µ1 ). Let S : X × X → Geo(X) be the Borel map given by Lemma 2.10. Now define ν := S♯ σ ∈ Pp (Geo(X)) and µt := (et )♯ ν for all t ∈ [0, 1]. We claim that (µt ) is the geodesic connecting µ0 to µ1 . Since (e0 , e1 ) ◦ S = id the measures µ0 and µ1 are indeed given as claimed. Now Z Z p p p d (γs , γt ) dν(γ) = |t − s| dp (γ0 , γ1 ) dν(γ) Wp (µs , µt ) ≤ Geo(X) Geo(X) Z = |t − s|p dp (x, y) dσ(x, y) = |t − s|p Wpp (µ0 , µ1 ). X×X Hence (µt ) is a constant speed geodesic. In the previous proof we noticed that we actually gave the geodesic (µt ) ∈ Geo(Pp (X)) using a measure on the space of geodesics. In fact, we can always “lift” a given measure (µt ) ∈ Geo(Pp (X)) to a measure on geodesic. This is stated in the following theorem. Theorem 2.11. Let p > 1 and (X, d) separable, complete and geodesic. Then for any geodesic (µt ) ∈ Geo(Pp (X)) there exists a measure π ∈ Pp (Geo(X)) such that µt = (et )♯ π for all t ∈ [0, 1]. Proof. The measure on geodesics is built inductively. First we find, similarly as in the proof of Theorem 2.8, measures ν0→ 1 , ν 1 →1 ∈ Pp (Geo(X)) such that (e0 , e1 )♯ ν0→ 1 ∈ Opt(µ0 , µ 1 ) 2 2 2 2 and (e0 , e1 )♯ ν 1 →1 ∈ Opt(µ 1 , µ1 ). Next we glue these measures together via the disintegration 2 2 theorem: write x dν0→ 1 (γ) = dµ 1 (x)dν0→ 1 (γ) 2 2 and 2 dν 1 →1 (γ) = dµ 1 (x)dν x1 →1 (γ) 2 2 2 x x with ν0→ on geodesics starting from x. 1 concentrated on geodesics ending at x and ν 1 →1 2 2 Now define ν 1 ∈ P(C([0, 1]; X)) as dν 1 (γ) = dµ 1 (x)dν x (γ) 2 with 1 x x 2 (restr11 (γ)), dν x (γ) = dν0→ 1 (restr0 (γ)) × dν 1 →1 2 where restrtt21 (γ) = γ ′ with γs′ = γ(1−s)t1 +st2 . 2 2 28 OPTIMAL MASS TRANSPORTATION Now by the triangle inequality for Lp , we get !1 Z p dp (γ0 , γ1 ) dν 1 (γ) Wp (µ0 , µ1 ) ≤ C([0,1];X) ≤ = = Z Z !1 p dp (γ0 , γ 1 ) dν 1 (γ) C([0,1];X) 2 !1 + p p Geo(X) d (γ0 , γ1 ) dν0→ 1 (γ) 2 + Z !1 p dp (γ 1 , γ1 ) dν 1 (γ) 2 C([0,1];X) Z !1 p p Geo(X) d (γ0 , γ1 ) dν 1 →1 (γ) 2 1 1 Wp (µ0 , µ1 ) + Wp (µ0 , µ1 ) = Wp (µ0 , µ1 ), 2 2 showing that the inequalities are actually equalities. Since also !1 Z p 1 dp (γ0 , γ1 ) dν 1 (γ) Wp (µ0 , µ1 ) ≤ 2 C([0,1];X) !1 !1 Z p p Z dp (γ 1 , γ1 ) dν 1 (γ) , ≤ max dp (γ0 , γ 1 ) dν 1 (γ) 2 2 C([0,1];X) C([0,1];X) 1 = Wp (µ0 , µ1 ), 2 this implies that for ν 1 -a.e. γ ∈ C([0, 1]; X) we have d(γ0 , γ 1 ) = 12 d(γ0 , γ1 ) and d(γ 1 , γ1 ) = 2 1 2 d(γ0 , γ1 ). 2 Thus ν 1 is concentrated on Geo(X). Now, using the above procedure we can define for every n ∈ N a measure ν n ∈ P(Geo(X)) with the property that (ek2−n )♯ ν n = µk2−n for all k ∈ {0, 1, . . . , 2n }. What is left to show is that (ν n ) converges to the measure we were looking for. Since the ¯ 0 , r)) is compact. Hence the tightness space (X, d) is assumed to be proper the set Geo(B(x n of (ν ) follows from the tightness of {µ0 , µ1 }. Thus there is a subsequence of (ν n ) converging weakly to a measure π ∈ P(Geo(X)). For t = k2−n , k, n ∈ N the equality (et )♯ π = µt is obvious. For other t ∈ [0, 1] the equality holds since for all n ∈ N Wp ((et )♯ ν m , µk2−n ) ≤ 2−n Wp (µ0 , µ1 ) for all m ≥ n, with suitably chosen k ∈ N depending on n and t. Also the converse of Theorem 2.8 holds. Theorem 2.12. Suppose that (X, d) is complete and separable, p > 1, and (Pp (X), Wp ) is geodesic. Then also (X, d) is geodesic. Proof. Take x, y ∈ X and (µt ) ∈ Geo(Pp (X)) connecting δx to δy . Since 1 1 Wp (µ 1 , δx ) = Wp (µ 1 , δy ) = Wp (δx , δy ) = d(x, y), 2 2 2 2 OPTIMAL MASS TRANSPORTATION 29 we have 1−p p 2 d (x, y) = ≥ ≥ Wpp (µ 1 , δx ) + 2 Z Z Wpp (µ 1 , δy ) 2 = Z p d (x, z) dµ 1 (z) + 2 X Z dp (y, z) dµ 1 (z) X 2 (dp (x, z) + (d(x, y) − d(x, z))p ) dµ 1 (z) 2 X 21−p dp (x, y) dµ 1 (z) = 21−p dp (x, y), 2 X where the inequalities are thus equalities. Hence µ 1 is concentrated on the set 2 1 z ∈ X : d(x, z) = d(y, z) = d(x, y) . 2 In particular this set is nonempty. Hence there exists x 1 ∈ X with 2 1 1 d(x, x 1 ) = d(y, ) = d(x, y). 2 2 2 Taking inductively midpoints between x and x 1 , x 1 and y and so on, we obtain a dense set 2 2 of points on a “geodesic”. By completeness this gives the geodesic between x and y. Notice that if p = 1, the conclusion from the chain of (in)equalities in the previous proof would only be that µ 1 is concentrated on the set 2 {z ∈ X : d(x, z) + d(y, z) = d(x, y)} . In particular, it could be that µ 1 = 21 (δx + δy ). 2 Definition 2.13. A geodesic space (X, d) is called nonbranching if for any γ 1 , γ 2 ∈ Geo(X) with γ01 = γ02 and γt1 = γt2 for some t ∈ (0, 1) implies γ 1 = γ 2 . Let us first observe that (Pp (X), Wp ) inherits the nonbranching property of (X, d), if p > 1. In the case p = 1, nontrivial transports can branch in nonbranching geodesic spaces: The two curves γt1 := δt for all t ∈ [0, 1], and ( δt , if t ∈ [0, 12 ] 2 γt := (2 − 2t)δ 1 + (2t − 1)δ1 , if t ∈ [ 12 , 1] 2 are both geodesics in (P(R), W1 ). They start as the same geodesic and then branch at t = 21 . Thus (P(R), W1 ) is branching (i.e. not nonbranching). Theorem 2.14. Let (X, d) be a complete, separable, geodesic, nonbranching metric space and 1 < p < ∞. Then (Pp (X), Wp ) is also nonbranching. Proof. Suppose that (Pp (X), Wp ) is branching. Thus there exist Γ, Γ′ ∈ Geo(Pp (X)) and t ∈ (0, 1) such that Γ0 = Γ′0 and Γt = Γ′t , but still Γ 6= Γ′ meaning that there exists s ∈ (0, 1] such that Γs 6= Γ′s . We may assume t < s. Let π1 , π2 ∈ P(Geo(X)) be such that (er )♯ π1 = Γrt+(1−r)s and (er )♯ π2 = Γ′rt+(1−r)s for all r ∈ [0, 1]. Disintegrating both π1 and π2 with respect to e0 , (i.e. Γt ), we get two sets of measures i = 1, 2. dπi (γ) = dΓt (x)dπix (γ), 30 OPTIMAL MASS TRANSPORTATION In a set of positive Γt -measure π1x (γ) 6= π2x (γ). For the beginning of the geodesics, consider π3 ∈ P(Geo(X)) such that (er )♯ π3 = Γrt for all r ∈ [0, 1]. Now disintegrating π3 w.r.t. e1 gives dπ3 (γ) = dΓt (x)dπ3x (γ). For Γt -almost every x ∈ X the gluing of π3x and π1x lives on geodesics. The same holds for the gluing of π3x and π1x since Γ′0 = Γ0 and Γ′t = Γt . All in all, we get for x ∈ X on a set of positive Γt -measure branching geodesics going via x, contradicting the nonbranching assumption. Let us next investigate how the nonbranching assumption is connected to the existence of optimal transport maps. We start with the observation that any inner point of a geodesic in Pp (X) in a nonbranching X has an optimal transport map to the endpoint of the geodesic. Theorem 2.15. Let (X, d) be a complete, separable, geodesic, nonbranching metric space and 1 < p < ∞. Let (µt ) ∈ Geo(Pp (X)). Then for every t ∈ (0, 1) there exists a unique σ ∈ Opt(µt , µ1 ) and it is induced by a map. Proof. Suppose this is not the case. Let σ ∈ Opt(µt , µ1 ) be such that it is not induced by a map. Then there exist x, y1 , y2 ∈ X such that y1 6= y2 and (x, y1 ), (x, y2 ) ∈ spt(σ). Let σ ′ ∈ Opt(µ0 , µt ) and z ∈ X such that (z, x) ∈ spt(σ ′ ). Gluing these optimal transports together at µt gives an optimal plan σ ′′ ∈ Opt(µ0 , µ1 ) with (z, y1 ), (z, y2 ) ∈ spt(σ ′′ ). Now z, x, y1 lie on the same geodesic as well as z, x, y2 , as otherwise µt would not be on a geodesic connecting µ0 and µ1 . Thus we have a contradiction with the nonbranching of (X, d). In the Euclidean case X = Rn with p = 2 we can say more about the intermediate transport maps of Theorem 2.15: Theorem 2.16. Let (µt ) ∈ Geo(P2 (Rn )). Then for any t ∈ (0, 1) the unique σ ∈ Opt(µt , µ1 ) is induced by a 1t -Lipschitz map. Proof. For any µ0 , µ1 ∈ P2 (Rn ) and σ ∈ Opt(µ0 , µ1 ) by the uniqueness of geodesics in Rn , there exists only one π ∈ P(Geo(Rn )) such that (e0 , e1 )♯ π = σ. The corresponding geodesic in P(Rn ) is µt = ((1 − t)e0 + te1 )♯ π = ((1 − t)P1 + tP2 )♯ σ. Thus the unique tranport plan from µt to µ1 is given by ((1 − t)P1 + tP2 , P2 )♯ σ. This plan is supported on a set G := {((1 − t)x + ty, y) : y ∈ ∂ − ϕ(x)} with ϕ convex. Recall that for a convex function ϕ and any (x1 , y1 ), (x2 , y2 ) ∈ ∂ − ϕ we have hy1 − y2 , x1 − x2 i ≥ 0. Therefore, for ((1 − t)x1 + ty1 , y1 ), ((1 − t)x2 + ty2 , y2 ) ∈ G we have |(1 − t)x1 +ty1 − (1 − t)x2 + ty2 |2 ≥ (1 − t)2 |x1 − x2 |2 + t2 |y1 − y2 |2 + 2t(1 − t)hx1 − x2 , y1 − y2 i ≥ t2 |y1 − y2 |2 . Hence G is a subset of the graph of a 1t -Lipschitz map, giving the claim. OPTIMAL MASS TRANSPORTATION 31 In order to have that the optimal transport from the endpoint µ0 of the geodesic is given by an optimal map, one has to assume something on the measure µ0 (as we already saw in the Euclidean case, Theorem 1.34). One sufficient condition for existence of optimal transport maps in nonbranching spaces is to have absolute continuity for the starting measure µ0 with respect to a nice enough reference measure m on X. This is due to Cavalletti and Huesmann [2], following the idea of Gigli [3]. Theorem 2.17. Let (X, d) be a proper, nonbranching metric space that supports a measure m such that for every compact set K ⊂ X there exists a measurable funciton f : [0, 1] → [0, 1] with lim supt→0 f (t) > 21 and a positive δ ≤ 1 such that m(At,x ) ≥ f (t)m(A) for all 0 ≤ t ≤ δ, (2.1) for all A ⊂ K compact, x ∈ K where At,x is defined as At,x := et ({γ ∈ Geo(X) : γ0 ∈ A, γ1 = x}). Then for any µ0 , µ1 ∈ P2 (X) with µ0 ≪ m there exists a unique σ ∈ Opt(µ0 , µ1 ), for the cost c(x, y) = d2 (x, y), and this plan is induced by a map. The result of Cavalletti and Huesmann holds for any cost function c(x, y) = h(d(x, y) with h stricty convex and nondecreasing. The assumption (2.1) on the reference measure means that when we contract a set towards a point, its measure does not shrink to zero too fast. This is used to push the nonbranching of geodesics to nonbranching at time zero. In the Euclidean Rn space with the Lebesgue measure the control function is simply f (t) = (1 − t)n , since the set At,x is a scaling of A by a factor (1 − t). A sketch of a proof of Theorem 2.17. Let µ0 , µ1 and σ ∈ Opt(µ0 , µ1 ) be fixed and let (ϕ, ϕc ) be the Kantorovich potentials associated with σ and Γ := {(x, y) ∈ X × X : ϕ(x) + ϕc (y) = c(x, y)} so that in particular σ(Γ) = 1. The first step is to prove that the control 2.1 on the contractions towards points can be transfered to more general targets. In more detail, for any compact Λ ∈ X × X, t ∈ [0, 1] and compact A ⊂ X one defines At,Λ := et ((e0 , e1 )−1 ((A × X) ∩ Λ)) and ˆ := (P1 (Λ) × P2 (Λ)) ∩ Γ. Λ Then one shows3 that for any Λ ⊂ Γ compact m(At,Λˆ ) ≥ f (t)m(A) for all t ∈ [0, δ] and A ⊂ P1 (Λ). (2.2) Next one shows that for any Λ1 , Λ2 ⊂ Γ such that P1 (Λ1 ) = P2 (Λ2 ) and P2 (Λ1 ) ∩ P2 (Λ2 ) = ∅ it necessarily holds that m(P1 (Λ1 )) = m(P1 (Λ2 )) = 0. This is seen by taking A := P1 (Λ1 ) = P1 (Λ2 ) and observing that since At,Λi , with i = 1, 2, converge to A in the Hausdorff topology, 3This is the most technical part of the proof. 32 OPTIMAL MASS TRANSPORTATION one has m(A) = lim sup m(Aǫ ) ≥ lim sup m(At,Λ1 ∪ At,Λ2 ) ǫ→0 t→0 = lim sup(m(At,Λ1 ) + m(At,Λ2 )) ≥ lim sup 2f (t)m(A) > m(A), t→0 t→0 where Aǫ := {x ∈ X : d(x, A) ≤ ǫ} and the second equality follows from the nonbranching assumption. Assume then that σ is not given by a map. Then there exists a continuous T : X → X and compact E ⊂ X such that {(x, T (x)) : x ∈ E} ⊂ Γ, for every x ∈ E there exists y ∈ X such that y 6= T (x) and (x, y) ∈ Γ, and µ0 (E) > 0. Since ∞ [ 1 Γ ∩ (E × X) = (x, y) ∈ Γ : x ∈ E, d(y, T (x)) ≥ , n n=1 there exists n ∈ N such that m(P1 (Λ)) > 0 with 1 . Λ := (x, y) ∈ Γ : x ∈ E, d(y, T (x)) ≥ n Since T is continuous, there exists δ > 0 such that for every x ∈ E and y ∈ X with d(x, y) ≤ δ 1 . Let us take x ¯ ∈ P1 (Λ) such that we have d(T (x), T (y)) < 2n ¯ x, δ)) > 0. m(P1 (Λ) ∩ B(¯ Now, defining ¯ x, δ)} Λ1 := {(x, T (x)) ∈ Γ : x ∈ E ∩ B(¯ ¯ x, δ)}, Λ2 := {(x, y) ∈ Λ : x ∈ B(¯ ¯ x, δ)) > 0. On the we have Λ1 , Λ2 ⊂ Γ with P1 (Λ1 ) = P2 (Λ2 ) and m(P1 (Λ1 )) = m(P1 (Λ) ∩ B(¯ ¯ x, δ) such that d(y, T (w)) ≥ 1 . Hence for any other hand, for any y ∈ P2 (Λ2 ) we have w ∈ B(¯ n ¯ x, δ) we have z ∈ B(¯ 1 1 1 d(y, T (z)) ≥ d(y, T (w)) − d(T (w), d(z)) ≥ − = , n 2n 2n showing that P2 (Λ1 )∩P2 (Λ2 ) = ∅. This contradicts the previous step. Thus σ is concentrated on a graph of a function from which the uniqueness follows as in the previous proofs. and Proper metric measure spaces satisfying the assumption of Theorem 2.17 are necessarily locally doubling: for every x ∈ X there exists a radius R > 0 and a constant C > 0 such that m(B(y, 2r)) ≤ Cm(B(y, r)) for all y ∈ B(x, R) and 0 < r < R. Problem 1. What properties of a metric measures space (X, d, m) imply the condition (2.1)? In which geodesic metric spaces (X, d) can one find a reference measure m satisfying the condition? For some sufficient conditions, see [2, Remark 3.5]. OPTIMAL MASS TRANSPORTATION 33 3. Optimal transport and curvature Optimal transportation has been used to give definitions of Ricci curvature lower bounds in metric measure spaces. Before going into Ricci curvature, let us briefly visit Alexandrov spaces, i.e. metric spaces with (generalized) sectional curvature bounds. Curvature bounds in the sense of Alexandrov can be stated by comparing triangles to triangles in model spaces. Let us define here only sectional curvature upper and lower bound zero. Definition 3.1. A geodesic space (X, d) is said to be positively curved in the sense of Alexandrov if for every γ ∈ Geo(X) and every z ∈ X we have d2 (γt , z) ≥ (1 − t)d2 (γ0 , z) + td2 (γ1 , z) − t(1 − t)d2 (γ0 , γ1 ) (3.1) for all t ∈ [0, 1]. If the converse inequality always holds, the space is called non positively curved (in the sense of Alexandrov). Non positively curved spaces are also called CAT(0) spaces, or Hadamard spaces. If (X, d) is both positively and non positively curved, it is a convex subset of a Hilbert space. Theorem 3.2. Assume (X, d) is positively curved. Then (P2 (X), W2 ) is positively curved. Proof. Let (µt ) ∈ Geo(P2 (X)) and π ∈ P(Geo(X)) such that for all t ∈ [0, 1]. µt = (et )♯ π Fix t ∈ [0, 1] and σ ∈ Opt(µt , ν). Gluing π and σ together at µt (by first disintegrating both with respect to µt , taking the products of the disintegrated measures, and then integrating) we obtain α ∈ P(Geo(X) × X) such that P♯1 α = π and (et , P 2 )♯ α = σ, with P 1 (γ, x) = γ, P 2 (γ, x) = x and et (γ, x) = γt . Since (e0 , P 2 )♯ α ∈ A(µ0 , ν) and (e1 , P 2 )♯ α ∈ A(µ1 , ν), we get W22 (µt , ν) Z Z d (γt , x) dα(γ, x) ≥ (1 − t)d2 (γ0 , x) + td2 (γ1 , x) − t(1 − t)d2 (γ0 , γ1 ) dα(γ, x) Z Z Z 2 2 = (1 − t) d (γ0 , x) dα(γ, x) + t d (γ1 , x) dα(γ, x) − t(1 − t) d2 (γ0 , γ1 ) dα(γ, x) = 2 ≥ (1 − t)W22 (µ0 , ν) + tW22 (µ1 , ν) − t(1 − t)W22 (µ0 , µ1 ), giving the claim. Let us next observe that upper bounds on the (sectional) curvature of the space (X, d) do not imply curvature bounds on the space (P2 (X), W2 ). Example 3.3. Let X = R2 with d the Euclidean distance. Then (X, d) is non positively (and positively) curved. Still (P2 (R2 ), W2 ) is not non positively curved. To see this, define 1 1 1 µ0 := (δ(1,1) + δ(5,3) ), µ1 := (δ(−1,1) + δ(−5,3) ), ν := (δ(0,0) + δ(0,−4) ). 2 2 2 Then A(µ0 , µ1 ) = {aσ1 + (1 − a)σ2 : a ∈ [0, 1]} 34 OPTIMAL MASS TRANSPORTATION with 1 σ1 = (δ((1,1),(−1,1)) + δ((5,3),(−5,3)) ) 2 and σ2 = 1 (δ + δ((5,3),(−1,1)) ). 2 ((1,1),(−5,3)) Since Z 1 1 d2 (x, y) d(aσ1 + (1 − a)σ2 )(x, y) = a (22 + 102 ) + (1 − a) (62 + 22 + 62 + 22 ) 2 2 = a · 52 + (1 − a) · 40, we have W22 (µ0 , µ1 ) = 40. Similarly one can calculate that W22 (µ0 , ν) = W22 (µ1 , ν) = 30. From the above computation we also see that the unique geodesic µt connecting µ0 to µ1 is 1 µt = (δ(1−6t,1+2t) + δ(5−6t,3−2t) ). 2 But now, 30 30 40 + − . 40 = W22 (µ 1 , ν) > 2 2 2 4 2 Hence (P2 (R ), W2 ) is not non positively curved. 3.1. Ricci curvature lower bounds. Let us then turn to Ricci curvature lower bounds. Let R denote the Riemann curvature tensor on a Riemannian manifold M , and let x ∈ M and u, v ∈ Tx M . The Ricci curvature Ric(u, v) ∈ R is defined as X Ric(u, v) := hR(u, ei )v, ei i, i where (ei ) is an orthonormal basis of Tx M . The manifold M is said to have Ricci curvature bounded below by λ ∈ R if Ric(u, u) ≥ λ|u|2 for every x ∈ M and u ∈ Tx M . One geometric interpretation of Ricci curvature is in terms of infinitesimal volume comparison. Let x ∈ M and B ⊂ Tx M a small neighbourhood of the origin. Since expx : B → M is injective and smooth, the density d(expx )♯ Ld , dVol where Ld is the Lebesgue measure on the tangent space Tx M and Vol is the volume measure on M , is also smooth. For u ∈ B this density has the Taylor expansion 1 ρ(expx (u)) = 1 + Ric(u, u) + o(|u|2 ). 2 Hence, where sectional curvature bounds tell us something about the tendency for geodesics to converge (in positive curvature) or diverge (in negative curvature), Ricci curvature bounds give us information how volume-elements shrink or grow. This effect of Ricci curvature can also be studied using optimal mass transport. Notice that in the above interpretation of Ricci curvature, we compared to the volume measure of the manifold. Unlike in the definition of (non) positively curved spaces, such reference measure plays a crucial role. We will usually write the reference measure as m. Next we will take a look at how Ricci curvature lower bounds can be formulated using optimal transport. Before this, let us list properties we would like to have for the formulation. It should at least: ρ= OPTIMAL MASS TRANSPORTATION 35 (1) agree with the Ricci curvature lower bound on Riemannian manifolds, (2) be stable under suitable convergence, (3) make sense on as general setting as possible, and (4) imply useful analytic and geometric properties. The point (1) is more or less obvious. The requirement for stability is motivated for example by the fact that the set of Riemannian manifolds with uniform Ricci curvature lower bound, and dimension and diameter upper bound is precompact in a natural topology, the measured Gromov-Hausdorff topology. We will soon look at this topology. The points (3) and (4) fight against each other. It turns out that the more properties we require, the more we have to restrict the definitions. Let us list some of the properties one could ask for: • For the classical “analysis on metric spaces”: Doubling property (at least locally) for the reference measure m: meaning that there exists a constant C such that m(B(x, 2r)) ≤ Cm(B(x, r)) for every x ∈ X and r > 0, and local Poincar´ e inequality (at least locally), meaning that there exist constants C and λ such that for any Lipschitz function f : X → R and ball B ⊂ X we have Z Z 1 1 |f (x) − hf iB | dm(x) ≤ Cr |∇f (x)| dm(x), m(B) B m(λB) λB where Z 1 f (x) dm(x). = hf iB := m(B) B • Locality: if the Ricci curvature lower bounds hold locally, they should hold globally (after all, the Ricci curvature is infinitesimal notion in the classical setting). • Restriction: if we take a geodesically convex subset, it should also have the same lower bound as the initial space. • Tensorization: if X and Y have a Ricci curvature lower bound, then should X × Y have as well. • Variants of the classical volume comparisons: the Bishop-Gromov volume comparison, and the Brunn-Minkowski inequality. (Both will be defined later.) • Estimates on the size of the cut-locus and possible branching of geodesics. • Functional inequalities: Sobolev inequality (recall Theorem 1.37), log-Sobolev inequality, HWI-inequality. • Geometric rigidity results: Splitting theorem, maximal diameter theorem, . • ... Let us then look at the different topologies used on the collection of metric measure spaces {(X, d, m)}. Definition 3.4. Let (Z, D) be a metric space and X, Y ⊂ Z. The Hausdorff distance between X and Y is dH (X, Y ) = max{sup dist(x, Y ), sup dist(y, X)}. x∈X y∈Y This notion can be generalized to metric spaces (X, dX ) (Y, dY ) be defining the GromovHausdorff distance between X and Y as dGH (X, Y ) = inf dH (f (X), g(Y )), f 36 OPTIMAL MASS TRANSPORTATION where the infimum is over all metirc spaces Z and isometric embeddings f : X → Z and g : Y → Z. For non-compact spaces often it makes more sense to consider the pointed Gromov-Hausdorff convergence. A sequence (Xi , di , pi )∞ i=1 of pointed metric spaces (i.e. metric spaces (Xi , di ) with chosen points pi ∈ Xi ) converges in the pointed Gromov-Hausdorff sense to a pointed metric space (X∞ , d∞ , p∞ ) if there exist a metric space (Z, dZ ) and isometric embeddings fi : Xi → Z for i = N ∪ {∞} such that for every ǫ > 0 and R > 0 there exists i0 ∈ N such that for every i > i0 f∞ (B(p∞ , R)) ⊂ (fi (B(pi , R)))ǫ and fi (B(pi , R)) ⊂ (f∞ (B(p∞ , R)))ǫ . Furthermore, if the spaces (Xi , di ) are equipped with measures mi , we may consider the pointed measured Gromov-Hausdorff convergence, where in addition to pointed GromovHausdorff convergence we require, with the above notation, that (fi )♯ mi weak∗ converges to (f∞ )♯ m∞ (or sometimes the convergence is required to be weak (i.e. narrow)). Let us also give another notion of convergence for metric measure spaces, called they Ddistance. It was introduced by Sturm in [4]. From now on we will assume all reference measures to be probability measures with finite second moment. In order to define the Ddistance we need the following Definition 3.5. Given two metric measure spaces (X, dX , mX ) and (Y, dY , mY ) (with mX ∈ P2 (X) and mY ∈ P2 (Y )), we consider the product space (X × Y, DXY ) where q DXY ((x1 , y1 ), (x2 , y2 )) := d2X (x1 , x2 ) + d2Y (y1 , y2 ). A couple (d, σ) is called an admissible coupling between (X, dX , mX ) and (Y, dY , mY ), if • d is a pseudo distance on spt(mX ) ⊔ spt(mY ) such that when restricted to spt(mX ) × spt(mX ) it equals dX and when restricted to spt(mY ) × spt(mY ) it equals dY . (Recall that a pseudo distance d is the same as a distance with the exception that d(x, y) = 0 does not necessarily imply x = y.) • σ ∈ P(spt(mX ) × spt(mY )) such that (P 1 )♯ σ = mX and (P 2 )♯ σ = mY . Here the Borel structure in P(spt(mX ) × spt(mY )) is with the distance DXY . We write A((X, dX , mX ), (Y, dY , mY )) for the set of admissible couplings. With the notion of couplings of metric measure spaces we can define the D-distance similarly as the W2 -distance. Definition 3.6. The distance D between metric measure spaces (X, dX , mX ) and (Y, dY , mY ) is defined as D((X, dX , mX ), (Y, dY , mY )) := inf C(d, σ), (3.2) (d,σ) where the infimum is over all (d, σ) ∈ A((X, dX , mX ), (Y, dY , mY )) and the cost C(d, σ) is defined as !1 Z 2 2 . d (x, y) dσ(x, y) C(d, σ) := spt(mX )×spt(mY ) Notice that where the measured (pointed) Gromov-Hausdorff convergence requires both the convergence of the spaces and the measures, the D-distance only cares about the supports of the measures mX and mY . For example D((X, dX , δx ), (Y, dY , δy )) = 0 regardless of what the spaces X and Y are. OPTIMAL MASS TRANSPORTATION 37 Remark 3.7. In the definition of D-distance it is enough to consider couplings (d, σ) where d is a real distance. Indeed, given (d, σ) ∈ A((X, dX , mX ), (Y, dY , mY )) and ǫ > 0 we can consider ( d(z1 , z2 ), if (z1 , z2 ) ∈ (X × X) ∪ (Y × Y ) dǫ (z1 , z2 ) = d(z1 , z2 ) + ǫ, if (z1 , z2 ) ∈ (X × Y ) ∪ (Y × X). Proposition 3.8. There always exists an optimal couling of metric measure spaces, i.e. an admissible coupling realizing the infimum in (3.2). Proof. As for the existence of minimizers for the Kantorovich problem, we need to show that the cost C is suitably lower semicontinuous and that A((X, dX , mX ), (Y, dY , mY )) has suitable compactness properties. The weak compactness of the set of measures {σ : (d, σ) ∈ A((X, dX , mX ), (Y, dY , mY ))} is clear since they are tight by the tightness of {mX } and {mY }. For the pseudo distances we will only use the fact that if we take a minimizing sequence, the distances stay bounded between fixed points. Let (X, dX , mX ), (Y, dY , mY ) be metric measure spaces as in the previous definition. Suppose that (di , σi )∞ i=1 ⊂ A((X, dX , mX ), (Y, dY , mY )) is a sequence such that C(di , σi ) → D((X, dX , mX ), (Y, dY , mY )). Without loss of generality we may assume that σi converge to some σ. In order to obtain a converging subsequence of di we can take a dense subset (xj , yj )∞ j=1 ∈ spt(mX ) × spt(mY ). If there would be a subsequence of di such that di (xj , yj ) → ∞ for some j, then C(di , σi ) → ∞. Thus the sequences stay bounded and we can take a converging subsequence for di (x1 , y1 ), a converging subsequence of this for di (x2 , y2 ) and so on, and finally the diagonal sequence which converges for all (xj , yj ). By continuity this will define a distance d on the whole space spt(X) ⊔ spt(Y ). It then follows that C(d, σ) = D((X, dX , mX ), (Y, dY , mY )). We denote the set of optimal couplings (i.e. the ones realizing the infimum in (3.2)) between metric measure spaces (X, dX , mX ) and (Y, dY , mY ) by Opt((X, dX , mX ), (Y, dY , mY )). Definition 3.9. We call two metric measure spaces (X, dX , mX ) and (Y, dY , mY ) isomorphic if there exists a bijective isometry f : spt(mX ) → spt(mY ) such that f♯ mX = mY . We denote by X the set of isomorphism classes of complete and separable metric measure spaces (X, d, m), with m ∈ P2 (X). Proposition 3.10. (X, D) is a metric space. Proof. Let us start by showing that D is a distance. First of all, it is clearly symmetric and for isomorphic spaces (X, dX , mX ) and (Y, dY , mY ) we have D((X, dX , mX ), (Y, dY , mY )) = 0. If we are able to show the triangle inequality, we get that D is always finite by going via a Dirac mass and recalling that m ∈ P2 (X) for all the spaces in X. In order to see the triangle inequality, take (Xi , di , mi ) ∈ X, i = 1, 2, 3. We may assume that Xi = spt(mi ) for i = 1, 2, 3. For every ǫ > 0 there exist (dij , σij ) ∈ A((Xi , di , mi ), (Xj , dj , mj )) with (i, j) = (1, 2), (2, 3) such that C(dij , σij ) ≤ D((Xi , di , mi ), (Xj , dj , mj )) + ǫ. 38 OPTIMAL MASS TRANSPORTATION Morever, we may assume dij to be distances. Let us then X2 ⊔ X3 by setting d12 (x, y), d (x, y), 23 d123 (x, y) := inf z∈X2 (d12 (x, z) + d23 (z, y)), inf z∈X2 (d23 (x, z) + d12 (z, y)), define the distance d123 on X1 ⊔ if if if if (x, y) ∈ X1 ⊔ X2 (x, y) ∈ X2 ⊔ X3 x ∈ X1 , y ∈ X3 x ∈ X3 , y ∈ X1 . This gives a good competitor (pseudo) distance for computing D((X1 , d1 , m1 ), (X3 , d3 , m3 ). We still need to find a good σ to acompany it. This is given by gluing σ12 to σ23 at m2 . Another way to conclude the triangle inequality without gluing is to consider the space (P2 (X1 ⊔ X2 ⊔ X3 ), W2 ) with the distance d123 . There we get D((X1 , d1 , m1 ), (X3 , d3 , m3 ) = inf C(d, σ) ≤ W2 (m1 , m3 ) ≤ W2 (m1 , m2 ) + W2 (m2 , m3 ) (d,σ) ≤ C(d12 , σ12 ) + C(d23 , σ23 ) ≤ D((X1 , d1 , m1 ), (X2 , d2 , m2 )) + D((X2 , d2 , m2 ), (X3 , d3 , m3 )) + 2ǫ. In order to show that D is a distance, we still need to verify that D((X, dX , mX ), (Y, dY , mY )) = 0 implies that X and Y are isomorphic. Let (d, σ) be an optimal coupling of (X, dX , mX ) and (Y, dY , mY ). Then !1 Z 2 2 C(d, σ) = d (x, y) dσ(x, y) = 0. spt(mX )×spt(mY ) Thus d(x, y) = 0 for σ-almost every (x, y) ∈ spt(mX ) × spt(mY ). Since d is a pseudo distance and a distance when restricted to spt(mY ), for every x ∈ spt(mX ) there exists a unique f (x) := y ∈ spt(mY ) such that d(x, y) = 0. By the triangle inequality for d, the map f is an isometry. Now σ = (id, f )♯ mX , so in particular mY = f♯ mX . Thus X and Y are isomorphic. Proposition 3.11. (X, D) is complete and separable. Proof. Let us start with completeness. Let (Xi , di , mi )∞ i=1 be a Cauchy-sequence in (X, D). Next we take a subsequence such that D((Xik , dik , mik ), (Xik+1 , dik+1 , mik+1 )) ≤ 2−k−1 for all k ∈ N. There exist (dˆk , σk ) ∈ A((Xik , dik , mik ), (Xik+1 , dik+1 , mik+1 )) with dˆk a distance such that C(dˆk , σk ) ≤ 2−k . Now we recursively attach the spaces to each other by defining (X1′ , d′1 ) := (Xi1 , di1 ) and ′ / ∼ with x ∼ y if d′k (x, y) = 0, where Xk′ := Xik ⊔ Xk−1 ′ ′ if x, y ∈ Xk−1 dk−1 (x, y), d′k (x, y) := dˆk−1 (x, y), if x, y ∈ Xik−1 ⊔ Xik ′ ′ ˆ inf z∈Xk−1 (dk−1 (x, z) + dk−1 (z, y)), if x ∈ Xk−1 , y ∈ Xik−1 ⊔ Xik . ′ . We define This way a sequence of nested metric spaces (Xk′ , d′k ), Xnk ⊂ Xk′ ⊂ Xk+1 S∞ we have ′ ′ ′ ′ ′ X = k=1 Xk and d = limk→∞ dk a distance on X . Let (X, d) be the completion of (X ′ , d′ ). OPTIMAL MASS TRANSPORTATION 39 Now, all the measures mik can be naturally embedded to (X, d) and for them we have (in the W2 -distance defined from d) W2 (mik , mik+1 ) ≤ C(dˆk , σk ) ≤ 2−k . Thus (mik )k is Cauchy in (P2 (X), W2 ). Since this space is complete, there exists a measure m ∈ P2 (X) such that D((Xik , dik , mik ), (X, d, m)) ≤ W2 (mik , m) → 0. Thus (X, D) is complete. The space (X, D) is separable since n o X Xdisc := (X, d, m) ∈ X : X = {x1 , . . . , xn }, d(xi , xj ) ∈ Q, m = ai δxi , ai ∈ Q, n ∈ N is dense in (X, D). Since we will often consider separately absolutely continuous measures in a given metric measure space (X, d, m), we denote P a (X) := {µ ∈ P(X) : µ ≪ m} and Ppa (X) := {µ ∈ Pp (X) : µ ≪ m} . (Recall that µ1 is absolutely continuous with respect to µ2 , denoted µ1 ≪ µ2 , if µ2 (A) = 0 ⇒ µ1 (A) = 0 for all (Borel) A ⊂ X, and µ1 and µ2 are singular with respect to eachother, denoted µ1 ⊥ µ2 , if there exists a set S ∈ B(X) such that µ1 (S) = µ2 (X \ S) = 0.) Given a coupling (d, σ) between metric measure spaces (X, dX , mX ) and (Y, dY , mY ) we can assosiate a map σ♯ : P2a (X) → P2a (Y ) defined as Z µ = ρmX 7→ σ♯ := ηmY , where η(y) := ρ(x) dσy (x), with {σy } the disintegration σ with respect to the projection on Y . Similarly we can define σ♯−1 : P2a (Y ) → P2a (X). Now that we have introduced a suitable topology (or two) for convergence of metric measure spaces, let us return to Ricci curvature lower bounds. We would like to deifnitions of Ricci curvature lower bounds to be stable under convergence in the D-distance. The stability will be obtained by defining the bounds using inequalities for semicontinuous funtionals on the space of probability measures. Let us next introduce the functionals we will consider. Let u : [0, ∞) → R be convex and continuous with u(0) = 0 and define u(z) . z→∞ z With these we can define a functional E : P(X) × P(X) → R ∪ {+∞} by setting Z E (µ|ν) := u(ρ) dν + u′ (∞)µs (X), u′ (∞) := lim where µ = ρν + µs is the decomposition of µ into the absolutely continuous part ρν with respect to ν and the singular part µs with respect to ν. The most important functionals for our purpose are the entropy functionals EN for N ∈ (1, ∞]. They are given by the functions uN which are defined as 1 uN (z) := N (z − z 1− N ) when N < ∞, and as u∞ (z) := z log(z). 40 OPTIMAL MASS TRANSPORTATION Notice that for N < ∞ 1 u′N (∞) z − z 1− N =N = lim N z→+∞ z and u′∞ (∞) = lim z log(z) z→+∞ z s + µ ∈ P(X)) = +∞. Thus for N < ∞ we have (for µ = ρm Z Z 1 1 EN (µ|m) = N (ρ − ρ1− N ) dm + N µs (X) = N − ρ− N dµ and (R ρ log(ρ) dm, if µs (X) = 0 +∞, if µs (X) > 0. In the following we always assume that E be given by some continuous and convex function u with u(0) = 0. E∞ (µ|m) = Lemma 3.12. Let (X, dX , mX ), (Y, dY , mY ) ∈ X and (d, σ) ∈ A((X, dX , mX ), (Y, dY , mY )). Then E (σ♯ µ|mY ) ≤ E (µ|mX ), for all µ ∈ P2a (X), E (σ♯−1 ν|mX ) ≤ E (ν|mY ), for all ν ∈ P2a (Y ). Proof. Let us only prove the inequality we get Z E (σ♯ µ|mY ) = Z ≤ Z = first inequality. Let µ = ρmX and σ♯ µ = ηmY . By Jensen’s Z Z ρ(x) dσy (x) dmY (y) u(η(y)) dmY (y) = u Z Z u(ρ(x)) dσy (x) dmY (y) = u(ρ(x)) dσ(x, y) u(ρ(x)) dmX (x) = E (µ|mX ). Lemma 3.13. The funtional E is weakly lower semincontinuous with respect to both variables. In other words, for µn → µ and νn → ν weakly we have E (µ|ν) ≤ lim inf E (µn |νn ). n→∞ Let us proof Lemma 3.13 only in the case where (X, d) is compact. We will use the Legendre transform for representing the functional. Definition 3.14. Let u : [0, ∞) → R be continuous and convex with u(0) = 0. The Legendre transform of u is defined on R as u∗ (r) := sup (rs − u(s)) . s∈[0,∞) Proposition 3.15. Let X be compact and u : [0, ∞) → R be continuous and convex with u(0) = 0. Then Z Z ∗ ′ ′ u (ϕ) dν : ϕ ∈ C(X), u (1/M ) ≤ ϕ ≤ u (M ), M ∈ N . ϕ dµ − E (µ|ν) = sup X X OPTIMAL MASS TRANSPORTATION 41 Proof of Lemma 3.13. Since u∗ is continuous on [u′ (1/M ), u′ (M )], so for ϕ continuous with values in [u′ (1/M ), u′ (M )] also u∗ (ϕ) is continuous. Thus by Proposition 3.15 the functional E is the supremum of continuous functionals Z Z u∗ (ϕ) dν. ϕ dµ − X X and thus lower semincontinuous. Let us next prove suitable Γ-convergence of the functionals in the D-distance. Theorem 3.16. Suppose limn→∞ D((Xn , dn , mn ), (X, d, m)) = 0 for spaces Xn , X ∈ X and let (dn , σn ) ∈ Opt((Xn , dn , mn ), (X, d, m)). Then a (i) For any sequence (µn )∞ n=1 with µn ∈ P2 (Xn ) such that (σn )♯ µn converges weakly to some µ ∈ P(X) it holds lim inf E (µn |mn ) ≥ E (µ|m). (ii) n→∞ a For any µ ∈ P2 (X) with bounded density P2a (Xn ) with W2 ((σn )♯ µn , µ) → 0 and there exists a sequence (µn )∞ n=1 with µn ∈ lim sup E (µn |mn ) ≤ E (µ|m). n→∞ References [1] V. I. Bogachev, Measure Theory, vol. 2, Springer. [2] F. Cavalletti, M. Huesmann, Existence and uniqueness of optimal transport maps, to appear in Ann. Inst. H. Poincar´e Anal. Non Lin´eaire, arXiv:1301.1782 [3] N. Gigli, Optimal maps in non branching spaces with Ricci curvature bounded from below, Geometric And Functional Analysis 22 (2012), 990–999. [4] K.-T. Sturm, On the geometry of metric measure spaces I, Acta Math. 196 (2006), 65–131.